JP2007114972A

JP2007114972A - Data processing method of structured document, data processing program and data processor

Info

Publication number: JP2007114972A
Application number: JP2005304968A
Authority: JP
Inventors: Shigeru Yoshida; 茂吉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2005-10-19
Filing date: 2005-10-19
Publication date: 2007-05-10
Anticipated expiration: 2025-10-19
Also published as: JP4887726B2

Abstract

<P>PROBLEM TO BE SOLVED: To perform the processing of fetching a CSV compressed document to an associative array and canceling CSV on the associative array and the management of a memory region size regarding a method for handling a CSV compressed XML document similarly to the a non-compressed document while saving resources. <P>SOLUTION: For the large volume XML document of a record form, a plurality of elements not to be processed are connected and gathered by a CSV format. A kind of a CSV-formatted element is read from a header when processing data, access is performed by canceling the CSV format for the CSV-formatted element when called from application software, and by managing the records for which the CSV format is canceled and the number of them, a memory is saved and loads are reduced without being conscious of CSV formatting. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、構造化文書として、主に、表形式で表されデータベースのように扱われるデータ型ＸＭＬ（eXtensible Markup Language）文書を対象とし、これに特化したＡＰＩ（Application Programming Interface ）を簡単化するための技術に関する。 The present invention mainly targets a data type XML (eXtensible Markup Language) document expressed in a table format and handled like a database as a structured document, and simplifies an API (Application Programming Interface) specialized for this. It relates to technology.

近年、インターネットを通して、個人、企業、自治体など、あらゆる種類のシステムが接続され、相互に連携したＷｅｂ（World Wide Web) サービスやＥＤＩ（Electronic Data Interchange)、ＥＣ(Electronic Commerce) が行われつつある。これらのシステムの実現には幅広い情報交換が必要になっており、そのデータ交換およびデータ処理において、データを構造化することによって柔軟な表現能力を持たせてたＸＭＬが、コンピュータ処理に適するため、共通基盤のフォーマットとして注目されている。 In recent years, various types of systems such as individuals, companies, and local governments are connected through the Internet, and Web (World Wide Web) service, EDI (Electronic Data Interchange), and EC (Electronic Commerce) are being performed in cooperation with each other. In order to realize these systems, a wide range of information exchange is required. In the data exchange and data processing, XML that has flexible expression ability by structuring data is suitable for computer processing. It is attracting attention as a common platform format.

ＸＭＬは、１９８６年ＩＳＯ（International Organization for Standardization) で標準化されたＳＧＭＬ（Standard Generalized Markup Language）をインターネットで活用し易くするために、１９９８年２月にその基本仕様ＸＭＬ１．０がＷ３Ｃ（World Wide Web Consortium)において策定されたものである。Ｗｅｂページ作成言語であるＨＴＭＬ（HyperText Markup Language)は、タグが固定で表示に特化したものとなっており、そのタグ情報を基にコンピュータで情報を処理したいという要件に対応できない問題があった。ＸＭＬは、利用者が自由にタグを定義し、文書中の文字列に意味付けさせる言語構造であり、コンピュータによる情報処理を可能としている。 In order to make it easy to use SGML (Standard Generalized Markup Language) standardized by the International Organization for Standardization (ISO) in 1986 on the Internet, XML has its basic specification XML 1.0 changed to W3C (World Wide Web) in February 1998. Consortium). HTML (HyperText Markup Language), a Web page creation language, has a fixed tag and is specialized for display, and there is a problem that cannot meet the requirement to process information on a computer based on the tag information. . XML is a language structure that allows a user to freely define tags and give meaning to character strings in a document, and enables information processing by a computer.

ＸＭＬ文書に、検索・更新・削除などの操作を施す場合、標準のＡＰＩソフトでＤＯＭ(Document Object Model）ツリー構造に展開して操作する。しかしながら、ＤＯＭツリーへの展開には元データの５〜１０倍もの膨大な動作メモリ量を必要とする上、使わない項目も一緒に展開されてしまい時間もかかる欠点があった。そこで、本発明では、ＸＭＬ文書のデータ処理に必要とするリソースを軽減し、かつ、ユーザにその仕掛けを意識させずに行わせる技術を提供する。 When performing an operation such as search, update, or deletion on an XML document, the XML document is expanded into a DOM (Document Object Model) tree structure using standard API software. However, expansion to the DOM tree requires a huge amount of operation memory 5 to 10 times that of the original data, and items that are not used are also expanded together, which takes time. In view of this, the present invention provides a technique for reducing resources required for data processing of an XML document and making the user aware of the device.

また、ＸＭＬ文書は、その特徴によって、雑誌、マニュアル、辞典など、要素内容が長い文書型ＸＭＬ文書と、伝票、予定表など、タグ数が多く、要素内容が短いデータ型ＸＭＬ文書の二つに分類されるが、本発明は、主にデータ型ＸＭＬ文書を対象とし、とくに、表形式のような形で表され、データベースのように扱われるＸＭＬ文書に特化して、ＡＰＩの簡単化を図るものである。 In addition, depending on the characteristics of the XML document, there are two types: a document type XML document having a long element content such as a magazine, a manual, a dictionary, and a data type XML document having a large number of tags and a short element content such as a slip and a schedule. Although classified, the present invention mainly targets data type XML documents. In particular, the present invention aims to simplify the API by specializing in XML documents that are represented in a table form and handled like a database. Is.

以下、さらに、従来のＸＭＬ文書の抱える問題点をクリアにすべく、ＸＭＬ技術、ＡＰＩの現状、および先願技術について述べる。
（１）ＸＭＬについて
ここで、ＸＭＬ規格に基づき、専門用語の呼び方を定めておく。一対の”< ”と”> ”で囲まれた文字列をタグ、”< 文字列> ”を開始タグ、”</文字列> ”を終了タグ、開始タグから終了タグまでの文字列全体を要素、開始タグと終了タグで挟まれた文字列を要素内容、タグ内に記述される要素の名前を要素名( あるいはタグ名) 、要素に対する付加情報を属性と呼ぶ。 In the following, the XML technology, the current state of the API, and the prior application technology will be described in order to clear the problems of the conventional XML document.
(1) About XML Here, the terminology of technical terms is defined based on the XML standard. A string surrounded by a pair of "<" and ">" is a tag, "<string>" is a start tag, "</ string>" is an end tag, and the entire string from the start tag to the end tag A character string sandwiched between an element and a start tag and an end tag is called element content, the name of the element described in the tag is called an element name (or tag name), and additional information for the element is called an attribute.

構造化文書は、文書自身の中にタグを埋め込む形でデータ構造を記述する。データ構造をタグとして文書に埋め込んだ構成を採ることにより、データ項目の追加、削除、変更に対して柔軟性と拡張性を持たせることができる。また、タグ名に、人が読んで意味のある名前を付けることにより、データに視認性を持たせることが可能となる。
（２）ＸＭＬ文書を扱う標準的なＡＰＩ
代表的な構造化文書であるＸＭＬ文書では、応用ソフトからＸＭＬ文書を扱うために、ＤＯＭ (Document Object Model)と、ＳＡＸ (Simple API for XML）と呼ばれる二つの標準的なインターフェイス（ＡＰＩ）規格が定められている。ＳＡＸは、ストリーム形式でＸＭＬ文書を読み取るため、メモリ消費が小さく、一般に高速である。したがって、時系列の出力で参照するだけの簡単な処理に向いている。 A structured document describes a data structure in a form in which tags are embedded in the document itself. By adopting a configuration in which a data structure is embedded in a document as a tag, it is possible to provide flexibility and expandability with respect to addition, deletion, and change of data items. In addition, it is possible to give the data visibility by giving a meaningful name to the tag name.
(2) Standard API for handling XML documents
In XML documents, which are typical structured documents, there are two standard interface (API) standards called DOM (Document Object Model) and SAX (Simple API for XML) to handle XML documents from application software. It has been established. Since SAX reads an XML document in a stream format, it consumes less memory and is generally faster. Therefore, it is suitable for simple processing that is simply referred to by time-series output.

一方、ＤＯＭは、一般に低速でメモリ消費が大きい欠点があるが、文書の要素を階層的なツリー状に展開するため、複雑な処理内容でもプログラムが組み易いという特徴を持っているため、ＸＭＬ文書の更新には主にＤＯＭが使われる。
（３）省リソース化のための先願技術（図１７参照）
標準ＡＰＩ (ＤＯＭ）が大量に動作メモリを消費し、処理速度が遅いのは、データ処理に使わない要素も含め全要素をメモリ上に展開するからである。処理速度、メモリ量は、ＸＭＬ文書の要素数に比例する。本願出願人は、これらの不便さを解決するために、「ＸＭＬＣＳＶ圧縮」という方法を先願（特許文献１、２、３）している。この方法は、ＸＭＬ文書中の要素を、ランダムアクセスが必要な要素と、一括アクセスでいい要素の２つのグループに分け、複数個の一括アクセス要素をＣＳＶ (Comma Separated Values) 形式で一つの要素にまとめる変換を施すことによって、実質的な要素数を減らし、標準ＡＰＩのメモリ使用量を削減するとともに、高速化を実現するものである。
特開２００３−２０３０６７号公報特願２００４−０８２５８９号公報特願２００５−５０６７０７号公報 On the other hand, DOM generally has a drawback that it is low speed and consumes a large amount of memory. However, since the elements of a document are expanded in a hierarchical tree shape, it has a feature that a program can be easily assembled even with complicated processing contents. The DOM is mainly used for updating.
(3) Prior application technology for resource saving (see Fig. 17)
The standard API (DOM) consumes a large amount of operation memory and the processing speed is slow because all elements including elements not used for data processing are expanded on the memory. The processing speed and the memory amount are proportional to the number of elements of the XML document. In order to solve these inconveniences, the applicant of the present application has previously filed a method called “XML CSV compression” (Patent Documents 1, 2, and 3). In this method, elements in an XML document are divided into two groups: elements that require random access and elements that can be collectively accessed, and multiple elements that are collectively accessed are combined into one element in CSV (Comma Separated Values) format. By performing the conversion to be summarized, the substantial number of elements is reduced, the memory usage of the standard API is reduced, and the speed is increased.
JP 2003-203667 A Japanese Patent Application No. 2004-082589 Japanese Patent Application No. 2005-506707

上記したように、ＸＭＬは、柔軟なデータ表現形式である反面、標準のＡＰＩソフト（ＤＯＭ）でツリー構造に展開して操作する時には、大量のメモリを消費し、ＣＰＵ負荷を重くする欠点を抱える。本願出願人は、先に、ＸＭＬ応用ソフトにおいて、ＸＭＬ文書の処理対象外要素をＣＳＶ形式にしてＣＰＵ負荷の軽減と、省メモリ化を図る「ＸＭＬＣＳＶ圧縮」を提案してきた。 As described above, XML is a flexible data expression format, but has the disadvantages that it consumes a large amount of memory and makes the CPU load heavy when it is manipulated by expanding it into a tree structure with standard API software (DOM). . The applicant of the present application has previously proposed “XML CSV compression” in the XML application software, which reduces the load on the CPU and saves memory by converting the non-processable element of the XML document into the CSV format.

しかしながら、先願の技術では、ＣＳＶ圧縮されたＸＭＬ文書を応用ソフトの中でデータ処理する場合、ＣＳＶ化していない要素（生要素）と、ＣＳＶ化した要素（ＣＳＶ化要素）を区別し、操作する必要があった。ＣＳＶ化していない要素はそのまま使えるが、ＣＳＶ化した要素は、一旦、ＣＳＶ化を解いてから使うといった具合である。このため、リソースの削減にはなるものの、ユーザが２種類の要素を区別し意識して別々に扱う必要があり、応用ソフトのプログラミングが煩雑で手間を要する問題があった。 However, according to the technology of the prior application, when processing an XML document compressed in CSV format in application software, an element that is not converted to CSV (raw element) is distinguished from an element that is converted to CSV (CSV element). There was a need to do. Elements that have not been converted to CSV can be used as they are, but elements that have been converted to CSV are used after being converted to CSV. For this reason, although the resource is reduced, there is a problem that the user needs to distinguish and consciously handle the two types of elements and handle them separately, and the programming of the application software is complicated and time-consuming.

そこで、本発明では、レコード形式の大容量ＸＭＬ文書について、複数の処理対象外要素をＣＳＶ形式で繋いでまとめておき、データ処理時に、ヘッダよりＣＳＶ化要素の種類を読取って、応用ソフトから呼ばれたとき、ＣＳＶ化した要素については、ＣＳＶ化を解いてアクセスさせるようにし、また、ＣＳＶ化を解いたレコードとその数を管理することによって、ＣＳＶ化を意識させずに省メモリおよび負荷軽減を図る技術を提供する。 Therefore, in the present invention, for a large-capacity XML document in record format, a plurality of non-processable elements are connected together in CSV format, and the type of CSV conversion element is read from the header during data processing, and is called from application software. When an element is converted to a CSV, the element is converted to a CSV and accessed, and the records and the number of records that are converted to a CSV are managed to save memory and reduce the load without being aware of the conversion to CSV. Provide technology to achieve this.

第一の発明は、レコード形式で構成された構造化文書のデータ処理方法であって、前記構造化文書におけるレコード内の複数の要素を、個々にアクセスすべき第一の要素と一括してアクセスすべき第二の要素とにグループ分けする要素区分けステップと、前記第二の要素として対象となる前記レコード内の要素を区切り符号によって繋げて一つの要素に圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納する圧縮変換ステップと、応用ソフトによって前記レコード内の複数の要素にアクセスする際に、前記ヘッダ情報を最初に読み込ませ、当該要素が前記圧縮変換ステップで圧縮変換された前記第二の要素に該当するか否かを前記ヘッダ情報から判断する要素判断ステップと、前記レコード内の要素が前記第二の要素に該当しない場合に、当該要素内容をそのままアクセスさせ、また、前記レコード内の要素が前記第二の要素に該当する場合には、前記区切り符号で表現された要素内容を個々の要素内容に分解しメモリ上に展開した後にアクセスさせる要素アクセスステップと、をコンピュータに実行させることを特徴とする構造化文書のデータ処理方法に関する。 1st invention is the data processing method of the structured document comprised by the record format, Comprising: The several element in the record in the said structured document is accessed collectively with the 1st element which should be accessed individually An element dividing step for grouping into a second element to be combined, and the elements in the target record as the second element are connected by a delimiter code and compressed and converted into one element to represent the type of the element A compression conversion step for storing a header in a memory, and when accessing a plurality of elements in the record by application software, the header information is first read, and the element is compressed and converted in the compression conversion step. An element determining step for determining from the header information whether the element corresponds to the second element, and an element in the record does not correspond to the second element. If the element content is accessed as it is, and the element in the record corresponds to the second element, the element content represented by the delimiter is decomposed into individual element contents. The present invention relates to a structured document data processing method, characterized by causing a computer to execute an element access step that is accessed after being expanded into a document.

すなわち、第一の発明によれば、要素区分け手段が、構造化文書の各レコードにある複数の要素について、個々にアクセスすべき対象となる第一の要素と、一括りにアクセスすべき第二の要素とに区分けし、圧縮変換手段が、この一括アクセス対象として区分けされた第二の要素を、レコード毎に区切り符号によって繋げて一つの要素として圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納しておき、要素判断手段が、応用ソフトによって前記レコード内の複数の要素にアクセスする際に、ヘッダ情報を最初に読み込んで、当該要素が圧縮変換された第二の要素に該当するか否かについて前記ヘッダ情報から判断し、要素アクセス手段が、当該要素が第二の要素に該当しない場合には、当該要素内容をそのままアクセスさせ、また、当該要素が第二の要素に該当する場合には、区切り符号で表現された当該要素内容を個々の要素内容に分解し、メモリ上に展開した後にアクセスさせる構成をとることによって、使用メモリ容量の削減とＣＰＵの負荷逓減が図られるとともに、応用ソフト側からは意識しないで対象要素へのアクセスが実現される。 That is, according to the first invention, the element classification means has a plurality of elements in each record of the structured document, the first element to be accessed individually, and the second element to be accessed collectively. The compression conversion means compresses and converts the second element classified as a collective access target by a delimiter code for each record as one element, and a header indicating the type of the element. And when the element determination means accesses the plurality of elements in the record by the application software, the header information is first read and the element is compressed and converted. Is determined from the header information, if the element access means does not correspond to the second element, the element content is accessed as it is, In addition, when the element corresponds to the second element, the content of the element expressed by the delimiter is decomposed into individual element contents, and expanded on the memory, and then accessed, and then used memory is used. The capacity is reduced and the load on the CPU is reduced, and access to the target element is realized without being conscious of the application software.

第二の発明は、前記構造化文書をメモリに展開するときに、前記構造化文書ファイルをストリームデータとして読み取り、前記構造化文書の各要素を配列に割り当てて格納することを特徴とする上記第一の発明に記載の構造化文書のデータ処理方法に関する。 According to a second aspect of the invention, when the structured document is expanded in a memory, the structured document file is read as stream data, and each element of the structured document is assigned to an array and stored. The present invention relates to a structured document data processing method described in one invention.

すなわち、第二の発明によれば、構造化文書ファイルをシーケンシャルにストリームデータとして読み取り、構造化文書の各要素に対しメモリ上で配列に割り当て格納することになるので、ファイルに何度もアクセスを繰り返して読み取る必要がなく、メモリが削減でき、効率的に高速にメモリ展開を図ることが可能となる。 That is, according to the second invention, the structured document file is sequentially read as stream data, and each element of the structured document is allocated and stored in an array on the memory, so that the file is accessed many times. There is no need to read repeatedly, the memory can be reduced, and the memory can be efficiently deployed at high speed.

第三の発明は、前記構造化文書のメモリへの展開において、前記第一の要素と前記区切り符号で一括に表現された第二の要素をレコード毎に割り当てる第一の配列と、前記第二の要素の一括された要素内容を個々の要素内容に分解して割り当てる第二の配列とを有することを特徴とする上記第一または第二の発明に記載の構造化文書のデータ処理方法に関する。 According to a third aspect of the present invention, in the expansion of the structured document into the memory, a first array that assigns the first element and the second element collectively represented by the delimiter for each record, and the second The structured document data processing method according to the first or second aspect of the present invention, further comprising: a second array in which the element contents of the elements are collectively divided into individual element contents.

すなわち、第三の発明によれば、構造化文書をメモリに展開する際に、レコード毎に、個々がアクセス対象である第一の要素と区切り符号で一つに圧縮した第二の要素を割り当てた第一の配列と、一括された第二の要素内容を個々の要素内容に分解した第二の配列をメモリに保持することになるので、同一レコードがアクセスされた場合に、すでに圧縮要素が展開され、その分解された要素内容の結果をそのまま渡すことが可能となる。 That is, according to the third invention, when the structured document is expanded in the memory, the first element to be accessed and the second element compressed with a delimiter code are allocated for each record. Since the first array and the second array obtained by disassembling the grouped second element contents into individual element contents are held in the memory, when the same record is accessed, the compressed element has already been stored. The result of the expanded and disassembled element contents can be passed as it is.

上記してきた発明により以下の効果が生まれる。 The invention described above produces the following effects.

これまで、構造化文書における要素の圧縮は、レコード内要素を区切り符号によって一括化して減らす要素の割合にほぼ比例して、主記憶メモリの消費量を減らす効果を生むが、応用ソフト側からすると、レコード内の生要素と一括化要素とを区別してプログラムを組む必要があった。本発明によれば，これを意識せずにプログラミングすることができるため，一括化圧縮要素による性能改善効果と、プログラミングの容易さとを両立させることが可能となる。 Up to now, compression of elements in structured documents has the effect of reducing the amount of main memory consumption in proportion to the ratio of elements that are reduced by grouping the elements in the record by delimiters. Therefore, it was necessary to make a program by distinguishing the raw elements and the batch elements in the record. According to the present invention, since programming can be performed without being conscious of this, it is possible to achieve both the performance improvement effect by the batch compression element and the ease of programming.

また、構造化文書ファイルをシーケンシャルにストリームデータとして読み取り、構造化文書の各要素に対しメモリ上で配列に割り当てる構成をとることにより、ファイルに何度もアクセスを繰り返して読み取る必要がなくなるため、メモリを削減でき、効率的に高速にメモリ展開を図ることが可能となる。 In addition, by reading the structured document file sequentially as stream data and assigning each element of the structured document to an array on the memory, it is not necessary to read the file repeatedly and repeatedly. This makes it possible to efficiently expand the memory at high speed.

さらに、構造化文書をメモリに展開する際に、レコード毎に、個々がアクセス対象である第一の要素と区切り符号で一つに圧縮した第二の要素を割り当てた第一の配列と、一括された第二の要素内容を個々の要素内容に分解した第二の配列をメモリに保持することになるので、同一レコードがアクセスされた場合に、すでに圧縮要素が展開されているため、その分解された要素内容の結果をそのまま渡すことが可能となる。 In addition, when the structured document is expanded in the memory, for each record, the first array in which the first element to be accessed and the second element compressed into one with a delimiter code are allotted, and Since the second array obtained by decomposing the second element content into individual element contents is held in the memory, the compressed element is already expanded when the same record is accessed. It is possible to pass the result of the element content as it is.

以下、図面にもとづいて本発明の実施形態を説明する。実施例において、構造化文書は、表形式で表されデータベースのように扱われるデータ型のＸＭＬ文書を対象としいるため以下ではＸＭＬ文書と表現し、また、このＸＭＬ文書のレコード内の要素を区切り符号を使って一括した圧縮形式については、カンマだけでなく様々な文字列の適用が可能であるが、ここでは、便宜上、ＣＳＶ圧縮として表現する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the embodiment, since the structured document is an XML document of a data type expressed in a table format and handled like a database, it is expressed as an XML document below, and the elements in the record of the XML document are delimited. As for the compression format collectively using codes, various character strings can be applied as well as commas. Here, for convenience, it is expressed as CSV compression.

図１は、本発明の実施の形態になる構造化文書におけるデータ処理システムの基本構成を示す。データ処理システムは、エンドユーザ４０のＰＣ端末、アプリ開発者５０によって開発された様々なアプリケーションソフト３０、およびネットワークを介して接続されたサーバ等の装置に格納されたＣＳＶ圧縮ＸＭＬ文書Ａ、Ａ’（ＣＳＶ圧縮された構造化文書）、データ処理プログラム１０、およびアプリケーションソフト３０へのデータの受け渡しを行うＡＰＩソフト２０（ＸＭＬパーサ）で構成される。 FIG. 1 shows a basic configuration of a data processing system for a structured document according to an embodiment of the present invention. The data processing system includes a PC terminal of the end user 40, various application software 30 developed by the application developer 50, and CSV compressed XML documents A and A ′ stored in a device such as a server connected via a network. (Structured document compressed in CSV), data processing program 10, and API software 20 (XML parser) that transfers data to application software 30.

さらに、データ処理プログラム１０は、構造化文書におけるレコード内の複数の要素を、個々にアクセスすべき第一の要素と一括してアクセスすべき第二の要素とにグループ分けする要素区分手段１１、第二の要素として対象となるレコード内の要素をＣＳＶ形式で繋げて一つの要素に圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納する圧縮／復元変換手段１２、アプリケーションソフト３０（応用ソフト）によって前記レコード内の複数の要素にアクセスする際に、ＣＳＶ圧縮情報用のヘッダ情報を最初に読み込ませ、当該要素が圧縮変換された第二の要素に該当するか否かをそのヘッダ情報から判断する要素判断手段１３、および前記レコード内の要素が前記第二の要素に該当しない場合に、当該要素内容をそのままアクセスさせ、また、前記レコード内の要素が前記第二の要素に該当する場合には、ＣＳＶ圧縮された一括要素の要素内容を個々の要素内容に分解しメモリ上に展開した後にアクセスさせる要素アクセス手段１４で構成されている。 Further, the data processing program 10 includes an element classification unit 11 for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed. Compression / decompression conversion means 12 that connects the elements in the target record as the second element in CSV format, compresses and converts them into one element, attaches a header indicating the type of the element, and stores it in the memory, application software When accessing a plurality of elements in the record by 30 (application software), the header information for CSV compression information is first read, and whether or not the element corresponds to the second element that has been compression-converted is determined. If the element determination unit 13 determines from the header information, and the element in the record does not correspond to the second element, the content of the element is used as it is. If the element in the record corresponds to the second element, the element access of the batch-compressed CSV-compressed element is decomposed into individual element contents and expanded on the memory. It is comprised by the means 14.

本データ処理システムにおいて、エンドユーザ４０は、ネットワークに接続するサーバ等の装置に格納されたアクセス対象のＣＳＶ圧縮ＸＭＬ文書Ａ、Ａ’に対し、自端末の画面上から様々なアプリケーションソフト３０を使って検索・更新の指示を行う。 In this data processing system, the end user 40 uses various application software 30 on the screen of the own terminal with respect to the CSV compressed XML documents A and A ′ to be accessed stored in a device such as a server connected to the network. To instruct search / update.

ＣＳＶ圧縮ＸＭＬ文書Ａ、Ａ’が格納されたサーバ等の装置では、ＡＰＩソフト２０を介してデータ操作が行われ、データ処理プログラム１０によって、原ＸＭＬ文書Ａが加工処理されて、処理結果としてのＸＭＬ文書Ａ’がメモリ上に展開され、ＡＰＩソフト２０を介してアプリケーションソフト３０に戻され、エンドユーザ４０のＰＣ端末において、当該ＸＭＬ文書Ａ’は、ＸＭＬからＨＴＭＬに変換され、ブラウザによって画面に表示出力される。 In an apparatus such as a server in which the CSV compressed XML documents A and A ′ are stored, the data operation is performed via the API software 20, the original XML document A is processed by the data processing program 10, and the processing result is The XML document A ′ is expanded on the memory, returned to the application software 30 via the API software 20, and the XML document A ′ is converted from XML to HTML in the PC terminal of the end user 40, and is displayed on the screen by the browser. Display output.

図２は、本発明の実施の形態になる構造化文書のデータ処理システムを実現するコンピュータのハードウエア構成例を示す。 FIG. 2 shows a hardware configuration example of a computer that realizes a structured document data processing system according to an embodiment of the present invention.

図に示すコンピュータ１００は、バス１０９によって互いに接続するＣＰＵ（Central Processing Unit ）１０１、メモリ１０２、入力装置１０３、出力装置１０４、補助記憶装置１０５、媒体駆動装置１０６、可搬記録媒体１０７、およびネットワーク接続装置１０８を備えた構成となっている。但し、同図に示す構成は一例であり、これに限るものではない。 A computer 100 shown in the figure includes a CPU (Central Processing Unit) 101, a memory 102, an input device 103, an output device 104, an auxiliary storage device 105, a medium drive device 106, a portable recording medium 107, and a network connected to each other by a bus 109. The connection device 108 is provided. However, the configuration shown in the figure is an example, and the present invention is not limited to this.

ＣＰＵ１０１は、当該コンピュータ１００全体を制御する中央処理装置であり、メモリ１０２は、プログラムを実行したり、データ更新等を行う際に、補助記憶装置１０５（あるいは可搬型記録媒体１０７）に記憶されているプログラムあるいはデータを一時的に格納するＲＡＭ（Random Access Memory）等のメモリである。また、ＣＰＵ１０１は、メモリ１０２に読み出したプログラムおよびデータを用いて、上述した図１の要素区分手段１１、圧縮／復元変換手段１２、要素判断手段１３、および要素アクセス手段１４の機能を実現させる。 The CPU 101 is a central processing unit that controls the entire computer 100, and the memory 102 is stored in the auxiliary storage device 105 (or the portable recording medium 107) when executing a program or updating data. A memory such as a RAM (Random Access Memory) that temporarily stores programs or data. In addition, the CPU 101 realizes the functions of the element sorting unit 11, the compression / decompression conversion unit 12, the element determination unit 13, and the element access unit 14 of FIG. 1 described above using the program and data read to the memory 102.

補助記憶装置１０５は、磁気ディスク、光ディスク、光磁気ディスク等を装着した記憶装置であり、上記本発明の各機能を実現させるためのプログラムおよびデータ等が格納されている。データとしては、外部から入力された原ＸＭＬ文書Ａ、処理結果としてのＸＭＬ文書Ａ’等が一時的に記憶される。また、媒体駆動装置１０６は、例えば、ＦＤ（Fleible Disk）、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory ）、光磁気ディスク等の可搬型記録媒体１０７に記憶されているプログラムおよびデータ等を読み出す。 The auxiliary storage device 105 is a storage device in which a magnetic disk, an optical disk, a magneto-optical disk or the like is mounted, and stores a program and data for realizing each function of the present invention. As data, an original XML document A input from the outside, an XML document A ′ as a processing result, and the like are temporarily stored. The medium driving device 106 reads programs and data stored in a portable recording medium 107 such as an FD (Fleible Disk), a CD-ROM (Compact Disk Read Only Memory), and a magneto-optical disk.

ネットワーク接続装置１０８は、ネットワークに接続して外部の情報処理装置とプログラムおよびデータ等の送受信を可能にするものである。 The network connection device 108 is connected to a network and enables transmission and reception of programs, data, and the like with an external information processing device.

以下の実施例では、図３において、本発明のＣＳＶ圧縮したＸＭＬ文書を扱う基本的なＡＰＩを示し、また、図４、図５、図６において、このときのＸＭＬ文書の管理と格納、アクセスのイメージを示す。また、図７、図８において、本ＡＰＩを用いたJavascript（ジャバスクリプト）による応用ソフトの例を示す。さらに、図９〜１４において、図３の本ＡＰＩの各メソッドのフローチャートを示す。 In the following embodiment, FIG. 3 shows a basic API for handling a CSV compressed XML document of the present invention, and FIGS. 4, 5 and 6 show management, storage, and access of the XML document at this time. The image of is shown. 7 and 8 show examples of application software using Javascript using this API. Further, FIGS. 9 to 14 show a flowchart of each method of the API of FIG.

図３は、本発明の実施の形態になるアプリケーション・プログラミング・インタフェース（ＡＰＩ）の形式設定例を示す。以下に、本ＡＰＩに対し、Javascriptによる具体的な設定例を示す。
（０）ＣＳＶ圧縮文書操作オブジェクトの作成：オブジェクトのコンストラクタであり、Object=new CSVCDocument として扱うデータを初期化したオブジェクトを作成する。
（１）ＣＳＶ圧縮文書のロード：ファイルの読み込み処理をObject.openCSVCFile ( 入力ＣＳＶ圧縮ファイル名，レコード要素名）、戻り値を”エラーのステイタス”とする。補助記憶装置１０５に格納されたファイルをメモリ１０１にロードし、ＣＳＶ圧縮ファイル名とレコード要素名のデータを取得し、ＣＳＶ圧縮文書ファイルのヘッダを読み取って、データをデータ用と管理用連想配列に格納する。
（２）ＣＳＶ圧縮文書のクローズ：ファイルのクローズ処理をObject.closeCSVCDocument( ) 、戻り値を”エラーのステイタス”とする。
（３）ＸＭＬ文書のレコード数：レコード数の取得処理をObject.recordLength( )、戻り値を”オープン時に指定したレコードのレコード数”とする。
（４）レコード要素の読出し：レコード要素の読出し処理をContent = Object.getElement （レコード番号，要素名 )、戻り値を”要素内容，またはエラーのステイタス”とする。
（５）レコード要素の書込み：レコード要素の書込み処理をObject.putElement ( レコード番号，要素名，要素内容 )、戻り値を”エラーのステイタス”とする。 FIG. 3 shows a format setting example of an application programming interface (API) according to the embodiment of the present invention. A specific setting example using Javascript is shown below for this API.
(0) Creation of CSV compressed document operation object: This is an object constructor, and creates an object in which data handled as Object = new CSVCDocument is initialized.
(1) Load CSV-compressed document: The file reading process is Object.openCSVCFile (input CSV compressed file name, record element name), and the return value is “error status”. The file stored in the auxiliary storage device 105 is loaded into the memory 101, the data of the CSV compressed file name and the record element name is acquired, the header of the CSV compressed document file is read, and the data is associative array for data and management Store.
(2) Close CSV-compressed document: The file close process is Object.closeCSVCDocument (), and the return value is “error status”.
(3) Record number of XML document: Record number acquisition processing is Object.recordLength (), and return value is “record number of record specified at opening”.
(4) Record element read: The record element read process is Content = Object.getElement (record number, element name), and the return value is “element content or error status”.
(5) Record element write: Record element write processing is Object.putElement (record number, element name, element content), and the return value is "error status".

図４は、本発明の実施の形態になる構造化文書の圧縮オブジェクト（管理用連想配列）のデータ構成例を示す。図３（１）のオープン処理において、ＣＳＶ圧縮文書ファイルを読み取り、当該データを管理用連想配列に格納する。ＸＭＬ文書のヘッダを読み取って、レコード中の要素とＣＳＶ化要素の関係をデータ管理用の連想配列として記憶する。 FIG. 4 shows a data configuration example of the compressed object (management associative array) of the structured document according to the embodiment of the present invention. In the open process of FIG. 3A, the CSV compressed document file is read and the data is stored in the management associative array. The header of the XML document is read, and the relationship between the elements in the record and the CSV elements is stored as an associative array for data management.

連想配列名には、例えば”DocArray”のようにオブジェクト名”Doc ”を冠する。レコード要素は１次元配列となり、各レコードは１次元配列の各要素が連想配列を取る。この連想配列は、要素名を添字として格納内容の要素内容が読み書きされる。 The associative array name has the object name “Doc”, for example, “DocArray”. The record element is a one-dimensional array, and each element of the record is an associative array. In this associative array, the element contents of the stored contents are read and written with the element name as a subscript.

例えば、項目２００１には、ファイル名：fname として“従業員名簿-csv.xml”、項目２００２には、レコード名：recname として“従業員”、項目２００３には、レコード数：recnumとして”１０００”、項目２００４には、現行レコード番号：cur ＿recno として”１”，項目２００５には、データ用連想配列名：arrayname として“Doc Array ”、項目２００６には、ＣＳＶデータ用連想配列名：csv ＿arrayname として“Doc CSVArray”、項目２００７には、CSV 要素名：csvelem として“情報”、項目２００８には、putElementメソッド使用の有無：putElementFlagとして”１ or ０”、および項目２００９には、図６ (a)(b)(c)の管理用配列名他として“Doc AccessRecord”、“Doc ElementDisc ”などが格納される。 For example, in the item 2001, the file name: fname is “employee list-csv.xml”, in the item 2002, the record name: recname is “employee”, and in the item 2003, the record number: recnum is “1000”. In item 2004, current record number: cur_recno is “1”, in item 2005 is associative array name for data: arrayname “Doc Array”, and in item 2006 is associative array name for CSV data: csv_arrayname “Doc CSVArray”, item 2007 contains “information” as CSV element name: csvelem, item 2008 uses putElement method: “1 or 0” as putElementFlag, and item 2009 shows FIG. (b) “Doc AccessRecord”, “Doc ElementDisc”, etc. are stored as the management array name in (c).

図５は、本発明の実施の形態になる圧縮された構造化文書のレコード要素を格納したデータ用の配列例（その１−連想配列の例）を示す。図５のデータ用配列では、ＣＳＶ圧縮文書の各レコードがそのまま第１の一次元配列の連想配列に格納された形になっている。このうち，ＣＳＶ要素はアクセスされたときに，さらに、第２のデータ用１次元配列の連想配列に展開される。この連想配列名には、先と同様の命名法で、例えば”DocCSVArray ”と付ける。この第２の連想配列は、先に述べたＣＳＶ圧縮ヘッダから読み込んだ情報に基づいて、ＣＳＶ化された要素名を添字として格納された要素内容がアクセスされる。 FIG. 5 shows an example of an array for data storing record elements of a compressed structured document according to an embodiment of the present invention (part 1—an example of an associative array). In the data array of FIG. 5, each record of the CSV compressed document is stored in the associative array of the first one-dimensional array as it is. Among these, when accessed, the CSV element is further expanded into an associative array of the second one-dimensional array for data. For this associative array name, for example, “DocCSVArray” is added by the same naming method as above. In the second associative array, element contents stored with the element name converted to CSV as a subscript are accessed based on the information read from the CSV compressed header described above.

例えば、第１のデータ用配列Doc Array[2]は、連想配列でメモリ展開され、“＠id”、“氏名”、“所属”、“情報”の要素名を添字とする各項目における要素内容が、それぞれ“0002”、“田中次郎”、“営業”、“一般,2333,456,tanaka＠yyyyy ”として格納されている。そして、第２のデータ用配列Doc CSVArray[1] では、アクセス対象のＣＳＶ化要素が連想配列でメモリ展開され、“役職”、“内線”、“Fax ”、“Email ”の要素名を添字とする各項目における要素内容が、それぞれ“一般”、“2333”、“456 ”、“tanaka＠yyyy”として格納されている。 For example, the first data array Doc Array [2] is expanded in memory as an associative array, and the element contents in each item with the element names “@id”, “name”, “affiliation”, and “information” as subscripts Are stored as “0002”, “Jiro Tanaka”, “Sales”, and “General, 2333,456, tanaka @ yyyyy”, respectively. In the second data array Doc CSVArray [1], the CSVed elements to be accessed are expanded in an associative array, and the element names of “title”, “extension”, “Fax”, and “Email” are subscripted. The element contents of each item are stored as “general”, “2333”, “456”, and “tanaka @ yyyy”, respectively.

図６は、本発明の実施の形態になる圧縮された構造化文書のレコード要素を格納したデータ用の配列例（その２−通常配列の例）を示す。ここでは、図５−１の第１の配列および第２の配列において、要素名の代わりに要素番号（第１の配列では１〜４、第２の配列では５〜８を割付け）を添字として各項目に対応させ、要素内容を格納させた例を示している。前述した図５の連想配列は、出現しない要素があった場合でもメモリは有効に使えるのに対し、レコード内に出現の有無がある要素があった場合、通常配列であるのでメモリ領域を全て取ってしまうことになるが、レコード内の要素の組が常に出現する場合には、本構成の方が適しており、より高速な処理が可能である。 FIG. 6 shows an example of an array for data storing record elements of a compressed structured document according to an embodiment of the present invention (part 2—an example of a normal array). Here, in the first array and the second array in FIG. 5A, element numbers (1 to 4 in the first array and 5 to 8 in the second array) are used as subscripts instead of the element names. An example is shown in which element contents are stored corresponding to each item. The associative array shown in FIG. 5 can use the memory effectively even if there is an element that does not appear. However, if there is an element that appears in the record, it is a normal array, so the entire memory area is allocated. However, when a set of elements in a record always appears, this configuration is more suitable and higher speed processing is possible.

図７は、本発明の実施の形態になるレコード毎のＣＳＶ化要素をメモリ上に展開する際の管理情報を示している。 FIG. 7 shows management information when the CSV conversion element for each record according to the embodiment of the present invention is expanded on the memory.

（ａ）は、ＣＳＶ形式展開時に、レコードＣＳＶ化要素展開した連想配列とを対応付ける配列を示している。１次元配列でレコード数分の配列要素“Doc AccessRecord”を用い、アクセスした各レコードが、ＣＳＶ要素をDocCSVArray のどの連想配列要素に展開しているのかについて、その１次元配列の添字を格納して管理するものである。例えば、第２、第３の各レコードのCSVArrayの使用位置として、インデックス１、２が格納されている。 (A) has shown the arrangement | sequence which matches the associative arrangement | sequence which expand | deployed the record CSV conversion element at the time of CSV format expansion | deployment. Use the array element “Doc AccessRecord” for the number of records in the one-dimensional array, and store the subscript of the one-dimensional array for each associative array element in the DocCSVArray for each accessed record It is something to manage. For example, indexes 1 and 2 are stored as the CSVArray usage positions of the second and third records.

（ｂ）は、ＣＳＶ形式に書き戻した時に、レコードとＣＳＶ化要素展開連想配列とを対応付ける配列を示している。例えば、CSV ＿RecordArray の第１、第２の各レコードとして、各csvArray位置の使用レコード２、３が格納されている。 (B) shows an array for associating a record with a CSVed element expansion associative array when writing back to the CSV format. For example, as the first and second records of CSV_RecordArray, use records 2 and 3 at each csvArray position are stored.

（ｃ）は、ＣＳＶ化要素名を判定する連想配列を示している。１次元連想配列”DocDisc ”は、レコード内の要素が、第１の連想配列DocArrayと第２の連想配列DocCSVArray のどちらに格納されているかを識別する。この配列は、初期化したときにＣＳＶ圧縮ヘッダを読込んだときに作られ、例えば、“氏名”、“所属”、“役職”・・・の要素名に対し、ＣＳＶが使用されているか否かは、”0 ”、”0 ”、”1 ”・・・のフラグによって判定する。 (C) shows an associative array for determining a CSV element name. The one-dimensional associative array “DocDisc” identifies whether the element in the record is stored in the first associative array DocArray or the second associative array DocCSVArray. This array is created when the CSV compressed header is read when it is initialized. For example, whether or not CSV is used for element names of “name”, “affiliation”, “title”,. Is determined by the flags “0”, “0”, “1”.

（ｄ）は、ＣＳＶ要素名およびＣＳＶ化要素名の順序を記憶する配列を示す。１次元配列 “DocOrder”も、CSV 圧縮ヘッダを読込んだとき作られる。この配列は、第２のデータ用連想配列の要素を結合して、元のＣＳＶ形式にして書き戻す際に、要素を並べる順番を与えるために用いる。プログラム言語によっては連想配列が格納順を保存しないものもあるため、順序を保存するのである。添字0 の位置にＣＳＶ要素名が格納してある。 (D) shows an array for storing the order of CSV element names and CSV element names. The one-dimensional array “DocOrder” is also created when the CSV compressed header is read. This array is used to give the order in which the elements are arranged when the elements of the second data associative array are combined and written back into the original CSV format. In some programming languages, associative arrays do not preserve the storage order, so the order is preserved. The CSV element name is stored at the position of subscript 0.

なお、ＣＳＶ要素を展開する第２の連想配列DocCSVArray は、主記憶メモリを大量に消費するため、アクセス時に格納場所を新旧管理することによって、展開する数を一定数に抑えるように管理する。 Note that the second associative array DocCSVArray that expands CSV elements consumes a large amount of main memory, and therefore manages the storage location so that the number of expansions is kept constant by managing the storage location at the time of access.

管理する方法には、周知の技術である（１）LRU(Least Recently Used)や（２）LFU(Least Frequently Used)の論理によって格納場所が一杯になった場合、一つ場所を空けて、そこに新たに展開するＣＳＶ要素を格納する方法がある。ＬＲＵは、最も以前に使った場所を空ける方法であり、ＬＦＵは、最も使用頻度が少ない場所を空ける方法である。 In the management method, when the storage location becomes full due to the logic of (1) LRU (Least Recently Used) and (2) LFU (Least Frequently Used), which are well-known technologies, one location is left free. There is a method for storing newly developed CSV elements. LRU is a method of vacating a place that was used most recently, and LFU is a method of vacating a place that is least frequently used.

上記ＬＲＵの論理を実現するには、カウンタで今までに使った場所の数を計数するようにすればいい。そして、計数値が連想配列の要素数を越えて、一杯になったら、次から最も古い場所を空けて、新しいＣＳＶ要素を格納する。例えば，連想配列の要素数が２５６個の場合、カウンタでゼロから始めて、計数値の番号の場所を割当てて行き、２５６を越えた時点から、２５６でモジュロを取った値( 計数値を２５６で割った余り) の場所を空けて新たに割当てればよい。 In order to realize the LRU logic, the counter may count the number of places used so far. Then, when the count value exceeds the number of elements of the associative array and becomes full, the oldest place is opened next and a new CSV element is stored. For example, if the number of elements of the associative array is 256, the counter starts with zero, assigns the location of the number of the count value, and when the number exceeds 256, the value obtained by taking the modulo at 256 (the count value is 256 It is only necessary to make a new allocation by allocating the remainder of the division.

また、ＬＲＵの論理を実現するには、連想配列の要素を割当てたＣＳＶ要素の使用頻度を計数しておき、計数値の小さいものから場所を空けて、新たに割当てることにするものである。以上のいずれかの管理方法を採ることによって、展開ＣＳＶ要素数を一定に保つが可能となる。 In order to realize the logic of the LRU, the usage frequency of the CSV elements to which the elements of the associative array are assigned is counted, and a space is allocated from the one with the smallest counted value and newly assigned. By adopting any of the above management methods, the number of deployed CSV elements can be kept constant.

図８は、本発明の実施の形態になるレコード毎のＣＳＶ化要素をメモリ上に展開する際の管理情報( 図７（ｃ）の変形例）を示す。以下は、図８の管理情報を例として示している。（ｃ’）の連想配列を参照して各要素の番号を求め、生要素数（図では３個）をメモリに保持しておき、これ以上の番号ならばＣＳＶ化要素と判定して、第２のメモリに展開する仕組みである。 FIG. 8 shows management information (modified example of FIG. 7 (c)) when the CSV conversion element for each record according to the embodiment of the present invention is expanded on the memory. The following shows the management information of FIG. 8 as an example. The number of each element is obtained with reference to the associative array of (c ′), and the number of raw elements (three in the figure) is held in the memory. This is a mechanism for expanding to two memories.

本構成の図７（ｃ）の構成との大きな違いは、データ配列DocArrayおよびDocCSVArray が、連想配列から通常の配列になることである。通常配列を使うことによってアクセス速度が連想配列より高速となる。レコード内の要素の組が常に出現する場合、本構成の方が適している。 A major difference from the configuration of FIG. 7C in this configuration is that the data arrays DocArray and DocCSVArray are changed from an associative array to a normal array. By using the normal array, the access speed becomes faster than the associative array. This configuration is more appropriate when a set of elements in a record always appears.

図９は、本発明の実施の形態になるＡＰＩを用いたプログラム例（その１−特定要素の修正）を示す。本発明のＡＰＩの応用ソフトの例を示しており、図１７における「社員名簿-csv.xml」を、本ＡＰＩを用いてシーケンシャルにアクセスするプログラムである。 FIG. 9 shows an example of a program using the API according to the embodiment of the present invention (part 1—modification of specific elements). FIG. 18 shows an example of application software of the API of the present invention, which is a program for sequentially accessing “employee list-csv.xml” in FIG. 17 using this API.

ＣＳＶ圧縮文書用オブジェクトを作った後，openCSVCFileメソッドを用いて「社員名簿-csv.xml」をロードする。次に、for 文を用いて全レコードを走査し、getElementメソッドを用いて、「氏名」要素の要素内容を取り出し、「田中次郎」のレコードを探す。「田中次郎」のレコードが見付かったならば、その「Email 」要素の内容を「j.tanaka@yyy」に書替える。これ以降は、for 分を用いて再び全レコードを走査し、各レコードの要素名と要素内容を出力するようにしている。 After creating a CSV compressed document object, load “employee list-csv.xml” using the openCSVCFile method. Next, all records are scanned using the for statement, the element content of the “name” element is extracted using the getElement method, and the record of “Jiro Tanaka” is searched. If a record of “Jiro Tanaka” is found, the content of the “Email” element is rewritten to “j.tanaka@yyy”. From this point on, all records are scanned again using the for part, and the element name and element content of each record are output.

図１０は、本発明の実施の形態になるＡＰＩを用いたプログラム例（その２−ＸＭＬ文書の更新）を示す。原本ファイル「社員名簿-csv.xml」に対して、変更箇所を記述した「従業員名簿-csv-change.xml 」で書替えて、原本ファイルを更新する場合のプログラム例である。 FIG. 10 shows a program example (part 2-XML document update) using the API according to the embodiment of the present invention. This is a program example when the original file “Employee list-csv.xml” is rewritten with “Employee list-csv-change.xml” describing the changed part and the original file is updated.

まず、原本ファイルと変更箇所ファイルをopenCSVCFikeでロードして、ＣＳＶ圧縮文書オブジェクト Doc1 、Doc2をそれぞれ作る。次に、変更箇所を１レコードづつ取り出し、原本の全レコードを走査して、id属性が一致するレコードを探す。id属性が同じレコードを見つけた場合は、そのレコードの全要素を書替える。この例では、原本を何度も走査するため、ストリーム処理の場合は、何度もディスクからデータを読み出す必要があるのに対して、データがメモリ内に展開されているので高速で処理することができる。その上、本発明では、属性、氏名、所属以外の要素はＣＳＶ形式でまとめられているため、メモリ消費を大幅に抑えることができ、しかも、本発明のＡＰＩを用いることによって、ＣＳＶ形式でまとめられている要素を意識せずに扱うことが可能となる。 First, the original file and the changed part file are loaded with openCSVCFike to create CSV compressed document objects Doc1 and Doc2. Next, the changed part is taken out one record at a time, and all the records of the original are scanned to search for a record with a matching id attribute. When a record with the same id attribute is found, all elements of that record are rewritten. In this example, since the original is scanned many times, in the case of stream processing, it is necessary to read the data from the disk many times, whereas the data is expanded in the memory so that it can be processed at high speed. Can do. In addition, in the present invention, since elements other than the attribute, name, and affiliation are collected in the CSV format, the memory consumption can be greatly suppressed, and the API of the present invention is used to collect the elements in the CSV format. It becomes possible to handle without being conscious of the elements that are.

以下に、本発明のＡＰＩオブジェクトの各メソッドのフローについて、図１１〜図１６を用いて説明する。 Hereinafter, the flow of each method of the API object of the present invention will be described with reference to FIGS.

図１１は、本発明の実施の形態になる構造化文書のＡＰＩオブジェクトの作成フローを示す。実施例は、オブジェクトCSVCDocumentにおいて、図４に示すＣＳＶ圧縮文書オブジェクト（管理用連想配列）を作成するフローを示している。 FIG. 11 shows a creation flow of an API object of a structured document according to the embodiment of the present invention. The embodiment shows a flow of creating the CSV compressed document object (management associative array) shown in FIG. 4 in the object CSVCDocument.

ステップＳ１１において、種々の項目を管理するＣＳＶ圧縮文書用のオブジェクト( 管理用連想配列) 」を作成し、ステップＳ１２において、呼び出し元に戻る。 In step S11, a CSV compressed document object (management associative array) for managing various items is created. In step S12, the process returns to the caller.

図１２は、本発明の実施の形態になるＣＳＶ圧縮された構造化文書ファイルのオープン処理のフローを示す。ファイルのオープン処理 openCSVCFile において、第１引数で与えたＣＳＶ圧縮文書ファイルをストリーム形式でＸＭＬ文書を読み取るＳＡＸを用いて読み取り、第２引数で与えたレコード内の要素を図５に示す第１のデータ用連想配列“DocArray”に展開するフローを示している。 FIG. 12 shows a flow of open processing of a CSV-compressed structured document file according to the embodiment of the present invention. In the open process openCSVCFile, the CSV compressed document file given by the first argument is read using SAX that reads the XML document in the stream format, and the elements in the record given by the second argument are the first data shown in FIG. The flow to expand to the associative array “DocArray” is shown.

まず、ステップＳ２１において、引数としてＸＭＬファイル名、レコード要素名を受け取り、オブジェクト( 管理用連想配列) に記憶する。 First, in step S21, an XML file name and a record element name are received as arguments and stored in an object (management associative array).

つぎに、ステップＳ２２において、対象ＸＭＬ文書のＣＳＶ圧縮のヘッダからＣＳＶ要素、ＣＳＶ化格納要素名を読み取り、１次元連想配列“DocCSVDisc”に、要素内容が第１のデータ用連想配列“DocArray”か第２のデータ用連想配列“DocCSVArray ”か、どちらの連想配列に格納されているか区別する情報を作成する。また、ＣＳＶ要素展開管理用の各種配列を作成する。 Next, in step S22, the CSV element and the CSV storage element name are read from the CSV compression header of the target XML document, and the one-dimensional associative array “DocCSVDisc” has the element content as the first data associative array “DocArray”. The second data associative array “DocCSVArray” or information for distinguishing in which associative array is stored is created. Also, various arrays for CSV element development management are created.

さらに、ステップＳ２１において、ＳＡＸを用いて、レコード毎に、レコード内要素とＣＳＶ要素をデータ用第１の連想配列上に展開し、かつ、展開時にレコード数を計数する。ステップＳ２１において、計数したレコード数をオブジェクトに格納する。そして、ステップＳ２１において、呼び出し元に戻る。 Further, in step S21, using SAX, in-record elements and CSV elements are expanded on the first associative array for data for each record, and the number of records is counted at the time of expansion. In step S21, the counted number of records is stored in the object. In step S21, the process returns to the caller.

図１３は、本発明の実施の形態になるＣＳＶ圧縮された構造化文書ファイルのクローズ処理のフローを示す。圧縮された構造化文書ファイルのクローズ処理 closeCSVCDocumentは以下の通りである。 FIG. 13 shows the flow of a CSV-processed structured document file closing process according to the embodiment of the present invention. Close processing of compressed structured document file closeCSVCDocument is as follows.

ステップＳ３１において、putElementメソッドを以前に使ったかどうかをＣＳＶ圧縮文書オブジェクトより読出して判定する。もし使ってなければ、ステップＳ３４において、ＣＳＶ圧縮文書ファイルは参照されただけであるので、そのままクローズする。もし使っていれば、ステップＳ３２において、ＣＳＶ縮文書ファイルは書込みがされているので、次の出力対象となるレコードが存在するかを判定する。そして、対象のレコードが全てセットされていれば、ステップＳ３４において、クローズする。 In step S31, it is determined by reading from the CSV compressed document object whether the putElement method has been used before. If not used, the CSV compressed document file is only referred to in step S34, and is closed as it is. If it has been used, since the CSV reduced document file has been written in step S32, it is determined whether there is a record to be output next. If all the target records are set, the process closes in step S34.

さらに、未だ処理していないレコードがあれば、ステップＳ３３において、次の未処理のレコードをセットする。そして、ステップＳ３５において、当該レコードのＣＳＶ要素が第２の連想配列に既に展開されているかを判定する。 Furthermore, if there is a record that has not yet been processed, the next unprocessed record is set in step S33. In step S35, it is determined whether the CSV element of the record has already been expanded in the second associative array.

当該レコードのＣＳＶ要素が第２の連想配列に展開済みであれば、ステップＳ３６において、当該レコードの第２の連想配列“DocCSVCArray”の内容をjoin関数を用いてＣＳＶ形式に直し、第１のデータ用連想配列“DocArray”にＣＳＶ要素の内容を置き換えながら、ステップＳ３７において、当該レコードのＣＳＶ化なし要素とＣＳＶ要素を連想配列から、ＳＡＸを用いて“DocArray”の内容をＸＭＬ文書の形式で書き出す。そして、対象となる全レコードが終了するまでステップＳ３２からステップＳ３７までの処理を繰り返し行う。 If the CSV element of the record has been expanded into the second associative array, in step S36, the contents of the second associative array “DocCSVCArray” of the record are converted into the CSV format using the join function, and the first data While replacing the contents of the CSV element with the associative array “DocArray”, in step S37, the contents of the “DocArray” in the form of an XML document are written out from the associative array by using SAX. . Then, the processes from step S32 to step S37 are repeated until all the target records are completed.

また、ステップＳ３５において、当該レコードのＣＳＶ要素が第２の連想配列に展開されていなければ、ステップＳ３７にジャンプして、ＳＡＸを用いたＸＭＬ文書の書き出し処理を行う。 If the CSV element of the record is not expanded into the second associative array in step S35, the process jumps to step S37 to perform the XML document writing process using SAX.

ここで、上記split 関数と後述するjoin関数については、スクリプト言語では、ＣＳＶ形式の文字列を分離・結合する関数が標準でサポートされている。 Here, with regard to the split function and the join function described later, in the script language, a function that separates and joins CSV format character strings is supported as a standard.

例えば、Javascriptでは、形式文字列と分離したＣＳＶ化要素を格納する配列とを次のように指定することにより行える。 For example, in Javascript, a format character string and an array for storing separated CSV elements can be designated as follows.

・分離配列＝ split(区切り文字，CSV 形式文字列 )；
・結合 CSV 形式文字列＝ join( 区切り文字，配列) ；
図１４は、本発明の実施の形態になる圧縮された構造化文書ファイルのレコード数読み出し処理のフローを示す。以下に、レコード数読出し処理 recordLength を示す。・ Separated array = split (delimiter, CSV format string);
-Join CSV format string = join (delimiter, array);
FIG. 14 shows a flow of the record number reading process of the compressed structured document file according to the embodiment of the present invention. The record number reading process recordLength is shown below.

ステップＳ４１において、ＣＳＶ圧縮文書オブジェクトに格納されているレコード数を読み出し、ステップＳ４２において、呼び出し元に戻る。 In step S41, the number of records stored in the CSV compressed document object is read, and in step S42, the process returns to the caller.

図１５は、本発明の実施の形態になる構造化文書の要素内容の読み出し処理のフローを示す。要素内容の読出し処理 getElement は、以下の通り。 FIG. 15 shows a flow of a reading process of element contents of a structured document according to the embodiment of the present invention. Element content read processing getElement is as follows.

ステップＳ５１において、第１引数でレコード番号、第２引数でレコード内要素名を受け取り、ステップＳ５２において、受け取った要素名がＣＳＶ化要素かどうかを連想配列“DocCSVDisc”によって判定する。もしＣＳＶ化要素でなければ、ステップＳ５３において、第１のデータ用連想配列“＿DocArray”より要素名で引いて要素内容を読出す。もしＣＳＶ化要素であれば、ステップＳ５４において、このレコードのＣＳＶ要素がメモリ上に展開されているかを１次元配列“DocAccessRecord ”によって調べる。 In step S51, the record number is received as the first argument and the element name in the record is received as the second argument. In step S52, it is determined by the associative array “DocCSVDisc” whether the received element name is a CSV element. If it is not a CSV element, in step S53, the element content is read by subtracting the element name from the first data associative array “_DocArray”. If it is a CSV element, in step S54, it is checked by a one-dimensional array “DocAccessRecord” whether the CSV element of this record is expanded on the memory.

展開されていなければ、ステップＳ５５において、第２のデータ用連想配列“DocCSVArray ”の空き配列要素の番号を、１次元配列“DocAccessRecord ”に書込み、そのＣＳＶ要素をsplit 関数を用いて“DocCSVArray ”上に展開する。そして、ステップＳ５６において、格納位置、要素名を指定して、第２の連想配列より要素内容を読出す。 If not expanded, in step S55, the number of the empty array element of the second data associative array “DocCSVArray” is written to the one-dimensional array “DocAccessRecord”, and the CSV element is stored on the “DocCSVArray” using the split function. Expand to. In step S56, the storage position and the element name are designated, and the element contents are read from the second associative array.

また、もしこのレコードのＣＳＶ要素がメモリ上に展開されていれば、ステップＳ５６において、“DocAccessRecord ”から、“DocCSVArray ”上の展開されている場所を読取り、“DocCSVArray ”から要素名で要素内容を読出す。以上のようにして読出した要素内容を、ステップＳ５７において、戻り値として返す。 If the CSV element of this record has been expanded in the memory, in step S56, the expanded position on "DocCSVArray" is read from "DocAccessRecord", and the element content is extracted from "DocCSVArray" with the element name. Read. The element content read out as described above is returned as a return value in step S57.

図１６は、本発明の実施の形態になる構造化文書の要素内容の書込み処理のフローを示す。要素内容の書込み処理 putElement は、以下の通り。 FIG. 16 shows a flow of processing for writing element contents of a structured document according to the embodiment of the present invention. Element content write processing putElement is as follows.

まず、ステップＳ６１において、引数として、レコード番号、レコード内要素名、書込む要素内容を受け取り、ステップＳ６２において、受け取った要素名がＣＳＶ化要素かどうかを連想配列“DocCSVDisc”によって判定する。つぎに、もしＣＳＶ化要素でなければ、ステップＳ６３において、第１のデータ用連想配列“DocArray”より要素名で引いて要素内容を書込む。また、もしＣＳＶ化要素であれば、ステップＳ６４において、このレコードのＣＳＶ要素がメモリ上に展開されているか否かを１次元配列“DocAccessRecord ”によって調べる。 First, in step S61, a record number, an element name in the record, and an element content to be written are received as arguments. In step S62, it is determined by the associative array “DocCSVDisc” whether the received element name is a CSV element. Next, if it is not a CSV element, in step S63, the element content is written by subtracting the element name from the first data associative array “DocArray”. If it is a CSV element, in step S64, it is checked by the one-dimensional array “DocAccessRecord” whether or not the CSV element of this record is expanded on the memory.

ステップＳ６４で、ＣＳＶ要素展開されていなければ、ステップＳ６５において、第２のデータ用連想配列“DocCSVArray ”の空き配列要素の番号を、１次元配列“DocAccessRecord ”に書込み、そのＣＳＶ要素をsplit 関数を用いて“DocCSVArray ”上に展開する。そして、ステップＳ６５において、格納位置、要素名を指定して、第２の連想配列に要素内容を書込む。 If the CSV element is not expanded in step S64, the number of the empty array element of the second data associative array “DocCSVArray” is written in the one-dimensional array “DocAccessRecord” in step S65, and the CSV element is changed to the split function. Use it to expand on “DocCSVArray”. In step S65, the storage location and the element name are designated, and the element contents are written into the second associative array.

また、ステップＳ６４で、もしこのレコードのＣＳＶ要素がメモリ上に展開されていれば、“DocAccessRecord ”から、“DocCSVArray ”上の展開されている場所を読取り、“DocCSVArray ”から要素名で要素内容を書込み、ステップＳ６７において、呼び出し元に戻る。 In step S64, if the CSV element of this record has been expanded on the memory, the expanded location on "DocCSVArray" is read from "DocAccessRecord", and the element content is extracted from "DocCSVArray" with the element name. Write, return to caller in step S67.

上記の例では、レコード内で１個のＣＳＶ要素を持つ場合について説明したが、勿論、ＣＳＶ要素を複数個にした場合でも、ＣＳＶ圧縮文書のヘッダにおいて、その複数個のＣＳＶ要素に格納されている要素名を記述しておき、上述と同様に読み取り、２つのデータ用連想配列を用いてＡＰＩ上で管理することができる。 In the above example, the case where one CSV element is included in the record has been described. Of course, even when there are a plurality of CSV elements, they are stored in the plurality of CSV elements in the header of the CSV compressed document. The element names can be described, read in the same manner as described above, and managed on the API using two data associative arrays.

また、上記の例では、ＣＳＶ圧縮を意識させずにプログラミングできるＡＰＩが，要素内容の更新機能を備える場合について説明したが、これは、ＡＰＩに挿入・削除機能を付けた場合にも同様に適用できる。本発明の課題は、ＣＳＶ圧縮を意識させずに使わせることにあり、挿入・削除機能は本質でないため、説明を割愛している。 In the above example, the case where an API that can be programmed without being aware of CSV compression has an element content update function has been described, but this also applies to the case where an insertion / deletion function is added to the API. it can. An object of the present invention is to use the CSV compression without being aware of it, and since the insertion / deletion function is not essential, the description is omitted.

以上述べてきたように、本発明では、構造化文書全体が配列に格納されるＡＰＩの構成であるため、直感的な配列操作のみで構造化文書全体にわたっての各種データ操作が容易に行えるようになる。また、レコード要素名を与えることによってレコード要素が反映される配列構造となり、レコードの中と外が区別されて、レコード単位のオブジェクトとして扱うことが可能になる。さらに、本ＡＰＩ形式により、要素内容を別の要素名でアクセスすることが簡単に行え、レコード内の階層、要素名の変更、レコードの挿入・削除等の操作も行うことが可能となる。 As described above, in the present invention, since the entire structured document is configured as an API stored in an array, various data operations over the entire structured document can be easily performed only by an intuitive array operation. Become. Also, by giving a record element name, an array structure in which the record element is reflected is obtained, and the inside and outside of the record are distinguished from each other, and it can be handled as an object in units of records. Furthermore, this API format makes it possible to easily access the element contents with another element name, and to perform operations such as changing the hierarchy in the record, the element name, and inserting / deleting the record.

以上述べてきた本発明の実施の態様は、以下の付記に示す通りである。
（付記１）レコード形式で構成された構造化文書のデータ処理方法であって、
前記構造化文書におけるレコード内の複数の要素を、個々にアクセスすべき第一の要素と一括してアクセスすべき第二の要素とにグループ分けする要素区分けステップと、
前記第二の要素として対象となる前記レコード内の要素を区切り符号によって繋げて一つの要素に圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納する圧縮変換ステップと、
応用ソフトによって前記レコード内の複数の要素にアクセスする際に、前記ヘッダ情報を最初に読み込ませ、当該要素が前記圧縮変換ステップで圧縮変換された前記第二の要素に該当するか否かを前記ヘッダ情報から判断する要素判断ステップと、
前記レコード内の要素が前記第二の要素に該当しない場合に、当該要素内容をそのままアクセスさせ、また、前記レコード内の要素が前記第二の要素に該当する場合には、前記区切り符号で表現された要素内容を個々の要素内容に分解しメモリ上に展開した後にアクセスさせる要素アクセスステップと、
をコンピュータに実行させることを特徴とする構造化文書のデータ処理方法。
（付記２）前記構造化文書をメモリに展開するときに、前記構造化文書ファイルをストリームデータとして読み取り、前記構造化文書の各要素を配列に割り当てて格納することを特徴とする付記１に記載の構造化文書のデータ処理方法。
（付記３）前記構造化文書のメモリへの展開において、前記第一の要素と前記区切り符号で一括に表現された第二の要素をレコード毎に割り当てる第一の配列と、前記第二の要素の個々の要素内容を個々の要素内容に分解して割り当てる第二の配列とを有することを特徴とする付記１または２に記載の構造化文書のデータ処理方法。
（付記４）前記第二の要素において、前記区切り符号でまとめた要素内容を個々の要素内容に分解し前記第二の配列に展開してからアクセスさせる際に、前記第二の配列が予め定めた容量を越える場合、前記第二の配列中で以前に展開した配列要素を前記区切り符号によってまとめて第一の配列に書き戻した後、書き戻した前記配列要素に個々の要素内容を展開するようにしたことを特徴とする付記１乃至３に記載の構造化文書のデータ処理方法。
（付記５）前記メモリ上に展開した構造化文書を書き替え、レコード毎に、前記第一の要素に対するの第一の配列の内容を構造化文書として出力するとともに、前記第二の要素は、前記第二の配列に展開してなければ前記第一の配列の内容を出力し、あるいは、第二の配列要素に展開していれば個々の要素の内容を区切り符号によって一括して出力することを特徴とする付記１乃至４に記載の構造化文書のデータ処理方法。
（付記６）前記区切り符号で一括化された要素を展開する第二の配列は、アクセス時に、カウンタによって今までに使った場所の数を計数し、最も以前に使った場所、あるいは最も使用頻度が少ない場所を空けることで、展開する数を一定数に抑えることを特徴とする付記１乃至５に記載の構造化文書のデータ処理方法。
（付記７）レコード形式で構成された構造化文書のデータ処理プログラムであって、
コンピュータに、
前記構造化文書におけるレコード内の複数の要素を、個々にアクセスすべき第一の要素と一括してアクセスすべき第二の要素とにグループ分けする要素区分けステップと、
前記第二の要素として対象となる前記レコード内の要素を区切り符号によって繋げて一つの要素に圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納する圧縮変換ステップと、
応用ソフトによって前記レコード内の複数の要素にアクセスする際に、前記ヘッダ情報を最初に読み込ませ、当該要素が前記圧縮変換ステップで圧縮変換された前記第二の要素に該当するか否かを前記ヘッダ情報から判断する要素判断ステップと、
前記レコード内の要素が前記第二の要素に該当しない場合に、当該要素内容をそのままアクセスさせ、また、前記レコード内の要素が前記第二の要素に該当する場合には、前記区切り符号で表現された要素内容を個々の要素内容に分解しメモリ上に展開した後にアクセスさせる要素アクセスステップと、
を実行させる構造化文書のデータ処理プログラム。
（付記８）レコード形式で構成された構造化文書のデータ処理装置であって、
前記構造化文書におけるレコード内の複数の要素を、個々にアクセスすべき第一の要素と一括してアクセスすべき第二の要素とにグループ分けする要素区分け手段と、
前記第二の要素として対象となる前記レコード内の要素を区切り符号によって繋げて一つの要素に圧縮変換し、当該要素の種類を表すヘッダを付してメモリに格納する圧縮変換手段と、
応用ソフトによって前記レコード内の複数の要素にアクセスする際に、前記ヘッダ情報を最初に読み込ませ、当該要素が前記圧縮変換ステップで圧縮変換された前記第二の要素に該当するか否かを前記ヘッダ情報から判断する要素判断手段と、
前記レコード内の要素が前記第二の要素に該当しない場合に、当該要素内容をそのままアクセスさせ、また、前記レコード内の要素が前記第二の要素に該当する場合には、前記区切り符号で表現された要素内容を個々の要素内容に分解しメモリ上に展開した後にアクセスさせる要素アクセス手段と、
を有することを特徴とする構造化文書のデータ処理装置。 The embodiments of the present invention described above are as shown in the following supplementary notes.
(Supplementary note 1) A data processing method for a structured document configured in a record format,
An element partitioning step for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
A compression conversion step of compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. An element determination step for determining from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter An element access step for accessing the element contents after being decomposed into individual element contents and expanded on the memory;
A structured document data processing method characterized by causing a computer to execute the above.
(Supplementary note 2) The supplementary note 1, wherein when the structured document is expanded in a memory, the structured document file is read as stream data, and each element of the structured document is assigned to an array and stored. Data processing method for structured documents.
(Additional remark 3) In expansion | deployment to the memory of the said structured document, the 1st arrangement | sequence which allocates the 2nd element collectively represented by said 1st element and the said delimiter for every record, and said 2nd element The structured document data processing method according to appendix 1 or 2, further comprising: a second array in which each element content is decomposed and assigned to each element content.
(Supplementary Note 4) In the second element, when the element contents collected by the delimiter code are decomposed into individual element contents and expanded into the second array and then accessed, the second array is predetermined. If the capacity exceeds the capacity, array elements previously expanded in the second array are written together by the delimiter and written back to the first array, and then the individual element contents are expanded to the written back array elements. The structured document data processing method according to any one of appendices 1 to 3, wherein the data is processed as described above.
(Supplementary Note 5) The structured document expanded on the memory is rewritten, and for each record, the contents of the first array for the first element are output as a structured document, and the second element is: If not expanded to the second array, the contents of the first array are output, or if expanded to the second array element, the contents of the individual elements are output collectively by a delimiter The structured document data processing method according to any one of appendices 1 to 4, wherein:
(Supplementary Note 6) The second array that expands the elements grouped by the delimiter code counts the number of places used so far by the counter at the time of access, and uses the oldest place or the most frequently used place. The structured document data processing method according to any one of appendices 1 to 5, wherein the number of expansions is suppressed to a fixed number by making a space with a small amount of space.
(Supplementary note 7) A structured document data processing program configured in a record format,
On the computer,
An element partitioning step for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
A compression conversion step of compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. An element determination step for determining from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter An element access step for accessing the element contents after being decomposed into individual element contents and expanded on the memory;
Data processing program for structured documents that executes
(Supplementary Note 8) A structured document data processing apparatus configured in a record format,
Element classification means for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
Compression conversion means for compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. Element judging means judging from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter Element access means for accessing the element contents after being decomposed into individual element contents and expanded on a memory;
A structured document data processing apparatus characterized by comprising:

本発明の実施の形態になる構造化文書におけるデータ処理システムの基本構成を示す図である。It is a figure which shows the basic composition of the data processing system in the structured document which becomes embodiment of this invention. 本発明の実施の形態になる構造化文書のデータ処理システムを実現するコンピュータのハードウエア構成例を示す図である。It is a figure which shows the hardware structural example of the computer which implement | achieves the data processing system of the structured document which becomes embodiment of this invention. 本発明の実施の形態になるアプリケーション・プログラミング・インタフェース（ＡＰＩ）の形式設定例を示す図である。It is a figure which shows the example of a format setting of the application programming interface (API) which becomes embodiment of this invention. 本発明の実施の形態になる構造化文書の圧縮オブジェクト（管理用連想配列）のデータ構成例を示す図である。It is a figure which shows the example of a data structure of the compression object (management associative array) of the structured document which becomes embodiment of this invention. 本発明の実施の形態になる圧縮された構造化文書のレコード要素を格納したデータ用の配列例（その１−連想配列の例）を示す図である。It is a figure which shows the example of an arrangement | sequence for the data which stored the record element of the compressed structured document which becomes embodiment of this invention (the example of the 1-associative arrangement | sequence). 本発明の実施の形態になる圧縮された構造化文書のレコード要素を格納したデータ用の配列例（その２−通常配列の例）を示す図である。It is a figure which shows the example of an arrangement | sequence for the data which stored the record element of the compressed structured document which becomes embodiment of this invention (the example of the 2-normal arrangement | sequence). 本発明の実施の形態になるレコード毎のＣＳＶ化要素をメモリ上に展開する際の管理情報を示す図である。It is a figure which shows the management information at the time of expand | deploying the CSV conversion element for every record which becomes embodiment of this invention on a memory. 本発明の実施の形態になるレコード毎のＣＳＶ化要素をメモリ上に展開する際の管理情報( 図７（ｃ）の変形例）を示す図である。It is a figure which shows the management information at the time of expand | deploying the CSV conversion element for every record which becomes embodiment of this invention on a memory (modified example of FIG.7 (c)). 本発明の実施の形態になるＡＰＩを用いたプログラム例（その１−特定要素の修正）を示す図である。It is a figure which shows the example of a program (the 1-specific element correction) using API which becomes embodiment of this invention. 本発明の実施の形態になるＡＰＩを用いたプログラム例（その２−ＸＭＬ文書の更新）を示す図である。It is a figure which shows the example of a program (the update of the 2-XML document) using API which becomes embodiment of this invention. 本発明の実施の形態になる構造化文書のＡＰＩオブジェクトの作成フローを示す図である。It is a figure which shows the creation flow of the API object of the structured document which becomes embodiment of this invention. 本発明の実施の形態になるＣＳＶ圧縮された構造化文書ファイルのオープン処理のフローを示す図である。It is a figure which shows the flow of the open process of the CSV compression structured document file which becomes embodiment of this invention. 本発明の実施の形態になるＣＳＶ圧縮された構造化文書ファイルのクローズ処理のフローを示す図である。It is a figure which shows the flow of the close process of the CSV compressed structured document file which becomes embodiment of this invention. 本発明の実施の形態になる圧縮された構造化文書ファイルのレコード数読み出し処理のフローを示す図である。It is a figure which shows the flow of the record number read-out process of the compressed structured document file which becomes embodiment of this invention. 本発明の実施の形態になる構造化文書の要素内容の読み出し処理のフローを示す図である。It is a figure which shows the flow of the read-out process of the element content of the structured document which becomes embodiment of this invention. 本発明の実施の形態になる構造化文書の要素内容の書込み処理のフローを示す図である。It is a figure which shows the flow of the write-in process of the element content of the structured document which becomes embodiment of this invention. 先願発明におけるアクセス対象外要素を圧縮した場合の構造化文書の構成を示す図である。本発明の実施の形態になる不正防止システムの基本構成（実施例２）を示す図である。It is a figure which shows the structure of the structured document at the time of compressing the non-access object element in prior invention. It is a figure which shows the basic composition (Example 2) of the fraud prevention system which becomes embodiment of this invention.

Explanation of symbols

１０データ処理プログラム
１１要素区分手段
１２圧縮／復元変換手段
１３要素判断手段
１４要素アクセス手段
２０ＡＰＩソフト( ＸＭＬパーサ）
３０アプリケーション・ソフト
４０エンドユーザ
５０アプリ開発者
１００コンピュータ
１０１ＣＰＵ
１０２メモリ
１０３入力装置
１０４出力装置
１０５補助記憶装置
１０６媒体駆動装置
１０７可搬記録媒体
１０８ネットワーク接続装置
１０９バス DESCRIPTION OF SYMBOLS 10 Data processing program 11 Element division means 12 Compression / decompression conversion means 13 Element judgment means 14 Element access means 20 API software (XML parser)
30 Application software 40 End user 50 Application developer 100 Computer 101 CPU
102 Memory 103 Input Device 104 Output Device 105 Auxiliary Storage Device 106 Medium Drive Device 107 Portable Recording Medium 108 Network Connection Device 109 Bus

Claims

A data processing method for structured documents configured in record format,
An element partitioning step for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
A compression conversion step of compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. An element determination step for determining from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter An element access step for accessing the element contents after being decomposed into individual element contents and expanded on the memory;
A structured document data processing method characterized by causing a computer to execute the above.

2. The structured document according to claim 1, wherein when the structured document is expanded in a memory, the structured document file is read as stream data, and each element of the structured document is assigned to an array and stored. Document data processing method.

In the expansion of the structured document into the memory, a first array in which the first elements and the second elements collectively represented by the delimiters are assigned to each record, and the second elements are collectively The structured document data processing method according to claim 1, further comprising: a second array in which element contents are divided into individual element contents and assigned.

A structured document data processing program configured in record format,
On the computer,
An element partitioning step for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
A compression conversion step of compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. An element determination step for determining from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter An element access step for accessing the element contents after being decomposed into individual element contents and expanded on the memory;
Data processing program for structured documents that executes

A structured document data processing device configured in a record format,
Element classification means for grouping a plurality of elements in the record in the structured document into a first element to be individually accessed and a second element to be collectively accessed;
Compression conversion means for compressing and converting the elements in the target record as the second element into a single element by connecting them with a delimiter, and storing in a memory with a header indicating the type of the element;
When accessing a plurality of elements in the record by application software, the header information is first read, and whether or not the element corresponds to the second element compressed and converted in the compression conversion step is determined. Element judging means judging from header information;
When the element in the record does not correspond to the second element, the content of the element is accessed as it is, and when the element in the record corresponds to the second element, it is expressed by the delimiter Element access means for accessing the element contents after being decomposed into individual element contents and expanded on a memory;
A structured document data processing apparatus characterized by comprising: