JP5517263B2

JP5517263B2 - Chunk generating device, chunk reading device, chunk generating method and program

Info

Publication number: JP5517263B2
Application number: JP2011150059A
Authority: JP
Inventors: 隆幸中村; 豊荒川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-07-06
Filing date: 2011-07-06
Publication date: 2014-06-11
Anticipated expiration: 2031-07-06
Also published as: JP2013016112A

Description

本発明は、センサ情報データベースの情報を加工するサーバ構成法に関する。 The present invention relates to a server configuration method for processing information in a sensor information database.

従来、センサの計測データのような情報すなわちセンサ情報を扱う情報管理システムとして、ｕＴｕｐｌｅＳｐａｃｅ（例えば、非特許文献１参照。）があった。ｕＴｕｐｌｅＳｐａｃｅの特徴の一つに、センサ情報をｕＴｕｐｌｅ形式という、「キー＝値」の並びにより自由に表現できる点がある。また、ｕＴｕｐｌｅＳｐａｃｅの特徴の一つに、センサ情報を効率的に蓄積し検索転送する手段として、センサデータのチャンクを生成する方法を導入している点がある。 2. Description of the Related Art Conventionally, as an information management system for handling information such as sensor measurement data, that is, sensor information, there has been uTupleSpace (see Non-Patent Document 1, for example). One of the features of uTupleSpace is that sensor information can be expressed more freely in the “Tuple format” in which “key = value” is arranged. One of the features of uTupleSpace is the introduction of a method for generating chunks of sensor data as means for efficiently storing and retrieving and transferring sensor information.

また従来、ｕＴｕｐｌｅ形式のように複数の値を持つ情報をまとめて索引づけするデータ構造として、多次元検索木Ｕｂｉ−ｔｒｅｅ（例えば、非特許文献２参照。）があった。 Conventionally, there has been a multi-dimensional search tree Ubi-tree (for example, see Non-Patent Document 2) as a data structure for collectively indexing information having a plurality of values as in the u Tuple format.

「大量スキーマレスデータの蓄積・検索を実現する新しいｕＴｕｐｌｅＳｐａｃｅの設計と実装」、柏木啓一郎ほか著、マルチメディア，分散，協調とモバイル（ＤＩＣＯＭＯ２０１０）シンポジウム、２０１０年７月"Design and implementation of a new uTupleSpace that enables storage and retrieval of large amounts of schemaless data", Keiichiro Kashiwagi et al., Multimedia, Distributed, Collaboration and Mobile (DICOMO2010) Symposium, July 2010 「ユビキタスデータのためのインデキシング技術ＵＢＩ−ｔｒｅｅの改良」、荒川豊ほか著、電子情報通信学会データ工学研究会技術報告、信学技報，ｖｏｌ．１１０，ｎｏ．１６２，ＤＥ２０１０−２２，ｐｐ．４７−５２，２０１０年８月"Improvement of indexing technology UBI-tree for ubiquitous data", Yutaka Arakawa et al., IEICE Technical Report, IEICE Technical Report, vol. 110, no. 162, DE2010-22, pp. 47-52, August 2010

チャンクを生成する方法として、最も単純なのは、蓄積されたセンサ情報のうち、時系列順に古いものから一定個数ずつを、一次ＤＢ（あるいは一時ＤＢ）と呼ばれる最初にセンサ情報が蓄積される領域からチャンクとして別ファイルに移し替えていく方法である。 The simplest method for generating a chunk is to use a certain number of the accumulated sensor information from the oldest in chronological order, starting from the first sensor information accumulation area called the primary DB (or temporary DB). It is a method of moving to another file.

さらにチャンクの効率を改善するためには、検索結果がチャンク単位で取り扱われることを鑑み、一つの検索式で結果としてまとめられることが多いデータ、すなわち、似た値をもつデータがなるべく同じチャンクに集まるようにするのがよい。そのためには、一時ＤＢにある程度の量のセンサ情報を蓄えておき、何らかの手段でその中で似た値をもつデータを抽出して、それらをチャンクとして別ファイルに移し替えていく方法が考えられる。 In order to further improve the efficiency of chunks, considering that search results are handled in units of chunks, data that is often combined as a result with a single search expression, that is, data with similar values, should be merged into the same chunk as much as possible. It is better to gather. To that end, it is conceivable to store a certain amount of sensor information in the temporary DB, extract data having similar values by some means, and transfer them to another file as chunks. .

さらにその際、似た値を持つデータを抽出する方法として、木構造の特徴を用い、木構造で近くに配置されるノードを選択するという方法が考えられる。またこの際、センサ情報を管理する木構造としてＵｂｉ−ｔｒｅｅを採用するという方法が考えられる。 At that time, as a method of extracting data having similar values, a method of using a feature of the tree structure and selecting a node arranged nearby in the tree structure is conceivable. At this time, a method of adopting Ubi-tree as a tree structure for managing sensor information is conceivable.

しかし、上記従来技術から想起されるチャンク生成方式においては、一時ＤＢのサイズを大きく取っておかなければならないという問題があった。すなわち、例えばチャンクの１ファイルに１００万個のセンサ情報をまとめるとする。母集団の中から似たデータを抽出して一つのファイルにまとめようとすると、まとめようとする先のファイルに対して母集団は十分な大きさがなければならない。これが仮に１０００倍のデータ量の中から選択しようとすると、１００万×１０００＝１０億個のセンサ情報を単一の一時ＤＢに蓄積して管理しなければならないということになる。 However, the chunk generation method recalled from the above prior art has a problem that the size of the temporary DB must be kept large. That is, for example, 1 million pieces of sensor information are collected in one chunk file. If similar data is extracted from the population and then combined into one file, the population must be large enough for the previous file to be combined. If this is to be selected from 1000 times the data amount, it means that 1 million × 1000 = 1 billion sensor information must be stored and managed in a single temporary DB.

このような大きなサイズのデータを管理することは、メモリならびに二次記憶装置への負荷が多大なものとなり、かつ、そのような膨大なデータの中から似たデータを抽出するという処理の負荷が多大なものとなるという問題があった。 Managing such a large size of data increases the load on the memory and the secondary storage device, and the processing load of extracting similar data from such a large amount of data. There was a problem that it would be enormous.

また従来、非特許文献１で開示されているように、「チャンクはデータ行を並べたテキストファイル形式で保管され」ていた。このようにチャンクがセンサ情報が行ごとに列挙されるテキストファイル形式である場合、アプリケーションプログラムがチャンクのファイルを読み取る処理を容易に作成でき、また、チャンクのファイルの中身は人間にとっても可読であり取り扱いが容易であるという利点があった。その一方で、チャンクのファイルの中から真に必要とするデータ行のみを検索結果として得るためには、チャンクのファイル全体を読み込み全ての行について一致判定処理を行わなければならず、処理負荷が高くなるという問題があった。 Conventionally, as disclosed in Non-Patent Document 1, “chunks are stored in a text file format in which data rows are arranged”. In this way, if the chunk is in a text file format in which sensor information is enumerated line by line, the application program can easily create a process to read the chunk file, and the contents of the chunk file are also human readable. There was an advantage that it was easy to handle. On the other hand, in order to obtain only the data rows that are really necessary from the chunk file as a search result, the entire chunk file must be read and a match determination process must be performed for all the rows. There was a problem of becoming higher.

あるいは、チャンクのファイル形式として、テキストによるセンサ情報の記述に加え、検索を高速化するための木構造等の索引情報を別途作成し、同一ファイルあるいは別ファイルとして保管するという方法が考えられる。この方法によれば、チャンクのファイルの中から真に必要とするデータ行のみを検索結果として得るために、チャンクのファイルの一部を読み込むだけでよく、全ての行について一致判定処理を行う必要がないという利点がある。その一方で、アプリケーションプログラムがチャンクのファイルを読み取る処理の作成が容易ではなく、また、当該ファイルの中身は人間にとって不可読であり取り扱いが困難であるという問題があった。 Alternatively, as a chunk file format, in addition to the description of sensor information in text, a method of separately creating index information such as a tree structure for speeding up the search and storing it as the same file or a different file is conceivable. According to this method, only a part of the chunk file needs to be read in order to obtain only the data rows that are really necessary from the chunk file, and it is necessary to perform a match determination process for all the rows. There is an advantage that there is no. On the other hand, it is not easy for the application program to create a process for reading a chunk file, and the contents of the file are unreadable for humans and difficult to handle.

すなわち、チャンクの処理の容易性と処理の高速性が両立できないという問題があった。 That is, there is a problem that the ease of chunk processing and the speed of processing cannot be compatible.

本願発明に係るチャンク生成装置は、複数の情報を列挙したファイルであるチャンクを生成する装置であって、前記装置は一時プールと第１チャンクとチャンクプールを保持し、前記一時プールは情報を木構造で管理し、前記第１チャンクは前記装置が生成しようとするチャンクが含むべき情報の列挙数よりは少ない数の情報を列挙したファイルであり、前記チャンクプールは個々の前記第１チャンクのデータ範囲と前記第１チャンクの識別情報を示す荷札情報を木構造で管理し、情報を前記一時プールに登録する機能と、前記一時プールから木構造で近傍の情報群を取り出し前記第１チャンクを作成する機能と、前記第１チャンクの荷札情報を生成して前記チャンクプールに登録する機能と、前記チャンクプールから木構造で近傍の荷札情報群を取り出しそれらの荷札情報が示す前記第１チャンクの含む情報を少なくとも含む第２チャンクを作成する機能と、を少なくとも有する。 A chunk generation device according to the present invention is a device that generates a chunk that is a file listing a plurality of information, wherein the device holds a temporary pool, a first chunk, and a chunk pool, and the temporary pool stores information in a tree. The first chunk is a file that enumerates a smaller number of information than the enumerated number of information that should be included in the chunk to be generated by the device, and the chunk pool is data of each of the first chunks. The tag information indicating the range and the identification information of the first chunk is managed in a tree structure, the information is registered in the temporary pool, and the neighboring information group is extracted from the temporary pool in a tree structure to create the first chunk. A function for generating tag information of the first chunk and registering it in the chunk pool, and a tag information in the vicinity of the chunk pool in a tree structure. It has a function of creating a second chunk taken out group includes information including the first chunk showing their tag information at least, the least.

前記第２チャンクを作成する機能は、ｋ個の前記第１チャンクの含む情報から前記第２チャンクを作成する際に、前記第２チャンクは前記ｋ個の第１チャンク内容のファイル内位置を検索するための索引情報と前記ｋ個の第１チャンク内容の並びをその構成要素として少なくとも含んでもよい。 When the second chunk is created from information included in the k first chunks, the second chunk searches the position in the file of the contents of the k first chunks when creating the second chunk. It may include at least the index information and the arrangement of the k first chunk contents as its constituent elements.

前記一時プール及び前記チャンクプールは、前記木構造としてＵＢＩ−Ｔｒｅｅ検索木アルゴリズムを用い、前記第１チャンクの前記荷札情報が示す前記第１チャンクのデータ範囲は複数のキーに対する値の集合として表現し、前記第２チャンクの前記索引情報は複数のキーに対する値の集合から前記第１チャンク内容のファイル内位置を検索するための索引情報であってもよい。 The temporary pool and the chunk pool use the UBI-Tree search tree algorithm as the tree structure, and the data range of the first chunk indicated by the tag information of the first chunk is expressed as a set of values for a plurality of keys. The index information of the second chunk may be index information for searching the file position of the contents of the first chunk from a set of values for a plurality of keys.

本願発明に係るチャンク読み取り装置は、チャンクを読み取る装置であって、チャンク生成装置で作成されたチャンクを入力とし、索引情報を読み取る機能と、検索条件に合致する索引情報を選択する機能と、ファイル読み取り位置を索引情報が示すファイル内位置に移動する機能と、前記移動したファイル内位置から前記第１チャンク内容を読み取る機能と、を少なくとも有する。 A chunk reading device according to the present invention is a device that reads a chunk, receives a chunk created by the chunk generation device as an input, reads index information, selects index information that matches a search condition, and a file It has at least a function of moving the reading position to a position in the file indicated by the index information and a function of reading the first chunk contents from the moved position in the file.

本願発明に係るチャンク生成方法は、一時プールと第１チャンクとチャンクプールを保持し、一時プール登録部と第１チャンク作成部と荷札情報生成部と第２チャンク作成部を備え、複数の情報を列挙したファイルであるチャンクを生成するチャンク生成装置が実行する方法であって、前記一時プール登録部が、情報を前記一時プールに登録し、前記一時プールが、該情報を木構造で管理する手順と、前記第１チャンク作成部が、前記一時プールから木構造で近傍の情報群を取り出し、生成しようとするチャンクが含むべき情報の列挙数よりは少ない数の情報を列挙した前記第１チャンクを作成する手順と、前記荷札情報生成部が、前記第１チャンクの識別情報を示す荷札情報を生成して前記チャンクプールに登録し、前記チャンクプールが、個々の前記第１チャンクのデータ範囲と前記第１チャンクの荷札情報を木構造で管理する手順と、前記第２チャンク作成部が、前記チャンクプールから木構造で近傍の荷札情報群を取り出し、それらの荷札情報が示す前記第１チャンクの含む情報を少なくとも含む第２チャンクを作成する手順と、を順に実行する。 The chunk generation method according to the present invention includes a temporary pool, a first chunk, and a chunk pool, and includes a temporary pool registration unit, a first chunk creation unit, a tag information generation unit, and a second chunk creation unit, and includes a plurality of pieces of information. a method chunk generating apparatus for generating a chunk is listed file is executed, the temporary pool registration unit registers the information on the one o'clock pool, the temporary pool, manages the information in a tree structure procedure When the first chunk creation unit, the one o'clock retrieve information group in the vicinity of a tree structure from the pool, the first chunk that lists the number of information less than enumerating the number of generation try to information should contain chunk a step of creating, the tag information generating unit generates the tag information indicating identification information of the first chunk is registered in the chunk pool, the chunk pool And procedures for managing tag information of each of the said first chunk of data range first chunk in a tree structure, the second chunk creation unit extracts the tag information group in the vicinity of a tree structure from the chunk pool, they a step of creating at least comprises second chunk information including the first chunk indicated tag information, the order to execute.

本願発明に係るチャンク生成方法は、前記第２チャンク作成部が、ｋ個の前記第１チャンクの含む情報から前記第２チャンクを作成する際に、前記第２チャンクに前記ｋ個の第１チャンク内容のファイル内位置を検索するための索引情報と前記ｋ個の第１チャンク内容の並びをその構成要素として少なくとも含めてもよい。 Chunk generating method according to the present invention, the second chunk creation unit, in preparing the second chunk from information including the k-number of the first chunk, the k-number of first chunk in the second chunk at least including Umate contents of the index information for searching a file in position and the arrangement of the k-number of first chunk contents as its components may be.

本願発明に係るチャンク生成プログラムは、コンピュータに、一時プールと第１チャンクとチャンクプールを保持させ、前記一時プールでは、情報を木構造で管理させ、前記第１チャンクは、生成しようとするチャンクが含むべき情報の列挙数よりは少ない数の情報を列挙したファイルであり、前記チャンクプールでは、個々の前記第１チャンクのデータ範囲と前記第１チャンクの識別情報を示す荷札情報を木構造で管理させ、本願発明に係るチャンク生成装置の各機能を実現させるためのプログラムである。 The chunk generation program according to the present invention causes a computer to hold a temporary pool, a first chunk, and a chunk pool. In the temporary pool, information is managed in a tree structure, and the first chunk includes a chunk to be generated. It is a file that enumerates a smaller number of information than the number of information to be included, and in the chunk pool, tag data indicating the data range of each first chunk and the identification information of the first chunk is managed in a tree structure And a program for realizing each function of the chunk generation device according to the present invention.

以上述べたように、本発明によれば、チャンクの生成を効率的に行うことができ、生成したチャンクは容易かつ高速に読み取り処理ができるようなチャンク生成装置、チャンク読み取り装置、チャンク生成方法及びチャンク生成プログラムを実現することができる。 As described above, according to the present invention, chunk generation can be efficiently performed, and the generated chunk can be read easily and at high speed. A chunk generation program can be realized.

本実施の形態における装置構成を示す。The apparatus structure in this Embodiment is shown. 一時プール（１０１）が保持するデータ構造を示す。The data structure which a temporary pool (101) hold | maintains is shown. 一時プール（１０１）が保持するそれぞれのセンサ情報（１０２）の内容を示す。The contents of each sensor information (102) stored in the temporary pool (101) are shown. 新たに一時プール（１０１）への登録処理をしようとするセンサ情報（１０２）の内容を示す。The contents of the sensor information (102) to be newly registered in the temporary pool (101) are shown. 登録処理を行った後の一時プール（１０１）が保持するデータ構造を示す。The data structure which a temporary pool (101) after performing a registration process hold | maintains is shown. 一時プール（１０１）から近傍のセンサ情報群を取り出す動作を示す。The operation of extracting a nearby sensor information group from the temporary pool (101) is shown. 作成されたＳ７というファイル名のＳチャンク（１０４）のファイルの内容を示す。The contents of the created file of the S chunk (104) with the file name S7 are shown. Ｓチャンクプール（１０５）が保持するデータ構造を示す。The data structure which S chunk pool (105) hold | maintains is shown. Ｓチャンクプール（１０５）が保持する荷札情報（１０６）のうちＳ１とＳ２に対応するＳチャンク（１０４）のファイルの内容を示す。Of the tag information (106) held by the S chunk pool (105), the contents of the file of the S chunk (104) corresponding to S1 and S2 are shown. 一時プール（１０１）からの近傍のセンサ情報群の取り出しとＳチャンク（１０４）の生成に伴ってＳチャンクプール（１０５）への登録処理をしようとする新たな荷札情報（３０１）の内容を示す。Indicates the contents of new tag information (301) to be registered in the S chunk pool (105) as the neighboring sensor information group is extracted from the temporary pool (101) and the S chunk (104) is generated. . 登録処理を行った後のＳチャンクプール（１０５）が保持するデータ構造を示す。The data structure which S chunk pool (105) after performing a registration process shows is shown. Ｓチャンクプール（１０５）から近傍の荷札情報群を取り出す動作を示す。The operation of extracting a nearby tag information group from the S chunk pool (105) is shown. 作成されたＬ０３というファイル名のＬチャンク（１０８）のファイルの内容を示す。The contents of the created file of the L chunk (108) having the file name L03 are shown.

本発明の実施の形態の一つを以下に示す。
第１図は本実施の形態における装置構成を示す。
チャンク生成装置（１００）は、ネットワーク（１０９）に接続され、一時プール登録部（１１０）とＳチャンク作成部（１１１）とＳチャンクプール登録部（１１２）とＬチャンク作成部（１１３）を具備する。Ｓチャンクが第１チャンク、Ｌチャンクが第２チャンクに相当する。 One embodiment of the present invention will be described below.
FIG. 1 shows an apparatus configuration in the present embodiment.
The chunk generation device (100) is connected to the network (109) and includes a temporary pool registration unit (110), an S chunk creation unit (111), an S chunk pool registration unit (112), and an L chunk creation unit (113). To do. The S chunk corresponds to the first chunk and the L chunk corresponds to the second chunk.

さらに、チャンク生成装置（１００）は、データ保持を行う一時プール（１０１）およびＳチャンクプール（１０５）を具備する。一時プール（１０１）は２分木データ構造によりセンサ情報（１０２）を保持し、Ｓチャンクプール（１０５）は２分木データ構造により荷札情報（１０６）を保持する。Ｓチャンクプール（１０５）がチャンクプールに相当する。 Furthermore, the chunk generation device (100) includes a temporary pool (101) for holding data and an S chunk pool (105). The temporary pool (101) holds sensor information (102) with a binary tree data structure, and the S chunk pool (105) holds tag information (106) with a binary tree data structure. The S chunk pool (105) corresponds to the chunk pool.

さらに、チャンク生成装置（１００）は、Ｓチャンクを保持するＳチャンク記憶装置（１０３）およびＬチャンクを保持するＬチャンク記憶装置（１０７）を具備する。 Furthermore, the chunk generation device (100) includes an S chunk storage device (103) that holds an S chunk and an L chunk storage device (107) that holds an L chunk.

第２図は一時プール（１０１）が保持するデータ構造を示す。
本図の例では６つのセンサ情報を保持している。例えば「３５：（ア）」という記載は、木の当該ノードが第３図で示す（ア）の情報本体を保持するとともに、該情報本体が含む「３５」という特定の値を主キーとして木構造により管理されることを表す。 FIG. 2 shows the data structure held by the temporary pool (101).
In the example of this figure, six pieces of sensor information are held. For example, the description “35: (A)” indicates that the node in the tree holds the information body of (A) shown in FIG. 3 and uses the specific value “35” included in the information body as a main key. Indicates that it is managed by the structure.

２分木構造の既知の特性から、これらのセンサ情報は整列されて格納されている。すなわち、上下軸を無視して左右軸のみに着目した場合、本図の例では主キーが左から順に０→３５→３３３→５９９→２０１７→３７７６と整列されている。このような２分木データ構造の管理（挿入・検索・削除）方式は広く知られている。 Because of the known characteristics of the binary tree structure, these pieces of sensor information are stored in an aligned manner. That is, when the vertical axis is ignored and only the left and right axes are focused, the main keys are aligned in the order of 0 → 35 → 333 → 599 → 2017 → 3776 from the left in the example of this figure. Such a binary tree data structure management (insertion / retrieval / deletion) method is widely known.

なお、本例のように近いデータが近傍に並ぶように整列されて格納されるのは、２分木だけに見られる特徴ではなく、木構造に広く一般的に観測される特徴である。なぜなら、そもそも木構造は検索を高速に行う必要があるために順序づけてデータの索引付けを行うという根本的な理由があるため、格納されているデータが順序づけられ近傍に近いデータが並べられることは、原理上の根本的要請である。 It is to be noted that the fact that close data is arranged and stored so as to be arranged in the vicinity as in this example is not a feature that is seen only in the binary tree but a feature that is widely observed in the tree structure. Because, in the first place, because the tree structure needs to be searched at high speed, there is a fundamental reason that data is indexed by ordering, so that the stored data is ordered and data close to the neighborhood are arranged. This is a fundamental request in principle.

第３図は一時プール（１０１）が保持するそれぞれのセンサ情報（１０２）の内容を示す。
本実施形態におけるセンサ情報とは、個々の情報のデータサイズが比較的小さく、かつ、その内部に複数の値を含むような情報である。そのような特徴を持つセンサ情報に対して本発明は効果的に機能する。特に、本実施の形態においては、それぞれのセンサ情報は「キー＝値」の任意個の並びから構成されるｕＴｕｐｌｅデータ形式によって記述することとする。 FIG. 3 shows the contents of each sensor information (102) held in the temporary pool (101).
The sensor information in the present embodiment is information such that the data size of each piece of information is relatively small and includes a plurality of values therein. The present invention functions effectively for sensor information having such characteristics. In particular, in the present embodiment, each sensor information is described in a uTuple data format composed of an arbitrary number of “key = value” sequences.

なお、本発明のいう情報とは、上記特徴を満たす様々な情報がその対象となるものであって、センサ情報（１０２）に限られない。具体的に一例を挙げると、温度や湿度、電流あるいは電圧値、流体の流量、物質の濃度、明度、騒音、位置、加速度などを含むセンサデバイスが計測した値を取り扱ってよく、またそれに限らず、センサ以外の例えばＷｅｂやインターネットを経由して取得した情報であってもよい。さらに、それら値に加えて、センサの特性や状態、計測日時等を示すメタデータを含む情報であってもよい。 The information referred to in the present invention is a variety of information satisfying the above characteristics, and is not limited to the sensor information (102). Specific examples include, but are not limited to, values measured by sensor devices including temperature, humidity, current or voltage values, fluid flow rates, substance concentrations, brightness, noise, position, and acceleration. Information other than sensors, for example, acquired via the Web or the Internet may be used. Further, in addition to these values, information including metadata indicating sensor characteristics and states, measurement date and time, and the like may be used.

本図の例ではＡ、Ｄ、Ｔという３つのキーに対してそれぞれ数の値を持っており、Ａは高度（ａｌｔｉｔｕｄｅ）、Ｄは日付（ｄａｔｅ）、Ｔは温度（ｔｅｍｐｅｒａｔｕｒｅ）を表している。例えば（ア）のセンサ情報は、高度３５ｍ、日付が２０１１年６月１１日、温度が摂氏２３．５度であることを表し、東京２３区の最も高い山で測定した気温のデータであることを示している。同様に（ウ）のセンサ情報は東京都の最も高い山（標高２０１７ｍ）で測定した気温のデータを、（エ）のセンサ情報は日本の最も高い山（標高３７７６ｍ）で測定した気温のデータを示している。そして、本実施例においては、キー「Ａ」（高度）に対して２分木データ構造を適用することで一時プール（１０１）を管理する。なお、本発明の適用範囲はｕＴｕｐｌｅデータ形式によって記述されるセンサ情報に限るものではなく、上述した特徴を持つセンサ情報一般に適用可能である。 In the example of this figure, each of the three keys A, D, and T has a numeric value, A is an altitude, D is a date, and T is a temperature. . For example, the sensor information in (a) indicates that the altitude is 35 m, the date is June 11, 2011, the temperature is 23.5 degrees Celsius, and the temperature is measured at the highest mountain in Tokyo's 23 wards. Is shown. Similarly, the sensor information in (c) is the temperature data measured at the highest mountain in Tokyo (altitude 2017m), and the sensor information in (d) is the temperature data measured at the highest mountain in Japan (altitude 3776m). Show. In this embodiment, the temporary pool (101) is managed by applying the binary tree data structure to the key “A” (altitude). Note that the scope of application of the present invention is not limited to sensor information described in the u Tuple data format, and is applicable to sensor information having the above-described features in general.

第４図は新たに一時プール（１０１）への登録処理をしようとするセンサ情報（１０２）の内容を示す。
本実施の形態では、本図に示される（キ）というセンサ情報が、ネットワーク（１０９）を通じて新たに一時プール登録部（１０１）に到着し、該情報の登録処理を行う様子を以下に詳述する。 FIG. 4 shows the contents of sensor information (102) to be newly registered in the temporary pool (101).
In the present embodiment, a state in which the sensor information (K) shown in the figure newly arrives at the temporary pool registration unit (101) through the network (109) and performs registration processing of the information will be described in detail below. To do.

一時プール登録部（１１０）は、該情報を受信し、一時プール（１０１）に対して該情報を挿入することによって、センサ情報（１０２）の一時プール（１０１）への登録処理を行う。該情報のうち主キーはＡ（高度）であるから値は４５である。これを２分木構造に挿入する方法はよく知られており、その結果は次の第５図のようになる。 The temporary pool registration unit (110) receives the information and inserts the information into the temporary pool (101) to perform registration processing of the sensor information (102) in the temporary pool (101). The value is 45 because the primary key of the information is A (altitude). The method of inserting this into a binary tree structure is well known, and the result is as shown in FIG.

第５図は登録処理を行った後の一時プール（１０１）が保持するデータ構造を示す。
ここで、チャンク生成装置（１００）は、一時プールからＳチャンクの作成を行う以下の一連の動作を起動する。なお、起動の契機は、前述の一時プール登録部（１１０）によるセンサ情報（１０２）の一時プール（１０１）への登録処理の完了であってもよく、あるいは、該登録処理の完了とは非同期的に、タイマー等の手段によって該動作を起動してもよい。 FIG. 5 shows the data structure held by the temporary pool (101) after the registration process.
Here, the chunk generation device (100) starts the following series of operations for creating an S chunk from the temporary pool. The activation trigger may be the completion of the registration process of the sensor information (102) in the temporary pool (101) by the temporary pool registration unit (110) described above, or asynchronous with the completion of the registration process. In particular, the operation may be activated by means such as a timer.

まず、起動されたＳチャンク作成部（１１１）は、一時プール（１０１）から近傍のセンサ情報群を取り出す。ここで近傍とは、データの値が相互に近いことであり、かつ前述した木構造一般に見られる特性に関する考察に基づくと、木構造上で隣り合って配置されているデータであるとも言える。具体的に例示すると以下のような処理である。 First, the activated S chunk creation unit (111) extracts a nearby sensor information group from the temporary pool (101). Here, the neighborhood means that the data values are close to each other, and it can also be said that the data are arranged adjacent to each other on the tree structure based on the above-described consideration regarding the characteristics generally found in the tree structure. Specifically, the processing is as follows.

第６図は一時プール（１０１）から近傍のセンサ情報群を取り出す動作を示す。
Ｓチャンク作成部（１１１）が特定のノードを注目点（２６１）として選択し、その近傍にあるデータを選択してそれを取り出し範囲（２６２）と定める。ここでは木構造のデータ数７個に対して、取り出し範囲の含むデータ数を３個とという定数にて動作するものとし、注目点（２６１）として（ア）を選択し、注目点のデータ（ア）およびその部分木を構成するデータ（オ）（キ）を取り出し範囲としている。 FIG. 6 shows an operation of extracting a nearby sensor information group from the temporary pool (101).
The S chunk creation unit (111) selects a specific node as a point of interest (261), selects data in the vicinity thereof, and defines it as an extraction range (262). Here, it is assumed that the number of data included in the extraction range is 3 with respect to the number of data in the tree structure, and that (a) is selected as the point of interest (261), and the data of the point of interest ( A) and data (e) (ki) constituting the subtree are taken out as the extraction range.

なお注目点（２６１）の選択方法は、この例では、日付が最も古いデータを選択している。他にも、最も日付が新しいデータを選択するとか、日付以外の他のキーに対する値（例えば第３図の例ではＤ以外のキーすなわちＡあるいはＴ）が最も大きいあるいは小さいデータを選択するとか、最も頻繁に検索によって取り出されているデータを選択するとか、複数あるいは全ての注目点を選択してみて取り出し範囲の含むデータ全体の値の幅が最も小さくなる候補を選択するといった方法が可能である。 In this example, the method of selecting the attention point (261) selects the data with the oldest date. In addition, when selecting data with the newest date, selecting data with the largest or smallest value for a key other than the date (for example, a key other than D, ie, A or T in the example of FIG. 3), It is possible to select the data extracted most frequently by search, or select a candidate having the smallest value width of the entire data included in the extraction range by selecting a plurality or all of the points of interest. .

Ｓチャンク作成部（１１１）は、上記により定めた取り出し範囲（２６２）のセンサ情報（１０２）を一時プール（１０１）から読み出し、新しいファイルにその内容を書き込む。該ファイルがＳチャンク（１０４）に相当する。該ファイルはＳチャンク記憶装置（１０３）に格納され、既存のＳチャンク（１０４）と重複しないファイル名を割り付ける。この例では通し番号を付与し、「Ｓ７」というファイル名で該ファイルを作成している。さらに、取り出したセンサ情報（１０２）は、一時プール（１０１）から削除する。 The S chunk creation unit (111) reads the sensor information (102) of the extraction range (262) determined as described above from the temporary pool (101), and writes the contents in a new file. This file corresponds to the S chunk (104). The file is stored in the S chunk storage device (103), and a file name that does not overlap with the existing S chunk (104) is assigned. In this example, a serial number is assigned and the file is created with the file name “S7”. Further, the extracted sensor information (102) is deleted from the temporary pool (101).

第７図は作成されたＳ７というファイル名のＳチャンク（１０４）のファイルの内容を示す。
取り出し範囲（２６２）が含むセンサ情報（オ）（ア）（キ）のｕＴｕｐｌｅデータ形式
による記述を、主キー（本実施例では「Ａ」）の値順に、テキスト形式で書き出したものとなっている。 FIG. 7 shows the contents of the created file of the S chunk (104) having the file name S7.
Descriptions of sensor information (e) (a) (g) included in the extraction range (262) in the utuple data format are written in a text format in the order of the value of the primary key ("A" in this embodiment). Yes.

次に、上述した新たなＳチャンク（１０４）の作成に伴って、Ｓチャンクプール登録部（１１２）は、Ｓチャンクプール（１０５）に対して該Ｓチャンクに対応する荷札情報（１０６）を新たに登録する処理を行う。この処理内容を示すにあたり、まず、該登録処理以前に保持されているデータの様子を示した後、登録処理の手順について具体的に説明することとする。 Next, with the creation of the new S chunk (104) described above, the S chunk pool registration unit (112) newly updates the tag information (106) corresponding to the S chunk to the S chunk pool (105). Process to register with. In showing the processing contents, first, after showing the state of data held before the registration processing, the procedure of the registration processing will be specifically described.

第８図はＳチャンクプール（１０５）が保持するデータ構造を示す。
本図の例では６つの荷札情報を保持している。例えば「５０〜８５：Ｓ１」という記載が荷札情報の一例であり、木の当該ノードが第９図で示すＳ１というファイル名の情報本体に対応する荷札情報であることを表す。このように荷札情報には情報本体を含んでおらず、対応する情報本体へのポインタ（ここではファイル名「Ｓ１」）と、検索に用いる主キーの値（ここでは「５０〜８５」）という情報のみが含まれている。 FIG. 8 shows a data structure held by the S chunk pool (105).
In the example of this figure, six tag information is held. For example, the description “50 to 85: S1” is an example of the tag information, indicating that the node of the tree is tag information corresponding to the information body of the file name S1 shown in FIG. As described above, the tag information does not include the information body, and is referred to as a pointer to the corresponding information body (here, the file name “S1”) and the value of the primary key used for the search (here, “50 to 85”). Contains information only.

なおこのような範囲情報を主キーとして木構造に格納するにあたっては、Ｓチャンクプール（１０５）が用いる木構造が値範囲あるいは複数の値を索引として直接操作可能なデータ構造であれば、上記例であれば５０および８５という２つの数値を用いて木構造に格納してもよい。あるいは、本実施の形態においては２分木を用いており、単一の値のみを索引として直接操作可能であるため、始値「５０」のみを用いて木構造を構成することとする。すなわち、第８図の６つのノードは、始値に着目すると左から順に４５→５０→６５→８５→３３３→５９９と整列されている。いずれの木構造を用いた場合でも、本例のように近いデータが近傍に並ぶように整列されて格納されることになるのは、既に考察した通りである。 When storing such range information as a primary key in a tree structure, if the tree structure used by the S chunk pool (105) is a data structure that can be directly manipulated using a value range or a plurality of values as an index, the above example If so, it may be stored in the tree structure using two numerical values of 50 and 85. Alternatively, in the present embodiment, a binary tree is used, and only a single value can be directly operated as an index. Therefore, a tree structure is configured using only the opening value “50”. That is, the six nodes in FIG. 8 are arranged in the order of 45 → 50 → 65 → 85 → 333 → 599 in order from the left when paying attention to the opening price. Regardless of which tree structure is used, as described above, close data is arranged and stored so as to be arranged in the vicinity as in this example.

第９図はＳチャンクプール（１０５）が保持する荷札情報（１０６）のうちＳ１とＳ２に対応するＳチャンク（１０４）のファイルの内容を示す。
例えばＳ１というファイル名のＳチャンク（１０４）に関しては、キー「Ａ」の値のファイル内での最小値が５０、最大値が８５である。そこで、このＳチャンクに対応する荷札情報（１０６）は、上記の値範囲「５０〜８５」と、該ファイル名「Ｓ１」を値として有する。同様にＳ２というファイル名のＳチャンク（１０４）に関しては、対応する荷札情報（１０６）は値範囲「４５〜６５」とファイル名「Ｓ２」を値として有する。 FIG. 9 shows the contents of the file of the S chunk (104) corresponding to S1 and S2 in the tag information (106) held by the S chunk pool (105).
For example, regarding the S chunk (104) having the file name S1, the minimum value in the file of the value of the key “A” is 50 and the maximum value is 85. Therefore, the tag information (106) corresponding to the S chunk has the above value range “50 to 85” and the file name “S1” as values. Similarly, for the S chunk (104) with the file name S2, the corresponding tag information (106) has the value range “45 to 65” and the file name “S2” as values.

以上でＳチャンクプール登録部（１１２）による荷札情報（１０６）の該登録処理の以前に保持されているデータの様子を示したので、以下では該当録処理の手順について述べる。 The state of the data held before the registration processing of the tag information (106) by the S chunk pool registration unit (112) has been described above, and the procedure of the corresponding recording processing will be described below.

第１０図は一時プール（１０１）からの近傍のセンサ情報群の取り出しとＳチャンク（１０４）の生成に伴ってＳチャンクプール（１０５）への登録処理をしようとする新たな荷札情報（３０１）の内容を示す。 FIG. 10 shows the new tag information (301) to be registered in the S chunk pool (105) as the sensor information group in the vicinity is extracted from the temporary pool (101) and the S chunk (104) is generated. The contents of

前記手順で作成されたＳ７というファイル名のＳチャンク（１０４）では、第７図で示したように、キー「Ａ」の値の最小値が０で最大値が４５である。そこで、Ｓチャンクプール登録部（１１２）は、値範囲「０〜４５」およびファイル名「Ｓ７」を値として有するＳ０７の荷札情報（３０１）を該Ｓチャンクに対応して生成する。 In the S chunk (104) having the file name S7 created by the above procedure, the minimum value of the value of the key “A” is 0 and the maximum value is 45 as shown in FIG. Therefore, the S chunk pool registration unit (112) generates the tag information (301) of S07 having the value range “0 to 45” and the file name “S7” as values corresponding to the S chunk.

次にＳチャンクプール登録部（１１２）は、該生成したＳ７の荷札情報（３０１）をＳチャンクプール（１０５）に対して挿入することによって、荷札情報（１０６）のＳチャンクプール（１０５）への登録処理を行う。本実施の形態では該荷札情報のうち始値「０」を主キーとして２分木構造に挿入する。該挿入方法はよく知られており、その結果は次の第１１図のようになる。 Next, the S chunk pool registration unit (112) inserts the generated S7 tag information (301) into the S chunk pool (105), thereby transferring the tag information (106) to the S chunk pool (105). Registration process. In the present embodiment, the opening price “0” of the tag information is inserted into the binary tree structure as the main key. The insertion method is well known, and the result is as shown in FIG.

第１１図は登録処理を行った後のＳチャンクプール（１０５）が保持するデータ構造を示す。
ここで、チャンク生成装置（１００）は、ＳチャンクプールからＬチャンクの作成を行う以下の一連の動作を起動する。なお、起動の契機は、前述のＳチャンクプール登録部（１１２）による荷札情報（１０６）のＳチャンクプール（１０５）への登録処理の完了であってもよく、あるいは、該登録処理の完了とは非同期的に、タイマー等の手段によって該動作を起動してもよい。 FIG. 11 shows the data structure held by the S chunk pool (105) after the registration process.
Here, the chunk generation device (100) starts the following series of operations for creating an L chunk from the S chunk pool. The activation trigger may be the completion of the registration process of the tag information (106) to the S chunk pool (105) by the S chunk pool registration unit (112) described above, or the completion of the registration process. May be activated asynchronously by means such as a timer.

まず、起動されたＬチャンク作成部（１１３）は、Ｓチャンクプール（１０５）から近傍の荷札情報群を取り出す。ここで近傍とは、データの値が相互に近いことであり、かつ前述した木構造一般に見られる特性に関する考察に基づくと、木構造上で隣り合って配置されているデータであるとも言える。具体的に例示すると以下のような処理である。 First, the activated L chunk creation unit (113) takes out a nearby tag information group from the S chunk pool (105). Here, the neighborhood means that the data values are close to each other, and it can also be said that the data are arranged adjacent to each other on the tree structure based on the above-described consideration regarding the characteristics generally found in the tree structure. Specifically, the processing is as follows.

第１２図はＳチャンクプール（１０５）から近傍の荷札情報群を取り出す動作を示す。
Ｌチャンク作成部（１１３）が特定のノードを注目点（３２１）として選択し、その近傍にあるデータを選択してそれを取り出し範囲（３２２）と定める。ここでは木構造のデータ数７個に対して、取り出し範囲の含むデータ数を３個という定数にて動作するものとし、注目点（３２１）およびその部分木を構成するデータを取り出し範囲としている。 FIG. 12 shows an operation of extracting a nearby tag information group from the S chunk pool (105).
The L chunk creation unit (113) selects a specific node as an attention point (321), selects data in the vicinity thereof, and defines it as an extraction range (322). Here, for the number of data in the tree structure, the number of data included in the extraction range operates with a constant of three, and the point of interest (321) and the data constituting the subtree are set as the extraction range.

なお注目点（３２１）の選択方法は、この例では、Ｓチャンクの生成日付が最も古い、すなわち、Ｓチャンクのファイル名が最も若い通し番号をもつデータを選択している。他にも、最も生成日付が新しいデータを選択するとか、Ｓチャンクに格納される個々の行すなわち個々のセンサ情報の任意のキーに対する値が最も大きいあるいは小さいデータを選択するとか、最も頻繁に検索によって取り出されているデータを選択するとか、複数あるいは全ての注目点を選択してみて取り出し範囲の含むデータ全体の値の幅が最も小さくなる候補を選択するといった方法が可能である。 In this example, the method of selecting the attention point (321) selects the data having the oldest S chunk generation date, that is, the data having the smallest serial number of the S chunk file name. In addition, select the data with the newest generation date, select the individual row stored in the S chunk, that is, select the data with the largest or smallest value for any key of the individual sensor information, or search the most frequently It is possible to select the data that has been extracted by the above method, or select a candidate having the smallest value width of the entire data included in the extraction range by selecting a plurality or all of the attention points.

Ｌチャンク作成部（１１３）は、上記により定めた取り出し範囲（３２２）の荷札情報（１０６）をＳチャンクプール（１０５）から読み出し、該読み出された荷札情報（１０６）に対応するＳチャンク（１０４）のファイルをＳチャンク記憶装置（１０３）から読み出し、新しいファイルにその内容を第１３図で示すファイル形式で書き込む。該ファイルがＬチャンク（１０８）に相当する。該ファイルはＬチャンク記憶装置（１０７）に格納され、既存のＬチャンク（１０８）と重複しないファイル名が割り付けられる。この例では通し番号を付与し、「Ｌ０３」というファイル名で該ファイルを作成している。さらに、取り出した荷札情報（１０６）は、Ｓチャンクプール（１０５）から削除する。 The L chunk creation unit (113) reads the tag information (106) in the retrieval range (322) determined as described above from the S chunk pool (105), and reads the S chunk (106) corresponding to the read tag information (106). 104) is read from the S chunk storage device (103), and the contents are written in a new file in the file format shown in FIG. The file corresponds to the L chunk (108). The file is stored in the L chunk storage device (107), and a file name that does not overlap with the existing L chunk (108) is assigned. In this example, a serial number is assigned and the file is created with the file name “L03”. Further, the retrieved tag information (106) is deleted from the S chunk pool (105).

なお、チャンク生成装置（１００）は、Ｌチャンク記憶装置（１０７）内に格納されているＬチャンク（１０８）のファイルを、ＮＦＳあるいはＣＩＦＳあるいはＦＴＰあるいはＨＴＴＰあるいはＷｅｂＤＡＶあるいはＧｏｏｇｌｅＦＳあるいはＨａｄｏｏｐＤＦＳのようなファイル共有手段を用いて、ネットワーク（１０９）を介して他の計算機に公開してもよい。 The chunk generation device (100) converts the file of the L chunk (108) stored in the L chunk storage device (107) into a file such as NFS, CIFS, FTP, HTTP, WebDAV, GoogleFS, or Hadoop DFS. You may make it public to another computer via a network (109) using a sharing means.

第１３図は作成されたＬ０３というファイル名のＬチャンク（１０８）のファイルの内容を示す。
該ファイルは４つのブロックから構成され、第２〜第４ブロックは、上記手順において読み出したＳチャンク（１０４）のファイル内容をそのまま写し書きしたものである。本例では、取り出し範囲（３２２）はＳ７、Ｓ２、Ｓ１の３つのＳチャンクに対応する荷札情報を示しているので、これらのＳチャンクのファイル内容を順に写し書きし、区切り符号「［ＥＯＢ］」を末尾に付加する。これらのブロックは、センサ情報の情報本体が格納されている。 FIG. 13 shows the contents of the created file of the L chunk (108) having the file name L03.
The file is composed of four blocks, and the second to fourth blocks are a copy of the file contents of the S chunk (104) read in the above procedure. In this example, the extraction range (322) indicates tag information corresponding to the three S chunks S7, S2, and S1, so the file contents of these S chunks are copied in order, and the delimiter “[EOB] "At the end. These blocks store information bodies of sensor information.

さらに、該ファイルの先頭ブロックは、後続ブロックの荷札情報を、後続ブロックの個数の行数だけ並べ、最後に区切り符号を付加したものである。ここで荷札情報とは、情報本体を含んでおらず、対応する情報本体へのポインタと、検索に用いる主キーの値という情報のみが含まれているものを意味する。例えば１行目の例では、情報本体へのポインタとして１つ目のＳチャンク本体が格納されているブロック（すなわち第２ブロック）がファイル先頭から何バイト目から始まるかというオフセット情報の１６進数表記と、キー「Ａ」に対して値範囲が「０〜４５」であるという情報が格納されている。 Further, the head block of the file is a tag in which tag information of the subsequent block is arranged by the number of lines of the number of the subsequent block, and a delimiter is added at the end. Here, the tag information means information that does not include the information body but includes only information such as a pointer to the corresponding information body and the value of the primary key used for the search. For example, in the example of the first line, the hexadecimal notation of the offset information indicating the number of bytes starting from the beginning of the file in which the first S chunk body is stored as a pointer to the information body (that is, the second block). And information indicating that the value range is “0 to 45” for the key “A”.

このように、Ｌチャンク（１０８）の先頭ブロックに荷札情報が記載されていることにより、Ｌチャンク（１０８）のファイルの読み取りおよび内部で必要なセンサ情報を探し出す処理が、次のように効率的に実現できる。以下、本発明に係るチャンク読み取り装置の動作について説明する。 As described above, since the tag information is described in the first block of the L chunk (108), the process of reading the file of the L chunk (108) and searching for necessary sensor information therein is efficiently performed as follows. Can be realized. The operation of the chunk reading apparatus according to the present invention will be described below.

ネットワーク（１０９）に接続されたチャンク読み取り装置は、チャンク生成装置（１００）内に生成されたＬチャンクのファイルを、上記のファイル共有手段を用いて転送する。
次に、以下の手順によって該ファイルの内容を読み取り、探索を行う。 The chunk reading device connected to the network (109) transfers the L chunk file generated in the chunk generation device (100) using the file sharing means.
Next, the contents of the file are read and searched according to the following procedure.

説明のため、例えば該チャンク読み取り装置はＬチャンクのファイル「Ｌ０３」を転送してきており、その中から、高度が１０ｍ〜４０ｍの範囲で計測された気温を調査したいとする。
まず、該装置は該ファイルのうち、先頭ブロックのみをメモリ上に読み出す。
次に該装置は、先頭ブロックのそれぞれの行に記載される荷札情報について、主キーの値の範囲と上記調査範囲との重なり集合が空集合でない行を抽出する。この例では、「Ａ＝０〜４５」という範囲の情報と「１０〜４０」という調査範囲とは、重なり集合が１０〜４０という空でない範囲を持つので、この行が抽出される。その他の行は、重なりが空集合になるので、抽出されない。 For the sake of explanation, for example, it is assumed that the chunk reading device has transferred the file “L03” of the L chunk, and wants to investigate the temperature measured in an altitude range of 10 m to 40 m.
First, the apparatus reads only the first block of the file into the memory.
Next, for the tag information described in each row of the first block, the device extracts a row in which the overlapping set of the primary key value range and the survey range is not an empty set. In this example, the information in the range of “A = 0 to 45” and the survey range of “10 to 40” have a non-empty range in which the overlapping set is 10 to 40, so this row is extracted. Other rows are not extracted because the overlap is an empty set.

次いで、該装置は、抽出された行の荷札情報に記載されるポインタに従って、情報本体の位置を特定する。
この例では、１行目の「＿ｏｆｆｓｅｔ＝」という部分に続く１６進数数値が、対応する情報本体が格納されている第２ブロックに対する当該ファイル内での先頭からのオフセット情報である。 Next, the device specifies the position of the information body according to the pointer described in the tag information of the extracted row.
In this example, the hexadecimal value following the portion “_offset =” on the first line is offset information from the beginning in the file for the second block in which the corresponding information body is stored.

次いで、該装置は、該位置から情報本体をメモリ上に読み出す。すなわちファイルのシークを行い、その位置から、区切り符号が出現するまでファイルを読み出す。 The device then reads the information body from the location onto the memory. That is, the file is sought, and the file is read from that position until a delimiter appears.

次いで、該装置は、メモリ上の読み出した該情報本体の各行を調べ、上記調査範囲に合致する行を抽出する。
この例では、第２ブロックには３行分のセンサ情報が記載されており、その中で高度が１０〜４０という調査範囲に合致するのは「Ａ＝３５，Ｄ＝２０１１１０６１１，Ｔ＝２３．５」という行のみなので、この行が抽出される。これが探索結果となるので、結論として調査したい気温は「摂氏２３．５度」であるという結果を得る。 Next, the apparatus examines each row of the read information body on the memory, and extracts a row that matches the examination range.
In this example, the sensor information for three rows is described in the second block, and among them, it is “A = 35, D = 201110611, T = 23. Since this is only the line “5”, this line is extracted. Since this is a search result, the conclusion is that the temperature to be investigated is “23.5 degrees Celsius”.

上記一連の手順において、該Ｌチャンクには合計９個のセンサ情報が格納されているにもかかわらず、実際にセンサ情報本体をファイルからメモリ上に読み取った行数は、３行であった。また、荷札情報まで含めても、３＋３＝６行であった。仮に、９つのセンサ情報がフラットに記載されているファイルを読み取って同様の探索を行うとすると、９行分の情報をファイルからメモリ上に読み取る処理が必要になるはずであるから、それに比べて上記処理は効率的な読み取りおよび探索処理が実現できていると言える。 In the above-described series of procedures, although the sensor information of 9 pieces is stored in the L chunk, the number of rows in which the sensor information body is actually read from the file into the memory is 3. Even including the tag information, it was 3 + 3 = 6 lines. If a similar search is performed by reading a file in which 9 pieces of sensor information are written in a flat form, it is necessary to read 9 lines of information from the file into the memory. It can be said that the above processing has realized efficient reading and searching processing.

本実施の形態では、各Ｓチャンクが３個、各Ｌチャンクが３×３＝９個のセンサ情報を含む例で説明したが、これを１００×１００個あるいは１０００×１０００個などとした場合、上記の効率差はさらに拡大し、本実施形態における上記処理の優位性がさらに増す。この効率性は、本実施形態におけるＬチャンクのファイル形式が、それ単独である種の木構造を形成していることに起因している。すなわちＬチャンクは、先頭ブロックが木の１階層目のノード（ルートノード）、第２〜末尾ブロックが木の２階層目のノードとなるような、２階層の木構造を形成している。 In the present embodiment, an example in which each S chunk includes 3 pieces of sensor information and each L chunk includes 3 × 3 = 9 pieces of sensor information has been described. However, when this is set to 100 × 100 pieces or 1000 × 1000 pieces, The efficiency difference is further increased, and the superiority of the processing in the present embodiment is further increased. This efficiency is due to the fact that the file format of the L chunk in this embodiment forms a tree structure that is a single type. That is, the L chunk forms a two-layer tree structure in which the first block is a node (root node) in the first layer of the tree and the second to last blocks are nodes in the second layer of the tree.

本実施の形態の例ではルートノードが３つの後続ブロックを持ち、各後続ブロックは３行分のセンサ情報を含むので、３分木を形成している。このように木構造であるため、木構造の特質を利用した上述のようなチャンク読み取り装置を用いることによって、効率的にＬチャンクの中のセンサ情報を探索することができる。 In the example of the present embodiment, the root node has three subsequent blocks, and each subsequent block includes sensor information for three rows, thus forming a tri-tree. Since it is a tree structure in this way, it is possible to efficiently search for sensor information in the L chunk by using the above-described chunk reading device using the characteristics of the tree structure.

さらにこのＬチャンクのファイル形式は、木構造を形成していながら、すべてテキスト形式で記述されており、バイナリ形式による可読困難な情報を含む必要がないという利点がある。具体的には、第２ブロック以降がセンサ情報のｕＴｕｐｌｅデータ形式による表記をそのまま並べたものであるため、人間にとって可読性が高く、また、センサ情報全体を読み取るプログラムを極めて簡易に作成することができる。 Further, the file format of the L chunk is described in a text format while forming a tree structure, and there is an advantage that it is not necessary to include difficult-to-read information in a binary format. Specifically, since the second and subsequent blocks are arranged in the uTuple data format of sensor information as they are, they are highly readable for humans, and a program for reading the entire sensor information can be created very easily. .

さらに、第１ブロックについても単純な形式であるため、人間にとって可読性が高く、また、荷札情報を読み取るあるいは読み飛ばすプログラムを簡易に作成することができる。このような取り扱い容易性と、木構造による効率性を、両立している点が本実施形態のＬチャンクのファイル形式の特徴の一つである。 Furthermore, since the first block has a simple format, it is highly readable for humans, and a program for reading or skipping tag information can be easily created. One of the characteristics of the file format of the L chunk of the present embodiment is that both such ease of handling and efficiency due to the tree structure are compatible.

本実施形態を用いれば、このような効率的な読み取り処理を可能とするＬチャンクのファイル形式を、２段階の木構造を用いて効率的に生成することができる。このような本実施形態の方式は、従来手法すなわち１段階の木構造すなわち一時プールのみを用いて、９個のセンサ情報を抽出し、その９個のセンサ情報をさらに並べ替えてＬチャンクの上記ファイル構造を生成する方式よりも、該並べ替えの処理が不要となり、効率的にＬチャンクを生成できる。 If this embodiment is used, the file format of the L chunk that enables such an efficient reading process can be efficiently generated using a two-stage tree structure. Such a method of the present embodiment uses the conventional method, that is, one-stage tree structure, that is, only the temporary pool, extracts nine pieces of sensor information, and further rearranges the nine pieces of sensor information to perform the above-described processing of the L chunk. Compared to the method of generating the file structure, the rearrangement process is not necessary, and the L chunk can be generated efficiently.

さらに、本実施形態の方式では、該１段階の木構造のみを用いる従来手法に比べて、一時プールのサイズを小さく保つことが可能となる。なぜならば、近傍のセンサ情報を効果的に集めて取り出し範囲とするためには、取り出す個数に対して母集団の個数が十分に大きくなければならない。この比率が例えば０．１％すなわち１０００倍の個数が必要だったとして、個々のＬチャンクに例えば１０００×１０００＝１００万個のセンサ情報を格納するとき、従来手法では１０００×１０００×１０００＝１０億個のセンサ情報を一時プールが保持しなければならず、処理負荷が高い。 Furthermore, in the method of the present embodiment, the size of the temporary pool can be kept small compared to the conventional method using only the one-stage tree structure. This is because the number of populations must be sufficiently larger than the number to be extracted in order to effectively collect sensor information in the vicinity and set it as the extraction range. For example, when the ratio is 0.1%, that is, 1000 times the number is necessary, when storing 1000 × 1000 = 1 million pieces of sensor information in each L chunk, the conventional method has 1000 × 1000 × 1000 = 10. The temporary pool must hold 100 million pieces of sensor information, and the processing load is high.

本実施形態によれば、一時プールからは１０００個ずつ取り出すので１０００×１０００＝１００万個を保持すればよく、またＳチャンクプールも同じく１０００個ずつ取り出すので１０００×１０００＝１００万個を保持すればよいので、大幅に処理負荷を軽減でき、効率的に目的とするＬチャンクを生成することができる。 According to this embodiment, since 1000 pieces are taken out from the temporary pool, it is sufficient to hold 1000 × 1000 = 1 million pieces, and since the S chunk pool is also taken out 1000 pieces, 1000 × 1000 = 1 million pieces are held. Therefore, the processing load can be greatly reduced, and the target L chunk can be generated efficiently.

なお、本実施の形態においては、一時プールおよびＳチャンクプールという２段階の木構造を用いて最終的な目的とするチャンクのファイル（ここではＬチャンク）を生成した。本発明の範囲はこれに限るものではなく、Ｓチャンクに対して行ったのと同様のやり方をＬチャンクに対しても適用して、Ｌチャンクプールを構成し、Ｌチャンクを複数個集めた「ＬＬチャンク」を生成するような、３段階の木構造によるチャンク生成も可能である。さらには、同様に４段階あるいは５段階といったことも可能であるが、前述した処理容易性の利点が薄れていくため、２〜３段階程度が適切である。 In the present embodiment, a final target chunk file (here, L chunk) is generated using a two-stage tree structure of a temporary pool and an S chunk pool. The scope of the present invention is not limited to this. The same method as that applied to the S chunk is also applied to the L chunk to form an L chunk pool and collect a plurality of L chunks. Chunk generation by a three-stage tree structure that generates “LL chunk” is also possible. Furthermore, although it is possible to have four or five stages in the same manner, about two to three stages are appropriate because the advantages of the ease of processing described above are diminished.

本実施の形態において、一時プール（１０１）およびＳチャンクプール（１０５）は２分木データ構造を用いて実現された。本発明の範囲はこれに限るものではなく、これらの片方あるいは両方に対して、ＡＶＬ木、Ｂ木、Ｂ＋木、Ｒ木などを含む任意の木構造が適用可能である。特に、多次元検索木Ｕｂｉ−ｔｒｅｅを適用することができ、その場合は次に述べる顕著な利点を得ることができる。すなわち、Ｕｂｉ−ｔｒｅｅは複数の値ならびに値範囲を一度に索引として処理可能なデータ構造である。 In the present embodiment, the temporary pool (101) and the S chunk pool (105) are realized using a binary tree data structure. The scope of the present invention is not limited to this, and an arbitrary tree structure including an AVL tree, a B tree, a B + tree, an R tree, or the like can be applied to one or both of them. In particular, the multidimensional search tree Ubi-tree can be applied, and in that case, the following significant advantages can be obtained. That is, Ubi-tree is a data structure that can be processed using a plurality of values and value ranges as an index at a time.

さらに、キーの種類すなわち次元数は数百といった規模で取り扱い可能である。さらに、各データに値を含む次元と含まない次元が共存しても効率的に処理可能である。これらの特徴から、Ｕｂｉ−ｔｒｅｅにはｕＴｕｐｌｅデータ形式で記述されたセンサ情報を直接格納し、探索処理を行うことができる。このようなＵｂｉ−ｔｒｅｅを本実施形態の一時プールならびにＳチャンクプールに用いると、主キーとして全ての値、すなわちｕＴｕｐｌｅデータ全体をそのまま木構造に格納することができる。その結果、全ての値を加味した上で最もよく近傍を形成するセンサ情報あるいは荷札情報の集合を取り出し範囲として得ることができる。 Furthermore, the types of keys, that is, the number of dimensions, can be handled on the scale of several hundreds. Furthermore, even if a dimension including a value and a dimension not including each data coexist, processing can be performed efficiently. From these characteristics, the Ubi-tree can directly store the sensor information described in the u Tuple data format and perform the search process. When such a Ubi-tree is used for the temporary pool and the S chunk pool of this embodiment, all values as the primary key, that is, the entire uTuple data can be stored in the tree structure as it is. As a result, a set of sensor information or tag information that most closely forms the neighborhood after taking all values into consideration can be obtained as the extraction range.

さらに、荷札情報には主キーの範囲を記載するのであるから、全ての値を主キーとして取ることができるＵｂｉ−ｔｒｅｅにおいては、荷札情報に全ての値の範囲を記載することができる。これによって、Ｌチャンクの先頭ブロックに記載される荷札情報には全ての値の範囲が列挙されるため、上記示した例では高度すなわち「Ａ」の範囲条件のみ効率的に探索可能であったが、Ｕｂｉ−ｔｒｅｅを用いれば全ての種類の範囲条件あるいはそれらの組み合わせに対しても効率的なＬチャンク内の探索を可能にできる。なお、この場合、上記で指摘したＬチャンクが形成する「ある種の木構造」とは、自然にＵｂｉ−ｔｒｅｅ構造そのものになっている点を特に指摘しておく。 Further, since the range of the primary key is described in the tag information, in the Ubi-tree that can take all values as the primary key, the range of all values can be described in the tag information. As a result, the range of all values is listed in the tag information described in the first block of the L chunk, so in the example shown above, only the range condition of altitude, that is, “A”, could be searched efficiently. If Ubi-tree is used, an efficient search within the L chunk can be made for all types of range conditions or combinations thereof. In this case, it is particularly pointed out that the “certain tree structure” formed by the L chunk pointed out above is naturally a Ubi-tree structure itself.

以上述べたように、本実施形態においてＵｂｉ−ｔｒｅｅを木構造に用いることにより、荷札情報の生成方式ならびにＬチャンクのファイル形式との相乗効果により、顕著な効果を得ることができる。 As described above, by using Ubi-tree in the tree structure in the present embodiment, a remarkable effect can be obtained by a synergistic effect with the tag information generation method and the L chunk file format.

なお、本発明の装置は、コンピュータとプログラムによっても実現でき、プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。 The apparatus of the present invention can be realized by a computer and a program, and can be recorded on a recording medium or provided through a network.

本発明は情報通信産業に適用することができる。 The present invention can be applied to the information communication industry.

１００：チャンク生成装置
１０１：一時プール
１０２：センサ情報
１０３：Ｓチャンク記憶装置
１０４：Ｓチャンク
１０５：Ｓチャンクプール
１０６：荷札情報
１０７：Ｌチャンク記憶装置
１０８：Ｌチャンク
１１１：Ｓチャンク作成部
１１２：Ｓチャンクプール登録部
１１３：Ｌチャンク作成部
２６１、３２１：注目点
２６２、３２２：取り出し範囲
３０１：Ｓ７の荷札情報 100: Chunk generation device 101: Temporary pool 102: Sensor information 103: S chunk storage device 104: S chunk 105: S chunk pool 106: Tag information 107: L chunk storage device 108: L chunk 111: S chunk creation unit 112: S chunk pool registration unit 113: L chunk creation unit 261, 321: attention point 262, 322: extraction range 301: tag information of S7

Claims

A device for generating a chunk, which is a file listing a plurality of information,
The device maintains a temporary pool, a first chunk and a chunk pool;
The temporary pool manages information in a tree structure,
The first chunk is a file that lists a smaller number of information than the number of information that should be included in the chunk that the device intends to generate,
The chunk pool manages tag information indicating a data range of each first chunk and identification information of the first chunk in a tree structure,
A function of registering information in the temporary pool;
A function of extracting a neighboring information group in a tree structure from the temporary pool and creating the first chunk;
A function of generating tag information of the first chunk and registering it in the chunk pool;
A function of extracting a neighboring tag information group in a tree structure from the chunk pool and creating a second chunk including at least information included in the first chunk indicated by the tag information;
A chunk generation device characterized by comprising:

The chunk generation device according to claim 1,
The function of creating the second chunk is as follows:
When creating the second chunk from the information contained in k first chunks,
The second chunk includes at least index information for searching the position of the k first chunk contents in a file and the arrangement of the k first chunk contents as a constituent element. .

The chunk generation device according to claim 1, wherein:
The temporary pool and the chunk pool use a UBI-Tree search tree algorithm as the tree structure,
The data range of the first chunk indicated by the tag information of the first chunk is expressed as a set of values for a plurality of keys,
The chunk generation device, wherein the index information of the second chunk is index information for searching a file position of the contents of the first chunk from a set of values for a plurality of keys.

A device that reads chunks,
A function of reading the index information with the chunk created by the chunk generation device according to claim 2 as input;
A function to select index information that matches the search conditions;
A function to move the file reading position to the position in the file indicated by the index information;
A function of reading the contents of the first chunk from the position in the moved file;
A chunk reading apparatus characterized by comprising:

The temporary pool, the first chunk, and the chunk pool are held, the temporary pool registration unit, the first chunk creation unit, the tag information generation unit, and the second chunk creation unit are provided, and a chunk that is a file listing a plurality of information is generated. A method performed by a chunk generator ,
And procedures wherein the temporary pool registration unit registers the information on the one o'clock pool, the temporary pool, to manage the information in a tree structure,
Wherein the first chunk creating unit extracts the information group in the vicinity of a tree structure from the one o'clock pool, to create the first chunk listed fewer information than enumerating the number of information should contain chunk to be generated Procedure and
The label information generating unit, said first and generates tag information indicating identification information of the chunk registered in the chunk pool, the chunk pool, tag of the data range of each of the first chunk first chunk Procedures for managing information in a tree structure ;
The second chunk creation unit takes out a neighboring tag information group in a tree structure from the chunk pool and creates a second chunk including at least information included in the first chunk indicated by the tag information;
Chunk generation method to execute in order.

The chunk generation method according to claim 5,
The second chunk creation unit
When creating the second chunk from the information contained in k first chunks,
Chunks, wherein at least that include in the arrangement of the the index information for searching a file position within the k first chunk contents of the first chunk content of the k-number in the second chunk as its components Generation method.

The chunk generation method according to claim 5, wherein:
The temporary pool and the chunk pool use a UBI-Tree search tree algorithm as the tree structure,
The data range of the first chunk indicated by the tag information of the first chunk is expressed as a set of values for a plurality of keys,
The chunk generation method, wherein the index information of the second chunk is index information for searching a file position of the contents of the first chunk from a set of values for a plurality of keys.

Let the computer hold the temporary pool, the first chunk, and the chunk pool,
In the temporary pool, information is managed in a tree structure,
The first chunk is a file that lists a smaller number of information than the number of information that should be included in the chunk to be generated.
In the chunk pool, the tag range information indicating the data range of each of the first chunks and the identification information of the first chunks is managed in a tree structure,
The chunk generation program for implement | achieving each function of any one of Claim 1 thru | or 3 .