JPWO2009028050A1

JPWO2009028050A1 - Multi-core compatible data processing method, multi-core processing apparatus, and program for manipulating tabular data

Info

Publication number: JPWO2009028050A1
Application number: JP2009529900A
Authority: JP
Inventors: 古庄　晋二; 晋二古庄
Original assignee: Turbo Data Laboratories Inc
Current assignee: Turbo Data Laboratories Inc
Priority date: 2007-08-28
Filing date: 2007-08-28
Publication date: 2010-11-25
Anticipated expiration: 2027-08-28
Also published as: WO2009028050A1; JP5208117B2

Abstract

表形式データからマルチコア型処理装置向けデータ構造を構築する方法は、制御ユニットが各演算ユニットの担当レコードに対応するブロック番号を格納する配列を作成し、演算ユニットが担当レコードのレコード順序番号を格納する配列を並列的に作成し、演算ユニットが、データ項目毎に、担当レコードに含まれる項目値にアクセスする項目値アクセス情報を格納する配列を並列的に作成し、演算ユニットが、データ項目毎に、項目値アクセス情報によってアクセスされるように展開された項目値を並列的に作成する。The method for building a data structure for multi-core processing devices from tabular data is to create an array in which the control unit stores the block number corresponding to the record assigned to each arithmetic unit, and the arithmetic unit stores the record sequence number of the assigned record. An array that stores field value access information for accessing field values included in the assigned record is created in parallel for each data item, and an arithmetic unit creates an array for each data item. In addition, the item values expanded so as to be accessed by the item value access information are created in parallel.

Description

本発明は、データ項目に対応した項目値を含むレコードの配列として表される表形式データを複数台の演算ユニットによって分担して操作し、特に、表形式データを構築し、表形式データから項目値を取得するデータ処理方法に関係する。 In the present invention, tabular data represented as an array of records including item values corresponding to data items is shared and operated by a plurality of arithmetic units, and in particular, tabular data is constructed and items are generated from tabular data. It relates to the data processing method for obtaining values.

本発明は、データ項目に対応した項目値を含むレコードの配列として表される表形式データを複数台の演算ユニットによって分担して操作し、特に、表形式データを構築し、表形式データから項目値を取得するマルチコア型処理装置にも関係する。 In the present invention, tabular data represented as an array of records including item values corresponding to data items is shared and operated by a plurality of arithmetic units, and in particular, tabular data is constructed and items are generated from tabular data. It also relates to a multi-core processing device that acquires values.

さらに、本発明は、上記データ処理方法を、マルチコア型プロセッサを備えるコンピュータに実行させるためのプログラム、コンピュータプログラムプロダクト、及び、コンピュータプログラムが記録された記録媒体に関係する。 Furthermore, the present invention relates to a program for causing a computer including a multi-core processor to execute the data processing method, a computer program product, and a recording medium on which the computer program is recorded.

従来、産業上の様々な分野において、大規模データを高速に処理することが求められている。大規模データの処理は、キャッシュやプリフェッチなどによるメモリアクセスの高速化、メモリ自体の高速化、及び、プロセッサの並列化のような演算処理の高速化、といったハードウェア技術の開発、ならびに、データ処理アルゴリズムの開発によって、高速化され続けている。 Conventionally, it is required to process large-scale data at high speed in various industrial fields. For large-scale data processing, development of hardware technologies such as high-speed memory access by cache and prefetch, high-speed memory, and high-speed arithmetic processing such as parallel processing of processors, and data processing The development of algorithms continues to speed up.

本発明者は、大規模データを高速に処理するための基本的なデータ処理アルゴリズム、たとえば、特許文献１に記載されているような、「オンメモリデータ処理アルゴリズム」を提案している。この技術は、表形式データを、従来のようなレコード（すなわち、行）単位ではなく、項目（すなわち、列）単位に成分分解するという考え方に基づいている。より具体的には、表形式データが、（１）レコード順を表す配列と、（２）項目に属する一意の項目値が所定の順序（たとえば、昇順）に並べられた値テーブルと、（３）各レコードに対応する項目値が値テーブルに格納されている位置情報を表す配列とからなるデータ構造によって表現されている。このようなデータ構造を採用することにより、表形式データの検索、ソート、マージ、ジョイン等の処理が高速に実現されている。 The inventor has proposed a basic data processing algorithm for processing large-scale data at high speed, for example, an “on-memory data processing algorithm” as described in Patent Document 1. This technique is based on the idea that the tabular data is decomposed into items (ie, columns) instead of records (ie, rows) as in the prior art. More specifically, the tabular data includes (1) an array representing the record order, (2) a value table in which unique item values belonging to the items are arranged in a predetermined order (for example, ascending order), and (3 ) The item value corresponding to each record is represented by a data structure including an array representing position information stored in the value table. By adopting such a data structure, processes such as retrieval, sorting, merging, and joining of tabular data are realized at high speed.

さらに、本発明者は、メモリ分散型のマルチプロセッサシステム及びメモリ共有型のマルチプロセッサシステムのようなプロセッサの並列化に対応した種々のオンメモリデータ処理アルゴリズムを提案している。たとえば、メモリ分散型のマルチプロセッサシステムに対応した検索・ソートアルゴリズムが特許文献２に記載され、集計アルゴリズムが特許文献３に記載されている。さらに、メモリ共有型のマルチプロセッサシステムに対応した効率的なソートアルゴリズムが特許文献４に記載されている。 Furthermore, the present inventor has proposed various on-memory data processing algorithms corresponding to the parallelization of processors such as a memory distributed multiprocessor system and a memory shared multiprocessor system. For example, Patent Literature 2 describes a search / sort algorithm corresponding to a memory distributed multiprocessor system, and Patent Literature 3 describes a tabulation algorithm. Furthermore, Patent Document 4 describes an efficient sorting algorithm corresponding to a memory-shared multiprocessor system.

ところで、近年、１台のプロセッサの内部に複数（又は多数）のコアを含むプロセッサアーキテクチャが提案されている。マルチコア型プロセッサの一例として、ＣｅｌｌＢｒｏａｄｂａｎｄＥｎｇｉｎｅ^ＴＭが知られている（非特許文献１を参照のこと）。このタイプのプロセッサは、たとえば、マルチメディアデータの高速処理や、分散コンピューティングなどに適用することが意図されている。このアーキテクチャでは、各コアは、大容量ではないが、専用のローカルメモリを有し、他のコアとは独立して演算を行うことができる。実際には、ローカルメモリのメモリ容量はマルチメディアデータ等の処理に不足しているので、外付けのグローバルメモリが設けられている。マルチコア型プロセッサアーキテクチャは、クロックの高速化に頼るのではなく、コアの追加によって並列性が高まり、処理能力が上昇するので、拡張性に優れている。By the way, in recent years, a processor architecture including a plurality of (or many) cores in one processor has been proposed. Cell Broadcast Engine ^TM is known as an example of a multi-core type processor (see Non-Patent Document 1). This type of processor is intended to be applied to, for example, high-speed processing of multimedia data and distributed computing. In this architecture, each core is not large-capacity, but has a dedicated local memory, and can perform calculations independently of other cores. Actually, since the memory capacity of the local memory is insufficient for processing multimedia data or the like, an external global memory is provided. The multi-core processor architecture has excellent extensibility because it increases parallelism and processing capacity by adding cores, rather than relying on clock speedup.

よって、このようなマルチコア型プロセッサアーキテクチャは、マルチメディアデータ処理だけでなく、高速性が要求される種々のアプリケーションに適用されることが望まれている。
"Cell Broadband EngineArchitecture", Version 1.01, October 3, 2006、［平成１９年７月２６日検索］、インターネット(URL:http://cell.scei.co.jp/pdf/CBE_Architecture_v101.pdf) 国際公開第００／１０１０３号公報国際公開第２００５／０４１０６６号公報国際公開第２００５／０４１０６７号公報国際公開第２００６／１２６４６７号公報 Therefore, it is desired that such a multi-core processor architecture is applied not only to multimedia data processing but also to various applications that require high speed.
"Cell Broadband EngineArchitecture", Version 1.01, October 3, 2006, [Search on July 26, 2007], Internet (URL: http://cell.scei.co.jp/pdf/CBE_Architecture_v101.pdf) International Publication No. 00/10103 International Publication No. 2005/041066 International Publication No. 2005/041067 International Publication No. 2006/126467

本発明者は、大規模な表形式データを高速に処理するため、上記の拡張性に優れたマルチコア型プロセッサアーキテクチャを利用する技術の重要性を認識した。 The present inventor has recognized the importance of the technology using the multi-core processor architecture having excellent extensibility in order to process large-scale tabular data at high speed.

しかし、データベースのように大規模なメモリを用いるアプリケーションでは、処理されるべきデータの全部はコアに付随するローカルメモリに収容できないので、データ処理アルゴリズムの複雑性が増す。たとえば、各コアに付随するローカルメモリに収容できない程に大きなデータをランダムアクセスすると、外付けのグローバルメモリへのアクセスが頻発し、処理性能が著しく低下する。よって、このような問題を起こさない新たなデータ構造が必要とされる。 However, in an application using a large-scale memory such as a database, since all of data to be processed cannot be accommodated in a local memory attached to the core, the complexity of the data processing algorithm increases. For example, if data that is too large to be accommodated in the local memory associated with each core is randomly accessed, access to an external global memory frequently occurs and processing performance is significantly reduced. Therefore, a new data structure that does not cause such a problem is required.

したがって、マルチコア型プロセッサを備えるコンピュータにおいて、データ項目に対応した項目値を含むレコードの配列として表される表形式データを、並列処理性能を低下させることなく、容量の小さい作業用メモリで処理する、データ処理方法を提供できることが好ましい。 Accordingly, in a computer including a multi-core processor, tabular data represented as an array of records including item values corresponding to data items is processed with a small working memory without reducing parallel processing performance. Preferably, a data processing method can be provided.

また、データ項目に対応した項目値を含むレコードの配列として表される表形式データを、並列処理性能を低下させることなく、容量の小さい作業用メモリで処理する、マルチコア型情報処理装置を提供できることが好ましい。 In addition, it is possible to provide a multi-core information processing apparatus that processes tabular data represented as an array of records including item values corresponding to data items, using a work memory with a small capacity without reducing parallel processing performance. Is preferred.

さらに、マルチコア型プロセッサを備えるコンピュータにおいて、データ項目に対応した項目値を含むレコードの配列として表される表形式データを、並列処理性能を低下させることなく、容量の小さい作業用メモリで処理する、プログラム、コンピュータプログラムプロダクト、及び、コンピュータプログラムが記録された記録媒体を提供できることが好ましい。 Furthermore, in a computer having a multi-core processor, tabular data represented as an array of records including item values corresponding to data items is processed with a small working memory without reducing parallel processing performance. It is preferable that a program, a computer program product, and a recording medium on which the computer program is recorded can be provided.

本発明の少なくとも1つの実施例によれば、マルチコア型プロセッサの並列処理性能を低下させることなく、小容量のローカルメモリを使用して表形式データを取り扱うため、表形式データを２タイプのデータ形式によって記述する。第１のタイプのデータは、ローカルメモリに収容できることが保証される程に小さく分割され、グローバルメモリ（又は、ディスク）に保持される配列群である。この第１のタイプのデータは、グローバルメモリからローカルメモリへ一括転送され得るので、ランダムアクセスを行っても遅延を生じない。第２のタイプのデータは、大量のデータをアクセスする際に、必ず所定の順序（たとえば、昇順又は降順）に連続的にアクセスされることが保証され、グローバルメモリ（又は、ディスク）に保持される配列群である。第２のタイプのデータは、そのままではローカルメモリに収容できないので、ローカルメモリに収容可能なサイズずつ、グローバルメモリからローカルメモリへ順次アクセスによって転送される。もちろん、第１のタイプのデータは、順次アクセスによらずに、部分的にグローバルメモリ上に格納されている要素が直接アクセスされることもある。 According to at least one embodiment of the present invention, tabular data is handled in two types of data formats in order to handle tabular data using a small capacity local memory without reducing the parallel processing performance of the multi-core processor. Describe by. The first type of data is a group of arrays that are divided into small pieces so that they can be accommodated in the local memory and are held in the global memory (or disk). Since the first type of data can be transferred from the global memory to the local memory at once, no delay occurs even if random access is performed. The second type of data is guaranteed to be accessed continuously in a predetermined order (eg, ascending or descending order) when accessing a large amount of data, and is held in global memory (or disk). Sequence group. Since the second type of data cannot be accommodated in the local memory as it is, it is transferred by sequential access from the global memory to the local memory in the size that can be accommodated in the local memory. Of course, the first type of data may be directly accessed by an element partially stored in the global memory without being sequentially accessed.

本文書中で、表形式データとは、データ項目に対応した項目値を含むレコードの配列として表されるデータを意味する。 In this document, tabular data means data represented as an array of records including item values corresponding to data items.

また、本文書中で、マルチコア型処理装置又はプロセッサとは、専用のローカルメモリを含む複数台の演算ユニットと、上記複数台の演算ユニットに接続されているグローバルメモリと、上記複数台の演算ユニットを接続するバスと、上記グローバルメモリ及び上記複数台の演算ユニットに接続されている少なくとも１台の制御ユニットと、を備える装置を意味する。 In addition, in this document, the multi-core processing device or processor means a plurality of arithmetic units including a dedicated local memory, a global memory connected to the plurality of arithmetic units, and the plurality of arithmetic units. Means at least one control unit connected to the global memory and the plurality of arithmetic units.

本発明の少なくとも1つの実施例は、上記の成分分解の考え方と、上記の２タイプのデータ形式の考え方とを組み合わせて、表形式データをマルチコア型処理装置上に構築する。 At least one embodiment of the present invention combines the above-described concept of component decomposition and the above-described two types of data format concepts to construct tabular data on a multi-core processing apparatus.

そのため、本発明の少なくとも1つの実施例によれば、表形式データの（複数又は多数の）レコードは、ブロック番号によって識別されるブロックに分割される。初期的には、このブロックは、このブロックに含まれるレコードの処理を担当する演算ユニットに対応している。各演算ユニットが担当するレコードは、本書中で、担当レコードと呼ばれる。そして、このブロック番号が原始レコード位置番号の順番に格納されているブロック番号配列がグローバルメモリ上に作成される。ブロック番号配列は第２のタイプのデータをもつ。原始レコード位置番号とは、元の表形式データの中で各レコードが収容されている位置、たとえば、行番号に対応する。 Thus, according to at least one embodiment of the invention, the tabular data record (s) is divided into blocks identified by block numbers. Initially, this block corresponds to an arithmetic unit in charge of processing the records included in this block. The record that each arithmetic unit is responsible for is referred to as the charge record in this document. Then, a block number array in which the block numbers are stored in the order of the original record position numbers is created on the global memory. The block number array has a second type of data. The original record position number corresponds to a position where each record is accommodated in the original tabular data, for example, a line number.

各演算ユニットは、担当レコードを認識するために、担当レコードのレコード順序番号（初期的には原始レコード位置番号と一致）がレコード順序番号の順番に格納されているレコード順序番号配列にアクセスすることができる。このレコード順序番号配列は、第１のタイプのデータをもち、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。 Each arithmetic unit must access the record sequence number array in which the record sequence number of the record in charge (initially matches the original record position number) is stored in order of the record sequence number in order to recognize the record in charge. Can do. This record sequence number array has the first type of data, and is transferred from the global memory to the local memory in each arithmetic unit as necessary.

さらに、各演算ユニットは、担当レコードに含まれる項目値にアクセスするため、項目値アクセス情報がレコード順序番号の順番に格納されている項目値アクセス情報配列にアクセスすることができる。この項目値アクセス情報配列は、第１のタイプのデータをもち、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。 Furthermore, since each arithmetic unit accesses the item value included in the assigned record, it can access the item value access information array in which the item value access information is stored in the order of the record sequence numbers. This item value access information array has the first type of data, and is transferred from the global memory to the local memory in each arithmetic unit as necessary.

各演算ユニットの担当レコードに含まれる項目値は、データ項目毎に各演算ユニットが項目値アクセス情報配列を用いてアクセスすることができるようにグローバルメモリに保持され、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。 The item values included in the record in charge of each arithmetic unit are held in the global memory so that each arithmetic unit can access each data item using the item value access information array. It is transferred to the local memory in each arithmetic unit.

表形式データの項目値は、データ項目毎に、一意の項目値が所定の順序（昇順又は降順）に格納されているグローバル項目値配列としてグローバルメモリ上に構築されている。このグローバル項目値配列は第２のタイプのデータをもつ。また、各演算ユニットが項目値アクセス情報配列を用いて担当レコードに含まれる項目値にアクセスするため、データ項目毎に、担当レコードに含まれる項目値を特定するローカル項目値番号が原始レコード位置番号の順番に格納されているローカル項目値番号配列と、ローカル項目値番号によって表される項目値がグローバル項目値配列中に格納されている位置を指定する項目値指定ポインタが所定の順序（昇順又は降順）に格納されている項目値指定ポインタ配列とがグローバルメモリ上に構築され、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。ローカル項目値番号配列及び項目値指定ポインタ配列は第１のタイプのデータをもつ。このように、表形式データの項目値は、データ項目毎に、グローバル項目値配列、ローカル項目値番号配列、及び、項目値指定ポインタ配列の形に展開されている。 The item values of the tabular data are constructed on the global memory as a global item value array in which unique item values are stored in a predetermined order (ascending order or descending order) for each data item. This global item value array has a second type of data. Since each arithmetic unit uses the field value access information array to access the field value included in the assigned record, the local field value number that identifies the field value included in the assigned record is the source record position number for each data item. The local item value number array stored in this order and the item value specification pointer for specifying the position where the item value represented by the local item value number is stored in the global item value array are in a predetermined order (ascending or The item value designation pointer array stored in the descending order) is constructed on the global memory, and transferred from the global memory to the local memory in each arithmetic unit as necessary. The local item value number array and the item value designation pointer array have the first type of data. As described above, the item values of the tabular data are expanded in the form of a global item value array, a local item value number array, and an item value designation pointer array for each data item.

以上の考え方に従って、本発明の一実施例は、
専用のローカルメモリを含む複数台の演算ユニットと、
上記複数台の演算ユニットに接続されているグローバルメモリと、
上記複数台の演算ユニットを接続するバスと、
上記グローバルメモリ及び上記複数台の演算ユニットに接続されている少なくとも１台の制御ユニットと、
を備えるマルチコア型処理装置において、
データ項目に対応した項目値を含むレコードの配列として表され、上記複数台の演算ユニットによって分担して操作される表形式データを、上記グローバルメモリに構築する方法であって、
上記制御ユニットが、上記レコードを各演算ユニットが担当する担当レコードを含むブロックに分割し、各レコードに対応するブロック番号を上記表形式データ中の原始レコード位置番号の順番に格納するブロック番号配列を作成し、上記グローバルメモリに格納するステップと、
上記複数台の演算ユニットが並列的に動作して、上記担当レコードの上記原始レコード位置番号をレコード順序番号の順番に格納するレコード順序番号配列を各演算ユニット内の上記ローカルメモリ上に作成し、上記グローバルメモリへ転送するステップと、
上記複数台の演算ユニットが並列的に動作して、上記担当レコードに含まれる上記項目値にアクセスする項目値アクセス情報を上記レコード順序番号の順番に格納する項目値アクセス情報配列を各演算ユニット内の上記ローカルメモリ上に作成し、上記グローバルメモリへ転送するステップと、
上記複数台の演算ユニットが並列的に動作して、上記担当レコードに含まれる上記項目値が上記項目値アクセス情報を用いてアクセスされるように、データ項目毎に上記項目値を各演算ユニット内の上記ローカルメモリ上に展開し、上記展開された項目値を上記グローバルメモリへ転送するステップと、
を備える方法を提供する。In accordance with the above concept, one embodiment of the present invention is
Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
In a multi-core processing apparatus comprising:
It is represented as an array of records including item values corresponding to data items, and is a method of constructing tabular data that is shared and operated by the plurality of arithmetic units in the global memory,
A block number array in which the control unit divides the record into blocks including records in charge of each arithmetic unit and stores block numbers corresponding to the records in the order of the original record position numbers in the tabular data. Creating and storing in the global memory;
The plurality of arithmetic units operate in parallel to create a record sequence number array that stores the original record position numbers of the assigned records in the order of record sequence numbers on the local memory in each arithmetic unit, Transferring to the global memory;
Item value access information arrays for storing item value access information for accessing the item values included in the record in charge in the order of the record sequence numbers when the plurality of operation units operate in parallel. Creating on the local memory and transferring to the global memory;
The item values are stored in each operation unit for each data item so that the plurality of operation units operate in parallel and the item values included in the assigned record are accessed using the item value access information. Expanding on the local memory and transferring the expanded item value to the global memory;
A method comprising:

上記方法は、
上記制御ユニットが、上記グローバルメモリ上の上記ブロック番号配列を参照して、所定のレコードを含むブロックのブロック番号と上記所定のレコードを担当する演算ユニットとを決定するステップと、
上記制御ユニットが、上記決定された演算ユニットへ上記所定のレコードのレコード順序番号を通知するステップと、
上記レコード順序番号を通知された演算ユニットが、当該演算ユニットの上記担当レコードに関する上記レコード順序番号配列及び前記項目値アクセス情報配列を、上記グローバルメモリから当該演算ユニットの上記ローカルメモリへ転送するステップと、
上記レコード順序番号を通知された演算ユニットが、上記通知されたレコード順序番号が格納されている位置を上記転送されたレコード順序番号配列中で特定するステップと、
上記レコード順序番号を通知された演算ユニットが、上記特定された位置によって指定される項目値アクセス情報を上記転送された項目値アクセス情報配列中で特定するステップと、
上記レコード順序番号を通知された演算ユニットが、データ項目毎に、上記グローバルメモリから、上記特定された項目値アクセス情報によって指定される上記項目値を取得し、上記取得された項目値を上記グローバルメモリへ転送するステップと、
をさらに備える。The above method
The control unit refers to the block number array on the global memory and determines a block number of a block including a predetermined record and an arithmetic unit responsible for the predetermined record;
The control unit notifying the determined arithmetic unit of a record sequence number of the predetermined record;
The arithmetic unit notified of the record sequence number transfers the record sequence number array and the item value access information array related to the record in charge of the arithmetic unit from the global memory to the local memory of the arithmetic unit; ,
The arithmetic unit notified of the record sequence number specifies the position where the notified record sequence number is stored in the transferred record sequence number array; and
The arithmetic unit notified of the record sequence number specifies the item value access information specified by the specified position in the transferred item value access information array;
The arithmetic unit notified of the record sequence number acquires the item value specified by the specified item value access information from the global memory for each data item, and uses the acquired item value as the global value. Transferring to memory;
Is further provided.

また、上記方法は、
上記複数台の演算ユニットが並列的に動作して、データ項目毎に上記項目値を各演算ユニット内の上記ローカルメモリに展開し、上記展開された項目値を上記グローバルメモリに格納するステップが、
上記複数台の演算ユニットが並列的に動作して、データ項目毎に、単一のブロックに含まれる上記項目値を上記グローバルメモリから上記ローカルメモリへ転送し、上記単一のブロックに含まれる項目値のうちの一意の値を所定の順序で格納するローカル項目値作業配列、及び、上記単一のブロックに含まれる上記担当レコードの上記原始レコード位置番号の順番に、上記担当レコードに含まれる項目値が上記ローカル項目値作業配列中に格納されている位置を指定するローカル項目値番号を格納するローカル項目値番号配列を上記ローカルメモリ上に作成し、上記グローバルメモリへ転送するステップと、
上記複数台の演算ユニットが並列的に動作して、データ項目毎に、１対のブロックに関連した、上記ブロックに含まれる上記項目値のうちの一意の値に対応する上記ブロック番号を格納したブロック番号作業配列と、上記ローカル項目値作業配列と、上記ブロックに含まれる上記項目値が上記ローカル項目値作業配列中で格納されている位置を指定するポインタを格納するローカル項目値指定ポインタ作業配列とからなる１対の組から、上記１対のブロックがマージされたブロックに関連した、さらなるブロック番号作業配列と、さらなるローカル項目値作業配列と、さらなるローカル項目値指定ポインタ作業配列とからなる組を作成するマージ処理を実行するステップと、
上記複数台の演算ユニットが並列的かつ階層的に動作して、データ項目毎に、最終的な１個のブロックにマージされるまで上記マージ処理を繰り返し、得られた最終的なブロック番号作業配列と、最終的なローカル項目値作業配列と、最終的なローカル項目値指定ポインタ作業配列とを上記グローバルメモリへ転送するステップと、
上記複数台の演算ユニットが並列的に動作して、データ項目毎に、上記最終的なローカル項目値指定ポインタ作業配列中の要素を上記最終的なブロック番号作業配列中の対応する要素によって指定されたブロック番号毎に分配し所定の順番に並べることにより、上記ローカル項目値番号によって表される上記項目値が、上記項目値を所定の順序で格納するグローバル項目値配列に一致する、上記最終的なローカル項目値作業配列中で格納されている位置を指定するポインタを格納する項目値指定ポインタ配列を作成し上記グローバルメモリへ転送するステップと、
を含む。Also, the above method
The step of operating the plurality of arithmetic units in parallel, expanding the item value for each data item in the local memory in each arithmetic unit, and storing the expanded item value in the global memory,
The plurality of arithmetic units operate in parallel to transfer, for each data item, the item values included in a single block from the global memory to the local memory, and items included in the single block. Items included in the assigned record in the order of the local field value work array for storing unique values among the values in a predetermined order, and the original record position number of the assigned record included in the single block Creating a local item value number array in the local memory for storing a local item value number specifying a position where a value is stored in the local item value work array, and transferring the local item value number array to the global memory;
The plurality of arithmetic units operate in parallel and store, for each data item, the block number corresponding to a unique value among the item values included in the block, related to a pair of blocks. Block number work array, local item value work array, and local item value designation pointer work array for storing a pointer for designating a position where the item value included in the block is stored in the local item value work array A pair consisting of a further block number work array, a further local item value work array, and a further local item value specification pointer work array associated with the block into which the pair of blocks is merged. Performing a merge process to create
The plurality of arithmetic units operate in a parallel and hierarchical manner, and the above merge processing is repeated until each data item is merged into one final block. And transferring the final local field value work array and the final local field value specification pointer work array to the global memory,
The plurality of arithmetic units operate in parallel, and for each data item, an element in the final local item value designation pointer work array is designated by a corresponding element in the final block number work array. By distributing each block number and arranging in a predetermined order, the item value represented by the local item value number matches the global item value array storing the item values in a predetermined order. Creating an item value specification pointer array for storing a pointer for specifying a position stored in a local item value work array and transferring the pointer to the global memory;
including.

本発明の別の実施例によれば、上記方法を実施するマルチコア型処理装置が提供される。本実施例によれば、マルチコア型処理装置は、
専用のローカルメモリを含む複数台の演算ユニットと、
上記複数台の演算ユニットに接続されているグローバルメモリと、
上記複数台の演算ユニットを接続するバスと、
上記グローバルメモリ及び上記複数台の演算ユニットに接続されている少なくとも１台の制御ユニットと、
を備え、
データ項目に対応した項目値を含むレコードの配列として表され、上記複数台の演算ユニットによって分担して操作される表形式データを上記グローバルメモリに構築する。このマルチコア型情報処理装置において、
上記制御ユニットが、上記レコードを各演算ユニットが担当する担当レコードを含むブロックに分割し、各レコードに対応するブロック番号を上記表形式データ中の原始レコード位置番号の順番に格納するブロック番号配列を作成し、上記グローバルメモリに格納する手段を含み、
各演算ユニットが、
他の演算ユニットと並列的に動作して、上記担当レコードの上記原始レコード位置番号をレコード順序番号の順番に格納するレコード順序番号配列を各演算ユニット内の上記ローカルメモリ上に作成し、上記グローバルメモリへ転送する手段と、
他の演算ユニットと並列的に動作して、上記担当レコードに含まれる上記項目値にアクセスする項目値アクセス情報を上記レコード順序番号の順番に格納する項目値アクセス情報配列を各演算ユニット内の上記ローカルメモリ上に作成し、上記グローバルメモリへ転送する手段と、
他の演算ユニットと並列的に動作して、上記担当レコードに含まれる上記項目値が上記項目値アクセス情報を用いてアクセスされるように、データ項目毎に上記項目値を各演算ユニット内の上記ローカルメモリ上に展開し、上記展開された項目値を上記グローバルメモリへ転送する手段と、
を含む。According to another embodiment of the present invention, a multi-core processing apparatus for performing the above method is provided. According to this embodiment, the multi-core processing apparatus is
Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
With
Tabular data that is expressed as an array of records including item values corresponding to data items and is shared and operated by the plurality of arithmetic units is constructed in the global memory. In this multi-core type information processing apparatus,
A block number array in which the control unit divides the record into blocks including records in charge of each arithmetic unit and stores block numbers corresponding to the records in the order of the original record position numbers in the tabular data. Including means for creating and storing in the global memory;
Each arithmetic unit is
Operate in parallel with other arithmetic units, create a record sequence number array in the local memory in each arithmetic unit to store the original record position number of the record in charge in the order of the record sequence number, and Means for transferring to memory;
An item value access information array that operates in parallel with other arithmetic units and stores the item value access information for accessing the item values included in the record in charge in the order of the record sequence numbers. Means for creating on local memory and transferring to the global memory;
Operate in parallel with other arithmetic units, so that the item values included in the record in charge are accessed using the item value access information, the item values for each data item are Means for expanding on a local memory and transferring the expanded item value to the global memory;
including.

また、上記マルチコア型処理装置において、
上記制御ユニットが、
上記グローバルメモリ上の上記ブロック番号配列を参照して、所定のレコードが含まれるブロックのブロック番号と上記所定のレコードを担当する上記演算ユニットとを決定する手段と、
上記決定された演算ユニットへ上記所定のレコードのレコード順序番号を通知する手段と、
をさらに含み、
上記レコード順序番号を通知された演算ユニットが、
当該演算ユニットの上記担当レコードに関する上記レコード順序番号配列及び前記項目値アクセス情報配列を、上記グローバルメモリから当該演算ユニットの上記ローカルメモリへ転送する手段と、
上記通知されたレコード順序番号が格納されている位置を上記転送されたレコード順序番号配列中で特定する手段と、
上記特定された位置によって指定される項目値アクセス情報を上記転送された項目値アクセス情報配列中で特定する手段と、
データ項目毎に、上記グローバルメモリから、上記特定された項目値アクセス情報によって指定される上記項目値を取得し、上記取得された項目値を上記グローバルメモリへ転送する手段と、
をさらに含む。In the multi-core processing apparatus,
The control unit is
Means for determining a block number of a block including a predetermined record and the arithmetic unit in charge of the predetermined record with reference to the block number array on the global memory;
Means for notifying the determined arithmetic unit of the record sequence number of the predetermined record;
Further including
The arithmetic unit notified of the record sequence number is
Means for transferring the record sequence number array and the item value access information array relating to the record in charge of the arithmetic unit from the global memory to the local memory of the arithmetic unit;
Means for identifying the position where the notified record sequence number is stored in the transferred record sequence number array;
Means for specifying item value access information specified by the specified position in the transferred item value access information array;
Means for acquiring the item value specified by the specified item value access information from the global memory for each data item, and transferring the acquired item value to the global memory;
Further included.

また、上記マルチコア型処理装置において、
各演算ユニットが、
他の演算ユニットと並列的に動作して、データ項目毎に上記項目値を各演算ユニット内の上記ローカルメモリに展開し、上記展開された項目値を上記グローバルメモリに格納する手段と、
他の演算ユニットと並列的に動作して、データ項目毎に、単一のブロックに含まれる上記項目値を上記グローバルメモリから上記ローカルメモリへ転送し、上記単一のブロックに含まれる項目値のうちの一意の値を所定の順序で格納するローカル項目値作業配列、及び、上記単一のブロックに含まれる上記担当レコードの上記原始レコード位置番号の順番に、上記担当レコードに含まれる項目値が上記ローカル項目値作業配列中に格納されている位置を指定するローカル項目値番号を格納するローカル項目値番号配列を上記ローカルメモリ上に作成し、上記グローバルメモリへ転送する手段と、
他の演算ユニットと並列的に動作して、データ項目毎に、１対のブロックに関連した、上記ブロックに含まれる上記項目値のうちの一意の値に対応する上記ブロック番号を格納したブロック番号作業配列と、上記ローカル項目値作業配列と、上記ブロックに含まれる上記項目値が上記ローカル項目値作業配列中で格納されている位置を指定するポインタを格納するローカル項目値指定ポインタ作業配列とからなる１対の組から、上記１対のブロックがマージされたブロックに関連した、さらなるブロック番号作業配列と、さらなるローカル項目値作業配列と、さらなるローカル項目値指定ポインタ作業配列とからなる組を作成するマージ処理を実行する手段と、
他の演算ユニットと並列的かつ階層的に動作して、データ項目毎に、最終的な１個のブロックにマージされるまで上記マージ処理を繰り返し、得られた最終的なブロック番号作業配列と、最終的なローカル項目値作業配列と、最終的なローカル項目値指定ポインタ作業配列とを上記グローバルメモリへ転送する手段と、
他の演算ユニットと並列的に動作して、データ項目毎に、上記最終的なローカル項目値指定ポインタ作業配列中の要素を上記最終的なブロック番号作業配列中の対応する要素によって指定されたブロック番号毎に分配し所定の順番に並べることにより、上記ローカル項目値番号によって表される上記項目値が、上記項目値を所定の順序で格納するグローバル項目値配列に一致する、上記最終的なローカル項目値作業配列中で格納されている位置を指定するポインタを格納する項目値指定ポインタ配列を作成し上記グローバルメモリへ転送する手段と、
をさらに含む。In the multi-core processing apparatus,
Each arithmetic unit is
Means for operating in parallel with other arithmetic units, expanding the item values for each data item in the local memory in each arithmetic unit, and storing the expanded item values in the global memory;
By operating in parallel with other arithmetic units, the item values included in a single block are transferred from the global memory to the local memory for each data item, and the item values included in the single block are transferred. The field values included in the assigned record are in the order of the local field value work array for storing the unique values of them in a predetermined order and the original record position number of the assigned record included in the single block. Means for creating a local item value number array for storing a local item value number for designating a position stored in the local item value work array on the local memory and transferring the local item value number array to the global memory;
A block number that operates in parallel with another arithmetic unit and stores the block number corresponding to a unique value among the item values included in the block, related to a pair of blocks, for each data item A work array, the local item value work array, and a local item value designation pointer work array for storing a pointer for designating a position where the item value included in the block is stored in the local item value work array. From this pair, a pair consisting of a further block number work array, a further local item value work array, and a further local item value specification pointer work array related to the block in which the pair of blocks is merged is created. Means for executing the merge process;
It operates in parallel and hierarchically with other arithmetic units, repeats the merging process until each data item is merged into one final block, and the final block number work array obtained, Means for transferring the final local field value work array and the final local field value specification pointer work array to the global memory;
A block that operates in parallel with another arithmetic unit and for each data item, the element in the final local item value designation pointer work array is designated by the corresponding element in the final block number work array. The final local, wherein the item values represented by the local item value numbers match the global item value array storing the item values in a predetermined order by distributing by number and arranging in a predetermined order Means for creating an item value designation pointer array for storing a pointer for designating a position stored in the item value work array and transferring it to the global memory;
Further included.

さらに、本発明の別の実施例によれば、
専用のローカルメモリを含む複数台の演算ユニットと、
上記複数台の演算ユニットに接続されているグローバルメモリと、
上記複数台の演算ユニットを接続するバスと、
上記グローバルメモリ及び上記複数台の演算ユニットに接続されている少なくとも１台の制御ユニットと、
を備えるコンピュータにロードされ、データ項目に対応した項目値を含むレコードの配列として表され、上記複数台の演算ユニットによって分担して操作される表形式データを、上記グローバルメモリに構築する方法を上記コンピュータに実行させるためのコンピュータプログラムプロダクトが提供される。Furthermore, according to another embodiment of the present invention,
Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
A method for constructing tabular data in the global memory, which is loaded into a computer comprising: A computer program product for causing a computer to execute is provided.

さらに、本発明の別の実施例によれば、
専用のローカルメモリを含む複数台の演算ユニットと、
上記複数台の演算ユニットに接続されているグローバルメモリと、
上記複数台の演算ユニットを接続するバスと、
上記グローバルメモリ及び上記複数台の演算ユニットに接続されている少なくとも１台の制御ユニットと、
を備えるコンピュータにロードされ、データ項目に対応した項目値を含むレコードの配列として表され、上記複数台の演算ユニットによって分担して操作される表形式データを、上記グローバルメモリに構築する方法を上記コンピュータに実行させるためのコンピュータプログラムが記録された記憶媒体が提供される。Furthermore, according to another embodiment of the present invention,
Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
A method for constructing tabular data in the global memory, which is loaded into a computer including the data and is represented as an array of records including item values corresponding to data items, and which is shared and operated by the plurality of arithmetic units. A storage medium in which a computer program to be executed by a computer is recorded is provided.

本発明の少なくとも1つの実施例によれば、表形式データが、マルチコア型プロセッサの各演算ユニット内のローカルメモリに収容できるように分割されたデータ群と、所定の順序に連続的にアクセス可能であるデータ群とによって表現される。これにより、各演算ユニットからグローバルメモリへの効率的なアクセスが実現される。よって、本発明の少なくとも1つの実施例によれば、マルチコア型プロセッサを備えるコンピュータにおいて、大規模な表形式データを高速に処理することが可能になる。 According to at least one embodiment of the present invention, tabular data can be continuously accessed in a predetermined order and a group of data divided so as to be accommodated in a local memory in each arithmetic unit of a multi-core processor. It is expressed by a certain data group. Thereby, an efficient access from each arithmetic unit to the global memory is realized. Therefore, according to at least one embodiment of the present invention, large-scale tabular data can be processed at high speed in a computer having a multi-core processor.

本発明の一実施形態によるマルチコア型処理装置の概略図である。It is the schematic of the multi-core type processing apparatus by one Embodiment of this invention. 本発明の一実施形態によるコンピュータシステムの概略図である。1 is a schematic diagram of a computer system according to an embodiment of the present invention. 本発明の一実施形態の基礎となるデータ管理機構を説明するための表形式データの一例を表す図である。It is a figure showing an example of the tabular data for demonstrating the data management mechanism used as the foundation of one Embodiment of this invention. 本発明の一実施形態の基礎となる基本的なデータ管理機構の説明図である。It is explanatory drawing of the basic data management mechanism used as the foundation of one Embodiment of this invention. 本発明の一実施形態によるマルチコア型情報処理装置向けデータ構造の説明図である。It is explanatory drawing of the data structure for multi-core type information processing apparatuses by one Embodiment of this invention. 本発明の一実施形態によるマルチコア型情報処理装置向けデータ構造の説明図である。It is explanatory drawing of the data structure for multi-core type information processing apparatuses by one Embodiment of this invention. 本発明の一実施形態によるマルチコア型情報処理装置向けデータ構造の説明図である。It is explanatory drawing of the data structure for multi-core type information processing apparatuses by one Embodiment of this invention. 本発明の一実施形態によるマルチコア型情報処理装置向けデータ構造の説明図である。It is explanatory drawing of the data structure for multi-core type information processing apparatuses by one Embodiment of this invention. 本発明の一実施形態によるマルチコア型処理装置向けデータ構造をグローバルメモリ上に構築する方法のフローチャートである。3 is a flowchart of a method for constructing a data structure for a multi-core processing device on a global memory according to an embodiment of the present invention. 本発明の一実施形態によるマルチコア型処理装置向けデータ構造における項目値取得方法のフローチャートである。It is a flowchart of the item value acquisition method in the data structure for multi-core type processing apparatuses by one Embodiment of this invention. 本発明の一実施形態によるコンパイル処理の概略的なフローチャートである。It is a schematic flowchart of the compilation process by one Embodiment of this invention. 本発明の一実施形態による順序情報作成処理の説明図である。It is explanatory drawing of the order information creation process by one Embodiment of this invention. 本発明の一実施形態による順序情報作成処理の説明図である。It is explanatory drawing of the order information creation process by one Embodiment of this invention. 本発明の一実施形態によるブロック内コンパイル処理の概要図である。It is a schematic diagram of the compile processing in a block by one Embodiment of this invention. 本発明の一実施形態によるブロック内コンパイル処理の概要図である。It is a schematic diagram of the compile processing in a block by one Embodiment of this invention. 本発明の一実施形態によるブロック内コンパイル処理の概要図である。It is a schematic diagram of the compile processing in a block by one Embodiment of this invention. ブロック内コンパイル処理の一実施例の概要図である。It is a schematic diagram of one Example of the compile processing in a block. ブロック内コンパイル処理の初期化処理の説明図である。It is explanatory drawing of the initialization process of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の１段目のマージ処理の説明図である。It is explanatory drawing of the merge process of the 1st step of the compilation process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process of the compile process in a block. ブロック内コンパイル処理の３段目のマージ処理の説明図である。It is explanatory drawing of the 3rd merge process of the compile processing in a block. ブロック内コンパイル処理の３段目のマージ処理の説明図である。It is explanatory drawing of the 3rd merge process of the compile processing in a block. ブロック内コンパイル処理の３段目のマージ処理の説明図である。It is explanatory drawing of the 3rd merge process of the compile processing in a block. ブロック内コンパイル処理の３段目のマージ処理の説明図である。It is explanatory drawing of the 3rd merge process of the compile processing in a block. ブロック内コンパイル処理の値リスト作成処理の説明図である。It is explanatory drawing of the value list creation process of the compile process in a block. 本発明の一実施形態によるブロック間コンパイル処理におけるマージ処理の概要図である。It is a schematic diagram of the merge process in the inter-block compilation process by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理におけるマージ処理の概要図である。It is a schematic diagram of the merge process in the inter-block compilation process by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理におけるマージ処理の概要図である。It is a schematic diagram of the merge process in the inter-block compilation process by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理が適用されるブロック内コンパイル処理の結果の説明図である。It is explanatory drawing of the result of the compile processing in a block to which the compile processing between blocks by one Embodiment of this invention is applied. 本発明の一実施形態によるブロック間コンパイル処理が適用されるブロック内コンパイル処理の結果の説明図である。It is explanatory drawing of the result of the compile processing in a block to which the compile processing between blocks by one Embodiment of this invention is applied. 本発明の一実施形態によるブロック間コンパイル処理における１段目のマージ処理の説明図である。It is explanatory drawing of the 1st merge process in the compiling process between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における１段目のマージ処理の説明図である。It is explanatory drawing of the 1st merge process in the compiling process between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における１段目のマージ処理の説明図である。It is explanatory drawing of the 1st merge process in the compiling process between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process in the compile processing between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process in the compile processing between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process in the compile processing between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における２段目のマージ処理の説明図である。It is explanatory drawing of the 2nd merge process in the compile processing between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における３段目のマージ処理の結果を説明する図である。It is a figure explaining the result of the merge process of the 3rd step in the inter-block compilation process by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における３段目のマージ処理の結果を説明する図である。It is a figure explaining the result of the merge process of the 3rd step in the inter-block compilation process by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における分配処理の説明図である。It is explanatory drawing of the distribution process in the compile processing between blocks by one Embodiment of this invention. 本発明の一実施形態によるブロック間コンパイル処理における分配処理の結果を説明する図である。It is a figure explaining the result of the distribution process in the compile processing between blocks by one Embodiment of this invention. 本発明の代替的な実施形態によるブロック間コンパイル処理におけるブロックグループ化処理の説明図である。It is explanatory drawing of the block grouping process in the block compiling process by alternative embodiment of this invention. 本発明の代替的な実施形態によるブロック間コンパイル処理におけるブロックグループ化処理の結果を説明する図であるIt is a figure explaining the result of the block grouping process in the compile processing between blocks by alternative embodiment of this invention

Explanation of symbols

１００マルチコア型処理装置
１０１マルチコア型プロセッサチップ
１１０，１２０，１３０，１４０演算ユニット
１１１，１２１，１３１，１４１コア
１１２，１２２，１３２，１４２ローカルメモリ
１５０チップ内バス
１６０，１６１，１６２，１６３バス
１７０，１７１，１７２，１７３グローバルメモリ
２００コンピュータシステム
２０２マルチコア型処理装置
２１０ＣＰＵ
２１２ＲＡＭ
２１４ＲＯＭ
２１６固定記憶装置
２１８ＣＤ−ＲＯＭ
２２０ＣＤ−ＲＯＭドライバ
２２２Ｉ／Ｆ
２２４入力装置
２２６表示装置
２２８バス
５００表形式データ
５０１データ項目「Ｓｃｈｏｏｌ」
５０２データ項目「Ａｇｅ」
５１０レコード０
５１１レコード１４
５２０，５２１，・・・，５２７ブロック
５３０順序情報
５３１項目情報「Ｓｃｈｏｏｌ」
５３２項目情報「Ａｇｅ」
５４０ブロック番号配列
５５１−０，５５１−１，・・・，５５１−７レコード順序番号配列
５５２−０，５５２−１，・・・，５５２−７項目値アクセス情報配列
５６０−０，５６０−１，・・・，５６０−７ブロック情報「Ｓｃｈｏｏｌ」
５６１−０，５６１−１，・・・，５６１−７ローカル項目値番号配列「Ｓｃｈｏｏｌ」
５６２−０，５６２−１，・・・，５６２−７項目値指定ポインタ配列「Ｓｃｈｏｏｌ」
５７０グローバル項目値配列「Ｓｃｈｏｏｌ」
５８０−０，５８０−１，・・・，５８０−７ブロック情報「Ａｇｅ」
５８１−０，５８１−１，・・・，５８１−７ローカル項目値番号配列「Ａｇｅ」
５８２−０，５８２−１，・・・，５８２−７項目値指定ポインタ配列「Ａｇｅ」
５９０グローバル項目値配列「Ａｇｅ」DESCRIPTION OF SYMBOLS 100 Multicore type processor 101 Multicore type processor chip 110,120,130,140 Arithmetic unit 111,121,131,141 Core 112,122,132,142 Local memory 150 In-chip bus 160,161,162,163 Bus 170, 171, 172, 173 Global memory 200 Computer system 202 Multi-core processor 210 CPU
212 RAM
214 ROM
216 Fixed storage device 218 CD-ROM
220 CD-ROM driver 222 I / F
224 Input device 226 Display device 228 Bus 500 Tabular data 501 Data item “School”
502 Data item “Age”
510 records 0
511 Record 14
520, 521, ..., 527 Block 530 Order information 531 Item information "School"
532 Item information "Age"
540 Block number array 551-0, 551-1, ..., 551-7 Record sequence number array 552-0, 552-1, ..., 552-7 Item value access information array 560-0, 560-1 , ..., 560-7 Block information "School"
561-0, 561-1, ..., 561-7 Local item value number array "School"
562-0, 562-1, ..., 562-7 Item value designation pointer array "School"
570 Global Item Value Array “School”
580-0, 580-1,..., 580-7 Block information “Age”
581-0, 581-1,..., 581-7 Local item value number array “Age”
582-0, 582-1, ..., 582-7 Item value designation pointer array "Age"
590 Global Item Value Array “Age”

以下、本発明を実施するための種々の形態を図面と共に詳細に説明する。 Hereinafter, various embodiments for carrying out the present invention will be described in detail with reference to the drawings.

［マルチコア型処理装置］
最初に、本発明の一実施例によるデータ処理を実現するマルチコア型処理装置について説明する。図１はマルチコア型処理装置の一実施形態の概略図である。マルチコア型処理装置１００は、マルチコア型プロセッサチップ１０１上に複数台（たとえば、２台、４台、８台等、本例では４台）の演算ユニット１１０、１２０、１３０、１４０が設けられている。各演算ユニット１１０、１２０、１３０、１４０は、データ処理用のコア１１１、１２１、１３１、１４１とコア専用のローカルメモリ１１２、１２２、１３２、１４２とを含む。各演算ユニット１１０、１２０、１３０、１４０は、チップ内のバス１５０によって接続されている。このバス１５０は、好ましくは、リング型バスである。演算ユニットは、チップ内のバス１５０によって接続されているので、高速にデータ通信することが可能である。さらに、各演算ユニット１１０、１２０、１３０、１４０は、ＤＭＡ転送をサポートするバス１６０、１６１、１６２、１６３を介して、チップ１０１に外付けされたグローバルメモリ１７０、１７１、１７２、１７３と接続されている。[Multi-core processing equipment]
First, a multi-core type processing apparatus that implements data processing according to an embodiment of the present invention will be described. FIG. 1 is a schematic view of an embodiment of a multi-core processing apparatus. The multi-core processing apparatus 100 is provided with a plurality of (for example, two, four, eight, etc., four in this example) arithmetic units 110, 120, 130, 140 on a multi-core processor chip 101. . Each arithmetic unit 110, 120, 130, 140 includes a core 111, 121, 131, 141 for data processing and a local memory 112, 122, 132, 142 dedicated to the core. Each arithmetic unit 110, 120, 130, 140 is connected by a bus 150 in the chip. This bus 150 is preferably a ring bus. Since the arithmetic units are connected by a bus 150 in the chip, data communication can be performed at high speed. Furthermore, each arithmetic unit 110, 120, 130, 140 is connected to global memory 170, 171, 172, 173 externally attached to the chip 101 via buses 160, 161, 162, 163 that support DMA transfer. ing.

チップ内のローカルメモリ１１２、１２２、１３２、１４２の記憶容量は、たとえば、２５６ＫＢ（キロバイト）程度であり、一方、グローバルメモリ１７０、１７１、１７２、１７３は数十ＧＢ（ギガバイト）の大容量メモリである。同図では、グローバルメモリ１７０、１７１、１７２、１７３が区別して記載されている。これは、各コアからグローバルメモリへ１本のバスでアクセスすると、バスの通信性能がボトルネックとなるので、各コアに専用のメモリインターフェイス（図示せず）を設け、外付けのグローバルメモリへはこのメモリインターフェイスを介してアクセスすることを示している。もちろん、このような構成であっても、ＮＵＭＡ（不均一メモリアクセス）方式のように、グローバルメモリが全体として論理的に連続した１つのメモリとして見えるように管理することは可能である。代替的な実施形態では、各演算ユニットは、１つのバスを介して物理的に一体的な外付けのグローバルメモリに接続される。 The storage capacity of the local memories 112, 122, 132, 142 in the chip is about 256 KB (kilobytes), for example, while the global memories 170, 171, 172, 173 are large-capacity memories of several tens GB (gigabytes). is there. In the figure, the global memories 170, 171, 172, and 173 are distinguished from each other. This is because when each core accesses the global memory with a single bus, the communication performance of the bus becomes a bottleneck. Therefore, a dedicated memory interface (not shown) is provided for each core. It shows access through this memory interface. Of course, even with such a configuration, it is possible to manage the global memory so that it appears as one logically continuous memory as a whole, as in the NUMA (non-uniform memory access) system. In an alternative embodiment, each computing unit is connected to a physically integrated external global memory via one bus.

さらに、上記のＣｅｌｌＢｒｏａｄｂａｎｄＥｎｇｉｎｅ^ＴＭのようなプロセッサでは、１チップ内には、汎用プロセッサコアと、演算用プロセッサコアとが搭載されている。汎用プロセッサコアは複数台の演算用プロセッサコアの動作を制御することが可能である。したがって、マルチコア型プロセッサは、好ましくは、汎用プロセッサコアのような制御ユニットを備えるが、制御ユニットは、チップ内に搭載する必要はなく、チップの外部に設けられることもある。Furthermore, in a processor such as the above-mentioned Cell Broadband Engine ^™ , a general-purpose processor core and an arithmetic processor core are mounted in one chip. The general-purpose processor core can control operations of a plurality of arithmetic processor cores. Therefore, the multi-core type processor preferably includes a control unit such as a general-purpose processor core, but the control unit does not need to be mounted in the chip and may be provided outside the chip.

制御ユニットと演算ユニット、又は、演算ユニット同士は、たとえば、メールボックスやシグナル機構を用いて通信することが可能である。 The control unit and the arithmetic unit, or the arithmetic units can communicate with each other using, for example, a mailbox or a signal mechanism.

［コンピュータシステム構成］
図２は、本発明の一実施形態による表形式データを操作するコンピュータシステム２００の概略図である。コンピュータシステム２００は、データ項目に対応した項目値を含むレコードの配列として表される表形式データを複数台の演算ユニットによって分担して操作する、図１に示されているような、マルチコア型処理装置２０２を備えている。図２に示されているように、コンピュータシステム２００は、さらに、プログラムを実行することによりシステム全体および個々の構成部分を制御するＣＰＵ２１０と、ワークデータ等を記憶する、たとえば、ＲＡＭ(Random Access Memory)のようなメモリ２１２と、プログラム等を記憶するＲＯＭ(Read Only Memory)２１４と、ハードディスク等の固定記憶媒体２１６と、ＣＤ−ＲＯＭ２１８をアクセスするためのＣＤ−ＲＯＭドライバ２２０と、ＣＤ−ＲＯＭドライバ２２０及び外部ネットワーク等（図示せず）へ繋がれた外部端子に接続されているインタフェース（Ｉ／Ｆ）２２２と、キーボード及びマウス等のような入力装置２２４と、コンピュータモニターのような表示装置２２６とを備えている。マルチコア型処理装置２１０、ＲＡＭ２１２、ＲＯＭ２１４、外部記憶媒体２１６、Ｉ／Ｆ２２２、入力装置２２４及び表示装置２２６は、バス２２８を介して相互に接続されている。[Computer system configuration]
FIG. 2 is a schematic diagram of a computer system 200 for manipulating tabular data according to one embodiment of the present invention. The computer system 200 performs multi-core processing as shown in FIG. 1 in which tabular data represented as an array of records including item values corresponding to data items is shared and operated by a plurality of arithmetic units. A device 202 is provided. As shown in FIG. 2, the computer system 200 further stores a CPU 210 that controls the entire system and individual components by executing a program, work data, and the like, for example, a RAM (Random Access Memory). ), A ROM (Read Only Memory) 214 that stores programs, a fixed storage medium 216 such as a hard disk, a CD-ROM driver 220 for accessing the CD-ROM 218, and a CD-ROM driver 220 and an interface (I / F) 222 connected to an external terminal connected to an external network or the like (not shown), an input device 224 such as a keyboard and a mouse, and a display device 226 such as a computer monitor. And. The multi-core processing device 210, the RAM 212, the ROM 214, the external storage medium 216, the I / F 222, the input device 224, and the display device 226 are connected to each other via a bus 228.

表形式データの操作をコンピュータシステム２００のマルチコア型処理装置２０２とＣＰＵ２１０に実行させるプログラムは、ＣＤ−ＲＯＭ２１８に収容され、ＣＤ−ＲＯＭドライバ２２０に読取られても良いし、ＲＯＭ２１４に予め記憶されていても良い。また、いったんＣＤ−ＲＯＭ２１８から読み出したものを、外部記憶媒体２１６の所定の領域に記憶しておいても良い。或いは、上記プログラムは、ネットワーク（図示せず）、外部端子、及び、Ｉ／Ｆ２２０を介して、外部から供給されるものであっても良い。 A program for causing the multi-core processing device 202 and the CPU 210 of the computer system 200 to operate the tabular data is stored in the CD-ROM 218 and may be read by the CD-ROM driver 220 or stored in the ROM 214 in advance. Also good. Further, what is once read from the CD-ROM 218 may be stored in a predetermined area of the external storage medium 216. Alternatively, the program may be supplied from the outside via a network (not shown), an external terminal, and the I / F 220.

また、本発明の一実施形態によるマルチコア型プロセッサシステムは、コンピュータシステム２００に表形式データを操作するプログラムを実行させることにより実現される。 The multi-core processor system according to an embodiment of the present invention is realized by causing the computer system 200 to execute a program for manipulating tabular data.

図２に示されているコンピュータシステム２００では、マルチコア型処理装置２０２の他にＣＰＵ２１０が設けられ、システム全体及び個々の構成部分を制御している。しかし、本発明は、このような実施形態に限定されることはなく、代替的な実施形態では、マルチコア型処理装置２０２に含まれている制御ユニットがシステム全体及び個々の構成部品を制御する。 In the computer system 200 shown in FIG. 2, a CPU 210 is provided in addition to the multi-core processing device 202, and controls the entire system and individual components. However, the present invention is not limited to such an embodiment, and in an alternative embodiment, a control unit included in the multi-core processing apparatus 202 controls the entire system and individual components.

［情報ブロックに基づくデータ管理機構］
図３は本発明の一実施形態の基礎となるデータ管理機構を説明するための表形式データの一例を表す図である。この表形式データは、上述の国際公開第ＷＯ００／１０１０３号に提案したデータ管理機構を用いることにより、コンピュータ内では図４に示されるようなデータ構造として記憶される。このデータ構造は、市販されているコンピュータ、たとえば、パーソナルコンピュータのハードウェア資源、特に、プロセッサ及びメモリを使用して大規模な表形式データの検索、ソート、集計等を実現するために提案された、コンピュータのメモリ上に置かれる表形式データのデータ構造であることに注意すべきである。[Data management mechanism based on information blocks]
FIG. 3 is a diagram showing an example of tabular data for explaining a data management mechanism as a basis of one embodiment of the present invention. This tabular data is stored as a data structure as shown in FIG. 4 in the computer by using the data management mechanism proposed in the above-mentioned International Publication No. WO00 / 10103. This data structure has been proposed to realize retrieval, sorting, aggregation, etc. of large-scale tabular data using hardware resources of commercially available computers, for example, personal computers, in particular, processors and memories. It should be noted that the data structure of tabular data placed on the computer memory.

なお、「元の表形式データ中でレコードが収容されている位置を表す情報レコード番号（すなわち、原始レコード位置番号）」と「レコードの並び順を表す情報（すなわち、レコード順序番号）」とが本文書中では明確に区別されている。すべてのレコードには原始レコード位置番号が関連付けられている。この原始レコード位置番号は、データ項目に対応した項目値を含む個々のレコードを特定するために利用される仮想的な情報である。一般に、表形式データは、レコードが常に原始レコード位置番号の順番に配列されているとは限らない。たとえば、元の表形式データをある項目の項目値に関して昇順にソートすると、得られる表形式データのレコードの並び順は元の表形式データのレコードの並び順とは異なる。但し、元々の表形式データ中のレコードは、レコードが原始レコード位置番号の順番に並べられていることがあり、この場合には、原始レコード位置番号とレコード順序番号とが初期的に一致している。 It should be noted that “information record number representing the position in which the record is accommodated in the original tabular data (ie, source record position number)” and “information representing the record order (ie, record order number)” A clear distinction is made in this document. Every record has a source record position number associated with it. This primitive record position number is virtual information used for specifying individual records including item values corresponding to data items. In general, in the tabular data, records are not always arranged in the order of the original record position numbers. For example, when the original tabular data is sorted in ascending order with respect to the item value of a certain item, the order of the records of the tabular data obtained is different from the order of the records of the original tabular data. However, the records in the original tabular data may be arranged in the order of the source record position numbers. In this case, the source record position number and the record sequence number are initially matched. Yes.

図４に示すように、表形式データの各レコードの並び順の番号（レコード順序番号）と、原始レコード位置番号は、レコード順序指定配列４０１（以下、この配列を「ＯｒｄＳｅｔ」のように略記する。）によって対応付けられる。レコード順序指定配列４０１は、レコード順序番号の順に原始レコード位置番号を格納している。図４の例では、レコードは原始レコード位置番号の順番に並べられている。 As shown in FIG. 4, the order number (record order number) of each record of the tabular data and the original record position number are abbreviated as a record order designation array 401 (hereinafter, this array is abbreviated as “OrdSet”). .). The record order specification array 401 stores the original record position numbers in the order of the record order numbers. In the example of FIG. 4, the records are arranged in the order of the original record position numbers.

ここで、本明細書中での配列の記法について説明する。一般に、配列Ａは、添字をｉとすると、配列の要素がＡ［ｉ］のように表記できるが、図面中では、配列は、配列の要素Ａ［ｉ］は、実線で囲まれた領域内に示され、要素Ａ［ｉ］と要素Ａ［ｉ＋１］の境界は点線で示されている。また、要素Ａ［ｉ］の添字ｉが要素Ａ［ｉ］の左側に示されている。また、配列の添字ｉは０から始まる整数で表されている。 Here, the notation of the arrangement | sequence in this specification is demonstrated. In general, an array element A can be expressed as A [i], where i is a subscript. However, in the drawing, an array element A [i] is within a region surrounded by a solid line. The boundary between the element A [i] and the element A [i + 1] is indicated by a dotted line. The subscript i of element A [i] is shown on the left side of element A [i]. Further, the subscript i of the array is represented by an integer starting from 0.

もう一度図４に戻ると、性別に関しては、表形式データのレコード順序番号＝０に対応する原始レコード位置番号は、配列ＯｒｄＳｅｔ［０］から「０」であることがわかる。原始レコード位置番号が「０」であるレコードに関する実際の性別の値、即ち、「男」又は「女」は、実際の値が所定の順序（たとえば、昇順又は降順）に従ってソートされた値リストである項目値配列４０３（以下、項目値配列、すなわち、値リストを「ＶＬ」のように略記する。）へのポインタ配列である項目値番号配列４０２（以下、項目値番号配列、すなわち、ポインタ配列を「ＶＮｏ」のように略記する。）を参照することによって取得できる。ポインタ配列４０２は、配列ＯｒｄＳｅｔ４０１に格納されている原始レコード位置番号の順番に従って、実際の値リスト４０３中の要素を指し示すポインタを格納している。これにより、表形式データのレコード「０」に対応する性別の項目値は、（１）配列ＯｒｄＳｅｔ４０１からレコード順序番号＝０に対応する原始レコード位置番号＝０を取り出し、（２）値リストへのポインタ配列４０２から原始レコード位置番号＝０に対応する要素「１」を取り出し、（３）値リスト４０３から、値リストへのポインタ配列３０２から取り出された要素「１」によって指し示される要素「女」を取り出すことにより取得できる。 Returning to FIG. 4 again, regarding the gender, it can be seen that the original record position number corresponding to the record order number = 0 of the tabular data is “0” from the array OrdSet [0]. The actual gender value for the record whose source record position number is “0”, ie, “male” or “female” is a value list in which the actual values are sorted according to a predetermined order (eg, ascending or descending order). Item value number array 402 (hereinafter, item value number array, that is, pointer array) that is a pointer array to a certain item value array 403 (hereinafter, item value array, that is, a value list is abbreviated as “VL”). Is abbreviated as “VNo.”). The pointer array 402 stores pointers that point to elements in the actual value list 403 according to the order of the source record position numbers stored in the array OrdSet 401. As a result, the item value of the gender corresponding to the record “0” in the tabular data is (1) the original record position number = 0 corresponding to the record sequence number = 0 is extracted from the array OrdSet 401, and (2) is stored in the value list. The element “1” corresponding to the original record position number = 0 is extracted from the pointer array 402, and (3) the element “female” indicated by the element “1” extracted from the value array 403 from the pointer array 302 to the value list is extracted. Can be obtained by taking out "".

他のレコードに対しても、また、年齢及び身長に関しても同様に項目値を取得することができる。 The item values can be acquired in the same manner for other records and also for age and height.

このように表形式データは、値リストＶＬと、値リストへのポインタ配列ＶＮｏの組合せにより表現され、この組合せを、特に、「情報ブロック」と称する。図４には、性別、年齢及び身長に関する情報ブロックがそれぞれ情報ブロック４０８、４０９及び４１０として示されている。 In this way, the tabular data is expressed by a combination of the value list VL and the pointer array VNo to the value list, and this combination is particularly referred to as an “information block”. In FIG. 4, information blocks regarding gender, age, and height are shown as information blocks 408, 409, and 410, respectively.

単一のコンピュータが単一のメモリ（物理的には複数であっても良いが、単一のアドレス空間に配置されアクセスされるという意味で単一のメモリ）を有するならば、単一のコンピュータは、当該メモリに、順序集合の配列ＯｒｄＳｅｔ、各情報ブロックを構成する値リストＶＬおよびポインタ配列ＶＮｏとを記憶しておけばよい。しかしながら、本発明の種々の実施形態では、表形式データの操作は、小容量の専用のローカルメモリを伴う複数台の演算ユニットにより構成されたマルチコア型処理装置によって行われる。そのため、効率的な並列処理を実現するために、表形式データを保持する新たな仕組みが提案されている。 If a single computer has a single memory (physically multiple, but a single memory in the sense that it is located and accessed in a single address space) May store the ordered set array OrdSet, the value list VL constituting each information block, and the pointer array VNo in the memory. However, in various embodiments of the present invention, the manipulation of tabular data is performed by a multi-core processing device configured by a plurality of arithmetic units with a small capacity dedicated local memory. Therefore, a new mechanism for holding tabular data has been proposed in order to achieve efficient parallel processing.

［マルチコア型処理装置向けデータ構造］
次に、本発明の一実施形態によるマルチコア型処理装置向けデータ構造について説明する。図５Ａ乃至５Ｄは本発明の一実施形態によるデータ構造の説明図である。図５Ａは、元々の表形式データの一例を示している。図５Ａに示された表形式データ５００は、「Ｓｃｈｏｏｌ」というデータ項目５０１に対応した項目値（たとえば、「Ｗｅｓｔ」、「Ｓｏｕｔｈ」、「Ｎｏｒｔｈ」及び「Ｅａｓｔ」と、「Ａｇｅ」というデータ項目５０２に対応した項目値（たとえば、「１２」、「８」、「１１」、「１０」など）とを含むレコードの配列として表される。この元々の表形式データ５００は、説明を簡単にするため、レコードが原始レコード位置番号の順番に並んでいる、すなわち、レコードを特定する原始レコード位置番号と、レコードの並び順を表すレコード順序番号とが一致しているものと仮定する。配列の先頭に位置するレコード５１０は、原始レコード位置番号０が付与されたレコードである。レコード５１０のデータ項目「Ｓｃｈｏｏｌ」の項目値は「Ｗｅｓｔ」であり、データ項目「Ａｇｅ」の項目値は「１２」である。[Data structure for multi-core processors]
Next, a data structure for a multi-core processing apparatus according to an embodiment of the present invention will be described. 5A to 5D are explanatory diagrams of a data structure according to an embodiment of the present invention. FIG. 5A shows an example of original tabular data. The tabular data 500 shown in FIG. 5A includes item values corresponding to the data item 501 “School” (for example, “West”, “South”, “North”, “East”, and data items “Age”). It is represented as an array of records including item values (for example, “12”, “8”, “11”, “10”, etc.) corresponding to 502. This original tabular data 500 is easy to explain. Therefore, it is assumed that the records are arranged in the order of the source record position numbers, that is, the source record position numbers that specify the records and the record order numbers that indicate the order of the records match. The record 510 located at the head is a record to which the original record position number 0 is assigned.The data item “School” of the record 510 is recorded. Eye value is "West", item value of the data item "Age" is "12".

本発明の一実施形態によるマルチコア型処理装置向けデータ構造では、この表形式データのレコードは、ブロック番号（本例では、０から７の８個のブロック番号）によって識別されるブロック５２０、５２１、・・・、５２７に分割される。初期的には、このブロックは、このブロックに含まれるレコードの処理を担当するマルチコア型処理装置の演算ユニットに対応している。 In the data structure for a multi-core processing device according to an embodiment of the present invention, this tabular data record includes blocks 520, 521, identified by a block number (in this example, eight block numbers from 0 to 7). ... divided into 527. Initially, this block corresponds to an arithmetic unit of a multi-core type processing apparatus that is responsible for processing the records included in this block.

マルチコア型処理装置向けデータ構造は、レコードの並び順（すなわち、レコード順序番号）と、データ構造内の項目値の格納場所とを対応付ける順序に関する情報（順序情報）と、データ項目毎の項目値に関する情報（項目情報）とによって構成される。順序情報は、機能的に上記の本発明の一実施形態の基礎となるデータ管理機構におけるレコード順序指定配列ＯｒｄＳｅｔに対応し、項目情報は、同様に情報ブロックに対応している。順序情報と項目情報は、共にグローバルメモリに保持され、必要に応じて、それらの一部が各演算ユニットのローカルメモリへ転送される。図５Ｂは順序情報５３０を示し、図５Ｃ及び５Ｄは、それぞれ、データ項目「Ｓｃｈｏｏｌ」及びデータ項目「Ａｇｅ」の項目情報５３１及び５３２を示している。 The data structure for a multi-core type processing apparatus relates to information (order information) relating to the order in which records are arranged (that is, record order numbers) and storage locations of item values in the data structure, and item values for each data item. Information (item information). The order information functionally corresponds to the record order specification array OrdSet in the data management mechanism that is the basis of the above-described embodiment of the present invention, and the item information similarly corresponds to the information block. Both the order information and the item information are held in the global memory, and a part of them is transferred to the local memory of each arithmetic unit as necessary. FIG. 5B shows the order information 530, and FIGS. 5C and 5D show the item information 531 and 532 of the data item “School” and the data item “Age”, respectively.

順序情報５３０は、ブロック番号がレコード順序番号の順番に、格納されているブロック番号配列５４０を含む。本実施形態のデータ構造では、レコード毎に当該レコードの操作を担当する演算ユニットが定められる。よって、（複数の）レコードは、各演算ユニットが担当するレコード、すなわち、担当レコードに分割され、担当レコード毎にブロック番号が割り当てられる。ブロック番号配列をＢｌｋＮｏ、レコード順序番号をｉとすると、ＢｌｋＮｏ［ｉ］は、レコード順序番号ｉをもつレコードが属するブロックのブロック番号がＢｌｋＮｏ［ｉ］であることを表している。ブロック番号配列５４０は、レコードの個数に等しいサイズを有する整数型の配列である。また、ブロック番号配列５４０は第２のタイプのデータである。たとえば、図５Ａ乃至５Ｄの例では、レコード順序番号０から３のレコードはブロック番号０のブロックに含まれ、レコード順序番号４から７のレコードはブロック番号１のブロックに含まれ、以下同様である。 The order information 530 includes a block number array 540 in which the block numbers are stored in the order of the record order numbers. In the data structure of this embodiment, an arithmetic unit responsible for the operation of the record is determined for each record. Therefore, the record (s) is divided into records that each arithmetic unit is in charge of, that is, the records in charge, and a block number is assigned to each record in charge. If the block number array is BlkNo and the record order number is i, BlkNo [i] indicates that the block number of the block to which the record having the record order number i belongs is BlkNo [i]. The block number array 540 is an integer type array having a size equal to the number of records. The block number array 540 is the second type of data. For example, in the examples of FIGS. 5A to 5D, records with record sequence numbers 0 to 3 are included in the block with block number 0, records with record sequence numbers 4 to 7 are included in the block with block number 1, and so on. .

本実施形態のデータ構造によれば、全レコードはブロックに対応した担当レコードに分割されるので、ブロック毎に、担当レコードのそれぞれを元の表形式データのレコードと対応付ける情報が必要になる。そのため、順序情報５３０は、ブロック毎に、担当レコードのレコード順序番号がレコード順序番号の順番に格納されているレコード順序番号配列５５１−１、５５１−２、・・・、５５１−７を含む。レコード順序番号配列は、以下では、ＧＯｒｄという名前で呼ばれることがある。たとえば、図５Ａ乃至５Ｄの例では、ブロック番号０というブロックに属する担当レコードのレコード順序番号は、０、１、２、３であり、ブロック番号１というブロックに属する担当レコードのレコード順序番号は、４、５、６、７であり、以下同様である。レコード順序番号配列は、各ブロックに属する担当レコードの数と同じサイズを有し、整数型の配列である。また、レコード順序番号配列は、各演算ユニットのローカルメモリに収容可能なサイズに分割されているので、第１のタイプのデータである。したがって、レコード順序番号配列は、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。 According to the data structure of the present embodiment, all records are divided into assigned records corresponding to the blocks. Therefore, information for associating each assigned record with the original tabular data record is required for each block. Therefore, the order information 530 includes a record order number array 551-1, 551-2, ..., 551-7 in which the record order numbers of the records in charge are stored in the order of the record order numbers for each block. The record order number array may be referred to as GOrd below. For example, in the examples of FIGS. 5A to 5D, the record sequence numbers of the assigned records belonging to the block of block number 0 are 0, 1, 2, and 3, and the record sequence numbers of the assigned records belonging to the block of block number 1 are 4, 5, 6, 7 and so on. The record sequence number array has the same size as the number of records in charge belonging to each block, and is an integer type array. The record sequence number array is divided into sizes that can be accommodated in the local memory of each arithmetic unit, and is therefore the first type of data. Therefore, the record sequence number array is transferred from the global memory to the local memory in each arithmetic unit as necessary.

さらに、各レコードに含まれる項目値は、後述する項目情報の形で保持されているので、各演算ユニットは、担当レコードに含まれる項目値をアクセスするためのアドレス情報、すなわち、項目値アクセス情報を取得することが必要である。よって、本実施形態のデータ構造によれば、順序情報５３０は、ブロック毎に、担当レコードの項目値アクセス情報がレコード順序番号の順番に格納されている項目値アクセス情報配列５５２−１、５５２−２、・・・、５５２−７をさらに含む。この項目値アクセス情報配列は整数型の配列であり、項目値アクセス情報配列のサイズは担当レコードのレコード数に一致する。項目値アクセス情報配列もまた、第１のタイプのデータであり、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。項目値アクセス情報配列は、ＬＯｒｄという名前で呼ばれることもある。たとえば、図５Ａ乃至５Ｄの例では、ブロック番号０というブロックに含まれるレコード順序番号が０というレコードに含まれる項目値は、このブロック番号０に関して、０という項目値アクセス情報によってアクセス可能であり、ブロック番号１というブロックに含まれるレコード順序番号が５というレコードに含まれる項目値は、このブロック番号１に関して、１という項目値アクセス情報によってアクセス可能である。 Furthermore, since the item value included in each record is held in the form of item information to be described later, each arithmetic unit has address information for accessing the item value included in the assigned record, that is, item value access information. It is necessary to get Therefore, according to the data structure of the present embodiment, the order information 530 includes the item value access information arrays 552-1 and 552 in which the item value access information of the assigned record is stored in the order of the record order number for each block. 2, ..., 552-7. This item value access information array is an integer type array, and the size of the item value access information array matches the number of records in charge. The item value access information array is also a first type of data, and is transferred from the global memory to the local memory in each arithmetic unit as necessary. The item value access information array may be called by the name LOrd. For example, in the example of FIGS. 5A to 5D, the item value included in the record whose record sequence number is 0 included in the block of block number 0 can be accessed by the item value access information of 0 with respect to this block number 0. The item value included in the record whose record sequence number is 5 included in the block of block number 1 can be accessed by the item value access information of 1 regarding this block number 1.

次に、本実施形態によれば、項目情報は、データ項目毎の項目情報として保持される。たとえば、図５Ａ乃至５Ｄの例では、データ項目「Ｓｃｈｏｏｌ」に関する項目情報５３１とデータ項目「Ａｇｅ」に関する項目情報５３２とがグローバルメモリに構築される。そして、ブロック毎の担当レコードに含まれる項目値は、データ項目毎に各演算ユニットが項目値アクセス情報配列を用いてアクセスすることができるようにグローバルメモリに保持され、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。項目値そのものは、データ項目毎に、一意の項目値が所定の順序（昇順又は降順）に格納されているグローバル項目値配列としてグローバルメモリ上に構築されている。たとえば、図５Ａ乃至５Ｄの例では、データ項目「Ｓｃｈｏｏｌ」に関する項目値は、グローバル項目値配列５７０としてグローバルメモリに保持され、データ項目「Ａｇｅ」に関する項目値は、グローバル項目値配列５９０としてグローバルメモリに保持されている。このグローバル項目値配列は第２のタイプのデータである。なお、グローバル項目値配列は、項目値そのものを格納する配列であるため、整数型、浮動小数点型、文字列型などの様々なデータ型をとる。 Next, according to this embodiment, item information is held as item information for each data item. For example, in the example of FIGS. 5A to 5D, item information 531 relating to the data item “School” and item information 532 relating to the data item “Age” are constructed in the global memory. The item values included in the record in charge for each block are held in the global memory so that each arithmetic unit can access each data item using the item value access information array. To the local memory in each arithmetic unit. The item value itself is constructed on the global memory as a global item value array in which unique item values are stored in a predetermined order (ascending order or descending order) for each data item. For example, in the example of FIGS. 5A to 5D, the item value related to the data item “School” is held in the global memory as the global item value array 570, and the item value related to the data item “Age” is stored as the global item value array 590 in the global memory. Is held in. This global item value array is the second type of data. The global item value array is an array for storing the item value itself, and thus has various data types such as an integer type, a floating point type, and a character string type.

項目情報は、担当レコードに関連した項目値アクセス情報を用いて、グローバル項目値配列に格納されている項目値を特定できるように構成されている。そのため、項目情報は、データ項目毎に、担当レコードに含まれる項目値を特定するローカル項目値番号が原始レコード位置番号の順番に格納されているローカル項目値番号配列と、ローカル項目値番号によって表される項目値がグローバル項目値配列中に格納されている位置を指定する項目値指定ポインタがローカル項目値番号の順番に格納されている項目値指定ポインタ配列とを含む。ローカル項目値番号配列及び項目値指定ポインタ配列はブロック毎に設けられる。ローカル項目値番号配列は、担当レコードのレコード数に一致するサイズを有する整数型配列であり、第１のタイプのデータであり、ＶＮｏという名前で呼ばれることもある。項目値指定ポインタ配列は、担当レコードに含まれる一意の項目値の数と同じサイズを有する整数型配列であり、第１のタイプのデータであり、ＬＶＬという名前で呼ばれることもある。ローカル項目値番号配列及び項目値指定ポインタ配列は、共に第１のタイプのデータであるため、ブロック毎にグローバルメモリ上に構築され、必要に応じて、グローバルメモリから各演算ユニット内のローカルメモリへ転送される。 The item information is configured so that the item value stored in the global item value array can be specified using the item value access information related to the record in charge. Therefore, the item information is represented for each data item by a local item value number array in which local item value numbers for specifying the item values included in the assigned record are stored in the order of the source record position numbers, and the local item value numbers. And an item value designation pointer array in which item value designation pointers for designating positions where the item values to be stored are stored in the global item value array are stored in the order of the local item value numbers. A local item value number array and an item value designation pointer array are provided for each block. The local item value number array is an integer type array having a size that matches the number of records in charge, is the first type of data, and is sometimes referred to as VNo. The item value designation pointer array is an integer type array having the same size as the number of unique item values included in the record in charge, is a first type of data, and is sometimes called LVL. Since both the local item value number array and the item value designation pointer array are data of the first type, they are constructed on the global memory for each block, and from the global memory to the local memory in each arithmetic unit as necessary. Transferred.

図５Ａ乃至５Ｄの例では、データ項目「Ｓｃｈｏｏｌ」に関して、項目情報５３１は、ローカル項目値番号配列５６１−０、５６１−１、・・・、５６１−７と、項目値指定ポインタ配列５６２−０、５６２−１、・・・、５６２−７と、グローバル項目値配列５９０とを含む。ローカル項目値番号配列と項目値指定ポインタ配列は、ブロック毎に分割されている。同図において、たとえば、ローカル項目値番号配列ＶＮｏの先頭の要素の値は「１」である。これは、値が「０」である項目値アクセス情報によって指定されたレコードに含まれる項目値の項目値番号が「１」であることを意味する。項目値番号が「１」である項目値は、項目値指定ポインタ配列ＬＶＬの２番目の要素、すなわち、ＬＶＬ［１］を参照することにより、グローバル項目値配列ＧＶＬの３番目の要素、すなわち、ＧＶＬ［２］であることがわかる。その他のブロックに関しても、また、その他のデータ項目に関しても、同様である。 In the example of FIGS. 5A to 5D, regarding the data item “School”, the item information 531 includes local item value number arrays 561-0, 561-1,... 561-7 and an item value designation pointer array 562-0. , 562-1,... 562-7 and a global item value array 590. The local item value number array and the item value designation pointer array are divided for each block. In the figure, for example, the value of the first element of the local item value number array VNo is “1”. This means that the item value number of the item value included in the record specified by the item value access information whose value is “0” is “1”. The item value whose item value number is “1” refers to the second element of the item value designation pointer array LVL, that is, the third element of the global item value array GVL, ie, LVL [1]. It turns out that it is GVL [2]. The same applies to other blocks and other data items.

このように、本実施形態のデータ構造によれば、各ブロックに属するレコードに含まれる項目値は、ブロック内で各項目値に付けられたローカル項目値番号と、このローカル項目値番号とグローバル項目値配列中の項目値とを対応付ける項目値指定ポインタと、グローバル項目値配列とによって表現されている。 As described above, according to the data structure of the present embodiment, the item value included in the record belonging to each block includes the local item value number assigned to each item value in the block, the local item value number, and the global item. This is expressed by an item value designation pointer that associates an item value in the value array with a global item value array.

［マルチコア型処理装置向けデータ構造の構築］
図６は、本発明の一実施形態によるマルチコア型処理装置向けデータ構造をグローバルメモリ上に構築する方法のフローチャートである。本方法によれば、マルチコア型処理装置の制御ユニットは、表形式データレコードをマルチコア型処理装置の各演算ユニットが担当する担当レコードを含むブロックに分割し、各レコードに対応するブロック番号を表形式データ中のレコード順に格納するブロック番号配列を作成し、グローバルメモリに格納する（ステップ６０２）。次に、複数台の演算ユニットが並列的に、担当レコードのレコード順序番号をレコード順序番号の順番に格納するレコード順序番号配列を、各演算ユニット内のローカルメモリ上に作成し、グローバルメモリへ転送する（ステップ６０４）。この処理は、たとえば、制御ユニットから各演算ユニットへ、各演算ユニットの担当レコードの先頭のレコード順序番号と担当レコードのレコード数とを通知することによって、簡単に実現される。その後、複数台の演算ユニットが並列的に動作して、担当レコードに含まれる項目値にアクセスする項目値アクセス情報をレコード順序番号の順番に格納する項目値アクセス情報配列を各演算ユニット内のローカルメモリ上に作成し、グローバルメモリへ転送するステップ（ステップ６０６）。各演算ユニットは、０から始まる連番を、担当レコードのレコード数に一致する個数だけ、項目値アクセス情報配列に格納すればよい。最後に、複数台の演算ユニットが並列的に動作して、担当レコードに含まれる項目値が項目値アクセス情報を用いてアクセスされるように、データ項目毎に項目値を各演算ユニット内のローカルメモリ上に展開し、展開された項目値をグローバルメモリへ転送する（ステップ６０８）。たとえば、演算ユニットは、協働して、データ項目毎に、ブロック内で各項目値に付けられたローカル項目値番号を原始レコード位置番号の順番に格納するローカル項目値番号配列と、このローカル項目値番号とグローバル項目値配列中の項目値とを対応付ける項目値指定ポインタをローカル項目値番号の順番に格納する項目値指定ポインタ配列とをローカルメモリ上に作成し、グローバルメモリへ転送し、全レコードに含まれる一意の項目値が所定の順序（昇順又は降順）で格納されたグローバル項目値配列をグローバルメモリ上に作成する。[Construction of data structure for multi-core processor]
FIG. 6 is a flowchart of a method for constructing a data structure for a multi-core processing device on a global memory according to an embodiment of the present invention. According to this method, the control unit of the multi-core processing device divides the tabular data record into blocks including the record in charge of each arithmetic unit of the multi-core processing device, and the block number corresponding to each record is tabulated. A block number array to be stored in the order of records in the data is created and stored in the global memory (step 602). Next, multiple arithmetic units create a record sequence number array that stores the record sequence numbers of the records in charge in the order of the record sequence numbers in parallel in each arithmetic unit and transfer them to the global memory. (Step 604). This process is easily realized, for example, by notifying the arithmetic unit from the control unit of the record sequence number at the head of the record in charge of each arithmetic unit and the number of records in the record in charge. After that, multiple arithmetic units operate in parallel, and the item value access information array for storing the item value access information for accessing the item values included in the assigned record in the order of the record sequence numbers is stored locally in each arithmetic unit. Step of creating on the memory and transferring to the global memory (step 606). Each arithmetic unit has to store as many sequential numbers starting from 0 in the item value access information array as the number corresponding to the number of records in the assigned record. Finally, field values for each data item are stored locally in each computing unit so that multiple computing units operate in parallel and the field values included in the record in charge are accessed using field value access information. The data is expanded on the memory, and the expanded item value is transferred to the global memory (step 608). For example, the arithmetic unit cooperates, for each data item, a local item value number array for storing local item value numbers attached to each item value in the block in the order of the original record position number, and this local item. An item value specification pointer array that stores the item value specification pointer that associates the value number with the item value in the global item value array in the order of the local item value number is created in the local memory, transferred to the global memory, and all records A global item value array in which the unique item values included in is stored in a predetermined order (ascending or descending order) is created on the global memory.

なお、ステップ６０２とステップ６０４の順序は入れ換え可能である。また、担当レコード数が比較的少なく、各演算ユニットがレコード順序番号配列と項目値アクセス情報配列を同時にローカルメモリに格納可能であれば、各演算ユニットは、両方の配列を同時に作成し、その後、グローバルメモリへ転送してもよい。或いは、制御ユニットが、ブロック毎のレコード順序番号配列及び項目値アクセス情報配列をグローバルメモリ上に直接作成しても構わない。 Note that the order of step 602 and step 604 can be interchanged. Also, if the number of records in charge is relatively small and each arithmetic unit can store the record sequence number array and the field value access information array at the same time in the local memory, each arithmetic unit creates both arrays at the same time, You may transfer to global memory. Alternatively, the control unit may directly create a record sequence number array and an item value access information array for each block on the global memory.

なお、上記説明では、原始レコード位置番号とレコード順序番号とが一致していると仮定しているが、原始レコード位置番号とレコード順序番号が一致していなくてもかまわない。たとえば、元の表形式データのレコードがソートされ、原始レコード位置番号と初期のレコード順序番号とが一致していなくても、マルチコア型処理装置向けデータ構造を構築することが可能である。具体的には、制御ユニットは、表形式データのレコードを各演算ユニットが担当する担当レコードを含むブロックに分割し、各レコードに対応するブロック番号を表形式データのレコードのレコード順序番号の順番に格納するブロック番号配列を作成し、グローバルメモリに格納すればよい。 In the above description, it is assumed that the source record position number and the record sequence number match, but the source record position number and the record sequence number may not match. For example, it is possible to construct a data structure for a multi-core processing device even if the original tabular data records are sorted and the original record position number does not match the initial record sequence number. Specifically, the control unit divides the tabular data record into blocks including the records in charge of each arithmetic unit, and assigns the block number corresponding to each record in the order of the record order number of the tabular data record. A block number array to be stored may be created and stored in the global memory.

［項目値の取得処理］
次に、本発明の一実施形態によるマルチコア型処理装置向けデータ構造における表形式データの項目値の取得について説明する。図７は、本発明の一実施形態による項目値取得方法のフローチャートである。項目値は、図５Ａ乃至５Ｄを参照して説明したように、データ項目毎に、項目情報の形でグローバルメモリ上に保持されている。よって、たとえば、制御ユニットは、唯一のレコードに含まれる項目値を容易に取得することができる。しかし、制御ユニットは、多数のレコードに含まれる項目値を同時に取得するには適していない。したがって、本実施形態では、多数の演算ユニットが同時に動作することにより、多数のレコードに含まれる項目値を同時に取得するような状況を考慮している。このような状況においても、項目値取得の基本動作は、ある特定の演算ユニットが担当レコード中のあるレコードに含まれる項目値を取得する処理であることが理解されるであろう。多数の演算ユニットが同時に動作できるようにするため、各演算ユニットは、項目値を取得するために必要な情報をグローバルメモリからローカルメモリへ転送し、ローカルメモリ上で各種の演算を実行する。[Item value acquisition processing]
Next, acquisition of item values of tabular data in a data structure for a multi-core processing device according to an embodiment of the present invention will be described. FIG. 7 is a flowchart of an item value acquisition method according to an embodiment of the present invention. As described with reference to FIGS. 5A to 5D, the item value is stored in the global memory in the form of item information for each data item. Therefore, for example, the control unit can easily acquire the item value included in the only record. However, the control unit is not suitable for acquiring item values included in a large number of records at the same time. Therefore, in the present embodiment, a situation is considered in which item values included in a large number of records are acquired simultaneously by a large number of arithmetic units operating simultaneously. Even in such a situation, it will be understood that the basic operation of acquiring the item value is a process in which a specific arithmetic unit acquires an item value included in a certain record in the assigned record. In order to allow a large number of arithmetic units to operate simultaneously, each arithmetic unit transfers information necessary for acquiring item values from the global memory to the local memory, and executes various arithmetic operations on the local memory.

図７に示されているように、最初に、制御ユニットが、グローバルメモリ上のブロック番号配列を参照して、項目値が取得されるべき所定のレコードが含まれるブロックのブロック番号と、このブロックを担当する演算ユニットとを決定する（ステップ７０２）。次に、制御ユニットは、決定されたブロック番号によって識別される演算ユニットへ所定のレコードのレコード順序番号を通知する（ステップ７０４）。その後、演算ユニットは、この演算ユニットの担当レコードに関するレコード順序番号配列及び項目値アクセス情報配列を、グローバルメモリからこの演算ユニットのローカルメモリへ転送する（ステップ７０６）。続いて、演算ユニットは、通知されたレコード順序番号が格納されている位置をローカルメモリへ転送されたレコード順序番号配列中で特定する（ステップ７０８）。その後、演算ユニットは、特定された位置によって指定される項目値アクセス情報をローカルメモリへ転送された項目値アクセス情報配列中で特定する（ステップ７１０）。最後に、演算ユニットは、データ項目毎に、グローバルメモリに保持されたグローバル項目値配列の中から特定された項目値アクセス情報によって指定される項目値を取得し、取得された項目値をグローバルメモリへ転送する（ステップ７１２）。 As shown in FIG. 7, first, the control unit refers to the block number array in the global memory, and the block number of the block including the predetermined record from which the item value is to be acquired, and this block Is determined (step 702). Next, the control unit notifies the record sequence number of a predetermined record to the arithmetic unit identified by the determined block number (step 704). Thereafter, the arithmetic unit transfers the record sequence number array and the item value access information array relating to the record in charge of this arithmetic unit from the global memory to the local memory of this arithmetic unit (step 706). Subsequently, the arithmetic unit specifies the position where the notified record sequence number is stored in the record sequence number array transferred to the local memory (step 708). Thereafter, the arithmetic unit specifies item value access information specified by the specified position in the item value access information array transferred to the local memory (step 710). Finally, for each data item, the arithmetic unit acquires the item value specified by the item value access information specified from the global item value array held in the global memory, and stores the acquired item value in the global memory. (Step 712).

本実施形態によるデータ取得の一例を、図５Ａ乃至５Ｄに示されたデータ構造を用いて、より詳細に説明する。たとえば、レコード順序番号＝１４であるレコード、図５Ａ乃至５Ｄでは、符号５１１で示されたレコードの項目値を取得することを考える。制御ユニットは、ブロック番号配列５４０の添字＝１４の要素の値、すなわち、ＢｌｋＮｏ［１４］＝３を読み出す。これにより、対象レコードに対応するブロック番号は３であることがわかる。そこで、制御ユニットは、ブロック番号３に含まれるレコードを担当する演算ユニットへ対象レコードのレコード順序番号である１４を通知する。ブロック番号と演算ユニットとの対応関係は、たとえば、制御ユニットによって管理されている。その後、レコード順序番号１４を通知された演算ユニットは、グローバルメモリから、ブロック番号＝３であるブロックに関するレコード順序番号配列５５１−３をローカルメモリへロードする。代替的な実施形態では、制御ユニットがブロック番号＝３であるブロックに関するレコード順序番号配列５５１−３をグローバルメモリから演算ユニットのローカルメモリへ転送する。 An example of data acquisition according to the present embodiment will be described in more detail using the data structure shown in FIGS. 5A to 5D. For example, in the record with the record sequence number = 14, FIGS. 5A to 5D, consider obtaining the item value of the record indicated by the reference numeral 511. The control unit reads the value of the element of subscript = 14 in the block number array 540, that is, BlkNo [14] = 3. Thereby, it is understood that the block number corresponding to the target record is 3. Therefore, the control unit notifies 14 which is the record sequence number of the target record to the arithmetic unit in charge of the record included in the block number 3. The correspondence between the block number and the arithmetic unit is managed by the control unit, for example. Thereafter, the arithmetic unit notified of the record sequence number 14 loads the record sequence number array 551-3 relating to the block having the block number = 3 from the global memory to the local memory. In an alternative embodiment, the control unit transfers the record sequence number array 551-3 for the block with block number = 3 from the global memory to the local memory of the arithmetic unit.

次に、演算ユニットは、レコード順序番号配列５５１−３の中で、レコード順序番号＝１４が格納されている位置を検索する。この格納位置は、この演算ユニットの担当レコード中での対象レコードの順位（ランク）とも称される。レコード順序番号配列は、本実施形態では、昇順の配列であるため、この格納位置は、周知の２分割法などによって効率的に見つけられる。本例では、格納位置＝２である。 Next, the arithmetic unit searches the position where record sequence number = 14 is stored in the record sequence number array 551-3. This storage position is also referred to as the rank (rank) of the target record in the record in charge of this arithmetic unit. In this embodiment, the record order number array is an ascending order array, so that this storage position can be found efficiently by a well-known two-division method or the like. In this example, the storage position = 2.

次に、演算ユニットは、グローバルメモリから、ブロック番号＝３であるブロックに関する項目値アクセス情報配列５５２−３をローカルメモリへ転送する。代替的な実施形態では、項目値アクセス情報配列５５２−３は、制御ユニットによって、グローバルメモリから演算ユニットのローカルメモリへ転送される。さらに別の実施形態では、項目値アクセス情報配列５５２−３は、レコード順序番号配列５５１−３と同時に、グローバルメモリから演算ユニットのローカルメモリへ転送される。なお、以下では、繰り返して説明しないが、グローバルメモリからローカルメモリへのデータの転送は、特に断らない限り、制御ユニットと演算ユニットのどちらが行ってもよい。さらに、転送されるべきデータ量と比べて、演算ユニットのローカルメモリの容量に余裕がある場合には、２つ以上のデータを一括して、グローバルメモリからローカルメモリへ転送してもよい。演算ユニットは、項目値アクセス情報配列５５２−３の中で、対象レコードの順位で示される位置に格納されている値、すなわち、ＬＯｒｄ［２］＝２を取得する。 Next, the arithmetic unit transfers the item value access information array 552-3 relating to the block whose block number = 3 from the global memory to the local memory. In an alternative embodiment, the item value access information array 552-3 is transferred from the global memory to the local memory of the arithmetic unit by the control unit. In yet another embodiment, the item value access information array 552-3 is transferred from the global memory to the local memory of the arithmetic unit simultaneously with the record sequence number array 551-3. Although not repeatedly described below, data transfer from the global memory to the local memory may be performed by either the control unit or the arithmetic unit unless otherwise specified. Further, when the local memory capacity of the arithmetic unit is larger than the amount of data to be transferred, two or more pieces of data may be transferred from the global memory to the local memory at once. The arithmetic unit acquires the value stored in the position indicated by the rank of the target record in the item value access information array 552-3, that is, LOrd [2] = 2.

次に、演算ユニットは、グローバルメモリから、データ項目＝“Ｓｃｈｏｏｌ”に関して、かつ、ブロック番号＝３であるブロックに関して、ローカル項目値番号配列５６１−３をローカルメモリへロードする。演算ユニットは、先に取得した値、すなわち、ＬＯｒｄ［２］＝２の値を、ローカル項目値番号配列５６１−３からローカル項目値を取得するためのオフセット値として使用する。すなわち、演算ユニットは、ローカル項目値配列５６１−３から、ＶＮｏ［２］＝０を取得する。 Next, the arithmetic unit loads the local item value number array 561-3 from the global memory to the local memory with respect to the data item = “School” and the block with the block number = 3. The arithmetic unit uses the previously acquired value, that is, the value of LOrd [2] = 2 as an offset value for acquiring the local item value from the local item value number array 561-3. That is, the arithmetic unit acquires VNo [2] = 0 from the local item value array 561-3.

次に、演算ユニットは、グローバルメモリから、データ項目＝“Ｓｃｈｏｏｌ”に関して、かつ、ブロック番号＝３であるブロックに関して、項目値指定ポインタ配列５６２−３をローカルメモリへ転送する。演算ユニットは、項目値指定ポインタ配列５６２−３から、先に取得されたローカル項目値＝０によって指定されている項目値指定ポインタ、すなわち、ＬＯｒｄ［０］＝１を取得する。 Next, the arithmetic unit transfers the item value designation pointer array 562-3 from the global memory to the local memory for the data item = “School” and the block whose block number = 3. The arithmetic unit obtains, from the item value designation pointer array 562-3, the item value designation pointer designated by the previously obtained local item value = 0, that is, LOrd [0] = 1.

最後に、演算ユニットは、データ項目＝“Ｓｃｈｏｏｌ”に関して、グローバルメモリに保持されているグローバル項目値配列５７０を参照して、対象レコードに含まれる項目値、すなわち、ＧＶＬ［１］＝“Ｎｏｒｔｈ”を取得する。代替的な実施形態では、演算ユニットは、先に取得された項目値指定ポインタ＝１を制御ユニットへ通知し、制御ユニットがグローバル項目値配列５７０から、通知された項目値指定ポインタによって指定される項目値を取得する。 Finally, the arithmetic unit refers to the global item value array 570 stored in the global memory with respect to the data item = “School”, and the item value included in the target record, that is, GVL [1] = “North”. To get. In an alternative embodiment, the arithmetic unit notifies the control unit of the previously acquired item value designation pointer = 1, and the control unit is designated from the global item value array 570 by the notified item value designation pointer. Get item value.

引き続き、演算ユニットは、データ項目＝“Ｓｃｈｏｏｌ”に関して実行した処理と同様の動作を、データ項目＝“Ａｇｅ”に関して実行することにより、対象レコードに含まれる項目値＝９を取得する。 Subsequently, the arithmetic unit obtains the item value = 9 included in the target record by executing the same operation as the processing executed for the data item = “School” for the data item = “Age”.

［表形式データのコンパイル処理］
次に、本発明の一実施形態による表形式データからマルチコア型処理装置向けデータを作成するコンパイル処理について説明する。以下では、図５Ａ乃至５Ｄに示されたデータ構造に関連して、本発明の一実施形態によるコンパイル処理が記述される。図８は、本発明の一実施形態によるコンパイル処理の概略的なフローチャートである。[Compile processing of tabular data]
Next, a compiling process for creating data for a multi-core processing device from tabular data according to an embodiment of the present invention will be described. In the following, a compilation process according to an embodiment of the invention will be described in relation to the data structure shown in FIGS. 5A to 5D. FIG. 8 is a schematic flowchart of the compiling process according to an embodiment of the present invention.

順序情報作成：本実施形態によれば、最初に、ブロック番号配列、レコード順序番号配列及び項目値アクセス情報配列からなる順序情報がグローバルメモリ上に作成される（ステップ８０２）。上述のように、ブロック番号配列は制御ユニットによって作成され、レコード順序番号配列及び項目値アクセス情報配列は、複数台の演算ユニットによって並列的に作成され、グローバルメモリへ転送される。 Order information creation: According to this embodiment, first, order information including a block number array, a record order number array, and an item value access information array is created on the global memory (step 802). As described above, the block number array is created by the control unit, and the record sequence number array and the item value access information array are created in parallel by a plurality of arithmetic units and transferred to the global memory.

ブロック内コンパイル：次に、複数台の演算ユニットが並列的に動作して、データ項目毎に、単一のブロックに含まれる担当レコードの原始レコード位置番号の順番に、ローカル項目値番号を格納するローカル項目値番号配列を作成し、グローバルメモリへ転送する（ステップ８０４）。このとき、同時に、複数台の演算ユニットは、担当レコードに含まれる項目値のうちの一意の値を所定の順序（たとえば、昇順又は降順）に格納するローカル項目値作業配列も作成し、グローバルメモリへ転送する。 In-block compilation: Next, multiple arithmetic units operate in parallel, and for each data item, local item value numbers are stored in the order of the original record position number of the assigned record contained in a single block. A local item value number array is created and transferred to the global memory (step 804). At the same time, the plurality of arithmetic units also create a local item value work array for storing unique values among the item values included in the assigned record in a predetermined order (for example, ascending or descending order), and Forward to.

ブロック間コンパイル１（マージ）：次に、複数台の演算ユニットが並列的かつ階層的に動作して、データ項目毎に、２個のブロックに関連した、ブロック番号作業配列、ローカル項目値作業配列、及び、項目値がローカル項目値作業配列中に格納されている位置を指定するポインタがローカル項目値番号の順番に格納されているローカル項目値指定ポインタ作業配列からなる１対の組から、２個のブロックをマージしたブロックに関連した、ブロック番号作業配列、ローカル項目値作業配列、及び、ローカル項目値指定ポインタ作業配列からなる組を作成するマージ処理を実行する。演算ユニットは、最終的に１個のブロックにマージされるまでこのマージ処理を繰り返し実行し、最終的なブロック番号作業配列、最終的なローカル項目値作業配列、及び、最終的なローカル項目値指定ポインタ配列をグローバルメモリへ転送する（ステップ８０６）。最終的なローカル項目値作業配列はグローバル項目値配列に一致する。 Inter-block compilation 1 (merge): Next, a plurality of arithmetic units operate in parallel and hierarchically, and for each data item, a block number work array and a local item value work array related to two blocks. And a pair of local item value designation pointer work arrays in which pointers for designating positions where the item values are stored in the local item value work array are stored in the order of local item value numbers, 2 A merge process for creating a set of a block number work array, a local item value work array, and a local item value designation pointer work array related to the block obtained by merging the blocks is executed. The arithmetic unit repeatedly executes this merging process until it is finally merged into one block, final block number work array, final local item value work array, and final local item value designation The pointer array is transferred to the global memory (step 806). The final local item value work array matches the global item value array.

ブロック間コンパイル２（分配）：最後に、複数台の演算ユニットが並列的に動作して、データ項目毎に、最終的なローカル項目値指定ポインタ作業配列中の要素を最終的なブロック番号作業配列中の対応する要素によって指定されたブロック番号毎に分配し所定の順番に並べることにより、ローカル項目値番号によって表される項目値が、最終的なローカル項目値作業配列、すなわち、グローバル項目値配列中で格納されている位置を指定するポインタを格納する項目値指定ポインタ配列を作成し、グローバルメモリへ転送する（ステップ８０８）。 Inter-block compilation 2 (distribution): Finally, multiple arithmetic units operate in parallel, and for each data item, the elements in the final local item value designation pointer work array are final block number work arrays. The item value represented by the local item value number is the final local item value work array, that is, the global item value array by distributing the block numbers specified by the corresponding elements in the block and arranging them in a predetermined order. An item value designation pointer array for storing pointers for designating the positions stored therein is created and transferred to the global memory (step 808).

以上のステップにより、図５Ａに示された表形式データから、図５Ｂに示されたブロック番号配列、レコード順序番号配列及び項目値アクセス情報配列と、図５Ｃに示された、データ項目＝“Ｓｃｈｏｏｌ”に関するローカル項目値番号配列、項目値指定ポインタ配列及びグローバル項目値配列と、図５Ｄに示された、データ項目＝“Ａｇｅ”に関するローカル項目値番号配列、項目値指定ポインタ配列及びグローバル項目値配列とがグローバルメモリ上に作成される。 Through the above steps, from the tabular data shown in FIG. 5A, the block number array, the record sequence number array, and the item value access information array shown in FIG. 5B, and the data item = “School” shown in FIG. 5C are obtained. Local item value number array, item value designation pointer array and global item value array for "", and local item value number array, item value designation pointer array and global item value array for data item = "Age" shown in FIG. 5D Are created in the global memory.

このようなマルチコア型処理装置向けデータ構造を採用することにより、以下のような利点が得られる。
利点１：省メモリ
グローバル項目値配列、すなわち、値リストに同一値が重複して格納されないので、省メモリになる。
利点２：演算の高速性
ブロック間ソートでは、算出した順位が全てのブロックにとって有効である。
利点３：演算ユニット間通信によるグローバルメモリアクセスの低減
ブロック間ソートの第１フェーズが最も沢山のＯ（ｎ＊ｌｏｇ（ｎ））のグローバルメモリアクセスを必要とする。それ以外の処理のグローバルメモリアクセスはＯ（ｎ）である。ブロック間通信を使うと、グローバルメモリアクセス量をちょうど１／３にすることができる。By adopting such a data structure for a multi-core processing apparatus, the following advantages can be obtained.
Advantage 1: Memory saving Since the same value is not redundantly stored in the global item value array, that is, the value list, the memory is saved.
Advantage 2: High-speed calculation In the sorting between blocks, the calculated rank is effective for all blocks.
Advantage 3: Reduction of global memory access by communication between arithmetic units The first phase of sorting between blocks requires the most O (n * log (n)) global memory accesses. The global memory access for other processes is O (n). If inter-block communication is used, the global memory access amount can be exactly ３.

以下、図５Ａ乃至５Ｄに示された表形式データに関連して、本発明の一実施形態によるコンパイル処理をより詳細に説明する。図９Ａ及び９Ｂは本発明の一実施形態による順序情報作成処理の説明図である。図９Ａ及び９Ｂに示されたデータは図５Ａ乃至５Ｄに示されたデータと同じデータであり、図９Ａの表形式データ５００から、図９Ｂの順序情報５３０が作成される。順序情報作成処理については上述の通りである。 Hereinafter, the compiling process according to the embodiment of the present invention will be described in more detail with reference to the tabular data shown in FIGS. 5A to 5D. 9A and 9B are explanatory diagrams of order information creation processing according to an embodiment of the present invention. The data shown in FIGS. 9A and 9B is the same as the data shown in FIGS. 5A to 5D, and the order information 530 of FIG. 9B is created from the tabular data 500 of FIG. 9A. The order information creation process is as described above.

図１０Ａ乃至１０Ｃは、本発明の一実施形態によるブロック内コンパイル処理の概要図である。ブロック内コンパイル処理によれば、図１０Ａに示された表形式データから、図１０Ｂ及び図１０Ｃに示されたデータ項目＝“Ｓｃｈｏｏｌ”に関する項目情報及びデータ項目＝“Ａｇｅ”に関する項目情報が作成される。同図に示されているように、項目情報には、ローカル項目値番号配列ＶＮｏとローカル項目値作業配列ｗＶＬとが含まれている。ブロック内コンパイル処理は、Ｂｌｏｃｋ−０、Ｂｌｏｃｋ−１、・・・、Ｂｌｏｃｋ−７というブロック毎に、各演算ユニットによって並列に実行される。 10A to 10C are schematic diagrams of the intra-block compilation processing according to the embodiment of the present invention. According to the intra-block compilation process, the item information related to the data item = “School” and the item information related to the data item = “Age” shown in FIG. 10B and FIG. 10C are created from the tabular data shown in FIG. 10A. The As shown in the figure, the item information includes a local item value number array VNo and a local item value work array wVL. The intra-block compilation process is executed in parallel by each arithmetic unit for each block of Block-0, Block-1,..., Block-7.

ここで、１つのブロックについてのブロック内コンパイル処理の一実施例について説明する。図１１は、ブロック内コンパイル処理の一実施例の概要図である。本例では、データ項目＝“Ｓｃｈｏｏｌ”に関する５行（レコード数＝５）のブロックＳｃｈｏｏｌ（すなわち、原始レコード位置番号の順序に項目値が格納されている項目値配列Ｓｃｈｏｏｌ）から、ローカル項目値番号配列ＶＮｏとローカル項目値作業配列ｗＶＬを作成する。ローカル項目値作業配列ｗＶＬは、項目値配列Ｓｃｈｏｏｌに含まれている項目値から抽出された一意の項目値が所定の順序（たとえば、昇順又は降順）に格納されている値のリストである。一方、ローカル項目値番号配列ＶＮｏは、ｉが原始レコード位置番号を表すとき、元の項目値配列Ｓｃｈｏｏｌの要素であるＳｃｈｏｏｌ［ｉ］とローカル項目値作業配列ｗＶＬ［ｊ］との間に、
Ｓｃｈｏｏｌ［ｉ］＝ｗＶＬ［ＶＮｏ［ｉ］］
という関係が成り立つような配列である。なお、以下の処理は、演算ユニットが演算ユニットのローカルメモリを使用して実行する。Here, an embodiment of the in-block compilation process for one block will be described. FIG. 11 is a schematic diagram of an embodiment of the in-block compilation process. In this example, the local item value number is obtained from the block School (that is, the item value array School in which the item values are stored in the order of the original record position numbers) with respect to the data item = “School” in five rows (number of records = 5). An array VNo and a local item value work array wVL are created. The local item value work array wVL is a list of values in which unique item values extracted from the item values included in the item value array School are stored in a predetermined order (for example, ascending order or descending order). On the other hand, the local item value number array VNo, when i represents the original record position number, is between the Element [1], which is an element of the original item value array School, and the local item value work array wVL [j],
School [i] = wVL [VNo [i]]
It is an array that holds the relationship. The following processing is executed by the arithmetic unit using the local memory of the arithmetic unit.

最初に、作業用データが初期化される。図１２は、ブロック内コンパイル処理の初期化処理の説明図である。ローカル項目値番号配列ＶＮｏのサイズはブロックの行数に等しく、ＶＮｏ［ｉ］＝ｉ（ｉ＝０、１、２・・・）のように０から始まる整数で初期化される。変換配列ＴＲはローカル項目値番号配列を変換するための配列であり、ＶＮｏと同じサイズである。ポインタ配列ＰＴＲは、項目値の並び順を表現する配列であり、配列ＰＴＲの要素は項目値の配列Ｓｃｈｏｏｌ中での項目値の位置を表している。ポインタ配列ＰＴＲと元の項目値配列Ｓｃｈｏｏｌのペアは、所定の順序に並べられた項目値のリストを表現している。作業ポインタ配列ｗＰＴＲは、元の項目値配列Ｓｃｈｏｏｌと組み合わされて、ブロック内のペアを併合した後の項目値のリストを表現するために利用される。 First, the working data is initialized. FIG. 12 is an explanatory diagram of the initialization process of the intra-block compilation process. The size of the local item value number array VNo is equal to the number of rows in the block, and is initialized with an integer starting from 0, such as VNo [i] = i (i = 0, 1, 2,...). The conversion array TR is an array for converting the local item value number array and has the same size as the VNo. The pointer array PTR is an array that represents the order in which item values are arranged, and an element of the array PTR represents the position of the item value in the item value array School. A pair of the pointer array PTR and the original item value array School represents a list of item values arranged in a predetermined order. The work pointer array wPTR is used in combination with the original item value array School to represent a list of item values after merging pairs in the block.

ブロック内コンパイル処理は、ブロック内のペアを順番にマージすることによって実現される。たとえば、ブロック内に５個のレコードが存在するときに、最初に１番目のレコードと２番目のレコードのペア（第１ペア）と、３番目のレコードと４番目のレコードのペア（第２ペア）と、５番目のレコードのペア（ペアを構成する相手は存在しないが、便宜的に第３ペアと称する）の３組のペアの間で項目値の大小関係を比較する。これは、１段目のマージ処理である。次に、第３のペアと第２のペアの間の比較処理、及び、第１のペアの間（第１のペアと対になるペアは存在しない）の比較処理を実行する。これは、２段目のマージ処理である。このようにして、最後に１組のペアになるまで段階的にマージ処理を繰り返すことにより、最終的に、ブロック内でマージされたローカル項目値番号配列ＶＮｏとローカル項目値作業配列ｗＶＬが得られる。 In-block compilation processing is realized by sequentially merging pairs in a block. For example, when there are 5 records in a block, first a pair of the first record and the second record (first pair), a pair of the third record and the fourth record (second pair) ) And the fifth pair of records (there is no third party in the pair, but for convenience, the third pair) is compared in the magnitude relation of the item values. This is the first-stage merge process. Next, a comparison process between the third pair and the second pair and a comparison process between the first pair (there is no pair paired with the first pair) are executed. This is the second-stage merge process. In this way, by repeating the merging process step by step until the last pair of pairs, the local item value number array VNo and the local item value work array wVL merged in the block are finally obtained. .

図１３Ａ乃至１３Ｇは、ブロック内コンパイル処理の１段目のマージ処理の説明図である。図１３Ａには、１番目のペアの１回目の比較処理、すなわち、Ｓｃｈｏｏｌ［ＰＴＲ［０］］とＳｃｈｏｏｌ［ＰＴＲ［１］］の比較処理が示されている。この例では、ポインタ配列ＰＴＲによって指示された項目値配列Ｓｃｈｏｏｌの２個の値（すなわち、ペア）の大小を比較し、小さい方を指示するポインタ配列ＰＴＲの要素と同じ位置にある変換配列ＴＲの要素に作業ポインタ配列ｗＰＴＲの添字を格納する。より具体的には、文字列の大小関係をアルファベット順に比較すると仮定するならば、
Ｓｃｈｏｏｌ［ＰＴＲ［０］］＝Ｓｃｈｏｏｌ［０］＝“Ｗｅｓｔ”
と
Ｓｃｈｏｏｌ［ＰＴＲ［１］］＝Ｓｃｈｏｏｌ［１］＝“Ｓｏｕｔｈ”
を比較すると、
“Ｗｅｓｔ”＞“Ｓｏｕｔｈ”
であるため、
ｗＰＴＲ［０］＝ＰＴＲ［１］＝１
ＴＲ［１］＝０
となる。すなわち、ｉ、ｊが０から始まる整数として、一般的に、
ＩＦＳｃｈｏｏｌ［ＰＴＲ［ｉ］］＞Ｓｃｈｏｏｌ［ＰＴＲ［ｊ］］
ＴＨＥＮｗＰＴＲ［ｉ］＝ＰＴＲ［ｊ］
ＴＲ［ｊ］＝ｉ
ＥＬＳＥＩＦＳｃｈｏｏｌ［ＰＴＲ［ｉ］］＜Ｓｃｈｏｏｌ［ＰＴＲ［ｊ］］
ＴＨＥＮｗＰＴＲ［ｊ］＝ＰＴＲ［ｉ］
ＴＲ［ｉ］＝ｊ
と記述できる。この処理では、小さい方が比較処理を通過し、大きい方はもう一度次の相手と比較される。13A to 13G are explanatory diagrams of the first-stage merge process of the in-block compilation process. FIG. 13A shows the first comparison process of the first pair, that is, the comparison process of School [PTR [0]] and School [PTR [1]]. In this example, the two values (that is, pairs) of the item value array School indicated by the pointer array PTR are compared in size, and the conversion array TR in the same position as the element of the pointer array PTR indicating the smaller one is compared. The subscript of the work pointer array wPTR is stored in the element. More specifically, if we assume that the magnitude relationships of strings are compared alphabetically,
School [PTR [0]] = School [0] = “West”
And School [PTR [1]] = School [1] = “South”
Comparing
“West”> “South”
Because
wPTR [0] = PTR [1] = 1
TR [1] = 0
It becomes. That is, i and j are generally integers starting from 0,
IF School [PTR [i]]> School [PTR [j]]
THEN wPTR [i] = PTR [j]
TR [j] = i
ELSE IF School [PTR [i]] <School [PTR [j]]
THEN wPTR [j] = PTR [i]
TR [i] = j
Can be described. In this process, the smaller one passes the comparison process, and the larger one is compared with the next partner again.

図１３Ｂは、１番目のペアの２回目の比較処理の説明図である。１番目の２回目の比較処理の対象は、１回目の比較処理で小さい方ではない、と判定されたＳｃｈｏｏｌ［ＰＴＲ［０］］である。このとき、比較相手は存在しないので、Ｓｃｈｏｏｌ［ＰＴＲ［０］］は小さい方であると判定され、
ｗＰＴＲ［１］＝ＰＴＲ［０］＝０
ＴＲ［０］＝１
となる。FIG. 13B is an explanatory diagram of the second comparison process of the first pair. The target of the first second comparison process is School [PTR [0]], which is determined not to be smaller in the first comparison process. At this time, since there is no comparison partner, it is determined that School [PTR [0]] is the smaller one,
wPTR [1] = PTR [0] = 0
TR [0] = 1
It becomes.

同様に、図１３Ｃは２番目のペアの１回目の比較処理を示し、図１３Ｄは２番目のペアの２回目の比較処理を示し、図１３Ｅは３番目のペアの１回目の比較処理を示している。３番目のペアは、実際には、ペアを構成せず、比較対象が１つしか存在しないので、Ｓｃｈｏｏｌ［ＰＴＲ［４］］はそのまま小さい方であると判定される。 Similarly, FIG. 13C shows the first comparison process for the second pair, FIG. 13D shows the second comparison process for the second pair, and FIG. 13E shows the first comparison process for the third pair. ing. Since the third pair does not actually form a pair and there is only one comparison target, it is determined that School [PTR [4]] is the smaller one as it is.

図１３Ｆ及び１３Ｇは、ブロック内コンパイル処理の１段目のマージ処理の後処理の説明図である。後処理では、最初に、ローカル項目値番号配列ＶＮｏが更新される。具体的には、
ＶＮｏ［ｉ］＝ＴＲ［ＶＮｏ［ｉ］］
によってローカル項目値番号が変換される。これは、ペアの比較処理によって、小さいと判定された項目値に対応するローカル項目値番号がローカル項目値番号配列ＶＮｏ中で前方に配置されることを意味している。次に、ポインタ配列ＰＴＲが更新される。具体的には、
ＰＴＲ［ｉ］＝ｗＰＴＲ［ｉ］
に従って、ポインタ配列ＰＴＲに作業ポインタ配列ｗＰＴＲを上書きすることによって、ポインタ配列ＰＴＲが更新される。13F and 13G are explanatory diagrams of post-processing of the first-stage merge processing in the in-block compilation processing. In the post-processing, first, the local item value number array VNo is updated. In particular,
VNo [i] = TR [VNo [i]]
Will convert the local field value number. This means that the local item value number corresponding to the item value determined to be small by the pair comparison process is arranged in front in the local item value number array VNo. Next, the pointer array PTR is updated. In particular,
PTR [i] = wPTR [i]
Thus, the pointer array PTR is updated by overwriting the pointer array PTR with the work pointer array wPTR.

続いて、ブロック内コンパイル処理の２段目のマージ処理について説明する。２段目のマージ処理では、１段目のマージ処理で利用された３番目のペアと２番目のペアとによって、２段目の第１のペアが形成され、１段目の１番目のペア単独で２段目の第２のペアが形成される。本発明の好ましい実施形態によれば、このように、２段目のペアリングを１段目のペアリングと逆順に（すなわち、後方から）実行される。その理由は、処理量をバランスさせるためである。もちろん、１段目と同様に前方からペアリングを行っても構わない。 Next, the second stage merge process of the in-block compilation process will be described. In the second-stage merge process, the third pair and the second pair used in the first-stage merge process form the first pair in the second stage, and the first pair in the first stage. A second pair of second stages is formed alone. According to the preferred embodiment of the present invention, the second-stage pairing is performed in the reverse order to the first-stage pairing (that is, from the rear). The reason is to balance the processing amount. Of course, pairing may be performed from the front as in the first stage.

２段目のマージ処理では、たとえば、１段目のマージ処理の一方のペアの項目値がａ１及びａ２（ａ１＜ａ２）であり、もう一方のペアの項目値がｂ１及びｂ２（ｂ１＜ｂ２）であるならば、最初にａ１とｂ１を比較し、次に、もし、ａ１＜ｂ１であるならば、ａ２とｂ１を比較することにより、ａ１、ａ２、ｂ１、ｂ２の大小関係が決定される。また、もし、ｂ１＝ａ２のように同一値が存在するならば、対応するポインタ配列ＰＴＲの要素の値が小さい方を選択することにより、重複を排除する。 In the second-stage merge processing, for example, the item values of one pair of the first-stage merge processing are a1 and a2 (a1 <a2), and the item values of the other pair are b1 and b2 (b1 <b2). ) First, a1 and b1 are compared, and if a1 <b1, then a2 and b1 are compared to determine the magnitude relationship between a1, a2, b1, and b2. The Also, if the same value exists such as b1 = a2, duplication is eliminated by selecting the smaller value of the element of the corresponding pointer array PTR.

図１４Ａ乃至１４Ｆはブロック内コンパイル処理の２段目のマージ処理の説明図である。図１４Ａは、２段目のマージ処理の１番目のペアの１回目の比較処理を示している。同図の例では、１段目の第２ペアからのＳｃｈｏｏｌ［ＰＴＲ［２］］＝“Ｓｏｕｔｈ”と、第３ペアからのＳｃｈｏｏｌ［ＰＴＲ［４］］＝“Ｓｏｕｔｈ”の比較処理が行われる。本例では、同一値であるため、ｉ、ｊ（ｉ＜ｊ）を０から始まる整数として、一般的に、
ＩＦＳｃｈｏｏｌ［ＰＴＲ［ｉ］］＝Ｓｃｈｏｏｌ［ＰＴＲ［ｊ］］
ＴＨＥＮｗＰＴＲ［ｉ］＝ＰＴＲ［ｉ］
ＩＦＰＴＲ［ｉ］＜ＰＴＲ［ｊ］
ＴＨＥＮＴＲ［ｉ］＝ＰＴＲ［ｉ］
ＴＲ［ｊ］＝ＰＴＲ［ｉ］
ＥＬＳＥＴＲ［ｉ］＝ＰＴＲ［ｊ］
ＴＲ［ｊ］＝ＰＴＲ［ｊ］
に従って、
ｗＰＴＲ［２］＝３
ＴＲ［２］＝２
ＴＲ［４］＝２
のように設定される。14A to 14F are explanatory diagrams of the second-stage merge process of the in-block compilation process. FIG. 14A shows the first comparison process of the first pair in the second-stage merge process. In the example shown in the figure, comparison processing of School [PTR [2]] = “South” from the second pair in the first stage and School [PTR [4]] = “South” from the third pair is performed. . In this example, since they are the same value, i, j (i <j) are generally integers starting from 0,
IF School [PTR [i]] = School [PTR [j]]
THEN wPTR [i] = PTR [i]
IF PTR [i] <PTR [j]
THEN TR [i] = PTR [i]
TR [j] = PTR [i]
ELSE TR [i] = PTR [j]
TR [j] = PTR [j]
According to
wPTR [2] = 3
TR [2] = 2
TR [4] = 2
It is set like this.

図１４Ｂは、２段目のマージ処理の１番目のペアの２回目の比較処理を示している。具体的には、Ｓｃｈｏｏｌ［ＰＴＲ［３］］＝“Ｗｅｓｔ”の単独の比較処理が行われる。比較相手が存在しないので、Ｓｃｈｏｏｌ［ＰＴＲ［３］］は小さい方であると判定され、図１３Ｂに関して説明した処理と同様の処理が行われる。 FIG. 14B shows the second comparison process of the first pair in the second-stage merge process. Specifically, a single comparison process of School [PTR [3]] = “West” is performed. Since there is no comparison partner, it is determined that School [PTR [3]] is the smaller one, and processing similar to that described with reference to FIG. 13B is performed.

以下、図１４Ｃは２段目のマージ処理の２番目のペアの１回目の比較処理を示し、図１４Ｄは２段目のマージ処理の２番目のペアの２回目の比較処理を示している。何れの比較処理も比較相手が存在しないので、それぞれ、Ｓｃｈｏｏｌ［ＰＴＲ［１］］及びＳｃｈｏｏｌ［ＰＴＲ［０］］が小さい方であると判定され、上述と同様の処理が行われる。 14C shows the first comparison process of the second pair of the second-stage merge process, and FIG. 14D shows the second comparison process of the second pair of the second-stage merge process. Since no comparison partner exists in any of the comparison processes, it is determined that School [PTR [1]] and School [PTR [0]] are smaller, and the same process as described above is performed.

図１４Ｅ及び１４Ｆは、ブロック内コンパイル処理の２段目のマージ処理の後処理の説明図である。１段目のマージ処理の後処理と同様に、２段目のマージ処理の後処理では、最初に、ローカル項目値番号配列ＶＮｏが更新される。具体的には、
ＶＮｏ［ｉ］＝ＴＲ［ＶＮｏ［ｉ］］
によってローカル項目値番号が変換される。次に、
ＰＴＲ［ｉ］＝ｗＰＴＲ［ｉ］
に従って、ポインタ配列ＰＴＲに作業ポインタ配列ｗＰＴＲを上書きすることによって、ポインタ配列ＰＴＲが更新される。14E and 14F are explanatory diagrams of post-processing of the second-stage merge processing in the in-block compilation processing. Similar to the post-process of the first-stage merge process, the post-process of the second-stage merge process first updates the local item value number array VNo. In particular,
VNo [i] = TR [VNo [i]]
Will convert the local field value number. next,
PTR [i] = wPTR [i]
Thus, the pointer array PTR is updated by overwriting the pointer array PTR with the work pointer array wPTR.

続いて、ブロック内コンパイル処理の３段目のマージ処理について説明する。３段目のマージ処理では、２段目のマージ処理で利用された１番目のペアと２番目のペアとによって、３段目の第１のペアが形成される。本発明の好ましい実施形態によれば、このように、３段目のペアリングは２段目のペアリングと逆順に（すなわち、後方から）実行される。本例では、１組のペアしか残っていないのでペアリングの順序を考慮する必要はない。 Next, the third-stage merge process of the in-block compilation process will be described. In the third-stage merge process, the first pair in the third stage is formed by the first pair and the second pair used in the second-stage merge process. According to the preferred embodiment of the present invention, the third-stage pairing is thus performed in the reverse order (that is, from the rear) as the second-stage pairing. In this example, since only one pair remains, it is not necessary to consider the order of pairing.

図１５Ａ乃至１５Ｄは、ブロック内コンパイル処理の３段目のマージ処理の説明図である。３段目のマージ処理では、１段目及び２段目のマージ処理と同様の処理が行われる。但し、比較対象のデータの数は、一般に、段数が増加すると共に増加する。本例では、図１５Ａ及び（Ｂ）に示されているように、同一値が頻繁に出現するため、３段目においても比較処理は２回で終了する。図１５Ｃ及び１５Ｄには、３段目のマージ処理の後処理が示されている。このように、最終的なローカル項目値番号配列ＶＮｏと最終的なポインタ配列ＰＴＲとが得られる。ポインタ配列中、格納値が存在しない要素は＊によって示されている。 15A to 15D are explanatory diagrams of the merge process at the third stage of the in-block compilation process. In the third-stage merge process, the same process as the first-stage and second-stage merge processes is performed. However, the number of data to be compared generally increases as the number of stages increases. In this example, as shown in FIGS. 15A and 15B, the same value appears frequently, so the comparison process is completed twice even in the third stage. 15C and 15D show the post-process of the third-stage merge process. Thus, the final local item value number array VNo and the final pointer array PTR are obtained. In the pointer array, elements having no stored value are indicated by *.

最後に、本発明の一実施形態によるブロック内コンパイル処理によれば、最終的なローカル項目値作業配列ｗＶＬ、すなわち、値リストが作成される。図１６は、ブロック内コンパイル処理における値リスト作成処理の説明図である。値リストｗＶＬは、ポインタ配列ＰＴＲの要素の値をポインタとして項目値配列Ｓｃｈｏｏｌから値を読み出し、ポインタ配列ＰＴＲの要素の順に値リストｗＶＬに格納することによって得られる。具体的には、本例では、
ｗＶＬ［０］＝Ｓｃｈｏｏｌ［ＰＴＲ［０］］＝Ｓｃｈｏｏｌ［１］＝“Ｓｏｕｔｈ”
ｗＶＬ［１］＝Ｓｃｈｏｏｌ［ＰＴＲ［１］］＝Ｓｃｈｏｏｌ［０］＝“Ｗｅｓｔ”
である。Finally, according to the in-block compilation process according to the embodiment of the present invention, a final local item value work array wVL, that is, a value list is created. FIG. 16 is an explanatory diagram of a value list creation process in the in-block compilation process. The value list wVL is obtained by reading values from the item value array School using the values of the elements of the pointer array PTR as pointers and storing them in the value list wVL in the order of the elements of the pointer array PTR. Specifically, in this example,
wVL [0] = School [PTR [0]] = School [1] = “South”
wVL [1] = School [PTR [1]] = School [0] = “West”
It is.

以上のブロック内コンパイル処理によって、ローカル項目値番号配列ＶＮｏと最終的なローカル項目値作業配列ｗＶＬが得られる。 The local item value number array VNo and the final local item value work array wVL are obtained by the above-described in-block compilation process.

次に、本発明の一実施形態によるブロック間コンパイル処理について説明する。ブロック間コンパイル処理は、複数台の演算ユニットが並列的かつ階層的に動作して、データ項目毎に、最終的に１個のブロックにマージされるまで、１対のブロックをマージする処理を繰り返し、最終的なブロック番号作業配列と最終的なローカル項目値作業配列と最終的なローカル項目値指定ポインタ配列とを生成するマージ処理と、マージ処理によって生成された最終的なブロック番号作業配列及び最終的なローカル項目値指定ポインタから、項目値指定ポインタ配列を生成する分配処理とを含む。マージ処理によって生成された最終的なローカル項目値作業配列は、グローバル項目値配列に一致する。 Next, inter-block compilation processing according to an embodiment of the present invention will be described. The inter-block compilation process repeats the process of merging a pair of blocks until multiple arithmetic units operate in parallel and hierarchically and are finally merged into one block for each data item. A merge process for generating a final block number work array, a final local item value work array, and a final local item value designation pointer array, and a final block number work array and a final generated by the merge process. Distribution processing for generating an item value designation pointer array from a local item value designation pointer. The final local item value work array generated by the merge process matches the global item value array.

マージ処理では、各演算ユニットが、１対のブロックに関する情報をマージして、マージされたより高い層の１個のブロックに関する情報を生成する。よって、マージ処理は、複数台の演算ユニットの並列動作によって実現される。また、各演算ユニットは、同じ層に属するマージされたよりブロックの対に関する情報をマージし、マージされたさらに高い層の１個のブロックに関する情報を生成する。このようにマージ処理を並列的かつ階層的に繰り返すことにより、最終的に最上層の１個のブロックに関する情報が生成される。最上層の１個のブロックとは、レコード全体を含むブロックである。 In the merge process, each arithmetic unit merges information related to a pair of blocks, and generates information related to one block of the merged higher layer. Therefore, the merge process is realized by a parallel operation of a plurality of arithmetic units. Each arithmetic unit also merges information about merged more block pairs belonging to the same layer and generates information about one block of the merged higher layer. In this way, by repeating the merge processing in parallel and hierarchically, information on one block at the top layer is finally generated. One block in the uppermost layer is a block including the entire record.

たとえば、２^ｎ−１台の演算ユニットが存在し、各演算ユニットが２個のブロックに関する情報を入力し、それらをマージして、１個のブロックに関する情報を出力すると仮定すると、各演算ユニットが１回ずつマージ処理を実行することによって、ｎ段（層）のマージ処理が実現される。この場合、全演算プロセッサによる全データ通信量のうち、演算プロセッサがグローバルメモリとの間で行う通信が占める割合は、１／ｎである。演算ユニット間の通信量は、全データ通信量の（ｎ−１）／ｎである。For example, ^assuming that there are 2 ⁿ -1 arithmetic units, each arithmetic unit inputs information about two blocks, merges them and outputs information about one block, each arithmetic unit has By executing the merge process once, an n-stage (layer) merge process is realized. In this case, of the total data communication amount by all the arithmetic processors, the ratio occupied by the communication performed by the arithmetic processor with the global memory is 1 / n. The communication amount between the arithmetic units is (n−1) / n of the total data communication amount.

図１７Ａ乃至１７Ｃはブロック間コンパイル処理におけるマージ処理の概要図である。同図の例では、たとえば、少なくとも７台の演算ユニットＳＰＵ−０、ＳＰＵ−１、・・・、ＳＰＵ−６が、ブロックＢｌｏｃｋ−０からブロックＢｌｏｃｋ−１２７までの１２８ブロックのマージ処理を実行している。 17A to 17C are schematic diagrams of merge processing in inter-block compilation processing. In the example of the figure, for example, at least seven arithmetic units SPU-0, SPU-1,..., SPU-6 execute a merge process of 128 blocks from block Block-0 to block Block-127. ing.

図１７Ａの１回目の処理は、ブロックＢｌｏｃｋ−０からブロックＢｌｏｃｋ−７までのマージと、ブロックＢｌｏｃｋ−８からブロックＢｌｏｃｋ−１５までのマージと、以下同様に、ブロックＢｌｏｃｋ−１２０からブロックＢｌｏｃｋ−１２７までのマージの１６回のマージ、すなわち、８個のブロックを１個のブロックにマージする処理を１６回に亘って逐次的に実行する。たとえば、１段目では、ＳＰＵ−０がブロック０とブロック１をマージしてブロック０〜１を出力し、ＳＰＵ−１がＢｌｏｃｋ−２とＢｌｏｃｋ−３をマージしてブロック２〜３を出力する。ここで、ブロック０〜１のようなブロックＡ〜Ｂという表記は、ブロックＡからブロックＢまでをマージした結果として得られるブロックを表している。次に、２段目において、ＳＰＵ−４がＳＰＵ−０によってマージされたブロック０〜１と、ＳＰＵ−１によってマージされたブロック２〜３とをマージして、ブロック０〜３を出力する。同様に、ＳＰＵ−４はブロック４〜７を出力する。３段目のＳＰＵ−６は、ＳＰＵ−４によって出力されたブロック０〜３と、ＳＰＵ−５によって出力されたブロック４〜７とをマージして、ブロック０〜７を出力する。このような３段回のマージ処理が、さらに繰り返し実行されることによって、ブロック０〜７、ブロック８〜１５、・・・、ブロック１２０〜１２７の１６個のブロックが出力される。尚、図中、白抜きの矢印は、グローバルメモリと演算ユニット内のローカルメモリとの間の入出力を表し、黒色の矢印は、チップ内バスを経由する演算ユニット内のローカルメモリ間のデータ転送を表している。 The first processing in FIG. 17A is performed by merging from block Block-0 to block Block-7, merging from block Block-8 to block Block-15, and so on from block Block-120 to block Block-127. Up to 16 merges, that is, the process of merging 8 blocks into one block is executed sequentially 16 times. For example, in the first stage, SPU-0 merges block 0 and block 1 to output blocks 0 to 1, and SPU-1 merges Block-2 and Block-3 to output blocks 2-3. . Here, the notations of blocks A to B such as blocks 0 to 1 represent blocks obtained as a result of merging from block A to block B. Next, in the second stage, SPU-4 merges blocks 0-1 merged by SPU-0 and blocks 2-3 merged by SPU-1, and outputs blocks 0-3. Similarly, SPU-4 outputs blocks 4-7. The third-stage SPU-6 merges the blocks 0-3 output by the SPU-4 with the blocks 4-7 output by the SPU-5, and outputs the blocks 0-7. By repeating such a three-stage merge process repeatedly, 16 blocks of blocks 0 to 7, blocks 8 to 15,..., And blocks 120 to 127 are output. In the figure, white arrows indicate input / output between the global memory and the local memory in the arithmetic unit, and black arrows indicate data transfer between the local memory in the arithmetic unit via the intra-chip bus. Represents.

図１７Ｂの２回目の処理は、１回目の処理によって出力された１６個のブロックのうち、ブロック０〜７、・・・、ブロック５６〜６３の８個のブロックをマージして、ブロック０〜６３を出力し、ブロック６４〜７１、・・・、ブロック１２０〜１２７の８個のブロックをマージして、ブロック６４〜１２７を出力する。 In the second process of FIG. 17B, among the 16 blocks output by the first process, 8 blocks of blocks 0 to 7,... 63 is output, and the blocks 64 to 71,..., And the blocks 120 to 127 are merged to output the blocks 64 to 127.

さらに、図１７Ｃの３回目の処理では、ＳＰＵ−０が、ブロック０〜６３とブロック６４〜１２７をマージして、最終的な１個のブロック０〜１２７を出力する。 Further, in the third process of FIG. 17C, SPU-0 merges blocks 0-63 and blocks 64-127, and outputs one final block 0-127.

次に、本発明の一実施形態によるブロック間コンパイルにおけるマージ処理をより詳細に説明する。図１８Ａ及び１８Ｂは、本発明の一実施形態によるブロック間コンパイル処理が適用されるブロック内コンパイル処理の結果の説明図である。たとえば、図１８Ａは、データ項目＝“Ｓｃｈｏｏｌ”に関して、ブロック内コンパイル処理の結果として生成された、ブロック０からブロック７までの８個のブロックの項目情報を示し、図１８Ｂは、データ項目＝“Ａｇｅ”に関して生成されたブロックの項目情報を示している。項目情報は、ローカル項目値番号配列ＶＮｏと、ブロック番号作業配列ＢｌｋＮｏと、ローカル項目値指定ポインタ作業配列ＬＶＬと、ローカル項目値作業配列ｗＶＬとを含む。本例は、図１７Ａに示されたブロック０からブロック７までの８個のブロックから、ＳＰＵ−０からＳＰＵ−６までの７台の演算ユニットを使用して、１個のブロック０〜７を生成するマージ処理に相当する。 Next, the merge process in inter-block compilation according to an embodiment of the present invention will be described in more detail. 18A and 18B are explanatory diagrams of results of intra-block compilation processing to which inter-block compilation processing according to an embodiment of the present invention is applied. For example, FIG. 18A shows item information of eight blocks from block 0 to block 7 generated as a result of the in-block compilation processing for data item = “School”, and FIG. 18B shows data item = “ The item information of the block produced | generated regarding "Age" is shown. The item information includes a local item value number array VNo, a block number work array BlkNo, a local item value designation pointer work array LVL, and a local item value work array wVL. In this example, one block 0 to 7 is changed from 8 blocks from block 0 to block 7 shown in FIG. 17A using 7 arithmetic units from SPU-0 to SPU-6. This corresponds to the merge process to be generated.

最初に、ＳＰＵ−０がブロック０とブロック１をマージする例を説明する。図１９Ａ乃至１９Ｃは、ブロック間コンパイル処理における１段目のマージ処理の説明図である。ＳＰＵ−０は、ブロック０及びブロック１に関する項目情報をグローバルメモリからローカルメモリへ転送すると共に、さらなるブロック番号作業配列ＢｌｋＮｏ’、さらなるローカル項目値指定ポインタ作業配列ＬＶＬ’、及び、さらなるローカル項目値作業配列ｗＶＬ’を初期化する。また、ローカル項目値作業配列ｗＶＬからの読み出し用ポインタも初期化される。以下、特に断らない限り、動作の主体は演算ユニットＳＰＵ−０である。 First, an example in which SPU-0 merges block 0 and block 1 will be described. 19A to 19C are explanatory diagrams of the first-stage merge process in the inter-block compilation process. SPU-0 transfers item information for block 0 and block 1 from global memory to local memory, and further block number work array BlkNo ′, further local item value designation pointer work array LVL ′, and further local item value work The array wVL ′ is initialized. A pointer for reading from the local item value work array wVL is also initialized. Hereinafter, unless otherwise specified, the operation subject is the arithmetic unit SPU-0.

次に、両方のブロックからのローカル項目値作業配列ｗＶＬの格納値が比較され、小さい方の格納値が、さらなるローカル項目値作業配列ｗＶＬ’へ、先頭から順番に転送される。小さい方の格納値に対応するブロック番号作業配列ＢｌｋＮｏの内容がさらなるブロック番号作業配列ＢｌｋＮｏ’へ転送される。たとえば、配列ＬＶＬに同じ値が連続して格納されているならば、その同じ値が格納されている回数だけブロック番号作業配列ＢｌｋＮｏからさらなるブロック番号作業配列ＢｌｋＮｏ’への転送が繰り返される。そして、さらなるブロック番号作業配列ＢｌｋＮｏ’へ値が書き込まれた回数と同じ個数の順序番号（初期値＝０）がさらなるローカル項目値指定ポインタ作業配列ＬＶＬ’に格納される。 Next, the stored values of the local item value work array wVL from both blocks are compared, and the smaller stored value is transferred to the further local item value work array wVL 'in order from the top. The contents of the block number work array BlkNo corresponding to the smaller stored value are transferred to the further block number work array BlkNo '. For example, if the same value is continuously stored in the array LVL, the transfer from the block number work array BlkNo to the further block number work array BlkNo 'is repeated as many times as the same value is stored. Then, the same number of sequence numbers (initial value = 0) as the number of times the value has been written to the further block number work array BlkNo 'is stored in the further local item value designation pointer work array LVL'.

最後に、小さい方の値が格納されていたローカル項目値番号作業配列ｗＶＬからの読み出し用ポインタが１つ後へシフトさせられる。 Finally, the pointer for reading from the local item value number work array wVL in which the smaller value is stored is shifted backward by one.

もし、両方のブロックからのローカル項目値作業配列ｗＶＬの格納値が同一であるならば、いずれかの格納値がさらなるローカル項目値作業配列ｗＶＬ’へ転送される。次に、両方のブロックからのブロック番号作業配列ＢｌｋＮｏの内容のうち、値の小さい方のブロック番号がさらなるブロック番号作業配列ＢｌｋＮｏ’へ転送される。この場合も、配列ＬＶＬに同じ値が連続して格納されているならば、その個数分だけブロック番号作業配列ＢｌｋＮｏからさらなるブロック番号作業配列ＢｌｋＮｏ’への転送が繰り返される。続いて、両方のブロックからのブロック番号作業配列ＢｌｋＮｏの内容のうち、値の大きい方のブロック番号もまた同様に、さらなるブロック番号作業配列ＢｌｋＮｏ’へ転送される。ローカル項目値指定ポインタ作業配列に同じ値が連続して格納されているならば、その個数分だけブロック番号の転送が繰り返されることも同様である。そして、さらなるブロック番号作業配列ＢｌｋＮｏ’へ値が書き込まれた回数と同じ個数の順序番号がさらなるローカル項目値指定ポインタ作業配列ＬＶＬ’に格納される。最後に、小さい方の値が格納されていたローカル項目値番号作業配列ｗＶＬからの読み出し用ポインタが１つ後へシフトさせられる。 If the stored values of the local item value work array wVL from both blocks are the same, any stored value is transferred to the further local item value work array wVL '. Next, among the contents of the block number work array BlkNo from both blocks, the block number having the smaller value is transferred to the further block number work array BlkNo '. Also in this case, if the same value is continuously stored in the array LVL, the transfer from the block number work array BlkNo to the further block number work array BlkNo 'is repeated by that number. Subsequently, among the contents of the block number work array BlkNo from both blocks, the block number having the larger value is also transferred to the further block number work array BlkNo '. If the same value is continuously stored in the local item value designation pointer work array, the transfer of the block number is repeated as many times as the same value. Then, the same number of sequence numbers as the number of times the value is written to the further block number work array BlkNo 'is stored in the further local item value designation pointer work array LVL'. Finally, the pointer for reading from the local item value number work array wVL in which the smaller value is stored is shifted backward by one.

図１９Ａ乃至１９Ｃに戻ると、図１９Ａでは、ブロック０のｗＶＬ［０］＝“Ｓｏｕｔｈ”とブロック１のｗＶＬ［０］＝“Ｎｏｒｔｈ”とが比較される。“Ｎｏｒｔｈ”の方が小さいので、ｗＶＬ’［０］＝ｗＶＬ［０］＝“Ｎｏｒｔｈ”のように、項目値が転送される。そして、ブロック１のブロック番号作業配列の内容ＢｌｋＮｏ［０］＝１が、さらなるブロック番号作業配列ＢｌｋＮｏ’［０］へ転送され、すなわち、ＢｌｋＮｏ’［０］＝１となる。ＬＶＬ’［０］には順序番号＝０が格納される。その後、ブロック１側のｗＶＬの読み出し用ポインタが後へシフトさせられる。 Returning to FIGS. 19A to 19C, in FIG. 19A, wVL [0] = “South” in block 0 is compared with wVL [0] = “North” in block 1. Since “North” is smaller, the item value is transferred as wVL ′ [0] = wVL [0] = “North”. Then, the content BlkNo [0] = 1 of the block number work array of block 1 is transferred to the further block number work array BlkNo '[0], that is, BlkNo' [0] = 1. The sequence number = 0 is stored in LVL ′ [0]. Thereafter, the wVL read pointer on the block 1 side is shifted backward.

同様に、図１９Ｂでは、ブロック０のｗＶＬ［０］とブロック１のｗＶＬ［１］とが比較される。両方の値は、共に“Ｓｏｕｔｈ”であり、一致する。よって、ｗＶＬ’［１］＝“Ｓｏｕｔｈ”、ＢｌｋＮｏ’［１］＝０、ＢｌｋＮｏ’［２］＝１となる。さらに、ＬＶＬ’［１］＝１、ＬＶＬ’［２］＝１のように設定される。最後に、ブロック１側と、ブロック２側の両方のｗＶＬからの読み出し用ポインタが進められる。 Similarly, in FIG. 19B, wVL [0] of block 0 and wVL [1] of block 1 are compared. Both values are “South” and match. Therefore, wVL ′ [1] = “South”, BlkNo ′ [1] = 0, and BlkNo ′ [2] = 1. Further, LVL ′ [1] = 1 and LVL ′ [2] = 1 are set. Finally, pointers for reading from wVL on both the block 1 side and the block 2 side are advanced.

同様に、図１９Ｃでは、ブロック０のｗＶＬ［１］とブロック１のｗＶＬ［２］とが比較される。両方の値は一致するので、図１９Ｂと同様の処理が行われる。 Similarly, in FIG. 19C, wVL [1] of block 0 and wVL [2] of block 1 are compared. Since both values match, the same processing as in FIG. 19B is performed.

このようにして、図１９Ｂに示されるように、ＳＰＵ−０がブロック０とブロック１をマージすることにより、さらなるブロック番号配列ＢｌｋＮｏ’と、さらなる項目値指定ポインタ配列ＬＶＬ’と、さらなる項目値作業配列ｗＶＬ’とからなる組が生成される。 In this way, as shown in FIG. 19B, SPU-0 merges block 0 and block 1 to further block number array BlkNo ′, further item value designation pointer array LVL ′, and further item value work. A set consisting of the array wVL ′ is generated.

以上の処理により、ブロック０とブロック１の２つのブロックからのＢｌｋＮｏ、ＬＶＬ及びｗＶＬの組のペアは、１組のＢｌｋＮｏ’、ＬＶＬ’及びｗＶＬ’に変換されていることがわかる。この処理を並列的かつ階層的に繰り返すことにより、ＢｌｋＮｏ、ＬＶＬ及びｗＶＬの多数の組が、１組のＢｌｋＮｏ’、ＬＶＬ’及びｗＶＬ’にマージされ得ることが明らかである。ここで、注意すべき点は、図１９Ａ乃至１９Ｃに関して説明した操作は、シーケンシャルアクセスだけを使って実現できることである。これにより、ブロック番号配列ＢｌｋＮｏ、ローカル項目値指定ポインタ作業配列ＬＶＬ、及び、ローカル項目値作業配列ｗＶＬ’のサイズが大きくなっても、演算ユニット内のローカルメモリで処理可能である。また、さらなるブロック番号配列ＢｌｋＮｏ’は、さらなるローカル項目値指定ポインタ作業配列ＬＶＬ’に格納される値が同一である限り、必ず昇順になることに注意すべきである。 Through the above processing, it can be seen that a pair of BlkNo, LVL, and wVL from the two blocks of block 0 and block 1 has been converted into one set of BlkNo ′, LVL ′, and wVL ′. It is clear that by repeating this process in parallel and hierarchically, multiple sets of BlkNo, LVL and wVL can be merged into one set of BlkNo ', LVL' and wVL '. Here, it should be noted that the operations described with reference to FIGS. 19A to 19C can be realized using only sequential access. As a result, even if the block number array BlkNo, the local item value designation pointer work array LVL, and the local item value work array wVL 'increase in size, they can be processed in the local memory in the arithmetic unit. It should be noted that the further block number array BlkNo 'is always in ascending order as long as the values stored in the further local item value designation pointer work array LVL' are the same.

今度は、演算ユニットＳＰＵ−４が、ＳＰＵ−０から出力されたブロック０〜１と、ＳＰＵ−１１から出力されたブロック２〜３を１つのブロックにマージする２段目のマージ処理について説明する。図２０Ａ乃至２０Ｄは、本発明の一実施形態によるブロック間コンパイルにおける２段目のマージ処理の説明図である。２段目のマージ処理は、入力される情報が他の演算ユニットのローカルメモリから転送される点を除いて、１段目のマージ処理と同様である。この処理を簡単に説明すると、最初、２つのブロックからのｗＶＬの読み出し用ポインタが先頭に設定される。ｗＶＬから値を読み出して、比較した後、大きくない方の値がｗＶＬ’に設定される。その後、ＢｌｋＮｏからＢｌｋＮｏ’へのブロック番号の転送と、ＬＶＬ’への順序番号の書き込みが行われる。最後に、大きくない方の値が読み出されたｗＶＬ’からの読み出し用ポインタが進められる。図２０Ａ、２０Ｂ、２０Ｃ及び２０Ｄには、この処理の過程と、得られたさらなるブロック番号配列ＢｌｋＮｏ’と、さらなるローカル項目値指定ポインタ作業配列ＬＶＬ’と、ローカル項目値作業配列ｗＶＬ’とが示されている。 This time, the second merging process in which the arithmetic unit SPU-4 merges the blocks 0 to 1 output from the SPU-0 and the blocks 2 to 3 output from the SPU-11 into one block will be described. . 20A to 20D are explanatory diagrams of the second-stage merge process in the inter-block compilation according to the embodiment of the present invention. The second-stage merge process is the same as the first-stage merge process, except that the input information is transferred from the local memory of another arithmetic unit. Briefly describing this processing, first, a pointer for reading wVL from two blocks is set at the head. After the value is read from wVL and compared, the smaller value is set to wVL '. Thereafter, the block number is transferred from BlkNo to BlkNo ′, and the sequence number is written to LVL ′. Finally, the pointer for reading from wVL ′ from which the smaller value is read is advanced. 20A, 20B, 20C and 20D show the process, the further block number array BlkNo ′ obtained, the further local item value designation pointer work array LVL ′, and the local item value work array wVL ′. Has been.

ブロック間コンパイル処理の１段目のマージ処理と２段目のマージ処理を終えると、ブロック０からブロック３までがマージされたブロック０〜３と、ブロック４からブロック７までがマージされたブロック４〜７が得られる。ＳＰＵ−６は、ＳＰＵ−４によって出力されるブロック０〜３と、ＳＰＵ−５によって出力されるブロック４〜７とを受信し、２つのブロックのマージ処理を同様に実行する。これにより、最終的な１個のブロックであるＢｌｏｃｋ０〜７が得られる。図２１Ａ及び２１Ｂは、本発明の一実施形態によるブロック間コンパイル処理における３段目のマージ処理の結果を説明する図である。図２１Ａは、データ項目「Ｓｃｈｏｏｌ」に関するマージ前のブロック毎の情報を表し、図２１Ｂは、データ項目「Ｓｃｈｏｏｌ」に関するブロック間コンパイルによるマージ処理の結果を表している。 When the first-stage merge process and the second-stage merge process of the inter-block compilation process are finished, blocks 0 to 3 in which blocks 0 to 3 are merged and block 4 to which blocks 4 to 7 are merged ~ 7 are obtained. The SPU-6 receives the blocks 0 to 3 output by the SPU-4 and the blocks 4 to 7 output by the SPU-5, and similarly executes the merge processing of the two blocks. As a result, Blocks 0 to 7 as one final block are obtained. 21A and 21B are diagrams for explaining the result of the third-stage merge process in the inter-block compilation process according to the embodiment of the present invention. FIG. 21A shows the information for each block before merging regarding the data item “School”, and FIG. 21B shows the result of the merging process by inter-block compilation regarding the data item “School”.

ここで、最終的なさらなるローカル項目値作業配列ｗＶＬ’は、グローバル項目値配列に一致することに注意すべきである。図２１Ｂのブロック番号作業配列ＢｌｋＮｏ’と、ローカル項目値指定ポインタ配列ＬＶＬ’と、ローカル項目値作業配列ｗＶＬ’との組と、図２１Ａのブロック毎の情報との間の関連性について説明する。図２１Ａは、たとえば、ブロック０の情報を参照すると、ブロック０内に含まれる項目値の値リストｗＶＬと、ブロック０内での項目値のランクを表している。つまり、ブロック０において、ランク＝０の項目値は”Ｓｏｕｔｈ”であり、ランク＝１の項目値は“Ｗｅｓｔ”である。一方、図２１Ｂを参照すると、ＢｌｋＮｏ’［０］＝２と、ＬＶＬ’［０］＝０と、ｗＶＬ［ＬＶＬ［０］］＝“Ｅａｓｔ”は、ブロック番号＝２には、項目値＝“Ｅａｓｔ”が含まれていることがわかる。さらに、ＢｌｋＮｏ’［０］は、配列ＢｌｋＮｏ’の中でブロック番号＝２が最初に出現する要素であることから、ブロック番号＝２のブロックにおける値リストの先頭の項目値は“Ｅａｓｔ”であることもわかる。また、ＬＶＬ’を先頭から走査すると、ＬＶＬ’［０］＝ＬＶＬ’［１］＝ＬＶＬ’［２］＝ＬＶＬ’［３］＝０から、ＢｌｋＮｏ’［０］＝２とＢｌｋＮｏ’［１］＝４とＢｌｋＮｏ’［２］＝５とＢｌｋＮｏ’［３］＝７の４個のブロック、すなわち、ブロック２、ブロック４、ブロック５及びブロック７のブロックにおける値リストの先頭の項目値は、ｗＶＬ’「０」＝“Ｅａｓｔ”であることもわかる。 It should be noted here that the final further local item value work array wVL 'matches the global item value array. The relationship between the block number work array BlkNo ′, the local item value designation pointer array LVL ′, and the local item value work array wVL ′ in FIG. 21B and the information for each block in FIG. 21A will be described. FIG. 21A shows, for example, the value list wVL of the item values included in the block 0 and the rank of the item value in the block 0 when the information of the block 0 is referred to. That is, in block 0, the item value of rank = 0 is “South”, and the item value of rank = 1 is “West”. On the other hand, referring to FIG. 21B, BlkNo ′ [0] = 2, LVL ′ [0] = 0, wVL [LVL [0]] = “East”, the item value = “ It can be seen that “East” is included. Furthermore, since BlkNo ′ [0] is the element in which block number = 2 appears first in the array BlkNo ′, the first item value in the value list in the block with block number = 2 is “East”. I understand that. When LVL ′ is scanned from the top, LVL ′ [0] = LVL ′ [1] = LVL ′ [2] = LVL ′ [3] = 0, and BlkNo ′ [0] = 2 and BlkNo ′ [1] = 4 and BlkNo ′ [2] = 5 and BlkNo ′ [3] = 7, that is, the first item value in the value list in the block 2, block 4, block 5 and block 7 is wVL It can also be seen that “0” = “East”.

本発明の一実施形態によるブロック間コンパイル処理は、マージ処理の後に、分配処理を実行する。分配処理では、複数台の演算ユニットが並列的に動作して、データ項目毎に、最終的なローカル項目値指定ポインタ作業配列中の要素を最終的なブロック番号作業配列中の対応する要素によって指定されたブロック番号毎に分配し所定の順番に並べることにより、ローカル項目値番号によって表される上記項目値が最終的なローカル項目値作業配列（すなわち、グローバル項目値配列）中で格納されている位置を指定するポインタを格納する項目値指定ポインタ配列を作成しグローバルメモリへ転送する。たとえば、図２１Ａに示された例では、ブロック０内のＬＶＬ［０］＝０によって指定される項目値ｗＶＬ［０］＝”Ｓｏｕｔｈ”がグローバル項目値配列ｗＶＬ’の何番目の要素であるかを指定する項目値ポインタ配列が獲得される。 In the inter-block compilation process according to the embodiment of the present invention, the distribution process is executed after the merge process. In the distribution process, multiple arithmetic units operate in parallel, and for each data item, the element in the final local item value specification pointer work array is specified by the corresponding element in the final block number work array. The item values represented by the local item value numbers are stored in the final local item value work array (that is, the global item value array) by distributing each block number and arranging them in a predetermined order. Creates an item value specification pointer array that stores a pointer that specifies the position, and transfers it to the global memory. For example, in the example shown in FIG. 21A, the element number wVL [0] = “South” specified by LVL [0] = 0 in the block 0 is the number element of the global item value array wVL ′. An item value pointer array specifying is obtained.

このため、本発明の一実施形態によるブロック間コンパイル処理における分配では、図２１Ｂに示されたデータが取得されているならば、たとえば、
ｆｏｒ（ｉ＝０；ｉ＜１９；ｉ＋＋）｛
ｉｎｄｅｘ［ｉ］＝０；
｝
ｆｏｒ（ｉ＝０；ｉ＜１９；ｉ＋＋）｛
ＬＶＬ［ＢｌｋＮｏ’［ｉ］］［ｉｎｄｅｘ［ｉ］］＝ＬＶＬ’［ｉ］；
ｉｎｄｅｘ［ｉ］＋＋；
｝
によって記述される操作にしたがって、ローカル項目値指定ポインタをブロック番号毎に分配される。これにより得られた配列ＬＶＬは、まさに項目値指定ポインタ配列である。上記の分配処理は１台のプロセッサで操作を実施する場合の操作に対応している。しかし、本発明の一実施形態によれば、好ましくは、複数台の演算ユニットを用いて、配列ＬＶＬが生成される。そのため、ブロック番号作業配列ＢｌｋＮｏ’とローカル項目値指定ポインタ作業配列ＬＶＬ’を分割して、複数台の演算ユニットに割り当てる。次に、各演算ユニットが、担当するＢｌｋＮｏ’とＬＶＬ’とに関して、ブロック番号毎にローカル項目値指定ポインタを分配する。最後に、各演算ユニットに分散して保持されているローカル項目値指定ポインタを、ブロック番号毎に１つに統合する。これにより、項目値指定ポインタ配列ＬＶＬが得られる。Therefore, in the distribution in the inter-block compilation process according to the embodiment of the present invention, if the data shown in FIG.
for (i = 0; i <19; i ++) {
index [i] = 0;
}
for (i = 0; i <19; i ++) {
LVL [BlkNo ′ [i]] [index [i]] = LVL ′ [i];
index [i] ++;
}
The local item value designation pointer is distributed for each block number in accordance with the operation described by. The array LVL thus obtained is exactly the item value designation pointer array. The above distribution processing corresponds to an operation when the operation is performed by one processor. However, according to an embodiment of the present invention, the array LVL is preferably generated using a plurality of arithmetic units. Therefore, the block number work array BlkNo ′ and the local item value designation pointer work array LVL ′ are divided and assigned to a plurality of arithmetic units. Next, each arithmetic unit distributes a local item value designation pointer for each block number with respect to BlkNo ′ and LVL ′ in charge. Finally, the local item value designation pointers distributed and held in each arithmetic unit are integrated into one for each block number. Thereby, the item value designation pointer array LVL is obtained.

図２２は、本発明の一実施形態によるブロック間コンパイル処理における分配処理の説明図である。同図に示されるように、配列ＢｌｋＮｏ’と配列ＬＶＬ’は、複数台の演算ユニット（本例では、ＳＰＵ−０からＳＰＵ−７の８台の演算ユニット）に割り当てられる。たとえば、ＳＰＵ−０は、０≦ｉ≦２の範囲内で、ＢｌｋＮｏ’［ｉ］及びＬＶＬ’［ｉ］の処理を担当する。ＢｌｋＮｏ’［０］＝２であり、ＬＶＬ’［０］＝０であるので、ＳＰＵ−０のローカルメモリ内のｗＬＶＬ−２［０］にＬＶＬ’［０］＝０を設定する。同様に、ＳＰＵ−１は、３≦ｉ≦４の範囲内で、ＢｌｋＮｏ’［ｉ］及びＬＶＬ’［ｉ］の処理を担当する。ＢｌｋＮｏ’［３］＝７であり、ＬＶＬ’［３］＝０であるので、ＳＰＵ−１のローカルメモリ内のｗＬＶＬ−２［０］にＬＶＬ’［０］＝０を設定する。この処理をすべての演算ユニットが並列的に実行される。本例では、演算ユニット毎に、ブロック０からブロック７までの配列ｗＬＶＬ−０からｗＬＶＬ−７が使用されている。この場合、演算ユニットの台数（＝８個）倍の作業領域がローカルメモリとグローバルメモリに確保されることになる。この作業領域をコンパクトにするため、リンクリストを使用しても構わない。重要なことは、作業領域がローカルメモリに格納できる間は、その作業領域をローカルメモリに収容し、作業領域がローカルメモリに収容できなくことが分かった時点で、ローカルメモリ中の作業領域の全部又は一部をある程度まとめてグローバルメモリへ転送することによって、グローバルメモリへのメモリアクセスを一括化することが可能である。 FIG. 22 is an explanatory diagram of distribution processing in inter-block compilation processing according to an embodiment of the present invention. As shown in the figure, the array BlkNo 'and the array LVL' are assigned to a plurality of arithmetic units (in this example, eight arithmetic units SPU-0 to SPU-7). For example, the SPU-0 takes charge of the processing of BlkNo ′ [i] and LVL ′ [i] within the range of 0 ≦ i ≦ 2. Since BlkNo ′ [0] = 2 and LVL ′ [0] = 0, LVL ′ [0] = 0 is set in wLVL-2 [0] in the local memory of SPU-0. Similarly, SPU-1 is in charge of processing BlkNo ′ [i] and LVL ′ [i] within a range of 3 ≦ i ≦ 4. Since BlkNo ′ [3] = 7 and LVL ′ [3] = 0, LVL ′ [0] = 0 is set in wLVL-2 [0] in the local memory of SPU-1. All processing units are executed in parallel in this process. In this example, an array wLVL-0 to wLVL-7 from block 0 to block 7 is used for each arithmetic unit. In this case, a work area equal to the number of arithmetic units (= 8) is secured in the local memory and the global memory. In order to make this work area compact, a linked list may be used. The important thing is that as long as the work area can be stored in local memory, the work area is accommodated in local memory, and when it is found that the work area cannot be accommodated in local memory, the entire work area in local memory can be stored. Alternatively, memory access to the global memory can be unified by transferring a part of the data to the global memory.

図２３は、本発明の一実施形態によるブロック間コンパイル処理における分配処理の結果を示す図である。たとえば、ブロック番号１に関して、ＳＰＵ−１のｗＬＶＬ−１に値１が格納され、ＳＰＵ−４のｗＬＶＬ−１に値２が格納され、ＳＰＵ−６のｗＬＶＬ−１に値３が格納されている。これらを１つに結合することによって、ブロック１に関する項目値指定ポインタ配列ＬＶＬ−１が得られる。具体的には、ポインタ値の小さい順に、すなわち、ＳＰＵ−０からＳＰＵ−７の順に、ｗＬＶＬ−１に格納されている値を取り出し、その値をＬＶＬ−１の先頭から順に格納すればよい。この操作は、たとえば、いずれか１台の演算ユニット、又は、制御ユニットが実行可能であるが、ブロック番号毎に、複数台の演算ユニットが並列的に、ＷＬＶＬから値を取り出し、ＬＶＬへ書き込むようにしてもよい。 FIG. 23 is a diagram showing a result of distribution processing in inter-block compilation processing according to an embodiment of the present invention. For example, for block number 1, value 1 is stored in wLVL-1 of SPU-1, value 2 is stored in wLVL-1 of SPU-4, and value 3 is stored in wLVL-1 of SPU-6 . By combining these into one, an item value designation pointer array LVL-1 relating to block 1 is obtained. Specifically, the values stored in wLVL-1 may be extracted in ascending order of pointer values, that is, in the order of SPU-0 to SPU-7, and the values may be stored in order from the beginning of LVL-1. This operation can be executed by, for example, any one arithmetic unit or control unit, but for each block number, a plurality of arithmetic units extract values from WLVL in parallel and write them to LVL. It may be.

図２２及び２３の例では、ブロック数が増加すると、ローカルメモリ上の作業領域も増大する。そのため、本発明の代替的な実施形態では、特に、ブロック数が多い場合に、処理を効率化するために、複数のブロックをグループ化した後に、グループ毎に分配処理を実施する。たとえば、ブロック番号を４で除算することにより、上位ブロック番号と、下位ブロック番号に分離し（グループ化し）、上位ブロック番号と下位ブロック番号に関して別々に分配処理を適用する。具体的には、ブロック間コンパイル処理におけるマージ処理によって得られたブロック番号作業配列ＢｌｋＮｏ’とローカル項目値指定ポインタ作業配列ＬＶＬ’の組から、上位ブロック番号用のブロック番号作業配列ＢｌｋＮｏ’とローカル項目値指定ポインタ作業配列ＬＶＬ’の組と、下位ブロック番号用のブロック番号作業配列ＢｌｋＮｏ’とローカル項目値指定ポインタ作業配列ＬＶＬ’の組が生成される。この処理もまた、複数台の演算ユニットが並列的に動作して、実行可能である。 22 and 23, as the number of blocks increases, the work area on the local memory also increases. Therefore, in an alternative embodiment of the present invention, particularly when the number of blocks is large, a distribution process is performed for each group after a plurality of blocks are grouped in order to improve the processing efficiency. For example, the block number is divided by 4 to separate the upper block number and the lower block number (grouping), and the distribution process is separately applied to the upper block number and the lower block number. Specifically, the block number work array BlkNo ′ for the upper block number and the local item are obtained from the set of the block number work array BlkNo ′ and the local item value designation pointer work array LVL ′ obtained by the merge process in the inter-block compilation process. A set of value designation pointer work array LVL ′ and a set of block number work array BlkNo ′ for the lower block number and local item value designation pointer work array LVL ′ are generated. This processing can also be executed by a plurality of arithmetic units operating in parallel.

図２４は、本発明の代替的な実施形態によるブロック間コンパイル処理におけるブロックグループ化処理の説明図である。たとえば、ＢｌｋＮｏ’［０］〜ＢｌｋＮｏ’［２］はＳＰＵ−０の担当範囲に属する。ＢｌｋＮｏ’［０］＝２は、３以下の値であり、下位ブロック番号に含まれるので、ＳＰＵ−０は、ローカルメモリ中の下位ブロック番号用のＢｌｋＮｏ及びＬＶＬ−１にＢｌｋＮｏ［０］＝ＢｌｋＮｏ｀［０］及びＬＶＬ−０［０］＝ＬＶＬ’［０］を設定する。一方、ＳＰＵ−３は、ＢｌｋＮｏ’［１３］〜ＢｌｋＮｏ’［１４］を担当する。ＢｌｋＮｏ’［１３］＝６は、４以上の値であり、上位ブロック番号に含まれるので、ＳＰＵ−３は、ローカルメモリ中の上位ブロック番号用のＢｌｋＮｏ及びＬＶＬ−１にＢｌｋＮｏ［０］＝ＢｌｋＮｏ｀［１３］及びＬＶＬ−１［０］＝ＬＶＬ’［１３］を設定する。このような処理を続けることにより、ブロックがグループ化される。 FIG. 24 is an explanatory diagram of block grouping processing in inter-block compilation processing according to an alternative embodiment of the present invention. For example, BlkNo ′ [0] to BlkNo ′ [2] belong to the range in charge of SPU-0. Since BlkNo ′ [0] = 2 is a value of 3 or less and is included in the lower block number, SPU-0 sets BlkNo [0] = BlkNo to BlkNo and LVL-1 for the lower block number in the local memory. Set ｀ [0] and LVL-0 [0] = LVL ′ [0]. On the other hand, the SPU-3 takes charge of BlkNo ′ [13] to BlkNo ′ [14]. Since BlkNo ′ [13] = 6 is a value of 4 or more and is included in the upper block number, SPU-3 assigns BlkNo [0] = BlkNo to BlkNo and LVL-1 for the upper block number in the local memory. Set [13] and LVL-1 [0] = LVL ′ [13]. By continuing such processing, blocks are grouped.

図２５は、本発明の代替的な実施形態によるブロック間コンパイル処理におけるブロックグループ化処理の結果を示す図である。たとえば、ＳＰＵ−０からＳＰＵ−７が作成した下位ブロック番号用のＢｌｋＮｏとＬＶＬ−０の要素を順番に抽出し、ＢｌｋＮｏ’とＬＶＬ’の先頭から順に格納すると、下位ブロック番号用のＢｌｋＮｏ’及びＬＶＬ’が得られる。同様に、ＳＰＵ−０からＳＰＵ−７が作成した上位ブロック番号用のＢｌｋＮｏとＬＶＬ−１の要素を順番に抽出し、ＢｌｋＮｏ’とＬＶＬ’の先頭から順に格納すると、上位ブロック番号用のＢｌｋＮｏ’及びＬＶＬ’が得られる。この結合処理は、たとえば、制御ユニットによって実行してもよく、或いは、演算ユニットによって実行してもよい。このようにして得られた、下位ブロック番号用のＢｌｋＮｏ’及びＬＶＬ’と、上位ブロック番号用のＢｌｋＮｏ’及びＬＶＬ’は、図２２を参照して説明した、本発明の一実施形態によるブロック間コンパイル処理における分配処理が適用され、ブロック番号毎に項目値指定ポインタ配列が得られる。 FIG. 25 is a diagram illustrating a result of the block grouping process in the inter-block compilation process according to an alternative embodiment of the present invention. For example, if BlkNo and LVL-0 elements for lower block numbers created by SPU-0 to SPU-7 are extracted in order and stored sequentially from the beginning of BlkNo ′ and LVL ′, BlkNo ′ for lower block numbers and LVL ′ is obtained. Similarly, if BlkNo and LVL-1 elements for higher block numbers created by SPU-0 to SPU-7 are extracted in order and stored in order from the head of BlkNo ′ and LVL ′, BlkNo ′ for higher block numbers And LVL ′. This combining process may be executed by, for example, the control unit or may be executed by the arithmetic unit. The BlkNo ′ and LVL ′ for the lower block number and the BlkNo ′ and LVL ′ for the upper block number obtained in this way are the blocks according to the embodiment of the present invention described with reference to FIG. Distribution processing in the compilation processing is applied, and an item value designation pointer array is obtained for each block number.

本発明は、以上の実施の形態に限定されることなく、特許請求の範囲に記載された発明の範囲内で、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above embodiments, and various modifications can be made within the scope of the invention described in the claims, and these are also included in the scope of the present invention. Needless to say.

Claims

Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
In a multi-core processing apparatus comprising:
It is represented as an array of records including item values corresponding to data items, and is a method of constructing tabular data that is shared and operated by the plurality of arithmetic units in the global memory,
A block number array in which the control unit divides the record into blocks including records in charge of each arithmetic unit and stores block numbers corresponding to the records in the order of the original record position numbers in the tabular data. Creating and storing in the global memory;
The plurality of arithmetic units operate in parallel to create a record sequence number array that stores the original record position numbers of the assigned records in the order of record sequence numbers on the local memory in each arithmetic unit, Transferring to the global memory;
Item value access information arrays for storing item value access information for accessing the item values included in the record in charge in the order of the record sequence numbers when the plurality of operation units operate in parallel. Creating on the local memory and transferring to the global memory;
The item values are stored in each operation unit for each data item so that the plurality of operation units operate in parallel and the item values included in the assigned record are accessed using the item value access information. Expanding on the local memory and transferring the expanded item value to the global memory;
A method comprising:

The control unit refers to the block number array on the global memory to determine a block number of a block including a predetermined record and the arithmetic unit responsible for the predetermined record;
The control unit notifying the determined arithmetic unit of a record sequence number of the predetermined record;
The arithmetic unit notified of the record sequence number transfers the record sequence number array and the item value access information array related to the record in charge of the arithmetic unit from the global memory to the local memory of the arithmetic unit; ,
The arithmetic unit notified of the record sequence number specifies the position where the notified record sequence number is stored in the transferred record sequence number array; and
The arithmetic unit notified of the record sequence number specifies the item value access information specified by the specified position in the transferred item value access information array;
The arithmetic unit notified of the record sequence number acquires the item value specified by the specified item value access information from the global memory for each data item, and uses the acquired item value as the global value. Transferring to memory;
Further comprising
The method of claim 1.

The step of operating the plurality of arithmetic units in parallel, expanding the item value for each data item in the local memory in each arithmetic unit, and storing the expanded item value in the global memory,
The plurality of arithmetic units operate in parallel to transfer, for each data item, the item values included in a single block from the global memory to the local memory, and items included in the single block. Items included in the assigned record in the order of the local field value work array for storing unique values among the values in a predetermined order, and the original record position number of the assigned record included in the single block Creating a local item value number array in the local memory for storing a local item value number specifying a position where a value is stored in the local item value work array, and transferring the local item value number array to the global memory;
The plurality of arithmetic units operate in parallel and store, for each data item, the block number corresponding to a unique value among the item values included in the block, related to a pair of blocks. Block number work array, local item value work array, and local item value designation pointer work array for storing a pointer for designating a position where the item value included in the block is stored in the local item value work array A pair consisting of a further block number work array, a further local item value work array, and a further local item value specification pointer work array associated with the block into which the pair of blocks is merged. Performing a merge process to create
The plurality of arithmetic units operate in a parallel and hierarchical manner, and the above merge processing is repeated until each data item is merged into one final block. And transferring the final local field value work array and the final local field value specification pointer work array to the global memory,
The plurality of arithmetic units operate in parallel, and for each data item, an element in the final local item value designation pointer work array is designated by a corresponding element in the final block number work array. By distributing each block number and arranging in a predetermined order, the item value represented by the local item value number matches the global item value array storing the item values in a predetermined order. Creating an item value specification pointer array for storing a pointer for specifying a position stored in a local item value work array and transferring the pointer to the global memory;
including,
The method of claim 1.

Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
With
A multi-core processing device that is represented as an array of records including item values corresponding to data items and constructs tabular data that is shared and operated by the plurality of arithmetic units in the global memory,
A block number array in which the control unit divides the record into blocks including records in charge of each arithmetic unit and stores block numbers corresponding to the records in the order of the original record position numbers in the tabular data. Including means for creating and storing in the global memory;
Each arithmetic unit is
Operate in parallel with other arithmetic units, create a record sequence number array in the local memory in each arithmetic unit to store the original record position number of the record in charge in the order of the record sequence number, and Means for transferring to memory;
An item value access information array that operates in parallel with other arithmetic units and stores the item value access information for accessing the item values included in the record in charge in the order of the record sequence numbers. Means for creating on local memory and transferring to the global memory;
Operate in parallel with other arithmetic units, so that the item values included in the record in charge are accessed using the item value access information, the item values for each data item are Means for expanding on a local memory and transferring the expanded item value to the global memory;
including,
Multi-core processing equipment.

The control unit is
Means for determining a block number of a block including a predetermined record and the arithmetic unit in charge of the predetermined record with reference to the block number array on the global memory;
Means for notifying the determined arithmetic unit of the record sequence number of the predetermined record;
Further including
The arithmetic unit notified of the record sequence number is
Means for transferring the record sequence number array and the item value access information array relating to the record in charge of the arithmetic unit from the global memory to the local memory of the arithmetic unit;
Means for identifying the position where the notified record sequence number is stored in the transferred record sequence number array;
Means for specifying item value access information specified by the specified position in the transferred item value access information array;
Means for acquiring the item value specified by the specified item value access information from the global memory for each data item, and transferring the acquired item value to the global memory;
Further including
The multi-core type processing apparatus according to claim 4.

Each arithmetic unit is
Means for operating in parallel with other arithmetic units, expanding the item values for each data item in the local memory in each arithmetic unit, and storing the expanded item values in the global memory;
By operating in parallel with other arithmetic units, the item values included in a single block are transferred from the global memory to the local memory for each data item, and the item values included in the single block are transferred. The field values included in the assigned record are in the order of the local field value work array for storing the unique values of them in a predetermined order and the original record position number of the assigned record included in the single block. Means for creating a local item value number array for storing a local item value number for designating a position stored in the local item value work array on the local memory and transferring the local item value number array to the global memory;
A block number that operates in parallel with another arithmetic unit and stores the block number corresponding to a unique value among the item values included in the block, related to a pair of blocks, for each data item A work array, the local item value work array, and a local item value designation pointer work array for storing a pointer for designating a position where the item value included in the block is stored in the local item value work array. From this pair, a pair consisting of a further block number work array, a further local item value work array, and a further local item value specification pointer work array related to the block in which the pair of blocks is merged is created. Means for executing the merge process;
It operates in parallel and hierarchically with other arithmetic units, repeats the merging process until each data item is merged into one final block, and the final block number work array obtained, Means for transferring the final local field value work array and the final local field value specification pointer work array to the global memory;
A block that operates in parallel with another arithmetic unit and for each data item, the element in the final local item value designation pointer work array is designated by the corresponding element in the final block number work array. The final local, wherein the item values represented by the local item value numbers match the global item value array storing the item values in a predetermined order by distributing by number and arranging in a predetermined order Means for creating an item value designation pointer array for storing a pointer for designating a position stored in the item value work array and transferring it to the global memory;
Further including
The multi-core type processing apparatus according to claim 4.

Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
A code that is loaded into a computer comprising: A computer readable program to be executed by the computer,
A block number array in which the control unit divides the record into blocks including records in charge of each arithmetic unit and stores block numbers corresponding to the records in the order of the original record position numbers in the tabular data. Code to create and store in the global memory,
The plurality of arithmetic units operate in parallel to create a record sequence number array that stores the original record position numbers of the assigned records in the order of record sequence numbers on the local memory in each arithmetic unit, A code to be transferred to the global memory,
Item value access information arrays for storing item value access information for accessing the item values included in the record in charge in the order of the record sequence numbers when the plurality of operation units operate in parallel. Code created on the local memory and transferred to the global memory,
The item values are stored in each operation unit for each data item so that the plurality of operation units operate in parallel and the item values included in the assigned record are accessed using the item value access information. A code that expands on the local memory and transfers the expanded item value to the global memory;
Including programs.

The control unit refers to the block number array on the global memory and determines a block number of a block including a predetermined record and the arithmetic unit in charge of the predetermined record;
A code for the control unit to notify the determined arithmetic unit of a record sequence number of the predetermined record;
The arithmetic unit notified of the record sequence number has a code for transferring the record sequence number array and the item value access information array related to the record in charge of the arithmetic unit from the global memory to the local memory of the arithmetic unit. ,
The arithmetic unit that is notified of the record sequence number has a code that specifies the position where the notified record sequence number is stored in the transferred record sequence number array;
The calculation unit notified of the record sequence number specifies the item value access information specified by the specified position in the transferred item value access information array, and
The arithmetic unit notified of the record sequence number acquires the item value specified by the specified item value access information from the global memory for each data item, and uses the acquired item value as the global value. A code to transfer to the memory,
Further including
The program according to claim 7.

The plurality of arithmetic units operate in parallel, expand the item value for each data item in the local memory in each arithmetic unit, and store the expanded item value in the global memory,
The plurality of arithmetic units operate in parallel to transfer, for each data item, the item values included in a single block from the global memory to the local memory, and items included in the single block. Items included in the assigned record in the order of the local field value work array for storing unique values among the values in a predetermined order, and the original record position number of the assigned record included in the single block A code for creating a local item value number array in the local memory for storing a local item value number for specifying a position where a value is stored in the local item value working array, and transferring the local item value number array to the global memory;
The plurality of arithmetic units operate in parallel and store, for each data item, the block number corresponding to a unique value among the item values included in the block, related to a pair of blocks. Block number work array, local item value work array, and local item value designation pointer work array for storing a pointer for designating a position where the item value included in the block is stored in the local item value work array A pair consisting of a further block number work array, a further local item value work array, and a further local item value specification pointer work array associated with the block into which the pair of blocks is merged. Code that performs the merge process to create
The plurality of arithmetic units operate in a parallel and hierarchical manner, and the above merge processing is repeated until each data item is merged into one final block. And a code for transferring the final local field value work array and the final local field value specification pointer work array to the global memory,
The plurality of arithmetic units operate in parallel, and for each data item, an element in the final local item value designation pointer work array is designated by a corresponding element in the final block number work array. By distributing each block number and arranging in a predetermined order, the item value represented by the local item value number matches the global item value array storing the item values in a predetermined order. A code for creating an item value specification pointer array for storing a pointer for specifying a position stored in a local field value work array and transferring it to the global memory;
including,
The program according to claim 7.

Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
A tabular data that is loaded into a computer and is represented as an array of records including item values corresponding to data items and is operated in a shared manner by the plurality of arithmetic units is constructed in the global memory. A computer program product for causing the computer to execute the method according to any one of items 1 to 3.

Multiple arithmetic units including dedicated local memory,
A global memory connected to the plurality of arithmetic units, and
A bus connecting the plurality of arithmetic units;
At least one control unit connected to the global memory and the plurality of arithmetic units;
A tabular data that is loaded into a computer and is represented as an array of records including item values corresponding to data items and is operated in a shared manner by the plurality of arithmetic units is constructed in the global memory. A storage medium storing a computer program for causing the computer to execute the method according to any one of items 3 to 3.