JPWO2019181594A1

JPWO2019181594A1 - Parameter setting units, arithmetic units, their methods, programs, and recording media

Info

Publication number: JPWO2019181594A1
Application number: JP2020508217A
Authority: JP
Inventors: 大五十嵐
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2018-03-19
Filing date: 2019-03-11
Publication date: 2021-03-11
Anticipated expiration: 2039-03-11
Also published as: EP3770889A1; CN111868805A; EP3770889A4; CN111868805B; EP3770889B1; US11907641B2; WO2019181594A1; US20210027009A1; JP7010365B2; AU2019238219B2; AU2019238219A1

Abstract

１個以上のレコードを含み、各レコードが任意長の１個以上のセルを含み、各セルが任意個の文字を含むテキストファイルに対する演算処理を効率的に行う。パラメータ設定装置は、属性情報を入力として１レコード分の文字列のサイズの最大値Ｓ_ｃｓｖおよび最小値ｓ_ｃｓｖ、エンコード情報の合計サイズの最大値Ｓ_ｅｎｃ、エンコード情報に特定の演算を行って得られる演算値の合計サイズの最大値Ｓ_ｓｓ、および参照情報の合計サイズＳ_ｒｅｆを設定し、エンコードおよび演算の処理単位となるレコード数としてＣ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値を得、演算処理における並列数としてｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値を得る。ただし、Ｃはキャッシュメモリサイズ、Ｍはメインメモリサイズ、ｆ_０はｓ_ｃｓｖ・Ｍ／（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））の関数値である。Efficiently performs arithmetic processing on a text file containing one or more records, each record containing one or more cells of arbitrary length, and each cell containing an arbitrary number of characters. Parameter setting apparatus performs maximum value S _csv and minimum value s _csv size of one record of a string attribute information as an _input, the maximum value S _enc total size of the encoding _information, the specific operation on the encoding information obtained set the total size _{S ref} of the maximum value _{S ss,} and reference information of the total size of the calculated value is to obtain a function value of _{_{C / (S csv + S enc}} + S ref) as a record number to be encoded and arithmetic processing units , The function value _{of f 0} / I · r · _Scsv is obtained as the number of parallels in the arithmetic processing. However, C is a function value of the cache memory size, M is the main memory size, _{f 0} is _{_{_{s csv · M / (s csv}}} + S enc + max (S ref, S ss)).

Description

本発明は、テキストファイルの文字列に対する演算技術に関する。 The present invention relates to an arithmetic technique for a character string in a text file.

１個以上のレコードを含み、各レコードが任意長の１個以上のセル（「フィールド」と呼ばれる場合もある）を含み、各セルが任意個の文字を含むテキストファイルの形式が知られている（例えば、非特許文献１等参照）。このようなテキストファイルの各セルに記述された値に対して特定の演算処理（例えば、非特許文献２，３等参照）を並列処理する場合、１つの単位処理で扱われるレコード数および並列数を特定する必要がある。この際、演算処理を効率的に行うためには、当該演算処理を行う演算装置のメインメモリサイズおよびキャッシュメモリサイズのみならず、入力されたテキストファイルの各レコードおよび各セルの位置および長さを考慮しなければならない。 The format of a text file is known that contains one or more records, each record contains one or more cells of arbitrary length (sometimes called a "field"), and each cell contains any number of characters. (See, for example, Non-Patent Document 1 and the like). When a specific arithmetic process (see, for example, Non-Patent Documents 2, 3 and the like) is processed in parallel for the value described in each cell of such a text file, the number of records and the number of parallel processes handled in one unit process. Need to be identified. At this time, in order to efficiently perform the arithmetic processing, not only the main memory size and the cache memory size of the arithmetic unit performing the arithmetic processing, but also the position and length of each record and each cell of the input text file are determined. Must be considered.

Y. Shafranovich, “RFC4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files Status of This Memo,” [online], October, 2005, SolidMatrix Technologies, Inc., [平成３０年１月６日検索]、インターネット＜http://www.ietf.org/rfc/rfc4180.txt＞Y. Shafranovich, “RFC4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files Status of This Memo,” [online], October, 2005, SolidMatrix Technologies, Inc., [Search January 6, 2018 ], Internet <http://www.ietf.org/rfc/rfc4180.txt> 五十嵐大，千田浩司，濱田浩気，高橋克巳，“軽量検証可能３パーティ秘匿関数計算の効率化及びこれを用いたセキュアなデータベース処理 (Secure Database Operations Using An Improved 3-party Veriable Secure Function Evaluation)，”ＩｎＳＣＩＳ２０１１，２０１１．Dai Igarashi, Koji Chida, Hiroki Hamada, Katsumi Takahashi, "Secure Database Operations Using An Improved 3-party Veriable Secure Function Evaluation," InSCIS2011,201. A. Shamir, "How to Share a Secret", Communications of the ACM, November 1979, Volume 22, Number 11, pp.612-613.A. Shamir, "How to Share a Secret", Communications of the ACM, November 1979, Volume 22, Number 11, pp.612-613.

しかしながら、このようなテキストファイルの各セルの長さは任意であり、また当該テキストファイルには各セルの位置および長さを表す情報は含まれていない場合も多い。そのため、各セルの位置や長さを特定するためには、入力されたテキストファイルの文字列を最初から順番に読み込まなければならない。よって、テキストファイルの各レコードおよび各セルの位置および長さを考慮し、１つの単位処理で扱われるレコード数および演算処理における並列数を特定し、効率的な演算処理を行うことは容易ではない。 However, the length of each cell of such a text file is arbitrary, and the text file often does not contain information indicating the position and length of each cell. Therefore, in order to specify the position and length of each cell, the input text file character strings must be read in order from the beginning. Therefore, it is not easy to specify the number of records handled in one unit processing and the number of parallels in the arithmetic processing in consideration of the position and length of each record and each cell in the text file, and perform efficient arithmetic processing. ..

本発明はこのような点に鑑みてなされたものであり、１個以上のレコードを含み、各レコードが任意長の１個以上のセルを含み、各セルが任意個の文字を含むテキストファイルに対する演算処理を効率的に行うことを目的とする。 The present invention has been made in view of these respects, and for a text file containing one or more records, each record containing one or more cells of arbitrary length, and each cell containing any number of characters. The purpose is to perform arithmetic processing efficiently.

上記の課題を解決するために、テキストファイルの文字列に対する演算処理のためのパラメータ設定装置が提供される。ただし、当該テキストファイルはＷ個のレコードを含み、レコードのそれぞれは任意長のＧ個のセルを含み、セルのそれぞれは任意個の文字を含む。ＷおよびＧが１以上の整数であり、Ｇ個のセルは属性情報に対応している。Ｃがキャッシュメモリサイズ、Ｍがメインメモリサイズである。パラメータ設定装置は、最大サイズ設定部と最小サイズ設定部とエンコードサイズ設定部と演算サイズ設定部と参照サイズ設定部と処理単位算出部と並列数算出部とを有する。最大サイズ設定部は当該属性情報を入力としてテキストファイルの１レコード分の文字列のサイズの最大値Ｓ_ｃｓｖを設定する。最小サイズ設定部は当該属性情報を入力として１レコード分の文字列のサイズの最小値ｓ_ｃｓｖを設定する。エンコードサイズ設定部は１レコード分の文字列を所定の有限集合の元にエンコードして得られるエンコード情報の合計サイズの最大値Ｓ_ｅｎｃを設定する。演算サイズ設定部は１レコード分のエンコード情報に特定の演算を行って得られる演算値の合計サイズの最大値Ｓ_ｓｓを設定する。ただし、当該エンコードおよび当該演算はテキストファイルのｒレコード分の文字列である処理単位文字列ごとに実行される処理である。参照サイズ設定部は１レコード分のセルそれぞれの位置および長さを表す参照情報の合計サイズＳ_ｒｅｆを設定する。処理単位算出部はレコード数ｒとしてＣ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値を得る。並列数算出部は演算処理における並列数ｎ_ｐとしてｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値を得る。ただし、Ｉが処理単位文字列ごとに実行されるエンコードおよび演算の繰り返し回数の最大値であり、Ｓ_ｒｅｆ≧Ｓ_ｓｓのときｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ）＝Ｓ_ｒｅｆであり、Ｓ_ｒｅｆ＜Ｓ_ｓｓのときｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ）＝Ｓ_ｓｓであり、ｆ_０がｓ_ｃｓｖ・Ｍ／（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））の関数値である。In order to solve the above problems, a parameter setting device for arithmetic processing on a character string of a text file is provided. However, the text file contains W records, each of the records contains G cells of arbitrary length, and each of the cells contains any number of characters. W and G are integers of 1 or more, and G cells correspond to the attribute information. C is the cache memory size and M is the main memory size. The parameter setting device has a maximum size setting unit, a minimum size setting unit, an encoding size setting unit, a calculation size setting unit, a reference size setting unit, a processing unit calculation unit, and a parallel number calculation unit. _{The maximum size setting unit sets the maximum value Scsv} of the size of the character string for one record of the text file by inputting the attribute information. _{The minimum size setting unit sets the minimum value s csv} of the size of the character string for one record by inputting the attribute information. _{The encoding size setting unit sets the maximum value Sensor} of the total size of the encoding information obtained by encoding the character string for one record based on a predetermined finite set. _{The calculation size setting unit sets the maximum value S ss} of the total size of the calculation values obtained by performing a specific calculation on the encoding information for one record. However, the encoding and the operation are processes executed for each processing unit character string which is a character string for r records of a text file. _{The reference size setting unit sets the total size Sref} of the reference information representing the position and length of each cell for one record. The processing unit calculation unit obtains a function value _{of C / (S csv} + _Senc + _Sref ) as the number of records r. The parallel number calculation unit obtains a function value of f ₀ / Ir r _Scsv _{as the parallel number n p in the arithmetic processing.} However, the maximum value encoding and the number of repetitions of calculation I are performed for each processing unit character _{string, S ref} ≧ _{S ss} when _{_{_{max (S ref, S ss)}}} = a _{_{S ref,} S ref <S} when _{_{_{ss max (S ref, S ss}}} ) a _{= S ss,} is a function value of _{f 0} is _{_{_{s csv · M / (s csv}}} + S enc + max (S ref, S ss)).

以上により、１個以上のレコードを含み、各レコードが任意長の１個以上のセルを含み、各セルが任意個の文字を含むテキストファイルに対する演算処理を効率的に行うことができる。 As described above, it is possible to efficiently perform arithmetic processing on a text file containing one or more records, each record containing one or more cells having an arbitrary length, and each cell containing an arbitrary number of characters.

図１は実施形態の演算システムを例示したブロック図である。FIG. 1 is a block diagram illustrating an arithmetic system of an embodiment. 図２は実施形態のパラメータ設定装置の機能構成を例示したブロック図である。FIG. 2 is a block diagram illustrating the functional configuration of the parameter setting device of the embodiment. 図３は実施形態のサーバ装置の機能構成を例示したブロック図である。FIG. 3 is a block diagram illustrating the functional configuration of the server device of the embodiment. 図４は実施形態の処理部の機能構成を例示したブロック図である。FIG. 4 is a block diagram illustrating the functional configuration of the processing unit of the embodiment. 図５は実施形態のパラメータ設定処理を例示するためのフロー図である。FIG. 5 is a flow chart for exemplifying the parameter setting process of the embodiment. 図６は実施形態の演算処理を例示するためのフロー図である。FIG. 6 is a flow chart for exemplifying the arithmetic processing of the embodiment. 図７は実施形態のスレッドｉの処理を例示するためのフロー図である。FIG. 7 is a flow chart for exemplifying the processing of the thread i of the embodiment. 図８は実施形態の各スレッドの処理を例示するための概念図である。FIG. 8 is a conceptual diagram for exemplifying the processing of each thread of the embodiment. 図９は実施形態のテキストファイルを例示するための概念図である。FIG. 9 is a conceptual diagram for exemplifying a text file of an embodiment. 図１０は実施形態のテキストファイルを例示するための概念図である。FIG. 10 is a conceptual diagram for exemplifying a text file of an embodiment. 図１１は実施形態のテキストファイルを例示するための概念図である。FIG. 11 is a conceptual diagram for exemplifying a text file of an embodiment. 図１２は実施形態のテキストファイルを例示するための概念図である。FIG. 12 is a conceptual diagram for exemplifying a text file of an embodiment. 図１３は実施形態のスレッドｉの処理を例示するためのフロー図である。FIG. 13 is a flow chart for exemplifying the processing of the thread i of the embodiment. 図１４は実施形態の各スレッドの処理を例示するための概念図である。FIG. 14 is a conceptual diagram for exemplifying the processing of each thread of the embodiment. 図１５は実施形態のスレッドｉの処理を例示するための概念図である。FIG. 15 is a conceptual diagram for exemplifying the processing of the thread i of the embodiment. 図１６は実施形態のスレッドｉの処理を例示するための概念図である。FIG. 16 is a conceptual diagram for exemplifying the processing of the thread i of the embodiment. 図１７は実施形態のスレッドｉの処理を例示するための概念図である。FIG. 17 is a conceptual diagram for exemplifying the processing of the thread i of the embodiment. 図１８は実施形態のスレッドｉの処理を例示するための概念図である。FIG. 18 is a conceptual diagram for exemplifying the processing of the thread i of the embodiment.

以下、本発明の実施形態を説明する。
［概要］
まず概要を説明する。
＜テキストファイル＞
各実施形態ではテキストファイルの文字列に対する演算処理を行う。このテキストファイルはＷ個のレコードを含み、レコードのそれぞれは任意長のＧ個のセルを含み、セルのそれぞれは任意個の文字を含む。ただし、各セルの長さには各セルの属性に応じた上限がある。ＷおよびＧが１以上の整数である。例えば、ＷおよびＧの少なくとも一方は２以上の整数である。Ｗが２以上の整数であってもよいし、Ｇが２以上の整数であってもよいし、ＷおよびＧの両方が２以上の整数であってもよい。Ｗが２以上の整数である場合、互いに隣接するレコードの間にはレコードの区切りを特定するための情報が存在する。例えば、互いに隣接するレコードの間に改行が存在し、複数のレコードは改行によって互いに区切られている。また、Ｇが２以上の整数である場合、互いに隣接するセルの間にはセル間の区切りを特定するための情報が存在する。例えば、互いに隣接するセルの間に区切り文字または改行が存在し、複数のセルは区切り文字または改行によって互いに区切られている。区切り文字の例はカンマ「，」である。その他の例として、互いに隣接するセルの間にタブまたは改行が存在してもよいし、互いに隣接するセルの間に半角スペースまたは改行が存在してもよい。Ｗが２以上の整数である場合、各レコードに含まれるセルの個数Ｇは互いに同一である。各レコードのＧ個のセルは属性情報（「スキーマ」とも呼ぶ）に対応している。属性情報は各セルがどのような属性の情報であるかを表しており、少なくとも各セルで表される文字列のサイズ（データ量）の最大値と最小値とを特定または推定するための情報を含んでいる。例えば、属性情報はセルがどのような有限集合の元を表しているのか示す情報を含んでいる。例えば、属性情報は「セルがｐを法とした剰余（ｍｏｄｐ）を表していること（ｐは正整数）」を表していてもよいし、「セルが所定個（例えば１０個）の所定の有限体（例えば拡大体ＧＦ（２^８））の要素で表現される文字列であること」を表していてもよいし、「セルが所定の整数型の整数（例えば、符号付き３２ビット整数）を表す文字列であること」を表していてもよい。Ｇ個の属性情報のそれぞれが各レコードのＧ個のセルのそれぞれに一対一で対応していてもよいし（すなわち、１個の属性情報が１個のセルの属性を表していてもよい）、１個の属性情報が各レコードの複数個（例えばＧ個）のセルに対応していてもよい（すなわち、１個の属性情報が複数個のセルの属性を表していてもよい）。前者の場合、１つのレコードに属する複数のセルの属性が互いに異なっていてもよいし、互いに同一であってもよい。また、Ｗが２以上の整数である場合、すべてのレコードのＧ個のセルに対応する「Ｇ個の属性の組」は互いに同一である。すなわち、すべてのレコードが有するｇ番目（ただし、ｇ＝１，…，Ｇ）のセルの属性ａｔｔ（ｇ）は互いに同一である。その他、属性情報がセルが表す情報の種別を表現していてもよい。また属性情報はテキストファイルに含まれていてもよいし（例えば、テキストファイルのヘッダが属性情報あってもよい）、含まれていなくてもよい。テキストファイルの例は、ＣＳＶ（Comma-Separated Values）ファイル、ＴＳＶ（tab-separated values）ファイル、ＳＳＶ（space-separated values）などである。これらはＣＳＶ（character-separated values）ファイルやＤＳＶ（delimiter-separated values）ファイルとして総称される。Hereinafter, embodiments of the present invention will be described.
[Overview]
First, an outline will be described.
<Text file>
In each embodiment, arithmetic processing is performed on the character string of the text file. This text file contains W records, each of which contains G cells of arbitrary length, and each of the cells contains any number of characters. However, the length of each cell has an upper limit according to the attribute of each cell. W and G are integers greater than or equal to 1. For example, at least one of W and G is an integer greater than or equal to 2. W may be an integer of 2 or more, G may be an integer of 2 or more, and both W and G may be an integer of 2 or more. When W is an integer of 2 or more, there is information for specifying the record delimiter between the records adjacent to each other. For example, there is a line break between records adjacent to each other, and a plurality of records are separated from each other by a line break. Further, when G is an integer of 2 or more, there is information for specifying the delimiter between cells between cells adjacent to each other. For example, there is a delimiter or newline between cells adjacent to each other, and multiple cells are separated from each other by the delimiter or newline. An example of a delimiter is the comma ",". As another example, there may be tabs or line breaks between cells adjacent to each other, and half-width spaces or line breaks may exist between cells adjacent to each other. When W is an integer of 2 or more, the number G of cells included in each record is the same as each other. The G cells of each record correspond to the attribute information (also called "schema"). The attribute information represents what kind of attribute information each cell has, and is information for specifying or estimating at least the maximum value and the minimum value of the size (data amount) of the character string represented by each cell. Includes. For example, attribute information contains information indicating what kind of finite set of elements a cell represents. For example, the attribute information may represent "a cell represents a remainder (mod p) modulo p (p is a positive integer)" or "a predetermined number of cells (for example, 10)". it may represent "things is an element string represented by the finite field (for example, extension field GF (2 ^8)), the integer" cell is a predetermined integer (e.g., signed 32-bit integer ) Is a character string. ” Each of the G attribute information may have a one-to-one correspondence with each of the G cells of each record (that is, one attribute information may represent the attribute of one cell). One attribute information may correspond to a plurality of cells (for example, G cells) of each record (that is, one attribute information may represent an attribute of a plurality of cells). In the former case, the attributes of a plurality of cells belonging to one record may be different from each other or may be the same as each other. When W is an integer of 2 or more, the "set of G attributes" corresponding to G cells of all records are the same as each other. That is, the attribute att (g) of the gth cell (where g = 1, ..., G) possessed by all the records is the same as each other. In addition, the attribute information may represent the type of information represented by the cell. Further, the attribute information may or may not be included in the text file (for example, the header of the text file may include the attribute information). Examples of text files are CSV (Comma-Separated Values) files, TSV (tab-separated values) files, SSVs (space-separated values), and the like. These are collectively referred to as CSV (character-separated values) files and DSV (delimiter-separated values) files.

＜パラメータ設定装置＞
パラメータ設定装置は、テキストファイルの文字列に対する「演算処理」のためのパラメータを設定して出力する。この「演算処理」はどのようなものであってもよい。「演算処理」の例は秘密分散処理、秘密計算処理（例えば、非特許文献１，２等参照）、暗号化処理、署名生成処理などである。パラメータ設定装置が設定するパラメータは、１つの単位処理で扱われるレコード数および演算処理における並列数である。好ましくは、パラメータ設定装置は、さらにテキストファイルからまとめて読み込まれるデータのファイルバッファサイズも設定する。以下では、テキストファイルの文字列に対する演算処理を行う演算装置のキャッシュメモリのキャッシュメモリサイズ（キャッシュメモリの記憶容量）をＣと表記し、メインメモリのメインメモリサイズ（メインメモリの記憶容量）をＭと表記する。<Parameter setting device>
The parameter setting device sets and outputs parameters for "arithmetic processing" for the character string of the text file. This "arithmetic processing" may be anything. Examples of "arithmetic processing" include secret sharing processing, secret calculation processing (see, for example, Non-Patent Documents 1 and 2), encryption processing, signature generation processing, and the like. The parameters set by the parameter setting device are the number of records handled in one unit processing and the number of parallels in the arithmetic processing. Preferably, the parameter setting device also sets the file buffer size of the data that is collectively read from the text file. In the following, the cache memory size (cache memory storage capacity) of the cache memory of the arithmetic unit that performs arithmetic processing on the character string of the text file is expressed as C, and the main memory size (main memory storage capacity) of the main memory is M. Notated as.

パラメータ設定装置は、最大サイズ設定部と最小サイズ設定部とエンコードサイズ設定部と演算サイズ設定部と参照サイズ設定部と処理単位算出部と並列数算出部とを有する。ファイルバッファサイズも設定される場合、パラメータ設定装置はさらにバッファサイズ算出部も有する。 The parameter setting device has a maximum size setting unit, a minimum size setting unit, an encoding size setting unit, a calculation size setting unit, a reference size setting unit, a processing unit calculation unit, and a parallel number calculation unit. When the file buffer size is also set, the parameter setting device also has a buffer size calculation unit.

最大サイズ設定部は属性情報を入力としてテキストファイルの１レコード分の文字列のサイズの最大値Ｓ_ｃｓｖを設定して出力する。最大値Ｓ_ｃｓｖはテキストファイルの各レコードの文字列のサイズを大きめに見積もったレコードサイズである。すなわち、属性情報が表す各セルのサイズの最大値（またはその推定値）を１レコード分合計したものが最大値Ｓ_ｃｓｖである。前述のように、属性情報は各セルで表される文字列のサイズの最大値を特定または推定するための情報を含み、最大サイズ設定部はこの情報を用いて最大値Ｓ_ｃｓｖを設定する。例えば、属性情報が「セルが符号付き３２ビット整数を表す文字列であること」を表す場合、当該セルのサイズの最大値は１１バイト（符号のための１バイト＋１１桁の整数のための１０バイト）である。The maximum size setting unit sets and outputs _{the maximum value Scsv} of the size of the character string for one record of the text file by inputting the attribute information. The maximum value S _csv is a record size in which the size of the character string of each record of the text file is overestimated. _{That is, the maximum value Scsv} is the sum of the maximum value (or its estimated value) of the size of each cell represented by the attribute information for one record. As described above, the attribute information includes information for specifying or estimating the maximum value of the size of the character string represented by each cell, and the maximum size setting unit sets the maximum value S _csv using this information. For example, if the attribute information indicates that the cell is a string representing a signed 32-bit integer, the maximum size of the cell is 11 bytes (1 byte for sign + 10 for 11 digit integer). Byte).

最小サイズ設定部は属性情報を入力として１レコード分の文字列のサイズの最小値ｓ_ｃｓｖを設定して出力する。最小値ｓ_ｃｓｖはテキストファイルの各レコードの文字列のサイズを小さめに見積もったレコードサイズである。すなわち、属性情報が表す各セルのサイズの最小値（またはその推定値）を１レコード分合計したものが最小値ｓ_ｃｓｖである。前述のように、属性情報は各セルで表される文字列のサイズの最小値を特定または推定するための情報を含み、最小サイズ設定部はこの情報を用いて最小値ｓ_ｃｓｖを設定する。例えば、属性情報が「セルが符号付き３２ビット整数を表す文字列であること」を表す場合、当該セルのサイズの最小値は１バイトである。The minimum size setting unit sets and outputs _{the minimum value s csv} of the size of the character string for one record by inputting the attribute information. The minimum value s _csv is the record size in which the size of the character string of each record in the text file is underestimated. _{That is, the minimum value s csv} is the sum of the minimum value (or its estimated value) of the size of each cell represented by the attribute information for one record. As described above, the attribute information includes information for specifying or estimating the minimum value of the size of the character string represented by each cell, and the minimum size setting unit sets the minimum value s _csv using this information. For example, when the attribute information indicates "the cell is a character string representing a signed 32-bit integer", the minimum value of the size of the cell is 1 byte.

エンコードサイズ設定部は１レコード分の文字列を所定の有限集合の元にエンコード（変換）して得られるエンコード情報の合計サイズの最大値Ｓ_ｅｎｃを設定して出力する。最大値Ｓ_ｅｎｃは１レコード分のエンコード情報の合計サイズを大きめに見積もったレコードサイズである。エンコード情報が属する「所定の有限集合」の例は、ｐを法とした剰余（ｍｏｄｐ）で表される有限集合、所定ビットで表現される値の有限集合、所定のビット数の所定の整数型の整数で表現される有限集合などである。この「所定の有限集合」は次に述べる「演算」の内容に応じて予め定められている。最大値Ｓ_ｅｎｃは、例えば、エンコード情報が属する所定の有限集合および属性情報から特定される。例えば、属性情報が「セルが２^６１を法とした剰余（ｍｏｄ２^６１）を表していること」を表しており、このセルの文字列が２^６１を法とした剰余（ｍｏｄ２^６１）で表現されるエンコード情報にエンコードされる場合、当該セルに対応するエンコード情報の最大サイズは８バイトになる。エンコード情報が属する所定の有限集合は、例えば、予め定められている。The encoding size setting unit sets and outputs _{the maximum value Sensor} of the total size of the encoding information obtained by encoding (converting) the character string for one record based on a predetermined finite set. The maximum value _Sec is a record size in which the total size of the encoding information for one record is overestimated. Examples of a "predetermined finite set" to which the encoding information belongs are a finite set represented by a remainder (mod p) modulo p, a finite set of values represented by a predetermined bit, and a predetermined integer with a predetermined number of bits. A finite set represented by an integer of type. This "predetermined finite set" is predetermined according to the content of the "operation" described below. The maximum value _Sec is specified from, for example, a predetermined finite set to which the encoding information belongs and attribute information. For example, the attribute information represents a "that represents a remainder cells modulo ^{^{2 61}} (mod ^{2 61)",} in the remainder of the string of this cell modulo ^{^{2 61}} (mod ^{2 61)} When encoded in the expressed encoding information, the maximum size of the encoding information corresponding to the cell is 8 bytes. The predetermined finite set to which the encoding information belongs is, for example, predetermined.

演算サイズ設定部は１レコード分のエンコード情報に特定の「演算」を行って得られる演算値の合計サイズの最大値Ｓ_ｓｓを設定して出力する。最大値Ｓ_ｓｓは１レコード分の演算値の合計サイズを大きめに見積もったレコードサイズである。この「演算」の例は秘密分散、秘密計算、暗号化、署名生成などである。「演算」は各セルを披演算子として行われるものであってもよいし、複数のセルを披演算子として行われるものであってもよい。最大値Ｓ_ｓｓは、例えば、エンコード情報が属する所定の有限集合、「演算」の内容、および属性情報から特定される。例えば、属性情報が「セルが２^６１を法とした剰余（ｍｏｄ２^６１）を表していること」を表しており、このセルの文字列が２^６１を法とした剰余（ｍｏｄ２^６１）で表現されるエンコード情報にエンコードされており、「演算」が各セルの値をShamir秘密分散方式（例えば、非特許文献３）でＮ（ただし、Ｎは正整数）個のパーティに秘密分散するものである場合、当該セルに対応する演算値の最大サイズは８Ｎバイトになる。The calculation size setting unit sets and outputs _{the maximum value S ss} of the total size of the calculation values obtained by performing a specific "calculation" on the encoding information for one record. The maximum value S _ss is a record size in which the total size of the calculated values for one record is overestimated. Examples of this "calculation" are secret sharing, secret calculation, encryption, signature generation, and the like. The "operation" may be performed with each cell as a show operator, or may be performed with a plurality of cells as a show operator. The maximum value S _ss is specified from, for example, a predetermined finite set to which the encoding information belongs, the content of the "operation", and the attribute information. For example, the attribute information represents a "that represents a remainder cells modulo ^{^{2 61}} (mod ^{2 61)",} in the remainder of the string of this cell modulo ^{^{2 61}} (mod ^{2 61)} It is encoded in the encoded information to be expressed, and the "operation" secretly shares the value of each cell to N (however, N is a positive integer) party by the Shamir secret sharing method (for example, Non-Patent Document 3). If, the maximum size of the calculated value corresponding to the cell is 8 Nbytes.

参照サイズ設定部はテキストファイル内の１レコード分のセルそれぞれの位置および長さを表す参照情報の合計サイズＳ_ｒｅｆを設定して出力する。「セルの位置」は、例えばセルの先頭文字の位置であってもよいし、セルの最終文字の位置であってもよいし、その他のセル内の文字の位置であってもよい。「セルの位置を表す情報」は、例えば、テキストファイルの文字列の先頭の文字から「セルの位置」の文字までの文字数であってもよいし、この文字数の関数値であってもよい。「セルの長さを表す情報」は、例えば、セルの文字数であってもよいし、この文字数の関数値であってもよい。参照情報は例えば属性情報から特定される。属性情報によって１レコードに属するセルの個数を特定できるからである。各セルの位置および長さを表すために必要なデータサイズはそれを表現する形式によって定まる。例えば、各セルの位置および長さを符号なし６４ビット整数で表す場合、各セルの参照情報は１６バイトになる。The reference size setting unit sets and outputs _{the total size Sref} of the reference information indicating the position and length of each cell for one record in the text file. The "cell position" may be, for example, the position of the first character of the cell, the position of the last character of the cell, or the position of a character in another cell. The "information representing the cell position" may be, for example, the number of characters from the first character of the character string of the text file to the character of the "cell position", or may be a function value of this number of characters. The "information representing the cell length" may be, for example, the number of characters in the cell or a function value of this number of characters. Reference information is specified from, for example, attribute information. This is because the number of cells belonging to one record can be specified by the attribute information. The data size required to represent the position and length of each cell is determined by the format in which it is represented. For example, when the position and length of each cell are represented by an unsigned 64-bit integer, the reference information of each cell is 16 bytes.

上述した「エンコード」および「演算」はテキストファイルのｒレコード分の文字列である処理単位文字列ごとに実行される。処理単位文字列ごとに実行される処理を「単位処理」と呼ぶことにする。処理単位算出部は１回の単位処理で処理されるレコード数ｒを表す「Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値」を得て（レコード数ｒとしてＣ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値を得て）出力する。キャッシュメモリサイズＣは、予め定められたものであってもよいし、入力されたものであってもよい。最大値Ｓ_ｃｓｖは最大サイズ設定部で得られたものであり、最大値Ｓ_ｅｎｃはエンコードサイズ設定部で得られたものであり、合計サイズＳ_ｒｅｆは参照サイズ設定部で得られたものである。「αの関数値」はαそのものであってもよいし、αに対応するその他の値であってもよい。「αの関数値」の例は、α以上の最小の整数、α以下の最大の整数、αに最も近い整数などである。例えば、r=C/(S_csv+S_enc+S_ref)であってもよいし、r=ROUNDUP(C/(S_csv+S_enc+S_ref))であってもよいし、r=ROUNDDOWN(C/(S_csv+S_enc+S_ref))であってもよいし、r=ROUND(C/(S_csv+S_enc+S_ref))であってもよい。ただし、ROUNDUP(α)はαを整数値に切り上げる切り上げ関数であり、ROUNDDOWN(α)はαを整数値に切り捨てる切り捨て関数であり、ROUND(α)はαをαに最も近い関数に丸める丸め整数である。ここで、Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆは、処理部が、テキストファイルから１レコード分の文字列を読み込み、参照情報を参照しながら、エンコード情報にエンコードして秘密分散などの「演算」を行うまでの処理（以下、「１レコード分の一連の処理」という）のために必要なメモリサイズを表す。このメモリサイズがキャッシュメモリサイズ以下であれば、途中でメインメモリからデータを読み込むことなく高速に１レコード分の一連の処理を実行できる。Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）は、何回分の「１レコード分の一連の処理」に必要なメモリサイズ（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）をキャッシュメモリに確保できるかを表すものである。Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）に対応するレコード数ｒの文字列を処理単位文字列とすることで、ｒレコード分の処理を行う際のメインメモリへのアクセス回数を削減し、高速に演算を行うことができる。The above-mentioned "encoding" and "operation" are executed for each processing unit character string which is a character string for r records of a text file. Processing The processing executed for each unit character string is called "unit processing". The processing unit calculating section represents the record number r to be processed in unit process once to obtain the _"C / function value of _{_{(S csv + S enc + S}} ref) " _{(C /} _(S csv ₊ S as a record number r _enc + S Obtain the function value of _{ref) and output it.} The cache memory size C may be a predetermined one or may be an input one. The maximum value S _csv is obtained by the maximum size setting unit, the maximum value _Senc is obtained by the encoding size setting unit, and the total size S _ref is obtained by the reference size setting unit. .. The "function value of α" may be α itself or another value corresponding to α. Examples of "function values of α" are the smallest integer greater than or equal to α, the largest integer less than or equal to α, and the integer closest to α. For example, r = C / (S _csv + S _enc + S _ref ), r = ROUNDUP (C / (S _csv + S _enc + S _ref )), or r = ROUNDDOWN It may be (C / (S _csv + S _enc + S _ref )) or r = ROUND (C / (S _csv + S _enc + S _ref )). However, ROUNDUP (α) is a round-up function that rounds up α to an integer value, ROUNDDOWN (α) is a round-down function that rounds down α to an integer value, and ROUND (α) is a rounded integer that rounds α to the function closest to α. Is. Here, in S _csv + _Senc + _Sref , until the processing unit reads the character string for one record from the text file, encodes it into the encoded information while referring to the reference information, and performs "calculation" such as secret sharing. Represents the memory size required for the processing (hereinafter referred to as "a series of processing for one record"). If this memory size is less than or equal to the cache memory size, a series of processes for one record can be executed at high speed without reading data from the main memory on the way. C / ( _Scsv + _Senc + _Sref _{) indicates how many times the memory size (Scsv} + _Senc + _Sref ) required for "a series of processing for one record" can be secured in the cache memory. By using the character string of the number of records r corresponding to C / (S _csv + _Senc + _Sref ) as the processing unit character string, the number of accesses to the main memory when processing r records is reduced and the speed is increased. Can perform operations.

並列数算出部は演算処理における並列数ｎ_ｐを表す「ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値」を得て（並列数ｎ_ｐとしてｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値を得て）出力する。例えば、n_p=f₀/I・r・S_csvであってもよいし、n_p=ROUNDUP(f₀/I・r・S_csv)であってもよいし、n_p=ROUNDDOWN(f₀/I・r・S_csv)であってもよいし、n_p=ROUND(f₀/I・r・S_csv)であってもよい。ｆ_０はｓ_ｃｓｖ・Ｍ／（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））の関数値である。例えば、f₀=s_csv・M/(s_csv+S_enc+max(S_ref,S_ss))であってもよいし、f₀=ROUNDUP(s_csv・M/(s_csv+S_enc+max(S_ref,S_ss)))であってもよいし、ｆ_０=ROUNDDOWN(s_csv・M/(s_csv+S_enc+max(S_ref,S_ss)))であってもよいし、ｆ_０=ROUND(s_csv・M/(s_csv+S_enc+max(S_ref,S_ss)))であってもよい。ただし、Ｓ_ｒｅｆ≧Ｓ_ｓｓのときｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ）＝Ｓ_ｒｅｆであり、Ｓ_ｒｅｆ＜Ｓ_ｓｓのときｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ）＝Ｓ_ｓｓである。Ｉは処理単位文字列ごとに実行される「エンコード」および「演算」の繰り返し回数の最大値である。例えば、Ｉは処理単位文字列ごとに実行される「エンコード」および「演算」の繰り返し回数である。メインメモリサイズＭは、予め定められたものであってもよいし、入力されたものであってもよい。最小値ｓ_ｃｓｖは最小サイズ設定部で設定されたものであり、最大値Ｓ_ｅｎｃはエンコードサイズ設定部で設定されたものであり、最大値Ｓ_ｓｓは演算サイズ設定部で設定されたものであり、合計サイズＳ_ｒｅｆは参照サイズ設定部で設定されたものであり、繰り返し回数の最大値Ｉは予め定められたものである。好ましくは、繰り返し回数の最大値Ｉは、ｒレコード分の文字列である処理単位文字列の「エンコード」および「演算」を行うための合計処理量に対する、前処理の合計処理量（演算数の合計）の比率が所定値以下となるように定められている。ｒは処理単位算出部で得られたものであってもよいし、Ｃ，Ｓ_ｃｓｖ，Ｓ_ｅｎｃ，Ｓ_ｒｅｆから得られたものであってもよい。すなわち、ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値が得られるのであれば、必ずしもｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値の生成に処理単位算出部で得られたｒが用いられなくてもよい。ここで、Ｘ＝（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））／ｓ_ｃｓｖは、１レコード分の一連の処理に必要なメモリサイズが、テキストファイルから読み込まれた１レコード分の文字列のメモリサイズの最大何倍であるかを表している。そのため、Ｓ_ｃｓｖ・Ｘは１レコード分の一連の処理に必要なメモリサイズの最大値を表し、Ｉ・ｒ・Ｓ_ｃｓｖ・Ｘはｒレコード分の一連の処理をＩ回繰り返すために必要なメモリサイズの最大値を表す。ｆ_０はｓ_ｃｓｖ・Ｍ／（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））＝Ｍ／Ｘの関数値であるため、ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖは、メインメモリサイズＭが「ｒレコード分の一連の処理（「１レコード分の一連の処理」をｒレコード分行う処理）」をＩ回繰り返す処理に必要なメモリサイズの何倍であるか、を表す。そのため、ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖに対応する値を並列数ｎ_ｐとすることでメインメモリのバッファオーバーフローを抑制できる。なお、エンコードの際には参照情報が必要となるため、Ｓ_ｒｅｆの領域をメインメモリに確保しておく必要がある。一方、エンコード後の秘密分散などの「演算」の際には参照情報は必要ないが、得られた演算値を格納するＳ_ｓｓの領域をメインメモリに確保する必要がある。すなわち、Ｓ_ｒｅｆおよびＳ_ｓｓの両方の領域が同時に必要となることはない。ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ）の領域をメインメモリに確保できれば十分である。The parallel number calculation unit obtains the "function value of _{f 0} / I · r · S _csv " representing the _{parallel number n p} in the arithmetic processing (the function value of f ₀ / I · r · S _csv _{as the parallel number n p} ). Get) output. For example, it may be a _{_{n p = f 0 / I ·}} r · S csv, may be a _{_{n p = ROUNDUP (f 0 /}} I · r · S csv), n p = ROUNDDOWN (f 0 / I · r · _Scsv ) or n _p = ROUND (f ₀ / I · r · _Scsv ). f ₀ is a function value of _{s csv} · M / (s _csv + _Senc + max ( _Sref , _Sss)). For example, f ₀ = s _csv · M / (s _csv + S _enc + max (S _ref , S _ss )) or f ₀ = ROUNDUP (s _csv · M / (s _csv + S _enc +) max (S _ref , S _ss ))) or f ₀ = ROUNDDOWN (s _csv · M / (s _csv + S _enc + max (S _ref , S _ss )))) , F ₀ = ROUND (s _csv · M / (s _csv + S _enc + max (S _ref , S _ss ))). _However, when the _{_{_{_{S ref ≧ S ss max (S}}}} ref, S ss) a _{= S _ref, S ref} _{<When _{_{S ss max (S ref, S}}} ss) = a _{S ss.} I is the maximum number of repetitions of "encoding" and "operation" executed for each processing unit character string. For example, I is the number of repetitions of "encoding" and "operation" executed for each processing unit character string. The main memory size M may be a predetermined one or may be an input one. The minimum value s _csv is set by the minimum size setting unit, the maximum value _Sec is set by the encoding size setting unit, and the maximum value S _ss is set by the calculation size setting unit. , The total size _Sref is set by the reference size setting unit, and the maximum value I of the number of repetitions is predetermined. Preferably, the maximum value I of the number of repetitions is the total processing amount of preprocessing (of the number of operations) with respect to the total processing amount for performing "encoding" and "calculation" of the processing unit character string which is a character string for r records. The ratio of total) is set to be less than or equal to the predetermined value. r may be obtained from the processing unit calculation unit, or may be obtained from _{C, Scsv} , _Senc , and _Sref. That _is, if the function value of _{f 0 / I · r · S} csv is obtained, not necessarily _{_{f 0 / I · r · S}} r obtained by the processing unit calculating unit to generate the _csv function value is used You may. Here, X = (s _csv + _Senc + max ( _Sref , S _ss )) / s _csv is a character string for one record read from a text file, in which the memory size required for a series of processing for one record is Indicates the maximum number of times the memory size of. Therefore, S _csv / X represents the maximum value of the memory size required for a series of processing for one record, and I / r / S _csv / X is the memory required for repeating the series of processing for r records I times. Represents the maximum size. f ₀ is _{_{_{s csv · M / (s csv}}} + S enc + max (S ref, S ss)) for = is a function value of _{M / X, f 0 / I} · r · S csv , the main memory size M is " It indicates how many times the memory size required for the process of repeating a series of processes for r records (a process of performing a series of processes for one record for r records) I times. Therefore, the buffer overflow of the main memory can be suppressed by setting the value corresponding to _{f 0} / I · r · _Scsv _{to the number of parallel np.} Since reference information is required for encoding, it is necessary to secure the _{Sref area in the main memory.} On the other hand, reference information is not required for "operation" such as secret sharing after encoding, but it is necessary to secure an _{S ss area for storing the obtained operation value in the main memory.} That is, _{both S ref} and S _ss regions are not required at the same time. It is sufficient if the max (S _ref , S _ss ) area can be secured in the main memory.

バッファサイズ算出部は、演算処理の際にテキストファイルの文字列からまとめて読み込まれるデータのファイルバッファサイズｆを表す「ｆ_０／ｎ_ｐの関数値」を得て（ファイルバッファサイズｆとしてｆ_０／ｎ_ｐの関数値を得て）出力する。例えば、f=f₀/n_pであってもよいし、f=ROUNDUP(f₀/n_p)であってもよいし、f=ROUNDDOWN(f₀/n_p)であってもよいし、f=ROUND(f₀/n_p)であってもよい。ｆ_０は上述の通りであり、ｎ_ｐは並列数算出部で得られたものであってもよいし、ｆ_０，Ｉ，ｒ，Ｓ_ｃｓｖから得られたものであってもよい。すなわち、ｆ_０／ｎ_ｐの関数値が得られるのであれば、必ずしもｆ_０／ｎ_ｐの関数値の生成に並列数算出部で得られたｎ_ｐが用いられなくてもよい。ここで、ｎ_ｐ＝ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖとするとｆ_０／ｎ_ｐ＝Ｉ・ｒ・Ｓ_ｃｓｖとなる。これは、ｒレコード分の処理をＩ回繰り返すためにテキストファイルから読み込まれる文字列のファイルバッファサイズｆに対応する。ファイルバッファサイズｆが大きいほうがシーケンシャルアクセルに近くなり高速だが、必要なメインメモリの記憶容量が大きくなる。上述のように得られたファイルバッファサイズｆは、予め定められたメインメモリサイズＭの制約の中で高速な処理を実現するものである。 _{The buffer size calculation unit obtains a "function value of f 0} / n _p " representing the file buffer size f of the data collectively read from the character string of the text file during arithmetic processing _{(f 0} as the file buffer size f). (Obtain the function value of _{/ np) and output.} For example, f = f ₀ / n _p , f = ROUNDUP (f ₀ / n _p ), f = ROUNDDOWN (f ₀ / n _p ), or It may be f = ROUND (f ₀ / n _p). f ₀ is as described above, and n _p may be obtained by the parallel number calculation unit or may be obtained from _{f 0} , I, r, _Scsv. That is, if the function value of _{f 0} / n _p _{can be obtained, the n p} obtained by the parallel number calculation unit does not necessarily have to be used for _{the generation of the function value of f 0} / n _p. Here, if n _p = f ₀ / I · r · S _csv , then f ₀ / n _p = I · r · S _csv . This corresponds to the file buffer size f of the character string read from the text file in order to repeat the processing for r records I times. The larger the file buffer size f, the closer to the sequential accelerator and the faster the speed, but the larger the storage capacity of the main memory required. The file buffer size f obtained as described above realizes high-speed processing within the constraints of the predetermined main memory size M.

＜演算装置（一般版）＞
演算装置は入力されたテキストファイルの文字列に対する演算処理を行う。演算装置はこの演算処理でパラメータ設定装置で得られたレコード数ｒおよび並列数ｎ_ｐを用いる。パラメータ設定装置でさらにファイルバッファサイズｆが得られる場合には、演算装置はこの演算処理でさらにパラメータ設定装置で得られたファイルバッファサイズｆを用いる。パラメータ設定装置でファイルバッファサイズｆが得られない場合、演算装置はファイルバッファサイズｆとして予め定められた値を用いてもよいし、属性情報に対応するその他のファイルバッファサイズｆが用いられてもよい。<Arithmetic logic unit (general version)>
The arithmetic unit performs arithmetic processing on the input text file character string. The arithmetic unit uses the number of records r and the number of parallels _np obtained by the parameter setting apparatus in this arithmetic processing. When the parameter setting device further obtains the file buffer size f, the arithmetic unit uses the file buffer size f further obtained by the parameter setting device in this arithmetic processing. If the file buffer size f cannot be obtained by the parameter setting device, the arithmetic unit may use a predetermined value as the file buffer size f, or may use another file buffer size f corresponding to the attribute information. Good.

テキストファイルが秘密分散などの「演算」の対象ではない文字を含む場合がある。例えば、ＣＳＶファイルにおけるカンマはセルの区切りを表す文字であって「演算」の対象ではない。ＣＳＶファイルではセルの文字列がダブルクォーテーション「“」「”」で囲まれる場合があるが、ダブルクォーテーションも「演算」の対象ではない。改行を表す文字（例えば、￥ｎ）なども「演算」の対象ではない（これらのダブルクォーテーション「“」「”」はセルに含まれない）。このような「演算」の対象ではない文字を「特殊文字」と呼ぶことにする。セル内において特殊文字の前にエスケープ文字を付加し、特殊文字を「演算」の対象の文字と扱うことを許す形式もある。このような場合、セル内の各文字単独でその文字が「演算」の対象を表すか否かを判別できず、セル間の区切りを判別できないことがある。例えば、「演算」の対象としてダブルクォーテーション「“」を用いる場合に、エスケープ文字として「“」をさらに付加する形式がある。例えば、セルの「１２３“４５６」という値を「演算」の対象とする場合に、当該セルを“１２３““４５６”と表記する場合がある。このような場合、セル“１２３““４５６”を先頭から順番に読んでいかなければセル間の区切りを判別できない。例えば、後半の“４５６”のみが読み込まれた場合、これが「４５６」を表す１つのセルであるのか、「“４５６」を含む値を表すセルの一部であるのかを判別できない。その他、「演算」の対象として改行を表す文字「￥ｎ」を用いる場合に、エスケープ文字として「￥」をさらに付加する形式もある。例えば、セルの「１２３￥ｎ４５６」という値を「演算」の対象とする場合に、当該セルを“１２３￥￥ｎ４５６”と表記する場合がある。このような場合、セル“１２３￥￥ｎ４５６”を先頭から順番に読んでいかなければセル間の区切りを判別できない。このような場合、テキストファイルの各セルの位置および長さを特定する処理を並列に行うことはできず、この処理をテキストファイルの先頭から順番に行っていかなければならない。 Text files may contain characters that are not subject to "calculations" such as secret sharing. For example, a comma in a CSV file is a character that represents a cell delimiter and is not the target of "calculation". In a CSV file, a cell character string may be enclosed in double quotation marks "" "" "", but double quotation marks are also not subject to "calculation". Characters that represent line breaks (for example, \ n) are also not subject to "calculation" (these double quotation marks "" "" "are not included in the cell). Characters that are not the target of such "calculations" are called "special characters". There is also a format in which an escape character is added before a special character in a cell to allow the special character to be treated as the character to be "operated". In such a case, it may not be possible to determine whether or not each character in the cell represents the target of "calculation", and it may not be possible to determine the delimiter between cells. For example, when the double quotation mark "" "is used as the target of the" operation ", there is a format in which" "" is further added as an escape character. For example, when the value "123" 456 "of a cell is the target of" calculation ", the cell may be described as" 123 "" 456 ". In such a case, the cell "123" and "456" must be read in order from the beginning to determine the division between cells. For example, when only the latter half "456" is read, it cannot be determined whether this is one cell representing "456" or a part of the cell representing a value including "456". In addition, when the character "\ n" representing a line feed is used as the target of "calculation", there is also a format in which "\" is further added as an escape character. For example, when the value "123 \ n456" of a cell is the target of "calculation", the cell may be described as "123 \ \ n456". In such a case, the cell "123 \\ n456" must be read in order from the beginning to determine the division between cells. In such a case, the process of specifying the position and length of each cell of the text file cannot be performed in parallel, and this process must be performed in order from the beginning of the text file.

このような形式のテキストファイルに対応可能な演算装置はメインメモリとキャッシュメモリと複数の処理部とを有する。各処理部は、読み込み部とファイル読み込みロック解除部とパース部とバッファ境界ロック解除部とエンコード部と演算部と並列性ロック解除部とを有する。これら複数の処理部は何れかのスレッドの処理に割り当てられる。スレッドｉの処理を行う処理部は以下の処理を行う。なお、ｉは各スレッドを表し、ｉ∈｛０，…，Ｔ−１｝であり、Ｔがテキストファイルの文字列のサイズＴＳに対応するスレッド数を表す正整数であり、１≦ｎ_ｐ≦Ｔである。例えば、各スレッドｉでテキストファイルから読み込まれる文字列のサイズＴＳ_ｉについてＴＳ＝ＴＳ_０＋…＋ＴＳ_Ｔ−１もしくはＴＳ≦ＴＳ_０＋…＋ＴＳ_Ｔ−１を満たす、または、ｆ・Ｔ≧ＴＳを満たす。また、初期状態でスレッド０のファイル読み込みロックおよびバッファ境界ロックならびにスレッド０，…，ｎ_ｐ−１の並列性ロックが解除されているものとする。An arithmetic unit capable of handling a text file of such a format has a main memory, a cache memory, and a plurality of processing units. Each processing unit has a reading unit, a file reading unlocking unit, a parsing unit, a buffer boundary unlocking unit, an encoding unit, an arithmetic unit, and a parallelism unlocking unit. These plurality of processing units are assigned to the processing of any thread. The processing unit that processes thread i performs the following processing. Note that i represents each thread, i ∈ {0, ..., T-1}, T is a positive integer representing the number of threads corresponding to the size TS of the character string of the text file, and 1 ≦ n _p ≦. It is T. _{For example, for the size TS i} of the character string read from the text file in each thread i, TS = TS ₀ + ... + TS _T-1 or TS ≤ TS ₀ + ... + TS _T-1 is satisfied, or f · T ≥ TS. Fulfill. Further, it is assumed that the file read lock and buffer boundary lock of thread 0 and _{the concurrency lock of threads 0, ..., N p -1 are released in the initial state.}

読み込み部は、スレッドｉのファイル読み込みロックおよび並列性ロックが解除された後、テキストファイルの文字列からファイルバッファサイズｆの領域に格納可能な文字列Ｓ_ｉを読み込んでメインメモリに格納する。ｉ＝０の場合、文字列Ｓ_０はテキストファイルの先頭の文字を先端とする「ファイルバッファサイズｆの領域に格納可能なデータ量の文字列」である。ｉ≧１の場合、文字列Ｓ_ｉはスレッドｉ−１で読み込まれた文字列Ｓ_ｉ‐１の終端の文字の直後の文字を先端とする「ファイルバッファサイズｆの領域に格納可能なデータ量の文字列」である。「ファイルバッファサイズｆの領域に格納可能な文字列」は、例えば、ファイルバッファサイズｆの領域に格納可能な最長の文字列であってもよいし、ファイルバッファサイズｆから定数を減じたサイズの領域に格納可能な最長の文字列であってもよい。Reading unit, after the file read lock and parallelism locking thread i is released, stored in the main memory by reading the string S _i can be stored from a string in a region of the file buffer size f of the text file. When i = 0, the character string S ₀ is a "character string of the amount of data that can be stored in the area of the file buffer size f" starting from the first character of the text file. When i ≧ 1, the character string S _i is the “amount of data that can be stored in the area of the file buffer size f” starting from the character immediately after the _{last character of the character string S i-1} read by the thread i-1. Character string ". The "character string that can be stored in the area of the file buffer size f" may be, for example, the longest character string that can be stored in the area of the file buffer size f, or the size obtained by subtracting a constant from the file buffer size f. It may be the longest character string that can be stored in the area.

文字列Ｓ_ｉがメインメモリに格納された後、ファイル読み込みロック解除部がスレッドｉ＋１のファイル読み込みロックを解除する。これにより、複数のスレッドでのメインメモリへのアクセスが互いに競合することを防止できる。ただし、ｉ＋１＞Ｔに対応するスレッドは存在せず、存在しないスレッドのファイル読み込みロックは解除されない。After the character string S _i is stored in the main memory, the file read unlock unit releases the file read lock of thread i + 1. As a result, it is possible to prevent the access to the main memory by a plurality of threads from competing with each other. However, the thread corresponding to i + 1> T does not exist, and the file read lock of the thread that does not exist is not released.

スレッドｉのバッファ境界ロックが解除された後、パース部が文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリに格納する。例えば、パース部は文字列Ｓ_ｉの各セルを特定し、特定した各セルの参照情報を計算してメインメモリに格納する。例えば、パース部はセルの境界に位置する情報（例えば、区切り文字または改行）に基づいて各セルを特定し、特定した各セルの参照情報を計算してメインメモリに格納する。ｉ＝０の場合、文字列Ｓ_０に含まれる終端の文字がセルの終端である場合とセルの終端でない場合がある。文字列Ｓ_０に含まれる終端の文字がセルの終端でない場合、スレッド０では終端の文字を含むセルを特定できず、その参照情報も計算できない。ｉ≧１の場合、文字列Ｓ_ｉの始端の文字がセルの始端である場合とセルの始端でない場合があり、文字列Ｓ_ｉに含まれる終端の文字がセルの終端である場合とセルの終端でない場合がある。文字列Ｓ_ｉの始端の文字がセルの始端でない場合、文字列Ｓ_ｉのみから文字列Ｓ_ｉの始端の文字を含むセルを特定できない。この場合、パース部は、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字と、文字列Ｓ_ｉとを用い、文字列Ｓ_ｉの始端の文字を含むセルを特定する。文字列Ｓ_ｉに含まれる終端の文字がセルの終端でない場合、スレッドｉでは終端の文字を含むセルを特定できず、その参照情報も計算できない。なお、テキストデータの終端の文字列Ｓ_Ｔ−１に含まれる終端の文字はセルの終端である。特定されたセルに対応する参照情報を用いることにより、当該セルが属するレコードと当該セルに対応する属性（例えば、当該レコードの最初から何番目の属性であるかを表す情報）とを特定できる。パース部は、メインメモリに参照情報を格納する領域が足りなくなったときに、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリにまとめて確保する。バッファ領域の確保には所定の処理（オーバーヘッド）が必要である。１レコードごとにバッファ領域を確保するのではなく、単位処理に対応するｒレコードごとにバッファ領域をまとめて確保することで、オーバーヘッドを抑制しつつ、可変長のレコードを処理できる。After the buffer boundary locking thread i is released, stored in the main memory by calculating a reference information representative of the location and length of each cell parser is included in the string S _i. For example, parser identifies each cell string S _i, stored in the main memory to calculate the reference information of each cell identified. For example, the parsing unit identifies each cell based on the information located at the cell boundary (for example, a delimiter or a line feed), calculates the reference information of each identified cell, and stores it in the main memory. When i = 0, the _{terminal character included in the character string S 0} may be the end of the cell or not the end of the cell. If the terminating character contained in the character string S ₀ is not the terminating character of the cell, thread 0 cannot identify the cell containing the terminating character and cannot calculate the reference information. For i ≧ 1, the start of the character string S _i is a cell may not be the beginning of when the cell is a beginning, end in the string S _i characters if the cell is the end of the cell It may not be the end. When starting character of the string S _i is not a beginning of a cell, can not identify the cell containing only string S _i of the start of the string S _i characters. In this case, the parsing part uses the characters of the character string S _i-1 that are not included in the cell specified by the thread i-1, and the character string S _i, and the cell containing the character at the beginning of the character _{string S i} To identify. If the terminating character contained in the character string S _i is not the terminating character of the cell, the thread i cannot identify the cell containing the terminating character and cannot calculate the reference information. The terminal character included in _{the character string ST-1} at the end of the text data is the end of the cell. By using the reference information corresponding to the specified cell, the record to which the cell belongs and the attribute corresponding to the cell (for example, information indicating the number of the attribute from the beginning of the record) can be specified. When the area for storing the reference information is insufficient in the main memory, the parsing unit collectively secures a buffer area for storing the reference information for r records in the main memory. Predetermined processing (overhead) is required to secure the buffer area. By allocating a buffer area for each r record corresponding to unit processing instead of allocating a buffer area for each record, it is possible to process variable-length records while suppressing overhead.

文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報が計算された後、バッファ境界ロック解除部がスレッドｉ＋１のバッファ境界ロックを解除する。ｉ＋１＞Ｔに対応するスレッドは存在せず、存在しないスレッドのバッファ境界ロックは解除されない。After the reference information representing the position and length of each cell included in the character string S _i is calculated, the buffer boundary unlocking unit releases the buffer boundary lock of thread i + 1. The thread corresponding to i + 1> T does not exist, and the buffer boundary lock of the nonexistent thread is not released.

スレッドｉ＋１のバッファ境界ロックの解除後、エンコード部は、参照情報によって特定される情報に基づいて、テキストデータから結合文字列ＣＳ_ｉに含まれる処理対象のｒレコード分の文字列である処理単位文字列ＰＳ_ｉ，ｊを選択し、選択した処理単位文字列ＰＳ_ｉ，ｊを所定の有限集合の元であるエンコード情報Ｅ_ｉ，ｊにエンコードする処理を、キャッシュメモリを利用して行う。処理単位文字列ＰＳ_ｉ，ｊの始端は何れかのレコードの始端であり、処理単位文字列ＰＳ_ｉ，ｊの終端は何れかのレコードの終端である。ｉ＝０の場合の結合文字列ＣＳ_０はＳ_０であり、ｉ≧１の場合の結合文字列ＣＳ_ｉは結合文字列ＣＳ_ｉ‐１の直後に文字列Ｓ_ｉを結合したものであり、Ｊが正整数であり、ｊ＝０，…，Ｊ−１である。例えば、文字列Ｓ_ｉの文字数がｒレコード分の文字数以上である場合、エンコード部は、文字列Ｓ_ｉから処理単位文字列ＰＳ_ｉ，ｊを選択するか、または、文字列Ｓ_ｉと文字列Ｓ_ｉ‐１とを結合した文字列から処理単位文字列ＰＳ_ｉ，ｊを選択する。文字列Ｓ_ｉの文字数がｒレコード分の文字数未満である場合、エンコード部は文字列Ｓ_ｉから文字列Ｓ_ｉ’までを結合した文字列から処理単位文字列ＰＳ_ｉ，ｊを選択する。ただし、０≦ｉ’≦ｉ−１である。ｉ＝０の場合、エンコード部は文字列Ｓ_０の先頭から処理単位文字列ＰＳ_０，０，…，ＰＳ_{０，Ｊ−１}を選択する。ｉ≧１の場合、エンコード部は文字列Ｓ_ｉ−１のうち処理単位文字列として選択されていない文字を先頭とした処理単位文字列ＰＳ_ｉ，０，…，ＰＳ_{ｉ，Ｊ−１}を選択する。Ｊ≧２の場合、ＰＳ_{ｉ，ｊ−１}の直後にＰＳ_ｉ，ｊが続く。After releasing the buffer boundary lock of thread i + 1, the encoding unit is a processing unit character which is a character string for r records to be processed included in the _{combined character string CS i from the text data based on the information specified by the reference information.} column PS _i, select _j, selected processing unit character string PS _i, encoding information E _{_i, j} is a predetermined original finite _set, the process for encoding the _j, performed using a cache memory. The start end of the processing unit character string PS _{i, j} is the start end of any record, and the end end of the processing unit character string PS _{i, j is the end end of any record.} The concatenation character string CS ₀ when i = 0 is S ₀ , and the concatenation character string CS _i when i ≧ 1 is the concatenation character string S _i immediately after the concatenation character string CS _i-1 . J is a positive integer, and j = 0, ..., J-1. For example, if the number of characters in the string S _i is equal to or greater than the number of characters r record content, the encoding unit, the string S _i from the processing unit character string PS _i, choose a _j, or string S _i and the string The _{processing unit character strings PS i and j} are selected from the character string in which _{S i-1 is combined.} When the number of characters in the character string S _i is less than the number of characters for r records, the encoding unit selects the _{processing unit character strings PS i and j} from the character string obtained by combining the _{character strings S i} to the character string S _i'. However, 0 ≦ i'≦ i-1. When i = 0, the encoding unit selects the processing unit character strings PS ₀ , 0, ..., PS _{0, J-1} _{from the beginning of the character string S 0.} When i ≧ 1, the encoding unit _{selects the processing unit character string PS i, 0} , ..., PS _{i, J-1} starting from the character not selected as the processing unit character string in the _{character string S i-1.} To do. For J ≧ _{_2, PS _i, j} followed by _{PS i,} immediately after the _j-1.

エンコード部は処理単位文字列ＰＳ_ｉ，ｊごとにエンコードを行う。テキストデータではコード指向でデータが並び（レコード１，レコード２，…，レコードＷの順序でデータが並ぶ）、すべてのレコードは互いに同一の「属性の組」に対応する。一般に、異なる種類のデータを続けて処理するよりも、同種のデータを続けて処理した方が処理速度が速い。そのため、エンコード部は、処理単位文字列ＰＳ_ｉ，ｊのうち同じ属性情報に対応するｒレコード分のｒ個のセルのエンコードを続けて行うことが望ましい。エンコード部は参照情報によって特定される情報に基づいて処理単位文字列ＰＳ_ｉ，ｊを選択し、選択した処理単位文字列ＰＳ_ｉ，ｊをエンコード情報Ｅ_ｉ，ｊにエンコードする。この処理の過程で必要となるｒレコード分の参照情報、処理単位文字列ＰＳ_ｉ，ｊ、エンコード情報Ｅ_ｉ，ｊをキャッシュメモリに格納して演算を行うことで高速な処理が可能となる。前述のようにＣ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）に対応するレコード数ｒを用いることで、このような処理が可能となっている。The encoding unit encodes each processing unit character string PS _{i, j.} In text data, data is arranged in a code-oriented manner (data is arranged in the order of record 1, record 2, ..., Record W), and all records correspond to the same "attribute set". In general, the processing speed is faster when the same type of data is continuously processed than when different types of data are continuously processed. Therefore, it is desirable that the encoding unit continuously encodes r cells of r records corresponding to the same attribute information in the processing unit character strings PS _{i and j.} Encoding section processing unit character string based on the information specified by the reference information PS _i, select _j, encoding selected processing unit character string PS _i, a _j encoding information E _i, the _j. High-speed processing is possible by storing the reference information for r records, the processing unit character strings PS _{i, j} , and the encoding information E _{i, j} , which are required in the process of this processing, in the cache memory and performing the calculation. As described above, such processing is possible by using the number of records r corresponding to _{C / (Scsv} + _Senc + _Sref).

演算部は、エンコード情報Ｅ_ｉ，ｊに特定の「演算」を行って演算値ＳＳ_ｉ，ｊを得てメインメモリに格納する処理を、キャッシュメモリを利用して行う。演算部は、同じ属性情報に対応するｒレコード分のｒ個のセルに対応する「演算」を続けて行うことが望ましい。この処理の過程でもｒレコード分の参照情報、処理単位文字列ＰＳ_ｉ，ｊ、エンコード情報Ｅ_ｉ，ｊをキャッシュメモリに格納して演算を行うことで高速な処理が可能となる。The calculation unit uses the cache memory to perform a process of performing a specific "calculation" on _{the encoding information E i and j} _{to obtain the calculation values SS i and j} and storing them in the main memory. It is desirable that the calculation unit continuously performs "calculation" corresponding to r cells for r records corresponding to the same attribute information. Even in this processing process, high-speed processing is possible by storing the reference information for r records, the processing unit character strings PS _{i, j} , and the encoding information E _{i, j} in the cache memory and performing the calculation.

演算値ＳＳ_ｉ，ｊが得られた後、並列性ロック解除部はスレッドｉ＋ｎ_ｐの並列性ロックを解除する。ただし、ｉ＋ｎ_ｐ＞Ｔに対応するスレッドは存在せず、存在しないスレッドの並列性ロックは解除されない。その後、スレッドｉの処理を行っていた処理部が開放され、当該処理部が他のスレッドの処理を行うことが可能になる。After the calculated values SS _{i and j} are obtained, the parallelism unlocking unit releases the parallelism lock of the thread i + _np . However, the thread corresponding to i + n _p > T does not exist, and the parallelism lock of the nonexistent thread is not released. After that, the processing unit that has been processing the thread i is released, and the processing unit can perform the processing of another thread.

＜演算装置（高速版）＞
テキストファイルの各セルが、単独で「演算」の対象を表すか否かを判別可能な文字のみを含む場合、テキストファイルの各セルの位置および長さを特定する処理を並列に行うことができ、さらに高速な演算が可能となる。例えば、エスケープ文字を使用していないテキストファイルの場合にはこのような並列処理が可能になる。このようなテキストファイルに対応可能な演算装置はメインメモリとキャッシュメモリと複数の処理部とを有する。各処理部は、読み込み部とファイル読み込みロック解除部とパース部とセル特定部とバッファ境界ロック解除部とエンコード部と演算部と並列性ロック解除部とを有する。これら複数の処理部は何れかのスレッドの処理に割り当てられる。スレッドｉの処理を行う処理部は以下の処理を行う。初期状態でスレッド０のファイル読み込みロックならびにバッファ境界ロックおよびスレッド０，…，ｎ_ｐ−１の並列性ロックが解除されているものとする。<Arithmetic logic unit (high-speed version)>
When each cell of the text file contains only characters that can determine whether or not it represents the target of "calculation" by itself, the process of specifying the position and length of each cell of the text file can be performed in parallel. , Even faster calculation is possible. For example, in the case of a text file that does not use escape characters, such parallel processing is possible. An arithmetic unit that can handle such a text file has a main memory, a cache memory, and a plurality of processing units. Each processing unit has a reading unit, a file reading unlocking unit, a parsing unit, a cell specifying unit, a buffer boundary unlocking unit, an encoding unit, an arithmetic unit, and a parallelism unlocking unit. These plurality of processing units are assigned to the processing of any thread. The processing unit that processes thread i performs the following processing. It is assumed that the file read lock of thread 0, the buffer boundary lock, and _{the concurrency lock of threads 0, ..., N p} -1 are released in the initial state.

読み込み部は、スレッドｉのファイル読み込みロックおよび並列性ロックが解除された後、テキストファイルの文字列からファイルバッファサイズｆの領域に格納可能な文字列Ｓ_ｉを読み込んでメインメモリに格納する。この詳細は演算装置（一般版）と同じである。Reading unit, after the file read lock and parallelism locking thread i is released, stored in the main memory by reading the string S _i can be stored from a string in a region of the file buffer size f of the text file. This detail is the same as the arithmetic unit (general version).

文字列Ｓ_ｉがメインメモリに格納された後、ファイル読み込みロック解除部がスレッドｉ＋１のファイル読み込みロックを解除する。この詳細は演算装置（一般版）と同じである。After the character string S _i is stored in the main memory, the file read unlock unit releases the file read lock of thread i + 1. This detail is the same as the arithmetic unit (general version).

パース部が文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリに格納する。パース部はスレッドｉのバッファ境界ロックが解除される前にこの処理を開始できる。すなわち、パース部は、ｉ≧１において、文字列Ｓ_ｉ−１に含まれる各セルの位置および長さを表す参照情報の計算が終わる前に、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報の計算を開始できる。例えば、パース部は文字列Ｓ_ｉのセルを特定し、特定した各セルの参照情報を計算してメインメモリに格納する。例えば、パース部はセルの境界に位置する情報（例えば、区切り文字または改行）に基づいてセルを特定し、特定した各セルの参照情報を計算してメインメモリに格納する。ｉ＝０の場合、文字列Ｓ_０に含まれる終端の文字がセルの終端である場合とセルの終端でない場合がある。文字列Ｓ_０に含まれる終端の文字がセルの終端でない場合、パース部は終端の文字を含むセルを特定できず、その参照情報も計算できない。ｉ≧１の場合、文字列Ｓ_ｉの始端の文字がセルの始端である場合とセルの始端でない場合があり、文字列Ｓ_ｉに含まれる終端の文字がセルの終端である場合とセルの終端でない場合がある。文字列Ｓ_ｉの始端の文字がセルの始端でない場合、パース部は文字列Ｓ_ｉの始端の文字を含むセルを特定できず、その参照情報も計算できない。文字列Ｓ_ｉに含まれる終端の文字がセルの終端でない場合、パース部は終端の文字を含むセルを特定できず、その参照情報も計算できない。なお、終端の文字列Ｓ_Ｔ−１に含まれる終端の文字はセルの終端である。なお、パース部は、メインメモリに参照情報を格納する領域が足りなくなったときに、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリにまとめて確保する。これにより、オーバーヘッドを抑制しつつ、可変長のレコードを処理できる。Parser stores to calculate the reference information representative of the location and length of each cell in a string S _i in a main memory. The parsing unit can start this process before the buffer boundary lock of thread i is released. That is, the parser, in i ≧ 1, before the calculation of the reference information indicating the position and length of each cell in the string S _i-1 is completed, the position of each cell in the string S _i and You can start calculating the reference information that represents the length. For example, parser identifies the cell of the string S _i, stored in the main memory to calculate the reference information of each cell identified. For example, the parsing unit identifies a cell based on information located at the boundary of the cell (for example, a delimiter or a line feed), calculates the reference information of each identified cell, and stores it in the main memory. When i = 0, the _{terminal character included in the character string S 0} may be the end of the cell or not the end of the cell. If the terminal character included in the character string S ₀ is not the terminal of the cell, the parsing unit cannot identify the cell containing the terminal character, and the reference information thereof cannot be calculated. For i ≧ 1, the start of the character string S _i is a cell may not be the beginning of when the cell is a beginning, end in the string S _i characters if the cell is the end of the cell It may not be the end. When starting character of the string S _i is not a beginning of the cell, parsing unit can not identify the cell that contains the start character string S _i, not even be calculated the reference information. If the character terminating in a character string S _i is not the end of the cell, parsing unit can not identify the cell containing the end character, not even be calculated the reference information. The terminal character _{included in the terminal character string ST-1} is the terminal of the cell. When the area for storing the reference information is insufficient in the main memory, the parsing unit collectively secures a buffer area for storing the reference information for r records in the main memory. This makes it possible to process variable-length records while suppressing overhead.

ｉ≧１の場合、文字列Ｓ_ｉは文字列Ｓ_ｉ‐１の直後に続く文字列である。ｉ≧１の場合、セル特定部は、スレッドｉのバッファ境界ロックが解除された後、参照情報と文字列Ｓ_ｉ‐１と文字列Ｓ_ｉとを用い、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルの位置に対応する情報Ａ_ｉを得てメインメモリに格納する。情報Ａ_ｉは、例えば、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルが属するレコードを表す情報と当該セルに対応する属性を表す情報（例えば、当該レコードの最初から何番目の属性に対応するかを表す情報）であってもよいし、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルの位置および長さを表す情報であってもよい。文字列Ｓ_ｉ‐１の終端がセルの終端である場合には文字列Ｓ_ｉの先頭のセルが「文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセル」となる。この場合には情報Ａ_ｉのメインメモリへの格納が省略されてもよい。一方、文字列Ｓ_ｉ‐１の終端がセルの終端でない場合、文字列Ｓ_ｉは文字列Ｓ_ｉ‐１と文字列Ｓ_ｉとを用い、「文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセル」を生成して情報Ａ_ｉを得る。これにより、パース部が特定できなかったセルの位置に対応する情報が得られる。参照情報と情報Ａ_ｉとによってテキストファイルの各セルが属するレコードと当該セルに対応する属性（例えば、当該レコードの最初から何番目の属性であるかを表す情報）とを特定できる。なお、ｉ＝０の場合、セル特定部は何もしない。When i ≧ 1, the character string S _i is a character string immediately following the character string _{S i-1.} When i ≧ 1, the cell identification part is included in the character string S _i-1 _{by using the reference information, the character string S i-1} and the character string S _i -1 after the buffer boundary lock of the thread i is released. stored in the main memory to obtain information a _i corresponding to the position of the cell immediately following the last cell. Information A _i is, for example, _{information representing a record to which a cell immediately following the last cell included in the character string S i-1} belongs and information representing an attribute corresponding to the cell (for example, the number from the beginning of the record). Information indicating whether or not it corresponds to the attribute of), or information indicating the position and length of the cell immediately following the last cell included _{in the character string Si-1.} When the end of the character string S _{i-1 is} the end of the cell, the _{first cell of the character string S i} becomes "the cell immediately following the last cell included in the character string S _i-1". In this case, _{the storage of the information Ai in} the main memory may be omitted. On the other hand, _{when the end of the character string S i-1} is not the end of the cell, the character string S _i uses the character string S _i-1 and the character string S _i, and "the last cell included in the _{character string S i-1".} obtain information a _i generates a cell "that immediately follows. As a result, information corresponding to the position of the cell whose perspective portion could not be specified can be obtained. The record to which each cell of the text file belongs and the attribute corresponding to the cell (for example, information indicating the number of the attribute from the beginning of the record) can be specified by the reference information and the information _Ai. When i = 0, the cell specific part does nothing.

バッファ境界ロック解除部は、情報Ａ_ｉが得られた後にスレッドｉ＋１のバッファ境界ロックを解除する。この詳細は演算装置（一般版）と同じである。その後、スレッドｉの処理を行っていた処理部が開放され、当該処理部が他のスレッドの処理を行うことが可能になる。The buffer boundary unlocking unit releases the buffer boundary lock of thread i + 1 after the _{information Ai is obtained.} This detail is the same as the arithmetic unit (general version). After that, the processing unit that has been processing the thread i is released, and the processing unit can perform the processing of another thread.

その後エンコード部は、参照情報および情報Ａ_ｉによって特定される情報に基づいて、結合文字列ＣＳ_ｉに含まれる処理対象のｒレコード分の文字列である処理単位文字列ＰＳ_ｉ，ｊを選択し、処理単位文字列ＰＳ_ｉ，ｊを所定の有限集合の元であるエンコード情報Ｅ_ｉ，ｊにエンコードする処理を、キャッシュメモリを利用して行う。この詳細は参照情報に加えて情報Ａ_ｉを用いる以外、演算装置（一般版）と同じである。After that, the encoding unit selects the _{processing unit character strings PS i and j} , which are the character strings for the r records to be processed included in the combined character string CS _i _{, based on the reference information and the information specified by the information A i.} , The process of encoding the processing unit character strings PS _{i, j} _{into the encoding information E i, j} , which is the source of a predetermined finite set, is performed using the cache memory. This detail is the same as that of the arithmetic unit (general version) except _{that the information Ai} is used in addition to the reference information.

演算部は、エンコード情報Ｅ_ｉ，ｊに特定の「演算」を行って演算値ＳＳ_ｉ，ｊを得てメインメモリに格納する処理を、キャッシュメモリを利用して行う。この詳細は演算装置（一般版）と同じである。The calculation unit uses the cache memory to perform a process of performing a specific "calculation" on _{the encoding information E i and j} _{to obtain the calculation values SS i and j} and storing them in the main memory. This detail is the same as the arithmetic unit (general version).

演算値ＳＳ_ｉ，ｊが得られた後、並列性ロック解除部はスレッドｉ＋ｎ_ｐの並列性ロックを解除する。この詳細は演算装置（一般版）と同じである。After the calculated values SS _{i and j} are obtained, the parallelism unlocking unit releases the parallelism lock of the thread i + _np . This detail is the same as the arithmetic unit (general version).

［第１実施形態］
図面を用いて第１実施形態を説明する。第１実施形態では、パラメータ設定装置が１つの単位処理で扱われるレコード数ｒ、並列数ｎ_ｐ、ファイルバッファサイズｆを設定し、演算装置（一般版）がエスケープ文字の使用が可能なＣＳＶ（Comma-Separated Values）ファイル（テキストファイル）の秘密分散（演算）を行う例を説明する。以下では、これまで説明した事項との相違点を中心に説明し、既に説明した事項については説明を省略する場合がある。[First Embodiment]
The first embodiment will be described with reference to the drawings. In the first embodiment, the parameter setting device sets the number of records r, the number of parallels _np , and the file buffer size f handled in one unit processing, and the arithmetic unit (general version) can use escape characters in CSV (general version). An example of performing secret sharing (calculation) of a Comma-Separated Values) file (text file) will be described. In the following, the differences from the items described so far will be mainly described, and the items already explained may be omitted.

＜構成＞
図１に例示するように、本実施形態の演算システム１は、パラメータ設定装置１１、演算装置１２、およびＮ個のサーバ装置１３−１〜１３−Ｎを有する。ただし、Ｎは２以上の正整数である。パラメータ設定装置１１から演算装置１２への情報の伝達が可能であり、演算装置１２からサーバ装置１３−１〜１３−Ｎへの情報の伝達が可能である。なお、情報の伝達はネットワークを介して行われてもよいし、その他の通信手段を用いて行われてもよいし、可搬型の記録媒体を介して行われてもよい。<Structure>
As illustrated in FIG. 1, the arithmetic system 1 of the present embodiment includes a parameter setting device 11, an arithmetic device 12, and N server devices 13-1 to 13-N. However, N is a positive integer of 2 or more. Information can be transmitted from the parameter setting device 11 to the arithmetic unit 12, and information can be transmitted from the arithmetic unit 12 to the server devices 13-1 to 13-N. Information may be transmitted via a network, may be performed by using other communication means, or may be transmitted via a portable recording medium.

図２に例示するように、パラメータ設定装置１１は、入力部１１１ａ、出力部１１１ｂ、記憶部１１２、制御部１１３、最大サイズ設定部１１４ａ、最小サイズ設定部１１４ｂ、エンコードサイズ設定部１１４ｃ、演算サイズ設定部１１４ｄ、参照サイズ設定部１１４ｅ、処理単位算出部１１４ｆ、並列数算出部１１４ｇ、およびバッファサイズ算出部１１４ｈを有する。パラメータ設定装置１１は、制御部１１３の制御の下で各処理を実行する。パラメータ設定装置１１で得られた各値は記憶部１１２に格納され、必要に応じて記憶部１１２から読み出されて他の処理に用いられる。 As illustrated in FIG. 2, the parameter setting device 11 includes an input unit 111a, an output unit 111b, a storage unit 112, a control unit 113, a maximum size setting unit 114a, a minimum size setting unit 114b, an encoding size setting unit 114c, and a calculation size. It has a setting unit 114d, a reference size setting unit 114e, a processing unit calculation unit 114f, a parallel number calculation unit 114g, and a buffer size calculation unit 114h. The parameter setting device 11 executes each process under the control of the control unit 113. Each value obtained by the parameter setting device 11 is stored in the storage unit 112, and is read out from the storage unit 112 as needed and used for other processing.

図３に例示するように、演算装置１２は、入力部１２１ａ、出力部１２１ｂ、補助記憶部１２２、メインメモリ１２３、制御部１２５、および処理部１２６−１〜１２６−Ｑを有する。ただし、Ｑは２以上の整数である。演算装置１２は、制御部１２５の制御の下で各処理を実行する。 As illustrated in FIG. 3, the arithmetic unit 12 includes an input unit 121a, an output unit 121b, an auxiliary storage unit 122, a main memory 123, a control unit 125, and a processing unit 126-1 to 126-Q. However, Q is an integer of 2 or more. The arithmetic unit 12 executes each process under the control of the control unit 125.

図４に例示するように、処理部１２６−ｑ（ただし、ｑ＝１，…，Ｑ）は、キャッシュメモリ１２６０−ｑ、読み込み部１２６１−ｑ、パース部１２６２−ｑ、エンコード部１２６５−ｑ、演算部１２６６−ｑ、ファイル読み込みロック解除部１２６７−ｑ、バッファ境界ロック解除部１２６８−ｑ、および並列性ロック解除部１２６９−ｑを有する。 As illustrated in FIG. 4, the processing unit 126-q (where q = 1, ..., Q) includes a cache memory 1260-q, a reading unit 1261-q, a perspective unit 1262-q, and an encoding unit 1265-q. It has a calculation unit 1266-q, a file read unlock unit 1267-q, a buffer boundary unlock unit 1268-q, and a parallelism unlock unit 1269-q.

＜パラメータ設定処理＞
図５を用いて、パラメータ設定装置１１のパラメータ設定処理を説明する。
演算処理対象のテキストデータの属性情報がパラメータ設定装置１１（図２）の入力部１１１ａに入力され、記憶部１１２に格納される。属性情報はテキストデータから読み込まれたものであってもよいし、テキストデータ以外から与えられたものであってもよい（ステップＳ１１１ａ）。<Parameter setting process>
The parameter setting process of the parameter setting device 11 will be described with reference to FIG.
The attribute information of the text data to be processed is input to the input unit 111a of the parameter setting device 11 (FIG. 2) and stored in the storage unit 112. The attribute information may be read from the text data or may be given from other than the text data (step S111a).

最大サイズ設定部１１４ａは、記憶部１１２から読み出した属性情報を入力としてテキストファイルの１レコード分の文字列のサイズの最大値Ｓ_ｃｓｖを設定して出力する（ステップＳ１１４ａ）。Maximum size setting unit 114a sets the maximum value _{S csv} size of one record of a string of text file and outputs the attribute information read from the storage unit 112 as an input (step S114a).

最小サイズ設定部１１４ｂは、記憶部１１２から読み出した属性情報を入力として１レコード分の文字列のサイズの最小値ｓ_ｃｓｖを設定して出力する（ステップＳ１１４ｂ）。 _{The minimum size setting unit 114b sets and outputs the minimum value s csv} of the size of the character string for one record by inputting the attribute information read from the storage unit 112 (step S114b).

エンコードサイズ設定部１１４ｃは、記憶部１１２から読み出した属性情報を入力とし、エンコード情報が属する「所定の有限集合」を表す情報に基づいて、１レコード分の文字列を所定の有限集合の元にエンコード（変換）して得られるエンコード情報の合計サイズの最大値Ｓ_ｅｎｃを設定して出力する。本実施形態のエンコード情報が属する「所定の有限集合」は秘密分散が行われる有限集合であり、予め定められている（ステップＳ１１４ｃ）。The encoding size setting unit 114c takes the attribute information read from the storage unit 112 as input, and uses the character string for one record as the source of the predetermined finite set based on the information representing the "predetermined finite set" to which the encoding information belongs. Set and output _{the maximum value Senc} of the total size of the encoding information obtained by encoding (conversion). The "predetermined finite set" to which the encoding information of the present embodiment belongs is a finite set to which secret sharing is performed, and is predetermined (step S114c).

演算サイズ設定部１１４ｄは、記憶部１１２から読み出した属性情報を入力とし、エンコード情報が属する所定の有限集合および秘密分散方式に基づいて、１レコード分のエンコード情報の秘密分散（演算）によって得られる秘密分散値（演算値）の合計サイズの最大値Ｓ_ｓｓを設定して出力する。本実施形態の秘密分散方式は予め定められている（ステップＳ１１４ｄ）。The calculation size setting unit 114d receives the attribute information read from the storage unit 112 as input, and is obtained by secret sharing (calculation) of the encoding information for one record based on a predetermined finite set to which the encoding information belongs and the secret sharing method. Set and output _{the maximum value S ss} of the total size of the secret sharing value (calculated value). The secret sharing method of this embodiment is predetermined (step S114d).

参照サイズ設定部１１４ｅは、記憶部１１２から読み出した属性情報を入力とし、テキストファイル内の１レコード分のセルそれぞれの位置および長さを表す参照情報の合計サイズＳ_ｒｅｆを設定して出力する（ステップＳ１１４ｅ）。The reference size setting unit 114e takes the attribute information read from the storage unit 112 as input, sets and outputs _{the total size Sref of the reference information representing the position and length of each cell for one record in the text file (} Step S114e).

処理単位算出部１１４ｆは、Ｓ_ｃｓｖ、Ｓ_ｅｎｃおよびＳ_ｒｅｆを入力とし、Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値ｒ（１回の単位処理で処理されるレコード数ｒ、すなわち処理単位文字列が含むレコード数ｒ）を得て出力する。キャッシュメモリサイズＣは予め定められたものであってもよいし、入力されたものであってもよい（ステップＳ１１４ｆ）。The processing unit calculation unit 114f takes S _csv, _Senc, and _Sref as inputs, and has _{a function value r of C / (S csv} + _Senc + _Sref ) (the number of records r processed in one unit processing, that is, the processing unit. The number of records r) included in the character string is obtained and output. The cache memory size C may be a predetermined one or may be an input one (step S114f).

並列数算出部１１４ｇは、Ｓ_ｃｓｖ、ｓ_ｃｓｖ、Ｓ_ｒｅｆ、Ｓ_ｅｎｃ、Ｉおよびｒを入力とし、ｆ_０／Ｉ・ｒ・Ｓ_ｃｓｖの関数値ｎ_ｐ（演算処理における並列数ｎ_ｐ）を得て出力する。ｆ_０はｓ_ｃｓｖ・Ｍ／（ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋ｍａｘ（Ｓ_ｒｅｆ，Ｓ_ｓｓ））の関数値である。メインメモリサイズＭは予め定められたものであってもよいし、入力されたものであってもよい（ステップＳ１１４ｇ）。The parallel number calculation unit 114g takes S _csv, s _csv , S _ref , _Senc , I and r as inputs, and sets the function value n _p (parallel number n _p in arithmetic processing) of _{f 0} / Ir r S _csv. Get and output. f ₀ is a function value of _{s csv} · M / (s _csv + _Senc + max ( _Sref , _Sss)). The main memory size M may be a predetermined one or may be an input one (step S114g).

バッファサイズ算出部１１４ｈは、ｆ_０およびｎ_ｐを入力とし、ｆ_０／ｎ_ｐの関数値ｆ（演算処理の際にテキストファイルの文字列からまとめて読み込まれるデータのファイルバッファサイズｆ）を得て出力する（ステップＳ１１４ｈ）。The buffer size calculation unit 114h takes f ₀ and n _p as inputs, and _{obtains a function value f of f 0} / n _p (file buffer size f of data collectively read from a character string of a text file during arithmetic processing). Is output (step S114h).

出力部１１１ｂは、上述のように得られたｒ，ｎ_ｐ，ｆを出力する（ステップＳ１１１ｂ）。The output unit 111b outputs r, n _p , and f obtained as described above (step S111b).

＜演算処理＞
図６から図１２を用い、演算装置１２の演算処理を説明する。
図６に例示するように、パラメータ設定装置１１から出力されたｒ，ｎ_ｐ，ｆとが演算装置１２（図３）の入力部１２１ａに入力され、補助記憶部１２２に格納される（ステップＳ１１１ａａ）。また演算処理対象のテキストデータが入力部１２１ａに入力され、補助記憶部１２２に格納される。図９から図１２にテキストデータを例示する。図９に例示するテキストデータは、各セルがダブルクォーテーションで囲まれたＣＳＶファイルである。セルの値としてダブルクォーテーション「“」を用いる場合にはその前にエスケープ文字として「“」が付加される。例えば、“4selddks““k304kdkk400-03d”は、「4selddks“k304kdkk400-03d」という秘密分散対象の値を表している。図１０に例示するように、このテキストファイルはＷ個のレコードｒｅｃ（１），…，ｒｅｃ（Ｗ）を含み、レコードｒｅｃ（ｗ）（ただし、ｗ＝１，…，Ｗ）のそれぞれは任意長のＧ個のセルｃｅｌｌ（ｗ，ｇ）を含み（ただし、ｇ＝１，…，Ｇ）、セルｃｅｌｌ（ｗ，ｇ）のそれぞれは任意個の文字を含む。セルｃｅｌｌ（ｗ，ｇ）はレコードｒｅｃ（ｗ）の最初からｇ番目の属性ａｔｔ（ｇ）に対応する（ステップＳ１１１ａｂ）。<Calculation processing>
The arithmetic processing of the arithmetic unit 12 will be described with reference to FIGS. 6 to 12.
As illustrated in FIG. 6, r, n _p , and f output from the parameter setting device 11 are input to the input unit 121a of the arithmetic unit 12 (FIG. 3) and stored in the auxiliary storage unit 122 (step S111aa). ). Further, the text data to be processed is input to the input unit 121a and stored in the auxiliary storage unit 122. Text data is illustrated in FIGS. 9 to 12. The text data illustrated in FIG. 9 is a CSV file in which each cell is enclosed in double quotation marks. When the double quotation mark """is used as the cell value,""" is added as an escape character before it. For example, "4selddks""k304kdkk400-03d" represents the value of the secret sharing target "4selddks" k304kdkk400-03d ". As illustrated in FIG. 10, this text file contains W records rec (1), ..., Rec (W), and each of the records rec (w) (where w = 1, ..., W) is arbitrary. It contains G long cell cells (w, g) (where g = 1, ..., G), and each cell cell (w, g) contains any number of characters. The cell cell (w, g) corresponds to the g-th attribute att (g) from the beginning of the record rec (w) (step S111ab).

その後、補助記憶部１２２からｒ，ｎ_ｐ，ｆがメインメモリ１２３に読み込まれ、スレッドｉ＝０，…，Ｔ−１の演算処理が実行される。演算処理はｉ＝０のスレッドから開始される。なお、初期状態でスレッド０のファイル読み込みロックおよびバッファ境界ロックならびにスレッド０，…，ｎ_ｐ−１の並列性ロックが解除されている。制御部１２５は処理部１２６−１〜１２６−Ｑのうち使用されていない処理部１２６−ｑをスレッドｉに割り当て、可能な限り複数のスレッドが並列に各スレッドｉの処理を実行する（ステップＳ１２６）。これによって得られた各秘密分散値は出力部１２１ｂから出力され、各サーバ装置１３−１〜１３−Ｎにそれぞれ送られ、各サーバ装置１３−１〜１３−Ｎに格納される（ステップＳ１１１ｂ）。以下にスレッドｉの処理の詳細を説明する。After that, r, n _p , and f are read from the auxiliary storage unit 122 into the main memory 123, and the arithmetic processing of threads i = 0, ..., T-1 is executed. The arithmetic processing is started from the thread of i = 0. In the initial state, the file read lock and buffer boundary lock of thread 0 and _{the concurrency lock of threads 0, ..., N p} -1 are released. The control unit 125 allocates the unused processing unit 126-q of the processing units 126-1 to 126-Q to the thread i, and a plurality of threads execute the processing of each thread i in parallel as much as possible (step S126). ). Each secret sharing value obtained in this way is output from the output unit 121b, sent to each server device 13-1 to 13-N, and stored in each server device 13-1 to 13-N (step S111b). .. The details of the processing of thread i will be described below.

≪スレッドｉの処理≫
図７および図８に例示するように、スレッドｉの処理を行う処理部１２６−ｑの読み込み部１２６１−ｑは、スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されたかを判定する。スレッド０のファイル読み込みロックおよび並列性ロックは初期状態で解除されている（ステップＳ１２６１ａ−ｑ）。スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されていない場合にはステップＳ１２６１ａ−ｑの判定が繰り返される。≪Processing of thread i≫
As illustrated in FIGS. 7 and 8, the reading unit 1261-q of the processing unit 126-q that performs the processing of the thread i determines whether both the file reading lock and the concurrency lock of the thread i are released. The file read lock and parallelism lock of thread 0 are released in the initial state (step S1261a-q). If both the file read lock and the parallelism lock of thread i are not released, the determination in step S1261a-q is repeated.

一方、スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されている場合、読み込み部１２６１−ｑは、メインメモリ１２３からファイルバッファサイズｆを読み込み、メインメモリ１２３にファイルバッファサイズｆの領域を確保する。さらに、読み込み部１２６１−ｑは、補助記憶部１２２に格納されたテキストファイルの文字列からファイルバッファサイズｆの領域に格納可能な文字列Ｓ_ｉを読み込む。図１１の例では、文字列Ｓ_０として以下が読み込まれる。
“石田”,“太郎”,“1990/2/8”,“100-0002”,“sjeifdfgjrrf”,“45dkfjkejdf5”
“石田”,“次郎”,“1985/5/2”,“111-0112”,“25df4d4ed”,“1s4dlccclseed”
“石田”,“花子”,“2001/4/8”,“111-2222”,“5d4e4d4ffg”,“skekdjjfaae”
“佐藤”,“太郎”,“1992/7/11”,“111-0345”,“dlekd4f3e”,“4selddks“
図１２の例では、文字列Ｓ_１として以下が読み込まれる。
“k304kdkk400-03d”
“佐藤”,“次郎”,“1989/8/21”,“123-0434”,“dkesopd445e”,“4ssjdejdoseae3230dds”
“佐藤”,“花子”,“1995/2/3”,“145-0234”,“skdeofl4s3d3”,“skek94kdskd4dc”
“田中”,“太郎”,“1992/3/23”,“134-0134”,“dj394949495kf”,“47s52\n5412485d”
“田中”,“次郎”,“1979/4/21”,“11
読み込み部１２６１−ｑは、メインメモリ１２３に確保したファイルバッファサイズｆの領域に文字列Ｓ_ｉを格納する（図７のステップＳ１２６１ｂ−ｑ、図８のＲ_ｉ）。On the other hand, when both the file read lock and the concurrency lock of thread i are released, the read unit 1261-q reads the file buffer size f from the main memory 123 and sets the area of the file buffer size f in the main memory 123. Secure. Furthermore, the reading unit 1261-q reads the string S _i can be stored in an area of the file buffer size f from a string of text files stored in the auxiliary storage unit 122. In the example of FIG. 11, the following is read as _{the character string S 0.}
“Ishida”, “Taro”, “1990/2/8”, “100-0002”, “sjeifdfgjrrf”, “45dkfjkejdf5”
"Ishida", "Jiro", "1985/5/2", "111-0112", "25df4d4ed", "1s4dlccclseed"
“Ishida”, “Hanako”, “2001/4/8”, “111-2222”, “5d4e4d4ffg”, “skekdjjfaae”
“Sato”, “Taro”, “1992/7/11”, “111-0345”, “dlekd4f3e”, “4selddks”
In the example of FIG. 12, the following are read as a string S _1.
“K304kdkk400-03d”
“Sato”, “Jiro”, “1989/8/21”, “123-0434”, “dkesopd445e”, “4ssjdejdoseae3230dds”
“Sato”, “Hanako”, “1995/2/3”, “145-0234”, “skdeofl4s3d3”, “skek94kdskd4dc”
"Tanaka", "Taro", "1992/3/23", "134-0134", "dj394949495kf", "47s52 \ n5412485d"
"Tanaka", "Jiro", "1979/4/21", "11"
Reading unit 1261-q stores the string _{S i} in the area of the file buffer size f secured in the main memory 123 (Step S1261b-q in FIG. 7, _R i in FIG. 8).

文字列Ｓ_ｉがメインメモリ１２３に格納された後、ファイル読み込みロック解除部１２６７−ｑがスレッドｉ＋１のファイル読み込みロックを解除する（図７のステップＳ１２６７−ｑ、図８のＵＲ_ｉ＋１）。After the character string S _i is stored in the main memory 123, the file read unlocking unit 1267-q releases the file read lock of the thread i + 1 (step S1267-q in FIG. 7, UR _{i + 1 in} FIG. 8).

パース部１２６２−ｑは、スレッドｉのバッファ境界ロックが解除されたか否かを判定する。スレッド０のバッファ境界ロックは初期状態で解除されている（ステップＳ１２６２ａ−ｑ）。スレッドｉのバッファ境界ロックが解除されていない場合にはステップＳ１２６２ａ−ｑの判定が繰り返される。 The parsing unit 1262-q determines whether or not the buffer boundary lock of thread i has been released. The buffer boundary lock of thread 0 is released in the initial state (step S1262a-q). If the buffer boundary lock of thread i is not released, the determination in step S1262a-q is repeated.

一方、スレッドｉのバッファ境界ロックが解除されている場合、パース部１２６２−ｑはｉ≧１であるか否かを判定する（ステップＳ１２６２ｂ−ｑ）。ｉ≧１でない場合（すなわち、ｉ＝０の場合）、パース部１２６２−ｑは、メインメモリ１２３から読み出した文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１１に例示した文字列Ｓ_０の場合、パース部１２６２−ｑは、文字列Ｓ_０をパースしてセル「石田」「太郎」「1990/2/8」「100-0002」「sjeifdfgjrrf」「45dkfjkejdf5」「石田」「次郎」「1985/5/2」「111-0112」「25df4d4ed」「1s4dlccclseed」「石田」「花子「2001/4/8」「111-2222」「5d4e4d4ffg」「skekdjjfaae」「佐藤」「太郎」「1992/7/11」「111-0345」「dlekd4f3e」を特定し、それらの参照情報を計算する。最後の「“4selddks“」の終端はセルの終端ではないため、スレッド０では「“4selddks“」の参照情報は計算されない。パース部１２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。その後、処理がステップＳ１２６８−ｑに進む（図７のステップＳ１２６２ｃ−ｑ、図８のＰ_ｉ）。一方、ｉ≧１である場合、パース部１２６２−ｑは、スレッドｉ−１でのパース結果（特定された各セルの参照情報およびセルに含まれない文字を特定する情報）をメインメモリ１２３から読み込み、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字を特定する。文字列Ｓ_ｉ−１の終端がセルの終端である場合には、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字は存在しない（ステップＳ１２６２ｄ−ｑ）。次にパース部１２６２−ｑは、メインメモリ１２３から文字列Ｓ_ｉを読み出し、スレッドｉ−１で特定されたセルに含まれない文字と文字列Ｓ_ｉとを結合した文字列をパースし、この文字列に含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。文字列Ｓ_ｉ−１の終端がセルの終端である場合には、パース部１２６２−ｑは文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１１および図１２に例示した文字列Ｓ_０およびＳ_１の場合、パース部１２６２−ｑは、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字「“4selddks“」と文字列Ｓ_ｉ−１とを結合した文字列
“4selddks““k304kdkk400-03d”
“佐藤”,“次郎”,“1989/8/21”,“123-0434”,“dkesopd445e”,“4ssjdejdoseae3230dds”
“佐藤”,“花子”,“1995/2/3”,“145-0234”,“skdeofl4s3d3”,“skek94kdskd4dc”
“田中”,“太郎”,“1992/3/23”,“134-0134”,“dj394949495kf”,“47s52\n5412485d”
“田中”,“次郎”,“1979/4/21”,“11
をパースし、この文字列に含まれる各セル「4selddks““k304kdkk400-03d」「佐藤」「次郎」「1989/8/21」「123-0434」「dkesopd445e」「4ssjdejdoseae3230dds」「佐藤」「花子」「1995/2/3」「145-0234」「skdeofl4s3d3」「skek94kdskd4dc」「田中」「太郎」「1992/3/23」「134-0134」「dj394949495kf」「47s52\n5412485d」「田中」「次郎」「1979/4/21」の位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。最後の“11の終端はセルの終端ではないため、スレッド１では“11の参照情報は計算されない。パース部１２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。その後、処理がステップＳ１２６８−ｑに進む（図７のステップＳ１２６２ｅ−ｑ、図８のＰ_ｉ）。On the other hand, when the buffer boundary lock of the thread i is released, the parsing unit 1262-q determines whether or not i ≧ 1 (step S1262b−q). If not i ≧ 1 (i.e., the case of i = 0), parser 1262-q parses the string _{S i} read from the main memory 123, the position and length of each cell included in the string _{S i} The reference information representing the above is calculated and stored in the main memory 123. _{For example, in the case of the character string S 0} illustrated in FIG. 11, the parsing unit 1262-q parses the character string S ₀ and cells "Ishida", "Taro", "1990/2/8", "100-0002", and "sjeifdfgjrrf". "45dkfjkejdf5""Ishida""Jiro""1985/5/2""111-0112""25df4d4ed""1s4dlccclseed""Ishida""Hanako" 2001/4/8 "" 111-2222 "" 5d4e4d4ffg "" skekdjjfaae "Sato""Taro""1992/7/11""111-0345""dlekd4f3e" are identified and their reference information is calculated. Since the end of the last "4selddks" is not the end of the cell, thread 0 does not calculate the reference information for "4selddks". The parsing unit 1262-q reads r from the main memory 123 when the area for storing the reference information in the main memory 123 becomes insufficient, and sets the buffer area for storing the reference information for the r records in the main memory 123. Secure all together. Thereafter, the processing proceeds to step S1268-q (Step S1262c-q in FIG. 7, _P i in FIG. 8). On the other hand, when i ≧ 1, the parsing unit 1262-q outputs the parsing result (reference information of each specified cell and information for identifying a character not included in the cell) in thread i-1 from the main memory 123. Read and identify the characters that are not included in the cell specified by thread i-1 in the character string S _i-1. When the end of the character string S _{i-1 is} the end of the cell, there is no character in the character string S _i-1 that is not included in the cell specified by the thread i-1 (step S1262d-q). Then parser 1262-q reads the string S _i from the main memory 123, to parse the string which is the concatenation of the characters and not in the specified cell and a string S _i thread i-1, this Reference information representing the position and length of each cell included in the character string is calculated and stored in the main memory 123. Reference information when the end of the string S _i-1 is the end of the cell, parsing unit 1262-q is to parse the string S _i, representing the position and length of each cell in the string S _i Is calculated and stored in the main memory 123. _{For example, in the case of the character strings S 0} and S ₁ illustrated in FIGS. 11 and 12, the perspective portion 1262-q is a character " _{1 in the character string S i-1} that is not included in the cell specified by the thread i-1." The character string "4selddks""k304kdkk400-03d" which is a combination of "4selddks""and the character string Si _-1.
“Sato”, “Jiro”, “1989/8/21”, “123-0434”, “dkesopd445e”, “4ssjdejdoseae3230dds”
“Sato”, “Hanako”, “1995/2/3”, “145-0234”, “skdeofl4s3d3”, “skek94kdskd4dc”
"Tanaka", "Taro", "1992/3/23", "134-0134", "dj394949495kf", "47s52 \ n5412485d"
"Tanaka", "Jiro", "1979/4/21", "11"
And each cell included in this string "4selddks""k304kdkk400-03d""Sato""Jiro""1989/8/21""123-0434""dkesopd445e""4ssjdejdoseae3230dds""Sato""Hanako""1995/2/3""145-0234""skdeofl4s3d3""skek94kdskd4dc""Tanaka""Taro""1992/3/23""134-0134""dj394949495kf""47s52 \ n5412485d""Tanaka""Jiro" The reference information indicating the position and length of "1979/4/21" is calculated and stored in the main memory 123. Since the end of the last "11 is not the end of the cell, thread 1 does not calculate the reference information of" 11. The parsing unit 1262-q reads r from the main memory 123 when the area for storing the reference information in the main memory 123 becomes insufficient, and sets the buffer area for storing the reference information for the r records in the main memory 123. Secure all together. Thereafter, the processing proceeds to step S1268-q (Step S1262e-q in FIG. 7, _P i in FIG. 8).

ステップＳ１２６８−ｑでは、バッファ境界ロック解除部１２６８−ｑがスレッドｉ＋１のバッファ境界ロックを解除する。ただし、ｉ＋１＞Ｔに対応するスレッドは存在せず、存在しないスレッドのバッファ境界ロックは解除されない（図７のステップＳ１２６８−ｑ、図８のＵＢ_ｉ＋１）。In step S1268-q, the buffer boundary unlocking unit 1268-q releases the buffer boundary lock of thread i + 1. However, the thread corresponding to i + 1> T does not exist, and the buffer boundary lock of the nonexistent thread is not released (step S1268-q in FIG. 7, UB _{i + 1 in} FIG. 8).

その後、エンコード部１２６５−ｑは、参照情報によって特定される情報に基づいて、テキストデータから結合文字列ＣＳ_ｉに含まれる処理対象のｒレコード分の文字列である処理単位文字列ＰＳ_ｉ，ｊ（ただし、ｊ＝０，…，Ｊ−１）を選択し、処理単位文字列ＰＳ_ｉ，ｊおよび処理単位文字列ＰＳ_ｉ，ｊに対応するｒレコード分の参照情報をキャッシュメモリ１２６０−ｑに格納する。ｒ＝２とした図１１および図１２の例では、結合文字列ＣＳ_０＝Ｓ_０から処理単位文字列ＰＳ_０，０が選択され、結合文字列ＣＳ_１＝Ｓ_０＋Ｓ_１から処理単位文字列ＰＳ_１，０およびＰＳ_１，１が選択される（ステップＳ１２６３−ｑ）。エンコード部１２６５−ｑは、キャッシュメモリ１２６０−ｑの処理単位文字列ＰＳ_ｉ，ｊおよび参照情報を用い、処理単位文字列ＰＳ_ｉ，ｊを所定の有限集合の元であるエンコード情報Ｅ_ｉ，ｊにエンコードし、エンコード情報Ｅ_ｉ，ｊをキャッシュメモリ１２６０−ｑに格納する（図７のステップＳ１２６５−ｑ、図８のＥ_ｉ）。 _{After that, the encoding unit 1265-q is a processing unit character string PS i, j} which is a character string for r records to be processed included in the _{combined character string CS i} from the text data based on the information specified by the reference information. (However, j = 0, ..., J-1) is selected _{, and the reference information for the r records corresponding to the processing unit character strings PS i, j} and the processing unit character strings PS _{i, j} is stored in the cache memory 1260-q. Store. In the examples of FIGS. 11 and 12 in which r = 2, the _{processing unit character string PS 0, 0} _{is selected from the combined character string CS 0} = S _0, and the processing unit character string is selected from the combined character string CS ₁ = S ₀ + S _1. PS _1,0 and PS _1,1 are selected (step S1263-q). The encoding unit 1265-q uses the processing unit character strings PS _{i, j} and the reference information of the cache memory 1260-q, and uses the processing unit character strings PS _{i, j} as the source of the predetermined finite set of encoding information E _{i, j.} And the encoding information E _{i and j} are stored in the cache memory 1260-q (step S1265-q in FIG. 7, E _i in FIG. 8).

演算部１２６６−ｑは、キャッシュメモリ１２６０−ｑから読み出したエンコード情報Ｅ_ｉ，ｊの秘密分散を行って秘密分散値（演算値）ＳＳ_ｉ，ｊを得てメインメモリ１２３に格納する。この際、処理単位文字列ＰＳ_ｉ，ｊに対応するｒレコード分の参照情報をメインメモリ１２３に格納しておく必要はないため、秘密分散値ＳＳ_ｉ，０，…，ＳＳ_{ｉ，Ｊ−１}がこのｒレコード分の参照情報が格納されていた領域に上書きされてもよい（図７のステップＳ１２６６−ｑ、図８のＳＳ_ｉ）。 _{The calculation unit 1266-q performs secret sharing of the encoding information E i and j} read from the cache memory 1260-q, obtains the secret sharing value (calculation value) SS _{i and j} , and stores them in the main memory 123. At this time, since it is not necessary to store the reference information for the r records corresponding to the processing unit character strings PS _{i, j} _{in the main memory 123, the secret sharing values SS i, 0} , ..., SS _{i, J-1} May be overwritten in the area where the reference information for this r record is stored (step S1266-q in FIG. 7, SS _i in FIG. 8).

その後、並列性ロック解除部１２６９−ｑが、メインメモリ１２３からｎ_ｐを読み込み、スレッドｉ＋ｎ_ｐの並列性ロックを解除する。ただし、ｉ＋ｎ_ｐ＞Ｔに対応するスレッドは存在せず、存在しないスレッドの並列性ロックは解除されない。その後、制御部１２５はスレッドｉへの処理部１２６−ｑの割り当てを解除する。これにより、処理部１２６−ｑを他のスレッドに割り当てることが可能になる（図７のステップＳ１２６９−ｑ、図８のＵＰ_ｉ＋ｎｐ）。After that, the parallelism unlocking unit 1269-q _{reads np} from the main memory 123 and releases the parallelism lock of the thread i + _np. However, the thread corresponding to i + n _p > T does not exist, and the parallelism lock of the nonexistent thread is not released. After that, the control unit 125 releases the assignment of the processing unit 126-q to the thread i. As a result, the processing unit 126-q can be assigned to another thread (step S1269-q in FIG. 7, UP _{i + np in} FIG. 8).

［第２実施形態］
第２実施形態では、パラメータ設定装置が１つの単位処理で扱われるレコード数ｒ、並列数ｎ_ｐ、ファイルバッファサイズｆを設定し、演算装置（高速版）がエスケープ文字の使用が禁止されたＣＳＶ（Comma-Separated Values）ファイル（テキストファイル）の秘密分散（演算）を行う例を説明する。[Second Embodiment]
In the second embodiment, the parameter setting device sets the number of records r, the number of parallels _np , and the file buffer size f handled in one unit processing, and the arithmetic device (high-speed version) prohibits the use of escape characters. An example of performing secret sharing (calculation) of a (Comma-Separated Values) file (text file) will be described.

＜構成＞
図１に例示するように、本実施形態の演算システム２は、パラメータ設定装置１１、演算装置２２、およびＮ個のサーバ装置１３−１〜１３−Ｎを有する。パラメータ設定装置１１から演算装置２２への情報の伝達が可能であり、演算装置２２からサーバ装置１３−１〜１３−Ｎへの情報の伝達が可能である。<Structure>
As illustrated in FIG. 1, the arithmetic system 2 of the present embodiment includes a parameter setting device 11, an arithmetic device 22, and N server devices 13-1 to 13-N. Information can be transmitted from the parameter setting device 11 to the arithmetic unit 22, and information can be transmitted from the arithmetic unit 22 to the server devices 13-1 to 13-N.

図３に例示するように、演算装置１２は、入力部１２１ａ、出力部１２１ｂ、補助記憶部１２２、メインメモリ１２３、制御部１２５、および処理部２２６−１〜２２６−Ｑを有する。ただし、Ｑは２以上の整数である。演算装置１２は、制御部１２５の制御の下で各処理を実行する。 As illustrated in FIG. 3, the arithmetic unit 12 includes an input unit 121a, an output unit 121b, an auxiliary storage unit 122, a main memory 123, a control unit 125, and a processing unit 226-1 to 226-Q. However, Q is an integer of 2 or more. The arithmetic unit 12 executes each process under the control of the control unit 125.

＜パラメータ設定処理＞
第１実施形態と同一である。<Parameter setting process>
It is the same as the first embodiment.

＜演算処理＞
図６および図１３から図１８を用い、演算装置２２の演算処理を説明する。
図６に例示するように、パラメータ設定装置１１から出力されたｒ，ｎ_ｐ，ｆとが演算装置２２（図３）の入力部１２１ａに入力され、補助記憶部１２２に格納される（ステップＳ１１１ａａ）。また演算処理対象のテキストデータが入力部１２１ａに入力され、補助記憶部１２２に格納される。図１５から図１８にテキストデータを例示する。図１５に例示するテキストデータは、各セルがダブルクォーテーションで囲まれていないＣＳＶファイルである。本実施形態のテキストデータでのエスケープ文字の使用は許可されておらず、各セルは単独で秘密分散（演算）の対象を表すか否かを判別可能な文字のみを含む。図１６に例示するように、このテキストファイルはＷ個のレコードｒｅｃ（１），…，ｒｅｃ（Ｗ）を含み、レコードｒｅｃ（ｗ）（ただし、ｗ＝１，…，Ｗ）のそれぞれは任意長のＧ個のセルｃｅｌｌ（ｗ，ｇ）を含み（ただし、ｇ＝１，…，Ｇ）、セルｃｅｌｌ（ｗ，ｇ）のそれぞれは任意個の文字を含む。セルｃｅｌｌ（ｗ，ｇ）はレコードｒｅｃ（ｗ）の最初からｇ番目の属性ａｔｔ（ｇ）に対応する（ステップＳ２１１ａｂ）。<Calculation processing>
The arithmetic processing of the arithmetic unit 22 will be described with reference to FIGS. 6 and 13 to 18.
As illustrated in FIG. 6, r, n _p , and f output from the parameter setting device 11 are input to the input unit 121a of the arithmetic unit 22 (FIG. 3) and stored in the auxiliary storage unit 122 (step S111aa). ). Further, the text data to be processed is input to the input unit 121a and stored in the auxiliary storage unit 122. Text data is illustrated in FIGS. 15 to 18. The text data illustrated in FIG. 15 is a CSV file in which each cell is not enclosed in double quotation marks. The use of escape characters in the text data of the present embodiment is not permitted, and each cell contains only characters that can independently determine whether or not it represents the target of secret sharing (calculation). As illustrated in FIG. 16, this text file contains W records rec (1), ..., Rec (W), and each of the records rec (w) (where w = 1, ..., W) is arbitrary. It contains G long cell cells (w, g) (where g = 1, ..., G), and each cell cell (w, g) contains any number of characters. The cell cell (w, g) corresponds to the g-th attribute att (g) from the beginning of the record rec (w) (step S211ab).

その後、補助記憶部１２２からｒ，ｎ_ｐ，ｆがメインメモリ１２３に読み込まれ、スレッドｉ＝０，…，Ｔ−１の演算処理が実行される。演算処理はｉ＝０のスレッドから開始される。なお、初期状態でスレッド０のファイル読み込みロックおよびスレッド０，…，ｎ_ｐ−１の並列性ロックが解除されている。制御部１２５は処理部２２６−１〜２２６−Ｑのうち使用されていない処理部２２６−ｑをスレッドｉに割り当て、可能な限り複数のスレッドが並列に各スレッドｉの処理を実行する（ステップＳ２２６）。これによって得られた各秘密分散値は出力部１２１ｂから出力され、各サーバ装置１３−１〜１３−Ｎにそれぞれ送られ、各サーバ装置１３−１〜１３−Ｎに格納される（ステップＳ１１１ｂ）。以下にスレッドｉの処理の詳細を説明する。After that, r, n _p , and f are read from the auxiliary storage unit 122 into the main memory 123, and the arithmetic processing of threads i = 0, ..., T-1 is executed. The arithmetic processing is started from the thread of i = 0. In the initial state, the file read lock of thread 0 and _{the parallelism lock of threads 0, ..., N p} -1 are released. The control unit 125 allocates the unused processing unit 226-q of the processing units 226-1 to 226-Q to the thread i, and a plurality of threads execute the processing of each thread i in parallel as much as possible (step S226). ). Each secret sharing value obtained in this way is output from the output unit 121b, sent to each server device 13-1 to 13-N, and stored in each server device 13-1 to 13-N (step S111b). .. The details of the processing of thread i will be described below.

≪スレッドｉの処理≫
図１３および図１４に例示するように、スレッドｉの処理を行う処理部２２６−ｑの読み込み部１２６１−ｑは、スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されたかを判定する。スレッド０のファイル読み込みロックおよび並列性ロックは初期状態で解除されている（ステップＳ１２６１ａ−ｑ）。スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されていない場合にはステップＳ１２６１ａ−ｑの判定が繰り返される。≪Processing of thread i≫
As illustrated in FIGS. 13 and 14, the reading unit 1261-q of the processing unit 226-q that performs the processing of thread i determines whether both the file reading lock and the concurrency lock of thread i have been released. The file read lock and parallelism lock of thread 0 are released in the initial state (step S1261a-q). If both the file read lock and the parallelism lock of thread i are not released, the determination in step S1261a-q is repeated.

一方、スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されている場合、読み込み部１２６１−ｑは、メインメモリ１２３からファイルバッファサイズｆを読み込み、メインメモリ１２３にファイルバッファサイズｆの領域を確保する。さらに、読み込み部１２６１−ｑは、補助記憶部１２２に格納されたテキストファイルの文字列からファイルバッファサイズｆの領域に格納可能な文字列Ｓ_ｉを読み込む。図１７の例では、文字列Ｓ_０として以下が読み込まれる。
石田,太郎,1990/2/8,100-0002,東京都渋谷区〇〇〇,03-3234-5678
石田,次郎,2000/4/2,274-16,神奈川県藤沢市江の島〇〇〇,03-9999-9999
石田,花子,1985/6/2,352-725,東京都港区区〇〇〇,03-1111-9999
佐藤,太郎,2001/5/1,100-0002,東京都千代田区〇〇〇,03-3234-5678
佐藤,次
図１８の例では、文字列Ｓ_１として以下が読み込まれる。
郎,2001/6/2,274-16,神奈川県藤沢市江の島〇〇〇,03-9999-9999
佐藤,花子,2002/7/2,352-725,東京都新宿区新宿〇〇〇,03-1111-9999
田中,太郎,2001/1/1,100-0002,東京都千代田区〇〇〇,03-1234-5678
田中,次郎,2001/1/2,251-0036,神奈川県藤沢市江の島〇〇〇
読み込み部１２６１−ｑは、メインメモリ１２３に確保したファイルバッファサイズｆの領域に文字列Ｓ_ｉを格納する（図１３のステップＳ１２６１ｂ−ｑ、図１４のＲ_ｉ）。On the other hand, when both the file read lock and the concurrency lock of thread i are released, the read unit 1261-q reads the file buffer size f from the main memory 123 and sets the area of the file buffer size f in the main memory 123. Secure. Furthermore, the reading unit 1261-q reads the string S _i can be stored in an area of the file buffer size f from a string of text files stored in the auxiliary storage unit 122. In the example of FIG. 17, the following is read as _{the character string S 0.}
Ishida, Taro, 1990/2 / 8,100-0002, Shibuya-ku, Tokyo 〇〇〇, 03-3234-5678
Ishida, Jiro, 2000/4 / 2,274-16, Enoshima, Fujisawa City, Kanagawa Prefecture 〇〇〇, 03-9999-9999
Ishida, Hanako, 1985/6 / 2,352-725, Minato-ku, Tokyo 〇〇〇, 03-1111-9999
Sato, Taro, 2001/5 / 1,100-0002, Chiyoda-ku, Tokyo 〇〇〇, 03-3234-5678
Sato, in the following example 18, the following are read as a string S _1.
Ro, 2001/6 / 2,274-16, Enoshima, Fujisawa City, Kanagawa Prefecture 〇〇〇, 03-9999-9999
Sato, Hanako, 2002/7 / 2,352-725, Shinjuku, Shinjuku-ku, Tokyo 〇〇〇, 03-1111-9999
Tanaka, Taro, 2001/1 / 1,100-0002, Chiyoda-ku, Tokyo 〇〇〇, 03-1234-5678
Tanaka, Jiro, 2001/1 / 2,251-0036, Fujisawa, Kanagawa Prefecture Enoshima thousand reading unit 1261-q stores the string S _i in the area of the file buffer size f secured in the main memory 123 (FIG. 13 step S1261b-q, _R i in FIG. 14) of the.

文字列Ｓ_ｉがメインメモリ１２３に格納された後、ファイル読み込みロック解除部１２６７−ｑがスレッドｉ＋１のファイル読み込みロックを解除する（図１３のステップＳ１２６７−ｑ、図１４のＵＲ_ｉ＋１）。After the character string S _i is stored in the main memory 123, the file read unlocking unit 1267-q releases the file read lock of the thread i + 1 (step S1267-q in FIG. 13, UR _{i + 1 in} FIG. 14).

パース部２２６２−ｑはメインメモリ１２３から読み出した文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１７に例示した文字列Ｓ_０の場合、パース部２２６２−ｑは、文字列Ｓ_０をパースしてセル「石田」「太郎」「1990/2/8」「100-0002」「東京都渋谷区〇〇〇」「03-3234-5678」「石田」「次郎」「2000/4/2」「274-16」「神奈川県藤沢市江の島〇〇〇」「03-9999-9999」「石田」「花子」「1985/6/2」「352-725」「東京都港区区〇〇〇」「03-1111-9999」「佐藤」「太郎」「2001/5/1」「100-0002」「東京都千代田区〇〇〇」「03-3234-5678」「佐藤」を特定し、それらの参照情報を計算する。最後の「次」の終端はセルの終端ではないため、スレッド０では「次」の参照情報は計算されない。例えば、図１７に例示した文字列Ｓ_０の場合、パース部２２６２−ｑは、文字列Ｓ_１をパースしてセル「2001/6/2」「274-16」「神奈川県藤沢市江の島〇〇〇」「03-9999-9999」「佐藤」「花子」「2002/7/2」「352-725」「東京都新宿区新宿〇〇〇」「03-1111-9999」「田中」「太郎」「2001/1/1」「100-0002」「東京都千代田区〇〇〇」「03-1234-5678」「田中」「次郎」「2001/1/2」「251-0036」を特定し、それらの参照情報を計算する。最初の「郎」の始端はセルの始端ではなく、最後の「神奈川県藤沢市江の島〇〇〇」の終端はセルの終端ではないため、スレッド１では「郎」および「神奈川県藤沢市江の島〇〇〇」の参照情報は計算されない。なお、パース部２２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。パース部２２６２−ｑはスレッドｉのバッファ境界ロックが解除される前にこの処理を開始できる。すなわち、パース部２２６２−ｑは、ｉ≧１において、文字列Ｓ_ｉ−１に含まれる各セルの参照情報の計算が終わる前に、文字列Ｓ_ｉに含まれる各セルの参照情報の計算を開始できる（図１３のステップＳ２２６２−ｑ、図１４のＰ_ｉ）。Parser 2262-q parses the string S _i read from the main memory 123, and stores the calculated reference information representative of the location and length of each cell in a string S _i in a main memory 123. _{For example, in the case of the character string S 0} illustrated in FIG. 17, the perspective unit 2262-q parses the character string S ₀ and cells "Ishida", "Taro", "1990/2/8", "100-0002", and "Tokyo". Shibuya-ku, Tokyo 〇〇〇 ”“ 03-3234-5678 ”“ Ishida ”“ Jiro ”“ 2000/4/2 ”“ 274-16 ”“ Enoshima 〇〇〇, Fujisawa City, Kanagawa Prefecture ”“ 03-9999-9999 ”“ Ishida, Hanako, June 2, 1985, 352-725, Minato-ku, Tokyo, 〇〇〇, 03-1111-9999, Sato, Taro, May 1, 2001, 100-0002 "○○○, Chiyoda-ku, Tokyo""03-3234-5678""Sato" is specified, and the reference information for them is calculated. Since the last "next" end is not the cell end, thread 0 does not calculate the "next" reference information. _{For example, in the case of the character string S 0} illustrated in FIG. 17, the perspective unit 2262-q parses the character string S ₁ and cells “2001/6/2” “274-16” “Enoshima, Fujisawa-shi, Kanagawa 〇〇〇 ”“ 03-9999-9999 ”“ Sato ”“ Hanako ”“ 2002/7/2 ”“ 352-725 ”“ Shinjuku, Shinjuku-ku, Tokyo 〇〇〇 ”“ 03-1111-9999 ”“ Tanaka ”“ Taro ” Identify "2001/1/1""100-0002""Chiyoda-ku, Tokyo 〇〇〇""03-1234-5678""Tanaka""Jiro""2001/1/2""251-0036" Calculate those reference information. Since the beginning of the first "ro" is not the beginning of the cell, and the end of the last "Enoshima, Fujisawa-shi, Kanagawa" is not the end of the cell, in thread 1, "ro" and "Enoshima, Fujisawa-shi, Kanagawa" The reference information of "○○" is not calculated. When the area for storing the reference information in the main memory 123 becomes insufficient, the perspective unit 2262-q reads r from the main memory 123 and uses the buffer area for storing the reference information for the r record as the main memory. Secure all at 123. The parsing unit 2262-q can start this process before the buffer boundary lock of thread i is released. That is, the perspective unit 2262-q calculates the reference information of each cell included in the character string S _i _{before the calculation of the reference information of each cell included in the character string S i-1 is completed in i ≧ 1.} It can be started _(P i in step S2262-q, 14 in FIG. 13).

その後、セル特定部２２６４−ｑが、スレッドｉのバッファ境界ロックが解除されたか否かを判定する。スレッド０のバッファ境界ロックは初期状態で解除されている（ステップＳ２２６４ａ−ｑ）。スレッドｉのバッファ境界ロックが解除されていない場合にはステップＳ２２６４−ｑの判定が繰り返される。 After that, the cell identification unit 2264-q determines whether or not the buffer boundary lock of the thread i has been released. The buffer boundary lock of thread 0 is released in the initial state (step S2264a-q). If the buffer boundary lock of thread i is not released, the determination in step S2264-q is repeated.

一方、スレッドｉのバッファ境界ロックが解除されており、かつ、ｉ≧１である場合、セル特定部２２６４−ｑは、参照情報と文字列Ｓ_ｉ‐１と文字列Ｓ_ｉとを用い、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルの位置に対応する情報Ａ_ｉを得てメインメモリに格納する。一方、スレッドｉのバッファ境界ロックが解除されており、かつ、ｉ＝０の場合、セル特定部２２６４−ｑは何もしない（ステップＳ２２６４ｂ−ｑ）。On the other hand, when the buffer boundary lock of the thread i is released and i ≧ 1, the cell identification unit 2264-q uses the reference information, the character string S _i-1, and the character string S _i , and the character is displayed. obtains information a _i corresponding to the position of the cell immediately following the last cell of the column contains S _i-1 stored in the main memory. On the other hand, when the buffer boundary lock of thread i is released and i = 0, the cell identification unit 2264-q does nothing (step S2264b-q).

その後、バッファ境界ロック解除部１２６８−ｑがスレッドｉ＋１のバッファ境界ロックを解除する。ただし、ｉ＋１＞Ｔに対応するスレッドは存在せず、存在しないスレッドのバッファ境界ロックは解除されない（図１３のステップＳ１２６８−ｑ、図１４のＵＢ_ｉ＋１）。After that, the buffer boundary unlocking unit 1268-q releases the buffer boundary lock of thread i + 1. However, the thread corresponding to i + 1> T does not exist, and the buffer boundary lock of the nonexistent thread is not released (step S1268-q in FIG. 13, UB _{i + 1 in} FIG. 14).

その後、処理部１２６−ｑに代えて処理部２２６−ｑのエンコード部１２６５−ｑおよび演算部１２６６−ｑが、第１実施形態で説明したステップＳ１２６５−ｑ，Ｓ１２６６−ｑ，Ｓ１２６９−ｑの処理を実行する（図１３のステップＳ１２６５−ｑ，Ｓ１２６６−ｑ，Ｓ１２６９−ｑ、図１４のＥ_ｉ，ＳＳ_ｉ，ＵＰ_ｉ＋ｎｐ）。After that, instead of the processing unit 126-q, the encoding unit 1265-q and the calculation unit 1266-q of the processing unit 226-q perform the processing of steps S1265-q, S1266-q, and S1269-q described in the first embodiment. (Steps S1265-q, S1266-q, S1269-q in FIG. 13, E _i , SS _i , UP _{i + np} in FIG. 14).

［その他の変形例等］ [Other variants]

なお、本発明は上述の実施形態に限定されるものではない。例えば、第１実施形態および第２実施形態では、パラメータ設定装置が１つの単位処理で扱われるレコード数ｒ、並列数ｎ_ｐ、ファイルバッファサイズｆを設定したが、パラメータ設定装置がファイルバッファサイズｆを設定しない実施形態であってもよい。また、第１実施形態および第２実施形態では、テキストファイルとしてＣＳＶ（Comma-Separated Values）ファイルを例示したが、前述したその他のテキストファイルに対する処理が行われてもよい。さらに、第１実施形態および第２実施形態では、「演算」として秘密分散を行う例を説明したが、「演算」としてその他の演算が行われてもよい。The present invention is not limited to the above-described embodiment. For example, in the first embodiment and the second embodiment, the parameter setting device sets the number of records r, the number of parallels n _p , and the file buffer size f handled in one unit processing, but the parameter setting device sets the file buffer size f. It may be an embodiment in which is not set. Further, in the first embodiment and the second embodiment, the CSV (Comma-Separated Values) file is exemplified as the text file, but the other text files described above may be processed. Further, in the first embodiment and the second embodiment, an example in which secret sharing is performed as an "operation" has been described, but other operations may be performed as an "operation".

上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 The various processes described above are not only executed in chronological order according to the description, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. In addition, it goes without saying that changes can be made as appropriate without departing from the spirit of the present invention.

上記の各装置は、例えば、ＣＰＵ（central processing unit）等のプロセッサ（ハードウェア・プロセッサ）およびＲＡＭ（random-access memory）・ＲＯＭ（read-only memory）等のメモリ等を備える汎用または専用のコンピュータが所定のプログラムを実行することで構成される。このコンピュータは１個のプロセッサやメモリを備えていてもよいし、複数個のプロセッサやメモリを備えていてもよい。このプログラムはコンピュータにインストールされてもよいし、予めＲＯＭ等に記録されていてもよい。また、ＣＰＵのようにプログラムが読み込まれることで機能構成を実現する電子回路（circuitry）ではなく、プログラムを用いることなく処理機能を実現する電子回路を用いて一部またはすべての処理部が構成されてもよい。１個の装置を構成する電子回路が複数のＣＰＵを含んでいてもよい。 Each of the above devices is, for example, a general-purpose or dedicated computer including a processor (hardware processor) such as a CPU (central processing unit) and a memory such as a RAM (random-access memory) and a ROM (read-only memory). Is composed of executing a predetermined program. This computer may have one processor and memory, or may have a plurality of processors and memory. This program may be installed in a computer or may be recorded in a ROM or the like in advance. Further, a part or all of the processing units are configured by using an electronic circuit that realizes a processing function without using a program, instead of an electronic circuit (circuitry) that realizes a function configuration by reading a program like a CPU. You may. The electronic circuits constituting one device may include a plurality of CPUs.

上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体の例は、非一時的な（non-transitory）記録媒体である。このような記録媒体の例は、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等である。 When the above configuration is realized by a computer, the processing contents of the functions that each device should have are described by a program. By executing this program on a computer, the above processing function is realized on the computer. The program describing the processing content can be recorded on a computer-readable recording medium. An example of a computer-readable recording medium is a non-transitory recording medium. Examples of such recording media are magnetic recording devices, optical disks, opto-magnetic recording media, semiconductor memories, and the like.

このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The distribution of this program is performed, for example, by selling, transferring, renting, or the like a portable recording medium such as a DVD or CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。処理の実行時、このコンピュータは、自己の記憶装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。 A computer that executes such a program first, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads the program stored in its own storage device and executes the process according to the read program. Another form of execution of this program may be for the computer to read the program directly from a portable recording medium and perform processing according to the program, and each time the program is transferred from the server computer to this computer. , Sequentially, the processing according to the received program may be executed. Even if the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. Good.

コンピュータ上で所定のプログラムを実行させて本装置の処理機能が実現されるのではなく、これらの処理機能の少なくとも一部がハードウェアで実現されてもよい。 Instead of executing a predetermined program on a computer to realize the processing functions of the present device, at least a part of these processing functions may be realized by hardware.

１，２演算システム
１１パラメータ設定装置
１２，２２演算装置
１３−１〜１３−Ｎサーバ装置1,2 Arithmetic system 11 Parameter setting device 12,22 Arithmetic device 13-1 to 13-N Server device

Y. Shafranovich, “RFC4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files,” [online], October, 2005, SolidMatrix Technologies, Inc., [平成３０年１月６日検索]、インターネット＜http://www.ietf.org/rfc/rfc4180.txt＞Y. Shafranovich, “RFC4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files,” [online], October, 2005, SolidMatrix Technologies, Inc., [Search January 6, 2018], Internet < http://www.ietf.org/rfc/rfc4180.txt> 五十嵐大，千田浩司，濱田浩気，高橋克巳，“軽量検証可能３パーティ秘匿関数計算の効率化及びこれを用いたセキュアなデータベース処理 (Secure Database Operations Using An Improved 3-party Veriable Secure Function Evaluation)，”ＩｎＳＣＩＳ２０１１，２０１１．Dai Igarashi, Koji Chida, Hiroki Hamada, Katsumi Takahashi, "Secure Database Operations Using An Improved 3-party Veriable Secure Function Evaluation," InSCIS2011,201. A. Shamir, "How to Share a Secret", Communications of the ACM, November 1979, Volume 22, Number 11, pp.612-613.A. Shamir, "How to Share a Secret", Communications of the ACM, November 1979, Volume 22, Number 11, pp.612-613.

図１は実施形態の演算システムを例示したブロック図である。FIG. 1 is a block diagram illustrating an arithmetic system of an embodiment. 図２は実施形態のパラメータ設定装置の機能構成を例示したブロック図である。FIG. 2 is a block diagram illustrating the functional configuration of the parameter setting device of the embodiment. 図３は実施形態の演算装置の機能構成を例示したブロック図である。FIG. 3 is a block diagram illustrating the functional configuration of the arithmetic unit of the embodiment. 図４は実施形態の処理部の機能構成を例示したブロック図である。FIG. 4 is a block diagram illustrating the functional configuration of the processing unit of the embodiment. 図５は実施形態のパラメータ設定処理を例示するためのフロー図である。FIG. 5 is a flow chart for exemplifying the parameter setting process of the embodiment. 図６は実施形態の演算処理を例示するためのフロー図である。FIG. 6 is a flow chart for exemplifying the arithmetic processing of the embodiment. 図７は実施形態のスレッドｉの処理を例示するためのフロー図である。FIG. 7 is a flow chart for exemplifying the processing of the thread i of the embodiment. 図８は実施形態の各スレッドの処理を例示するための概念図である。FIG. 8 is a conceptual diagram for exemplifying the processing of each thread of the embodiment. 図９は実施形態のテキストファイルを例示するための概念図である。FIG. 9 is a conceptual diagram for exemplifying a text file of an embodiment. 図１０は実施形態のテキストファイルを例示するための概念図である。FIG. 10 is a conceptual diagram for exemplifying a text file of an embodiment. 図１１は実施形態のテキストファイルを例示するための概念図である。FIG. 11 is a conceptual diagram for exemplifying a text file of an embodiment. 図１２は実施形態のテキストファイルを例示するための概念図である。FIG. 12 is a conceptual diagram for exemplifying a text file of an embodiment. 図１３は実施形態のスレッドｉの処理を例示するためのフロー図である。FIG. 13 is a flow chart for exemplifying the processing of the thread i of the embodiment. 図１４は実施形態の各スレッドの処理を例示するための概念図である。FIG. 14 is a conceptual diagram for exemplifying the processing of each thread of the embodiment. 図１５は実施形態のテキストファイルを例示するための概念図である。FIG. 15 is a conceptual diagram for exemplifying a text file of an embodiment. 図１６は実施形態のテキストファイルを例示するための概念図である。FIG. 16 is a conceptual diagram for exemplifying a text file of an embodiment. 図１７は実施形態のテキストファイルを例示するための概念図である。FIG. 17 is a conceptual diagram for exemplifying a text file of an embodiment. 図１８は実施形態のテキストファイルを例示するための概念図である。FIG. 18 is a conceptual diagram for exemplifying a text file of an embodiment.

以下、本発明の実施形態を説明する。
［概要］
まず概要を説明する。
＜テキストファイル＞
各実施形態ではテキストファイルの文字列に対する演算処理を行う。このテキストファイルはＷ個のレコードを含み、レコードのそれぞれは任意長のＧ個のセルを含み、セルのそれぞれは任意個の文字を含む。ただし、各セルの長さには各セルの属性に応じた上限がある。ＷおよびＧが１以上の整数である。例えば、ＷおよびＧの少なくとも一方は２以上の整数である。Ｗが２以上の整数であってもよいし、Ｇが２以上の整数であってもよいし、ＷおよびＧの両方が２以上の整数であってもよい。Ｗが２以上の整数である場合、互いに隣接するレコードの間にはレコードの区切りを特定するための情報が存在する。例えば、互いに隣接するレコードの間に改行が存在し、複数のレコードは改行によって互いに区切られている。また、Ｇが２以上の整数である場合、互いに隣接するセルの間にはセル間の区切りを特定するための情報が存在する。例えば、互いに隣接するセルの間に区切り文字または改行が存在し、複数のセルは区切り文字または改行によって互いに区切られている。区切り文字の例はカンマ「，」である。その他の例として、互いに隣接するセルの間にタブまたは改行が存在してもよいし、互いに隣接するセルの間に半角スペースまたは改行が存在してもよい。Ｗが２以上の整数である場合、各レコードに含まれるセルの個数Ｇは互いに同一である。各レコードのＧ個のセルは属性情報（「スキーマ」とも呼ぶ）に対応している。属性情報は各セルがどのような属性の情報であるかを表しており、少なくとも各セルで表される文字列のサイズ（データ量）の最大値と最小値とを特定または推定するための情報を含んでいる。例えば、属性情報はセルがどのような有限集合の元を表しているのか示す情報を含んでいる。例えば、属性情報は「セルがｐを法とした剰余（ｍｏｄｐ）を表していること（ｐは正整数）」を表していてもよいし、「セルが所定個（例えば１０個）の所定の有限体（例えば拡大体ＧＦ（２^８））の要素で表現される文字列であること」を表していてもよいし、「セルが所定の整数型の整数（例えば、符号付き３２ビット整数）を表す文字列であること」を表していてもよい。Ｇ個の属性情報のそれぞれが各レコードのＧ個のセルのそれぞれに一対一で対応していてもよいし（すなわち、１個の属性情報が１個のセルの属性を表していてもよい）、１個の属性情報が各レコードの複数個（例えばＧ個）のセルに対応していてもよい（すなわち、１個の属性情報が複数個のセルの属性を表していてもよい）。前者の場合、１つのレコードに属する複数のセルの属性が互いに異なっていてもよいし、互いに同一であってもよい。また、Ｗが２以上の整数である場合、すべてのレコードのＧ個のセルに対応する「Ｇ個の属性の組」は互いに同一である。すなわち、すべてのレコードが有するｇ番目（ただし、ｇ＝１，…，Ｇ）のセルの属性ａｔｔ（ｇ）は互いに同一である。その他、属性情報がセルが表す情報の種別を表現していてもよい。また属性情報はテキストファイルに含まれていてもよいし（例えば、テキストファイルのヘッダが属性情報あってもよい）、含まれていなくてもよい。テキストファイルの例は、ＣＳＶ（Comma-Separated Values）ファイル、ＴＳＶ（tab-separated values）ファイル、ＳＳＶ（space-separated values）ファイルなどである。これらはＣＳＶ（character-separated values）ファイルやＤＳＶ（delimiter-separated values）ファイルとして総称される。 Hereinafter, embodiments of the present invention will be described.
[Overview]
First, an outline will be described.
<Text file>
In each embodiment, arithmetic processing is performed on the character string of the text file. This text file contains W records, each of which contains G cells of arbitrary length, and each of the cells contains any number of characters. However, the length of each cell has an upper limit according to the attribute of each cell. W and G are integers greater than or equal to 1. For example, at least one of W and G is an integer greater than or equal to 2. W may be an integer of 2 or more, G may be an integer of 2 or more, and both W and G may be an integer of 2 or more. When W is an integer of 2 or more, there is information for specifying the record delimiter between the records adjacent to each other. For example, there is a line break between records adjacent to each other, and a plurality of records are separated from each other by a line break. Further, when G is an integer of 2 or more, there is information for specifying the delimiter between cells between cells adjacent to each other. For example, there is a delimiter or newline between cells adjacent to each other, and multiple cells are separated from each other by the delimiter or newline. An example of a delimiter is the comma ",". As another example, there may be tabs or line breaks between cells adjacent to each other, and half-width spaces or line breaks may exist between cells adjacent to each other. When W is an integer of 2 or more, the number G of cells included in each record is the same as each other. The G cells of each record correspond to the attribute information (also called "schema"). The attribute information represents what kind of attribute information each cell has, and is information for specifying or estimating at least the maximum value and the minimum value of the size (data amount) of the character string represented by each cell. Includes. For example, attribute information contains information indicating what kind of finite set of elements a cell represents. For example, the attribute information may represent "a cell represents a remainder (mod p) modulo p (p is a positive integer)" or "a predetermined number of cells (for example, 10)". it may represent "things is an element string represented by the finite field (for example, extension field GF (2 ^8)), the integer" cell is a predetermined integer (e.g., signed 32-bit integer ) Is a character string. ” Each of the G attribute information may have a one-to-one correspondence with each of the G cells of each record (that is, one attribute information may represent the attribute of one cell). One attribute information may correspond to a plurality of cells (for example, G cells) of each record (that is, one attribute information may represent an attribute of a plurality of cells). In the former case, the attributes of a plurality of cells belonging to one record may be different from each other or may be the same as each other. When W is an integer of 2 or more, the "set of G attributes" corresponding to G cells of all records are the same as each other. That is, the attribute att (g) of the gth cell (where g = 1, ..., G) possessed by all the records is the same as each other. In addition, the attribute information may represent the type of information represented by the cell. Further, the attribute information may or may not be included in the text file (for example, the header of the text file may include the attribute information). Examples of text files are CSV (Comma-Separated Values) files, TSV (tab-separated values) files, SSV (space-separated values) files , and the like. These are collectively referred to as CSV (character-separated values) files and DSV (delimiter-separated values) files.

上述した「エンコード」および「演算」はテキストファイルのｒレコード分の文字列である処理単位文字列ごとに実行される。処理単位文字列ごとに実行される処理を「単位処理」と呼ぶことにする。処理単位算出部は１回の単位処理で処理されるレコード数ｒを表す「Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値」を得て（レコード数ｒとしてＣ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）の関数値を得て）出力する。キャッシュメモリサイズＣは、予め定められたものであってもよいし、入力されたものであってもよい。最大値Ｓ_ｃｓｖは最大サイズ設定部で得られたものであり、最大値Ｓ_ｅｎｃはエンコードサイズ設定部で得られたものであり、合計サイズＳ_ｒｅｆは参照サイズ設定部で得られたものである。「αの関数値」はαそのものであってもよいし、αに対応するその他の値であってもよい。「αの関数値」の例は、α以上の最小の整数、α以下の最大の整数、αに最も近い整数などである。例えば、r=C/(S_csv+S_enc+S_ref)であってもよいし、r=ROUNDUP(C/(S_csv+S_enc+S_ref))であってもよいし、r=ROUNDDOWN(C/(S_csv+S_enc+S_ref))であってもよいし、r=ROUND(C/(S_csv+S_enc+S_ref))であってもよい。ただし、ROUNDUP(α)はαを整数値に切り上げる切り上げ関数であり、ROUNDDOWN(α)はαを整数値に切り捨てる切り捨て関数であり、ROUND(α)はαをαに最も近い整数に丸める丸め関数である。ここで、Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆは、処理部が、テキストファイルから１レコード分の文字列を読み込み、参照情報を参照しながら、エンコード情報にエンコードして秘密分散などの「演算」を行うまでの処理（以下、「１レコード分の一連の処理」という）のために必要なメモリサイズを表す。このメモリサイズがキャッシュメモリサイズ以下であれば、途中でメインメモリからデータを読み込むことなく高速に１レコード分の一連の処理を実行できる。Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）は、何回分の「１レコード分の一連の処理」に必要なメモリサイズ（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）をキャッシュメモリに確保できるかを表すものである。Ｃ／（Ｓ_ｃｓｖ＋Ｓ_ｅｎｃ＋Ｓ_ｒｅｆ）に対応するレコード数ｒの文字列を処理単位文字列とすることで、ｒレコード分の処理を行う際のメインメモリへのアクセス回数を削減し、高速に演算を行うことができる。 The above-mentioned "encoding" and "operation" are executed for each processing unit character string which is a character string for r records of a text file. Processing The processing executed for each unit character string is called "unit processing". The processing unit calculating section represents the record number r to be processed in unit process once to obtain the _"C / function value of _{_{(S csv + S enc + S}} ref) " _{(C /} _(S csv ₊ S as a record number r _enc + S Obtain the function value of _{ref) and output it.} The cache memory size C may be a predetermined one or may be an input one. The maximum value S _csv is obtained by the maximum size setting unit, the maximum value _Senc is obtained by the encoding size setting unit, and the total size S _ref is obtained by the reference size setting unit. .. The "function value of α" may be α itself or another value corresponding to α. Examples of "function values of α" are the smallest integer greater than or equal to α, the largest integer less than or equal to α, and the integer closest to α. For example, r = C / (S _csv + S _enc + S _ref ), r = ROUNDUP (C / (S _csv + S _enc + S _ref )), or r = ROUNDDOWN It may be (C / (S _csv + S _enc + S _ref )) or r = ROUND (C / (S _csv + S _enc + S _ref )). However, ROUNDUP (α) is a rounding function that rounds up α to an integer value, ROUNDDOWN (α) is a rounding function that rounds down α to an integer value, and ROUND (α) is a rounding function that rounds α to the integer closest to α. Is. Here, in S _csv + _Senc + _Sref , until the processing unit reads the character string for one record from the text file, encodes it into the encoded information while referring to the reference information, and performs "calculation" such as secret sharing. Represents the memory size required for the processing (hereinafter referred to as "a series of processing for one record"). If this memory size is less than or equal to the cache memory size, a series of processes for one record can be executed at high speed without reading data from the main memory on the way. C / ( _Scsv + _Senc + _Sref _{) indicates how many times the memory size (Scsv} + _Senc + _Sref ) required for "a series of processing for one record" can be secured in the cache memory. By using the character string of the number of records r corresponding to C / (S _csv + _Senc + _Sref ) as the processing unit character string, the number of accesses to the main memory when processing r records is reduced and the speed is increased. Can perform operations.

ｉ≧１の場合、文字列Ｓ_ｉは文字列Ｓ_ｉ‐１の直後に続く文字列である。ｉ≧１の場合、セル特定部は、スレッドｉのバッファ境界ロックが解除された後、参照情報と文字列Ｓ_ｉ‐１と文字列Ｓ_ｉとを用い、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルの位置に対応する情報Ａ_ｉを得てメインメモリに格納する。情報Ａ_ｉは、例えば、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルが属するレコードを表す情報と当該セルに対応する属性を表す情報（例えば、当該レコードの最初から何番目の属性に対応するかを表す情報）であってもよいし、文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセルの位置および長さを表す情報であってもよい。文字列Ｓ_ｉ‐１の終端がセルの終端である場合には文字列Ｓ_ｉの先頭のセルが「文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセル」となる。この場合には情報Ａ_ｉのメインメモリへの格納が省略されてもよい。一方、文字列Ｓ_ｉ‐１の終端がセルの終端でない場合、セル特定部は文字列Ｓ_ｉ‐１と文字列Ｓ_ｉとを用い、「文字列Ｓ_ｉ‐１に含まれる最後のセルの直後に続くセル」を生成して情報Ａ_ｉを得る。これにより、パース部が特定できなかったセルの位置に対応する情報が得られる。参照情報と情報Ａ_ｉとによってテキストファイルの各セルが属するレコードと当該セルに対応する属性（例えば、当該レコードの最初から何番目の属性であるかを表す情報）とを特定できる。なお、ｉ＝０の場合、セル特定部は何もしない。 When i ≧ 1, the character string S _i is a character string immediately following the character string _{S i-1.} When i ≧ 1, the cell identification part is included in the character string S _i-1 _{by using the reference information, the character string S i-1} and the character string S _i -1 after the buffer boundary lock of the thread i is released. stored in the main memory to obtain information a _i corresponding to the position of the cell immediately following the last cell. Information A _i is, for example, _{information representing a record to which a cell immediately following the last cell included in the character string S i-1} belongs and information representing an attribute corresponding to the cell (for example, the number from the beginning of the record). Information indicating whether or not it corresponds to the attribute of), or information indicating the position and length of the cell immediately following the last cell included _{in the character string Si-1.} When the end of the character string S _{i-1 is} the end of the cell, the _{first cell of the character string S i} becomes "the cell immediately following the last cell included in the character string S _i-1". In this case, _{the storage of the information Ai in} the main memory may be omitted. On the other hand, if the end of the string S _i-1 is not the end of the cell, the cell specific portion using a string S _i-1 and the string S _i, of the last cell included in the "string S _i-1 obtain information a _i generates a cell "immediately following. As a result, information corresponding to the position of the cell whose perspective portion could not be specified can be obtained. The record to which each cell of the text file belongs and the attribute corresponding to the cell (for example, information indicating the number of the attribute from the beginning of the record) can be specified by the reference information and the information _Ai. When i = 0, the cell specific part does nothing.

一方、スレッドｉのバッファ境界ロックが解除されている場合、パース部１２６２−ｑはｉ≧１であるか否かを判定する（ステップＳ１２６２ｂ−ｑ）。ｉ≧１でない場合（すなわち、ｉ＝０の場合）、パース部１２６２−ｑは、メインメモリ１２３から読み出した文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１１に例示した文字列Ｓ_０の場合、パース部１２６２−ｑは、文字列Ｓ_０をパースしてセル「石田」「太郎」「1990/2/8」「100-0002」「sjeifdfgjrrf」「45dkfjkejdf5」「石田」「次郎」「1985/5/2」「111-0112」「25df4d4ed」「1s4dlccclseed」「石田」「花子」「2001/4/8」「111-2222」「5d4e4d4ffg」「skekdjjfaae」「佐藤」「太郎」「1992/7/11」「111-0345」「dlekd4f3e」を特定し、それらの参照情報を計算する。最後の「“4selddks“」の終端はセルの終端ではないため、スレッド０では「“4selddks“」の参照情報は計算されない。パース部１２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。その後、処理がステップＳ１２６８−ｑに進む（図７のステップＳ１２６２ｃ−ｑ、図８のＰ_ｉ）。一方、ｉ≧１である場合、パース部１２６２−ｑは、スレッドｉ−１でのパース結果（特定された各セルの参照情報およびセルに含まれない文字を特定する情報）をメインメモリ１２３から読み込み、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字を特定する。文字列Ｓ_ｉ−１の終端がセルの終端である場合には、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字は存在しない（ステップＳ１２６２ｄ−ｑ）。次にパース部１２６２−ｑは、メインメモリ１２３から文字列Ｓ_ｉを読み出し、スレッドｉ−１で特定されたセルに含まれない文字と文字列Ｓ_ｉとを結合した文字列をパースし、この文字列に含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。文字列Ｓ_ｉ−１の終端がセルの終端である場合には、パース部１２６２−ｑは文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１１および図１２に例示した文字列Ｓ_０およびＳ_１の場合、パース部１２６２−ｑは、文字列Ｓ_ｉ−１のうちスレッドｉ−１で特定されたセルに含まれない文字「“4selddks“」と文字列Ｓ_ｉ−１とを結合した文字列
“4selddks““k304kdkk400-03d”
“佐藤”,“次郎”,“1989/8/21”,“123-0434”,“dkesopd445e”,“4ssjdejdoseae3230dds”
“佐藤”,“花子”,“1995/2/3”,“145-0234”,“skdeofl4s3d3”,“skek94kdskd4dc”
“田中”,“太郎”,“1992/3/23”,“134-0134”,“dj394949495kf”,“47s52\n5412485d”
“田中”,“次郎”,“1979/4/21”,“11
をパースし、この文字列に含まれる各セル「4selddks““k304kdkk400-03d」「佐藤」「次郎」「1989/8/21」「123-0434」「dkesopd445e」「4ssjdejdoseae3230dds」「佐藤」「花子」「1995/2/3」「145-0234」「skdeofl4s3d3」「skek94kdskd4dc」「田中」「太郎」「1992/3/23」「134-0134」「dj394949495kf」「47s52\n5412485d」「田中」「次郎」「1979/4/21」の位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。最後の“11の終端はセルの終端ではないため、スレッド１では“11の参照情報は計算されない。パース部１２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。その後、処理がステップＳ１２６８−ｑに進む（図７のステップＳ１２６２ｅ−ｑ、図８のＰ_ｉ）。 On the other hand, when the buffer boundary lock of the thread i is released, the parsing unit 1262-q determines whether or not i ≧ 1 (step S1262b−q). If not i ≧ 1 (i.e., the case of i = 0), parser 1262-q parses the string _{S i} read from the main memory 123, the position and length of each cell included in the string _{S i} The reference information representing the above is calculated and stored in the main memory 123. _{For example, in the case of the character string S 0} illustrated in FIG. 11, the parsing unit 1262-q parses the character string S ₀ and cells "Ishida", "Taro", "1990/2/8", "100-0002", and "sjeifdfgjrrf". "45dkfjkejdf5""Ishida""Jiro""1985/5/2""111-0112""25df4d4ed""1s4dlccclseed""Ishida""Hanako""2001/4/8""111-2222""5d4e4d4ffg"" Identify "skekdjjfaae", "Sato", "Taro", "1992/7/11", "111-0345", and "dlekd4f3e", and calculate their reference information. Since the end of the last "4selddks" is not the end of the cell, thread 0 does not calculate the reference information for "4selddks". The parsing unit 1262-q reads r from the main memory 123 when the area for storing the reference information in the main memory 123 becomes insufficient, and sets the buffer area for storing the reference information for the r records in the main memory 123. Secure all together. Thereafter, the processing proceeds to step S1268-q (Step S1262c-q in FIG. 7, _P i in FIG. 8). On the other hand, when i ≧ 1, the parsing unit 1262-q outputs the parsing result (reference information of each specified cell and information for identifying a character not included in the cell) in thread i-1 from the main memory 123. Read and identify the characters that are not included in the cell specified by thread i-1 in the character string S _i-1. When the end of the character string S _{i-1 is} the end of the cell, there is no character in the character string S _i-1 that is not included in the cell specified by the thread i-1 (step S1262d-q). Then parser 1262-q reads the string S _i from the main memory 123, to parse the string which is the concatenation of the characters and not in the specified cell and a string S _i thread i-1, this Reference information representing the position and length of each cell included in the character string is calculated and stored in the main memory 123. Reference information when the end of the string S _i-1 is the end of the cell, parsing unit 1262-q is to parse the string S _i, representing the position and length of each cell in the string S _i Is calculated and stored in the main memory 123. _{For example, in the case of the character strings S 0} and S ₁ illustrated in FIGS. 11 and 12, the perspective portion 1262-q is a character " _{1 in the character string S i-1} that is not included in the cell specified by the thread i-1." The character string "4selddks""k304kdkk400-03d" which is a combination of "4selddks""and the character string Si _-1.
“Sato”, “Jiro”, “1989/8/21”, “123-0434”, “dkesopd445e”, “4ssjdejdoseae3230dds”
“Sato”, “Hanako”, “1995/2/3”, “145-0234”, “skdeofl4s3d3”, “skek94kdskd4dc”
"Tanaka", "Taro", "1992/3/23", "134-0134", "dj394949495kf", "47s52 \ n5412485d"
"Tanaka", "Jiro", "1979/4/21", "11"
And each cell included in this string "4selddks""k304kdkk400-03d""Sato""Jiro""1989/8/21""123-0434""dkesopd445e""4ssjdejdoseae3230dds""Sato""Hanako""1995/2/3""145-0234""skdeofl4s3d3""skek94kdskd4dc""Tanaka""Taro""1992/3/23""134-0134""dj394949495kf""47s52 \ n5412485d""Tanaka""Jiro" The reference information indicating the position and length of "1979/4/21" is calculated and stored in the main memory 123. Since the end of the last "11 is not the end of the cell, thread 1 does not calculate the reference information of" 11. The parsing unit 1262-q reads r from the main memory 123 when the area for storing the reference information in the main memory 123 becomes insufficient, and sets the buffer area for storing the reference information for the r records in the main memory 123. Secure all together. Thereafter, the processing proceeds to step S1268-q (Step S1262e-q in FIG. 7, _P i in FIG. 8).

演算部１２６６−ｑは、キャッシュメモリ１２６０−ｑから読み出したエンコード情報Ｅ_ｉ，ｊの秘密分散を行って秘密分散値（演算値）ＳＳ_ｉ，ｊを得てメインメモリ１２３に格納する。この際、処理単位文字列ＰＳ_ｉ，ｊに対応するｒレコード分の参照情報をメインメモリ１２３に格納しておく必要はないため、秘密分散値ＳＳ_ｉ，０，…，ＳＳ_{ｉ，Ｊ−１}がこのｒレコード分の参照情報が格納されていた領域に上書きされてもよい（図７のステップＳ１２６５−ｑ、図８のＳＳ_ｉ）。 _{The calculation unit 1266-q performs secret sharing of the encoding information E i and j} read from the cache memory 1260-q, obtains the secret sharing value (calculation value) SS _{i and j} , and stores them in the main memory 123. At this time, since it is not necessary to store the reference information for the r records corresponding to the processing unit character strings PS _{i, j} _{in the main memory 123, the secret sharing values SS i, 0} , ..., SS _{i, J-1} May be overwritten in the area where the reference information for this r record is stored (step S126 5- q in FIG. 7, SS _i in FIG. 8).

図３に例示するように、演算装置２２は、入力部１２１ａ、出力部１２１ｂ、補助記憶部１２２、メインメモリ１２３、制御部１２５、および処理部２２６−１〜２２６−Ｑを有する。ただし、Ｑは２以上の整数である。演算装置１２は、制御部１２５の制御の下で各処理を実行する。 As illustrated in FIG. 3, the arithmetic unit 2 2 has an input unit 121a, an output portion 121b, an auxiliary storage unit 122, a main memory 123, control unit 125, and a processing unit 226-1~226-Q. However, Q is an integer of 2 or more. The arithmetic unit 12 executes each process under the control of the control unit 125.

図４に例示するように、処理部２２６−ｑ（ただし、ｑ＝１，…，Ｑ）は、キャッシュメモリ１２６０−ｑ、読み込み部１２６１−ｑ、パース部２２６２−ｑ、セル特定部２２６４−ｑ、エンコード部１２６５−ｑ、演算部１２６６−ｑ、ファイル読み込みロック解除部１２６７−ｑ、バッファ境界ロック解除部１２６８−ｑ、および並列性ロック解除部１２６９−ｑを有する。 As illustrated in FIG. 4, the processing unit 2 26-q (where q = 1, ..., Q) includes a cache memory 1260-q, a reading unit 1261-q, a perspective unit 2 262-q, and a cell specifying unit 2264. It has −q, an encoding unit 1265-q, an arithmetic unit 1266-q, a file read unlocking unit 1267-q, a buffer boundary unlocking unit 1268-q, and a parallelism unlocking unit 1269-q.

一方、スレッドｉのファイル読み込みロックおよび並列性ロックの両方が解除されている場合、読み込み部１２６１−ｑは、メインメモリ１２３からファイルバッファサイズｆを読み込み、メインメモリ１２３にファイルバッファサイズｆの領域を確保する。さらに、読み込み部１２６１−ｑは、補助記憶部１２２に格納されたテキストファイルの文字列からファイルバッファサイズｆの領域に格納可能な文字列Ｓ_ｉを読み込む。図１７の例では、文字列Ｓ_０として以下が読み込まれる。
石田,太郎,1990/2/8,100-0002,東京都渋谷区〇〇〇,03-3234-5678
石田,次郎,2000/4/2,274-16,神奈川県藤沢市江の島〇〇〇,03-9999-9999
石田,花子,1985/6/2,352-725,東京都港区〇〇〇,03-1111-9999
佐藤,太郎,2001/5/1,100-0002,東京都千代田区〇〇〇,03-3234-5678
佐藤,次
図１８の例では、文字列Ｓ_１として以下が読み込まれる。
郎,2001/6/2,274-16,神奈川県藤沢市江の島〇〇〇,03-9999-9999
佐藤,花子,2002/7/2,352-725,東京都新宿区新宿〇〇〇,03-1111-9999
田中,太郎,2001/1/1,100-0002,東京都千代田区〇〇〇,03-1234-5678
田中,次郎,2001/1/2,251-0036,神奈川県藤沢市江の島〇〇〇
読み込み部１２６１−ｑは、メインメモリ１２３に確保したファイルバッファサイズｆの領域に文字列Ｓ_ｉを格納する（図１３のステップＳ１２６１ｂ−ｑ、図１４のＲ_ｉ）。 On the other hand, when both the file read lock and the concurrency lock of thread i are released, the read unit 1261-q reads the file buffer size f from the main memory 123 and sets the area of the file buffer size f in the main memory 123. Secure. Furthermore, the reading unit 1261-q reads the string S _i can be stored in an area of the file buffer size f from a string of text files stored in the auxiliary storage unit 122. In the example of FIG. 17, the following is read as _{the character string S 0.}
Ishida, Taro, 1990/2 / 8,100-0002, Shibuya-ku, Tokyo 〇〇〇, 03-3234-5678
Ishida, Jiro, 2000/4 / 2,274-16, Enoshima, Fujisawa City, Kanagawa Prefecture 〇〇〇, 03-9999-9999
Ishida, Hanako, 1985/6 / 2,352-725, Minato- ku , Tokyo 〇〇〇, 03-1111-9999
Sato, Taro, 2001/5 / 1,100-0002, Chiyoda-ku, Tokyo 〇〇〇, 03-3234-5678
Sato, in the following example 18, the following are read as a string S _1.
Ro, 2001/6 / 2,274-16, Enoshima, Fujisawa City, Kanagawa Prefecture 〇〇〇, 03-9999-9999
Sato, Hanako, 2002/7 / 2,352-725, Shinjuku, Shinjuku-ku, Tokyo 〇〇〇, 03-1111-9999
Tanaka, Taro, 2001/1 / 1,100-0002, Chiyoda-ku, Tokyo 〇〇〇, 03-1234-5678
Tanaka, Jiro, 2001/1 / 2,251-0036, Fujisawa, Kanagawa Prefecture Enoshima thousand reading unit 1261-q stores the string S _i in the area of the file buffer size f secured in the main memory 123 (FIG. 13 step S1261b-q, _R i in FIG. 14) of the.

パース部２２６２−ｑはメインメモリ１２３から読み出した文字列Ｓ_ｉをパースし、文字列Ｓ_ｉに含まれる各セルの位置および長さを表す参照情報を計算してメインメモリ１２３に格納する。例えば、図１７に例示した文字列Ｓ_０の場合、パース部２２６２−ｑは、文字列Ｓ_０をパースしてセル「石田」「太郎」「1990/2/8」「100-0002」「東京都渋谷区〇〇〇」「03-3234-5678」「石田」「次郎」「2000/4/2」「274-16」「神奈川県藤沢市江の島〇〇〇」「03-9999-9999」「石田」「花子」「1985/6/2」「352-725」「東京都港区〇〇〇」「03-1111-9999」「佐藤」「太郎」「2001/5/1」「100-0002」「東京都千代田区〇〇〇」「03-3234-5678」「佐藤」を特定し、それらの参照情報を計算する。最後の「次」の終端はセルの終端ではないため、スレッド０では「次」の参照情報は計算されない。例えば、図１８に例示した文字列Ｓ_１の場合、パース部２２６２−ｑは、文字列Ｓ_１をパースしてセル「2001/6/2」「274-16」「神奈川県藤沢市江の島〇〇〇」「03-9999-9999」「佐藤」「花子」「2002/7/2」「352-725」「東京都新宿区新宿〇〇〇」「03-1111-9999」「田中」「太郎」「2001/1/1」「100-0002」「東京都千代田区〇〇〇」「03-1234-5678」「田中」「次郎」「2001/1/2」「251-0036」「神奈川県藤沢市江の島〇〇〇」を特定し、それらの参照情報を計算する。最初の「郎」の始端はセルの始端ではないため、スレッド１では「郎」の参照情報は計算されない。なお、パース部２２６２−ｑは、メインメモリ１２３に参照情報を格納する領域が足りなくなったときに、メインメモリ１２３からｒを読み込み、ｒレコード分の参照情報を格納するためのバッファ領域をメインメモリ１２３にまとめて確保する。パース部２２６２−ｑはスレッドｉのバッファ境界ロックが解除される前にこの処理を開始できる。すなわち、パース部２２６２−ｑは、ｉ≧１において、文字列Ｓ_ｉ−１に含まれる各セルの参照情報の計算が終わる前に、文字列Ｓ_ｉに含まれる各セルの参照情報の計算を開始できる（図１３のステップＳ２２６２−ｑ、図１４のＰ_ｉ）。 Parser 2262-q parses the string S _i read from the main memory 123, and stores the calculated reference information representative of the location and length of each cell in a string S _i in a main memory 123. _{For example, in the case of the character string S 0} illustrated in FIG. 17, the perspective unit 2262-q parses the character string S ₀ and cells "Ishida", "Taro", "1990/2/8", "100-0002", and "Tokyo". Shibuya-ku, Tokyo 〇〇〇 ”“ 03-3234-5678 ”“ Ishida ”“ Jiro ”“ 2000/4/2 ”“ 274-16 ”“ Enoshima 〇〇〇, Fujisawa City, Kanagawa Prefecture ”“ 03-9999-9999 ”“ Ishida, Hanako, June 2, 1985, 352-725, Minato- ku, Tokyo, 〇〇〇, 03-1111-9999, Sato, Taro, May 1, 2001, 100-0002 "○○○, Chiyoda-ku, Tokyo""03-3234-5678""Sato" is specified, and the reference information for them is calculated. Since the last "next" end is not the cell end, thread 0 does not calculate the "next" reference information. For example, if the string S ₁ illustrated in FIG. 1 8, parser 2262-q, the cell "2001/6/2" parses the string S ₁ "274-16", "Fujisawa Enoshima 〇〇〇 ”“ 03-9999-9999 ”“ Sato ”“ Hanako ”“ 2002/7/2 ”“ 352-725 ”“ Shinjuku, Shinjuku-ku, Tokyo 〇〇〇 ”“ 03-1111-9999 ”“ Tanaka ”“ Taro "2001/1/1""100-0002""Chiyoda-ku, Tokyo 〇〇〇""03-1234-5678""Tanaka""Jiro""2001/1/2""251-0036""KanagawaPrefecture" Identify "Fujisawa City Enoshima OOOO" and calculate the reference information for them. The first of the beginning of the "low" nose damage at the beginning of the cell, reference information of the thread 1 "wax" is not calculated. When the area for storing the reference information in the main memory 123 becomes insufficient, the perspective unit 2262-q reads r from the main memory 123 and uses the buffer area for storing the reference information for the r record as the main memory. Secure all at 123. The parsing unit 2262-q can start this process before the buffer boundary lock of thread i is released. That is, the perspective unit 2262-q calculates the reference information of each cell included in the character string S _i _{before the calculation of the reference information of each cell included in the character string S i-1 is completed in i ≧ 1.} It can be started _(P i in step S2262-q, 14 in FIG. 13).

その後、セル特定部２２６４−ｑが、スレッドｉのバッファ境界ロックが解除されたか否かを判定する。スレッド０のバッファ境界ロックは初期状態で解除されている（ステップＳ２２６４ａ−ｑ）。スレッドｉのバッファ境界ロックが解除されていない場合にはステップＳ２２６４ａ−ｑの判定が繰り返される。 After that, the cell identification unit 2264-q determines whether or not the buffer boundary lock of the thread i has been released. The buffer boundary lock of thread 0 is released in the initial state (step S2264a-q). If the buffer boundary lock of thread i is not released, the determination in step S2264 a-q is repeated.

その後、処理部１２６−ｑに代えて処理部２２６−ｑのエンコード部１２６５−ｑおよび演算部１２６６−ｑが、第１実施形態で説明したステップＳ１２６３−ｑ，Ｓ１２６５−ｑ，Ｓ１２６９−ｑの処理を実行する（図１３のステップＳ１２６３−ｑ，Ｓ１２６５−ｑ，Ｓ１２６９−ｑ、図１４のＥ_ｉ，ＳＳ_ｉ，ＵＰ_ｉ＋ｎｐ）。 After that, instead of the processing unit 126-q, the encoding unit 1265-q and the calculation unit 1266-q of the processing unit 226-q perform the steps S1263-q, S1265-q , and S1269-q described in the first embodiment. process is executed (step S1263-q, S1265-q in FIG. _{13, S 1269-q, E} i in FIG. _{_{14, SS i, UP i +}} np).

Claims

It is a parameter setting device for arithmetic processing on character strings in text files.
The text file contains W records, each of the records contains G cells of arbitrary length, each of the cells contains arbitrary characters, and W and G are integers of 1 or more. G cells correspond to attribute information,
C is the cache memory size, M is the main memory size,
A maximum size setting unit that sets _{the maximum value Scsv} of the size of the character string for one record of the text file by inputting the attribute information, and
A minimum size setting unit that sets _{the minimum value s csv} of the size of the character string for one record by inputting the attribute information, and
An encoding size setting unit that sets _{the maximum value Sensor} of the total size of the encoding information obtained by encoding the character string for one record based on a predetermined finite set, and
An operation size setting unit that sets _{the maximum value Sss} of the total size of the operation values obtained by performing a specific operation on the encoding information for one record, and
A reference size setting unit for setting _{the total size Sref} of reference information representing the position and length of each of the cells for one record, and a reference size setting unit.
The encoding and the operation are processes executed for each processing unit character string which is a character string for r records of the text file, and a function value _{of C / (Scsv} + _Senc + _{Sref) is obtained as the number of records r.} Processing unit calculation unit and
I is the maximum value of the encoding and the number of iterations of the operations are performed for each of the processing units string _{is S ref} ≧ _S when the _{_{_{ss max (S ref, S ss}}} ) = S ref, S ref < when _{_{_{S ss max (S ref, S}}} ss) a _{= S ss,} a function value of _{f 0} is _{_{_{s csv · M / (s csv}}} + S enc + max (S ref, S ss)), in the arithmetic processing A parallel number calculation unit that obtains a function value of f ₀ / I · r · S _csv _{as a parallel number n p, and a parallel number calculation unit.}
Parameter setting device having.

The parameter setting device according to claim 1.
A parameter setting device further including a buffer size calculation unit that obtains a function value _{of f 0} / _np as a file buffer size f of data collectively read from a character string of the text file during the arithmetic processing.

The parameter setting device according to claim 1 or 2.
The maximum value I of the number of iterations is a parameter defined so that the ratio of the total processing amount of the preprocessing to the total processing amount for performing the encoding and the operation of the processing unit character string is equal to or less than a predetermined value. Setting device.

An arithmetic unit that performs arithmetic processing on a character string in a text file.
The text file contains W records, each of the records contains G cells of arbitrary length, each of the cells contains any character, W and G are integers greater than or equal to 1, f. Is the defined file buffer size, n _p is the number of parallels, r is a positive integer representing the number of records, i represents each thread, and i ∈ {0, ..., T-1}. T is a positive integer representing the number of threads corresponding to the size of the character string of the text file, 1 ≤ n _p ≤ T, and in the initial state, the file read lock and buffer boundary lock of thread 0 and the thread 0, ..., The parallelism lock of n _{p -1 is released,}
It has a main memory, a cache memory, and a plurality of processing units.
Of the plurality of processing units, the processing unit that performs the processing of thread i is
Wherein after a file read lock and parallelism locking thread i is released, the reading unit for storing a character string in a text file into the main memory by reading the file buffer size string S _i can be stored in an area of f When,
A file read unlocking unit that unlocks the file read lock of thread i + 1 after the character string S _{i is stored in the main memory,}
After the buffer boundary locking thread i is canceled, and a parser for the character string to calculate a reference information representative of the location and length of each cell included in the S _i and stored in the main memory,
A buffer boundary unlocking unit that releases the buffer boundary lock of thread i + 1 after the reference information representing the position and length of each cell included in the character string S _{i is calculated.}
The combined character string CS ₀ when i = 0 is S ₀ , and the combined character string CS _i when i ≧ 1 is the combined character string S _i immediately after the combined character string CS _i-1 . , J is a positive integer, j = 0, ..., J-1, and is a character string for r records to be processed included in the _{combined character string CS i based on the information specified by the reference information.} A process of selecting a certain processing unit character string PS _{i, j} and encoding the processing unit character string PS _{i, j} _{into encoding information E i, j} , which is a source of a predetermined finite set, is performed using the cache memory. Encoding part to do and
A calculation unit that uses the cache memory to perform a process of performing a specific operation on the encoding information E _{i, j} _{to obtain an operation value SS i, j and storing the operation value SS i, j in the main memory.}
A parallelism unlocking unit that unlocks the parallelism of threads i + _np after the calculated values SS _{i and j are obtained.}
Arithmetic logic unit.

An arithmetic unit that performs arithmetic processing on a character string in a text file.
The text file contains W records, each of the records contains G cells of arbitrary length, each of the cells contains any character, W and G are integers greater than or equal to 1, f. Is the defined file buffer size, n _p is the number of parallels, r is a positive integer representing the number of records, i represents each thread, and i ∈ {0, ..., T-1}. T is a positive integer representing the number of threads corresponding to the size of the character string of the text file, 1 ≤ n _p ≤ T, and the file read lock of thread 0 and thread 0, ..., n _p -1 in the initial state. The parallelism lock of
It has a main memory, a cache memory, and a plurality of processing units.
Of the plurality of processing units, the processing unit that performs the processing of thread i is
Wherein after a file read lock and parallelism locking thread i is released, the reading unit for storing a character string in a text file into the main memory by reading the file buffer size string S _i can be stored in an area of f When,
A file read unlocking unit that unlocks the file read lock of thread i + 1 after the character string S _{i is stored in the main memory,}
A parsing unit to be stored in the main memory by calculating a reference information representative of the location and length of each cell included in the string S _i,
When i ≧ 1, the character string S _i is a character string _{immediately following the character string S i-1} , and after the buffer boundary lock of the thread i is released, the reference information and the character string S _{i-1 are released.} And the cell identification part that obtains _{the information A i} corresponding to the position of the cell immediately after the last cell included in the _{character string S i-1} by using the character string S _i and the character string S i.
A buffer boundary unlocking unit that releases the buffer boundary lock of thread i + 1 after the information A _{i is obtained, and a buffer boundary unlocking unit.}
The combined character string CS ₀ when i = 0 is S ₀ , and the combined character string CS _i when i ≧ 1 is the combined character string S _i immediately after the combined character string CS _i-1 . , J is a positive integer, j = 0, ..., J-1, and the processing target r included in the concatenation character string CS _i _{is based on the reference information and the information specified by the information A i.} _{The process of selecting the processing unit character strings PS i, j} , which are character strings for records, and encoding the processing unit character strings PS _{i, j} _{into the encoding information E i, j} , which is the source of a predetermined finite set, is described above. The encoding part that uses the cache memory and
A calculation unit that uses the cache memory to perform a process of performing a specific operation on the encoding information E _{i, j} _{to obtain an operation value SS i, j and storing the operation value SS i, j in the main memory.}
A parallelism unlocking unit that unlocks the parallelism of threads i + _np after the calculated values SS _{i and j are obtained.}
Arithmetic logic unit.

The arithmetic unit according to claim 5.
The cell contains only characters that can be determined by itself whether or not it represents the target of the operation.
In i ≧ 1, the perspective unit is the position of each cell included in the _{character string S i} before the calculation of the reference information representing the position and length of each cell included in the _{character string S i-1 is completed.} And an arithmetic unit that initiates the calculation of reference information representing length.

The arithmetic unit according to any one of claims 4 to 6.
C is the cache memory size of the cache memory, M is the main memory size of the main memory, and
S _csv is the maximum value of the character string size for one record of the text file.
s _csv is the minimum value of the size of the character string for one record.
S _enc is the maximum value of the total size of the encoded information obtained by encoding the string of the one record based on the finite set,
S _ss is the maximum value of the total size of the calculated values obtained by performing the operation on the encoding information for the one record.
S _ref is the total size of reference information representing the position and length of each of the cells for the one record.
The number of records r is a function value of _{C / (S csv} + _Senc + _Sref).
I is the maximum number of iterations of the encoding and the calculation is performed the processing unit character string _{PS i,} for each _{_j,} when the _{_{_{_{S ref ≧ S ss max (S}}}} ref, S ss) be a _{= S ref} , _S ref _<when the _{_{_{S ss max (S ref, S}}} ss) a _{= S ss,} is a function value of _{f 0} is _{_{_{s csv · M / (s csv}}} + S enc + max (S ref, S ss)), The parallel number n _p is a function value of _{f 0} / I · r · _Scsv.
An arithmetic unit in which the file buffer size f is _{a function value of f 0} / _np.

The arithmetic unit according to any one of claims 4 to 7.
The parsing unit is an arithmetic unit that collectively secures a buffer area for storing the reference information for the r record in the main memory when the area for storing the reference information becomes insufficient in the main memory. ..

A method of executing processing of each part of the apparatus according to any one of claims 1 to 8.

A program for operating a computer as the device according to any one of claims 1 to 8.

A computer-readable recording medium in which a program for operating a computer as any of the devices 1 to 8 is stored.