JP2024042360A

JP2024042360A - Data processing program, data processing method, and data processing device

Info

Publication number: JP2024042360A
Application number: JP2022147028A
Authority: JP
Inventors: 芙夕楓山田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2024-03-28

Abstract

An object of the present invention is to reduce variations in processing amount of a plurality of calculation units.
A storage unit 11 stores 2N (N is an integer greater than or equal to 2) pieces of data used for calculations in combination with each of a plurality of partner data. From the 2N pieces of data, the processing unit 12 identifies the top N first data and the bottom N second data in the result of sorting based on the number of partner data to be operated on. The processing unit 12 sends each of the N first data items to each of the N calculation units from the first calculation unit to the N-th calculation unit in descending order of the number of partner data to be calculated. assign. The processing unit 12 sends each of the lower N second data to each of the N calculation units from the first calculation unit to the N-th calculation unit in ascending order of the number of partner data to be calculated. assign. The processing unit 12 uses the N calculation units to execute operations on N data out of the 2N data in parallel based on the results of allocation of the 2N data to the N calculation units.
[Selection diagram] Figure 1

Description

本発明はデータ処理プログラム、データ処理方法およびデータ処理装置に関する。 The present invention relates to a data processing program, a data processing method, and a data processing device.

データの分析にパターンマイニングと呼ばれる手法が用いられている。パターンマイニングでは、データ集合の中から、ある条件を満たすデータの組み合わせを抽出することがある。データの複数の組み合わせそれぞれに対する演算は、例えばコンピュータが備えるＣＰＵ（Central Processing Unit）などの複数の演算部により並列に実行することができる。ここで、コンピュータによる並列処理の実行を効率化する方法が考えられている。 A method called pattern mining is used to analyze data. In pattern mining, combinations of data that satisfy certain conditions may be extracted from a data set. Operations on each of a plurality of combinations of data can be executed in parallel by a plurality of arithmetic units such as a CPU (Central Processing Unit) included in a computer, for example. Here, methods are being considered to make the execution of parallel processing by computers more efficient.

例えば、マルチプロセッサシステムにおいて、ＯＳ（Operating System）のスケジューラにより、中断中のスレッドを最も少ない負荷のＣＰＵに割り当てることで、中断中のスレッドの割り当てと同時に負荷分散処理を行うスケジュール制御方法の提案がある。 For example, in a multiprocessor system, a schedule control method has been proposed in which the OS (Operating System) scheduler allocates suspended threads to the CPU with the least load, thereby simultaneously allocating suspended threads and performing load distribution processing. be.

また、スレッドをまとめてディスパッチする際に、前回と同じＣＰＵにディスパッチするようにディスパッチ時期を調整することで、ＣＰＵごとに設けられたキャッシュの中のデータが再利用される可能性を高めるスケジューリング方式の提案もある。 In addition, when dispatching threads in batches, the dispatch timing is adjusted so that the threads are dispatched to the same CPU as the previous time, thereby increasing the possibility that data in the cache provided for each CPU will be reused. There is also a proposal.

また、マルチコアプロセッサを備えたコンピュータシステムで、ＯＳスケジューラによりＣＰＩ（Cycles Per Instruction）レートと呼ばれる命令ごとのサイクル数を用いて、各スレッドを各コアに動的に割り当てる方法の提案もある。 Also, in a computer system with a multi-core processor, a method has been proposed in which the OS scheduler dynamically assigns each thread to each core using the number of cycles per instruction, called the CPI (Cycles Per Instruction) rate.

更に、タスクが実行されているときに発行されたオブジェクト、メモリ、またはレジスタのロックの数などの統計データを収集し、統計データを基に後続の処理サイクルでのスレッド数を調整するマルチスレッド処理システムの提案もある。 Additionally, multithreaded processing collects statistical data, such as the number of object, memory, or register locks issued while a task is executing, and adjusts the number of threads in subsequent processing cycles based on the statistical data. There are also system proposals.

国際公開第２００７／０１７９３２号International Publication No. 2007/017932 特開平７－３０２２４６号公報Japanese Patent Application Laid-Open No. 7-302246 米国特許出願公開第２００８／００５９７１２号明細書US Patent Application Publication No. 2008/0059712 米国特許出願公開第２０１７／００３１７０８号明細書US Patent Application Publication No. 2017/0031708

第１のデータ集合の要素と第２のデータ集合の要素との全ての組み合わせに対する演算を複数の演算部を用いて行う場合、第１のデータ集合の要素ごとに、当該要素に関する演算を担当する演算部を割り当てることが考えられる。この場合、重複する組み合わせの演算は省略され得る。このため、第１のデータ集合の要素ごとに、組み合わせ相手として用いる第２のデータ集合の要素の数が変わり得る。したがって、例えば第１のデータ集合の要素を、組み合わせ相手となる第２のデータ集合の要素の数が多い順に、各演算部にサイクリックに割り当てると、各演算部が実行する処理量にばらつきが生じる。各演算部の処理量のばらつきは、全ての組み合わせに対する演算の終了を遅延させる要因となる。 When performing calculations on all combinations of elements of the first data set and elements of the second data set using multiple calculation units, each element of the first data set is responsible for the calculation regarding that element. It is conceivable to allocate an arithmetic unit. In this case, redundant combination operations may be omitted. Therefore, the number of elements of the second data set used as a combination partner may vary for each element of the first data set. Therefore, for example, if the elements of the first data set are cyclically assigned to each calculation unit in descending order of the number of elements of the second data set to be combined, the amount of processing performed by each calculation unit will vary. arise. Variations in the processing amount of each calculation unit become a factor that delays the completion of calculations for all combinations.

１つの側面では、本発明は、複数の演算部の処理量のばらつきを低減することを目的とする。 In one aspect, the present invention aims to reduce variations in processing amount of a plurality of arithmetic units.

１つの態様では、データ処理プログラムが提供される。このデータ処理プログラムは、コンピュータに、次の処理を実行させる。コンピュータは、複数の相手データそれぞれとの組み合わせによる演算に用いられる２Ｎ（Ｎは２以上の整数）個のデータから、演算対象の相手データの数でのソート結果における上位のＮ個の第１データと下位のＮ個の第２データとを特定する。コンピュータは、上位のＮ個の第１データそれぞれを、演算対象の相手データの数の降順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。コンピュータは、下位のＮ個の第２データそれぞれを、演算対象の相手データの数の昇順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。コンピュータは、Ｎ個の演算部に対する２Ｎ個のデータの割り当て結果に基づいて、２Ｎ個のデータのうちのＮ個のデータに対する演算をＮ個の演算部により並列に実行する。 In one aspect, a data processing program is provided. This data processing program causes a computer to execute the following process. The computer identifies the top N first data and the bottom N second data in the sorting result by the number of the partner data to be operated on, from 2N (N is an integer of 2 or more) pieces of data used in the operation by combination with each of the multiple partner data. The computer assigns each of the top N first data to each of the N operation units from the first operation unit to the Nth operation unit in descending order of the number of the partner data to be operated on. The computer assigns each of the bottom N second data to each of the N operation units from the first operation unit to the Nth operation unit in ascending order of the number of the partner data to be operated on. The computer executes the operation on N data of the 2N data in parallel by the N operation units based on the result of the assignment of the 2N data to the N operation units.

また、１つの態様では、コンピュータが実行するデータ処理方法が提供される。また、１つの態様では、記憶部と処理部とを有するデータ処理装置が提供される。 Also, in one aspect, a computer-implemented data processing method is provided. Further, in one aspect, a data processing device including a storage section and a processing section is provided.

１つの側面では、複数の演算部の処理量のばらつきを低減できる。 In one aspect, variations in the amount of processing of the plurality of calculation units can be reduced.

第１の実施の形態のデータ処理装置を説明する図である。FIG. 1 is a diagram illustrating a data processing device according to a first embodiment. 第２の実施の形態のデータ処理装置のハードウェア例を示す図である。FIG. 7 is a diagram illustrating an example of hardware of a data processing device according to a second embodiment. 商品の組み合わせの抽出例を示す図である。It is a figure which shows the example of extraction of the combination of goods. ３種類の商品を全て購入した人数の計算例を示す図である。FIG. 13 is a diagram showing an example of calculation of the number of people who purchased all three types of products. ２種類の商品の組み合わせに対する演算例を示す図である。FIG. 3 is a diagram showing an example of calculation for a combination of two types of products. ３種類の商品の組み合わせに対する演算例を示す図である。FIG. 3 is a diagram showing an example of calculation for a combination of three types of products. ＣＰＵのハードウェア例を示す図である。It is a diagram showing an example of hardware of a CPU. 複数のコアに対するデータ割り当て例を示す図である。FIG. 3 is a diagram illustrating an example of data allocation to multiple cores. データ処理装置の機能例を示す図である。FIG. 2 is a diagram illustrating a functional example of a data processing device. 組み合わせ計算例を示すフローチャートである。It is a flowchart which shows the example of a combination calculation. 複数のコアに対するデータ割り当ての比較例を示す図である。FIG. 3 is a diagram illustrating a comparative example of data allocation to multiple cores. 各コアが計算する組み合わせ数の比較を示す図である。FIG. 3 is a diagram showing a comparison of the number of combinations calculated by each core. 第２の実施の形態のキャッシュ使用例（その１）を示す図である。FIG. 7 is a diagram showing an example (part 1) of cache use according to the second embodiment. 第２の実施の形態のキャッシュ使用例（その２）を示す図である。FIG. 7 is a diagram showing an example (part 2) of cache use according to the second embodiment. 第３の実施の形態の組み合わせ計算の実行順序の例を示す図である。FIG. 13 illustrates an example of an execution order of combination calculations according to the third embodiment; 商品数が１０個の場合の組み合わせ計算の実行順序の例を示す図である。FIG. 7 is a diagram illustrating an example of the execution order of combination calculations when the number of products is 10. 第３の実施の形態のキャッシュ使用例を示す図である。FIG. 13 illustrates an example of cache usage according to the third embodiment; 第３の実施の形態のキャッシュ使用例（続き）を示す図である。FIG. 13 illustrates a continuation of an example of cache usage according to the third embodiment; 第３の実施の形態のキャッシュ使用例（続き）を示す図である。It is a figure which shows the cache use example (continued) of 3rd Embodiment. 組み合わせ計算例を示すフローチャートである。It is a flowchart which shows the example of a combination calculation. 第４の実施の形態のセクタキャッシュの例を示す図である。FIG. 7 is a diagram showing an example of a sector cache according to a fourth embodiment. 第４の実施の形態のキャッシュ使用例を示す図である。FIG. 12 is a diagram illustrating an example of cache use according to the fourth embodiment. 第４の実施の形態のキャッシュ使用例（続き）を示す図である。FIG. 23 illustrates a continuation of the cache usage example according to the fourth embodiment; 第４の実施の形態のキャッシュ使用例（続き）を示す図である。FIG. 12 is a diagram showing an example of cache use (continued) according to the fourth embodiment. 組み合わせ計算例を示すフローチャートである。It is a flowchart which shows the example of a combination calculation. 第２の実施の形態の組み合わせ計算を一般化したフローチャートである。It is a flowchart which generalizes the combination calculation of a 2nd embodiment. 第３の実施の形態の組み合わせ計算を一般化したフローチャートである。It is a flowchart which generalizes combination calculation of a 3rd embodiment. 第４の実施の形態の組み合わせ計算を一般化したフローチャートである。It is a flowchart which generalizes combination calculation of a 4th embodiment.

以下、本実施の形態について図面を参照して説明する。
［第１の実施の形態］
第１の実施の形態を説明する。 The present embodiment will be described below with reference to the drawings.
[First embodiment]
A first embodiment will be described.

図１は、第１の実施の形態のデータ処理装置を説明する図である。
データ処理装置１０は、第１のデータ集合の要素と第２のデータ集合の要素との組み合わせに対する演算を複数の演算部を用いて行う。データ処理装置１０は、記憶部１１および処理部１２を有する。 FIG. 1 is a diagram illustrating a data processing apparatus according to a first embodiment.
The data processing device 10 uses a plurality of calculation units to perform calculations on combinations of elements of the first data set and elements of the second data set. The data processing device 10 has a storage section 11 and a processing section 12.

記憶部１１は、ＲＡＭ（Random Access Memory）などの揮発性の半導体メモリでもよいし、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの不揮発性ストレージでもよい。処理部１２は、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＤＳＰ（Digital Signal Processor）などのプロセッサである。ただし、処理部１２は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの特定用途の電子回路を含んでもよい。プロセッサは、ＲＡＭなどのメモリ（記憶部１１でもよい）に記憶されたプログラムを実行する。 The storage unit 11 may be a volatile semiconductor memory such as a RAM (Random Access Memory), or may be a nonvolatile storage such as an HDD (Hard Disk Drive) or a flash memory. The processing unit 12 is, for example, a processor such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a DSP (Digital Signal Processor). However, the processing unit 12 may include a specific purpose electronic circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The processor executes a program stored in a memory such as a RAM (or the storage unit 11).

ここで、複数のプロセッサの集合を「マルチプロセッサ」と言うことがある。また、複数のプロセッサコアを有するプロセッサを「マルチコアプロセッサ」と言うことがある。処理部１２は、例えばマルチプロセッサまたはマルチコアプロセッサである。 Here, a set of multiple processors is sometimes referred to as a "multiprocessor." Further, a processor having multiple processor cores is sometimes referred to as a "multi-core processor." The processing unit 12 is, for example, a multiprocessor or a multicore processor.

処理部１２は、Ｎ個の演算部を有する。Ｎは２以上の整数である。図１ではＮ＝２の場合が例示される。Ｎ＝２の場合、処理部１２は、演算部１２ａ，１２ｂを有する。例えば処理部１２がマルチプロセッサの場合、演算部１２ａ，１２ｂは、処理部１２が有するプロセッサでもよい。例えば処理部１２がマルチコアプロセッサの場合、演算部１２ａ，１２ｂは、処理部１２が有するプロセッサコアでもよい。演算部１２ａ，１２ｂは、組み合わせの演算に用いられる。 The processing unit 12 has N calculation units. N is an integer of 2 or more. In FIG. 1, a case where N=2 is illustrated. When N=2, the processing section 12 includes calculation sections 12a and 12b. For example, when the processing unit 12 is a multiprocessor, the calculation units 12a and 12b may be processors included in the processing unit 12. For example, when the processing unit 12 is a multi-core processor, the calculation units 12a and 12b may be processor cores included in the processing unit 12. The calculation units 12a and 12b are used for combination calculations.

ここで、対象となる組み合わせは第１のデータ集合の要素と第２のデータ集合の要素の組み合わせとする。この対象の組み合わせの演算に用いられる第１のデータ集合は２Ｎ個のデータを含む。Ｎ＝２の例では、第１のデータ集合は、第１のデータ集合の要素として４つのデータｄ１，ｄ２，ｄ３，ｄ４を含む。第１のデータ集合の要素との組み合わせ対象となる第２のデータ集合の要素を「相手データ」と言う。組み合わせの演算では、第１のデータ集合の要素ごとに当該要素の演算を担当する演算部が割り当てられる。第１のデータ集合の１つの要素に対応する全組み合わせの演算を１スレッドで実行することで、演算に使用するデータの入れ替えが少なくて済み、演算を効率的に行えるためである。 Here, the target combination is a combination of elements of the first data set and elements of the second data set. The first data set used for this target combination calculation includes 2N pieces of data. In the example of N=2, the first data set includes four data d1, d2, d3, and d4 as elements of the first data set. The elements of the second data set that are to be combined with the elements of the first data set are referred to as "partner data." In the combination calculation, a calculation unit is assigned to each element of the first data set in charge of calculation of the element. This is because by executing the calculations for all combinations corresponding to one element of the first data set in one thread, there is less need to replace data used in the calculations, and the calculations can be performed efficiently.

また、組み合わせの演算では、重複する組み合わせに対する演算が省略される。重複する組み合わせに対する演算の省略により、余計な演算を省ける。例えば、データｄ１とある相手データとの組み合わせと、データｄ２と他の相手データとの組み合わせが、同一の組み合わせに相当する場合、データｄ１についてだけ該当の相手データとの演算を行えばよく、データｄ２と当該他の相手データとの演算は省略される。このため、データｄ１～ｄ４は、組み合わせによる演算対象となる相手データの数が異なる。そこで、処理部１２は各演算部に対して、次のようにデータｄ１～ｄ４を割り当てる。 Furthermore, in the calculation of combinations, calculations for overlapping combinations are omitted. By omitting operations for duplicate combinations, unnecessary operations can be omitted. For example, if the combination of data d1 and certain partner data and the combination of data d2 and other partner data correspond to the same combination, it is only necessary to perform calculations on data d1 with the corresponding partner data; The calculation between d2 and the other partner data is omitted. For this reason, the data d1 to d4 differ in the number of partner data to be calculated in combination. Therefore, the processing unit 12 allocates data d1 to d4 to each calculation unit as follows.

処理部１２は、Ｎ個のデータから、演算対象の相手データの数（相手データとの組み合わせ数）でのソート結果における上位のＮ個の第１データと下位のＮ個の第２データとを特定する。テーブル１１ａは、Ｎ＝２の例において、データｄ１～ｄ４それぞれの相手データとの組み合わせ数を例示する。データｄ１～ｄ４それぞれの演算対象の相手データの数は、データｄ１～ｄ４それぞれの組み合わせ対象の相手データの数とも言える。例えば、テーブル１１ａは記憶部１１に記憶されてもよい。データｄ１～ｄ４それぞれの相手データとの組み合わせ数は次の通りであると仮定する。 From the N pieces of data, the processing unit 12 sorts the upper N pieces of first data and the lower N pieces of second data in the sorting result based on the number of partner data to be operated on (the number of combinations with the partner data). Identify. Table 11a illustrates the number of combinations of each of data d1 to d4 with partner data in an example where N=2. The number of partner data to be calculated for each of the data d1 to d4 can also be said to be the number of partner data to be combined to each of the data d1 to d4. For example, the table 11a may be stored in the storage unit 11. It is assumed that the number of combinations of data d1 to d4 with each other's data is as follows.

データｄ１の相手データとの組み合わせ数はｍ１である。データｄ２の相手データとの組み合わせ数はｍ２である。ｍ２＜ｍ１である。データｄ３の相手データとの組み合わせ数はｍ３である。ｍ３＜ｍ２である。データｄ４の相手データとの組み合わせ数はｍ４である。ｍ４＜ｍ３である。なお、ｍ１～ｍ４は何れも正の整数である。この場合、処理部１２は、組み合わせ数の上位２個のデータｄ１，ｄ２、および、下位２個のデータｄ３，ｄ４を特定する。 The number of combinations of data d1 and partner data is m1. The number of combinations of data d2 and partner data is m2. m2<m1. The number of combinations of data d3 and partner data is m3. m3<m2. The number of combinations of data d4 and partner data is m4. m4<m3. Note that m1 to m4 are all positive integers. In this case, the processing unit 12 specifies the top two pieces of data d1 and d2 and the bottom two pieces of data d3 and d4 in the number of combinations.

処理部１２は、上位のＮ個の第１データそれぞれを、組み合わせ数の降順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。処理部１２は、下位のＮ個の第２データそれぞれを、組み合わせ数の昇順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。 The processing unit 12 allocates each of the N pieces of high-order first data to each of the N pieces of calculation units from the first calculation unit to the N-th calculation unit in descending order of the number of combinations. The processing unit 12 allocates each of the lower N second data to each of the N calculation units from the first calculation unit to the N-th calculation unit in ascending order of the number of combinations.

上記のＮ＝２の例では、演算部１２ａが第１の演算部でもよく、演算部１２ｂが第２の演算部でもよい。処理部１２は、組み合わせ数の上位２個のデータｄ１，ｄ２それぞれを、組み合わせ数の降順となるように、演算部１２ａ，１２ｂそれぞれに割り当てる。すなわち、処理部１２は、演算部１２ａにデータｄ１を割り当て、演算部１２ｂにデータｄ２を割り当てる。また、処理部１２は、組み合わせ数の下位２個のデータｄ３，ｄ４それぞれを、組み合わせ数の昇順となるように、演算部１２ａ，１２ｂそれぞれに割り当てる。すなわち、処理部１２は、演算部１２ａにデータｄ４を割り当て、演算部１２ｂにデータｄ３を割り当てる。 In the above example of N=2, the calculation unit 12a may be the first calculation unit, and the calculation unit 12b may be the second calculation unit. The processing unit 12 allocates the two data d1 and d2 with the highest number of combinations to the respective calculation units 12a and 12b in descending order of the number of combinations. That is, the processing unit 12 allocates data d1 to the calculation unit 12a, and allocates data d2 to the calculation unit 12b. Furthermore, the processing unit 12 allocates the two lowest data d3 and d4 in the number of combinations to the calculation units 12a and 12b, respectively, in ascending order of the number of combinations. That is, the processing unit 12 allocates data d4 to the calculation unit 12a and data d3 to the calculation unit 12b.

なお、上記の割り当てについて、演算部１２ｂ，１２ａの順序でみれば、演算部１２ｂ，１２ａそれぞれにデータｄ２，ｄ１それぞれが組み合わせ数の昇順となるように割り当てられていると言える。同様に、演算部１２ｂ，１２ａそれぞれにデータｄ３，ｄ４それぞれが組み合わせ数の降順となるように割り当てられていると言える。 Regarding the above allocation, when looking at the order of the calculation units 12b and 12a, it can be said that the data d2 and d1 are allocated to the calculation units 12b and 12a, respectively, in ascending order of the number of combinations. Similarly, it can be said that the data d3 and d4 are respectively allocated to the calculation units 12b and 12a in descending order of the number of combinations.

また、上記の処理部１２による割り当て処理は、演算部１２ａ，１２ｂの何れかにより実行されてもよいし、処理部１２が備える他の演算部（制御用の演算部）により実行されてもよい。図１では当該他の演算部の図示は省略されている。 Further, the allocation processing by the processing unit 12 described above may be executed by either the calculation units 12a or 12b, or may be executed by another calculation unit (control calculation unit) included in the processing unit 12. . In FIG. 1, illustration of the other calculation units is omitted.

処理部１２は、Ｎ個の演算部に対する２Ｎ個のデータの割り当て結果に基づいて、２Ｎ個のデータのうちのＮ個のデータに対する演算をＮ個の演算部により並列に実行する。上記のＮ＝２の例では、処理部１２は、演算部１２ａ，１２ｂに対するデータｄ１～ｄ４の割り当て結果に基づいて、２個のデータに対する演算を、演算部１２ａ，１２ｂにより並列に実行する。 The processing unit 12 uses the N calculation units to execute operations on N data out of the 2N data in parallel based on the results of allocation of the 2N data to the N calculation units. In the above example where N=2, the processing unit 12 causes the calculation units 12a and 12b to execute calculations on the two pieces of data in parallel based on the results of allocation of data d1 to d4 to the calculation units 12a and 12b.

図１には、演算部１２ａ，１２ｂによる演算の実行例２０が示されている。実行例２０は、演算部１２ａ，１２ｂそれぞれによる演算の実行時間の例を示す。横軸の左から右向かう方向が時間の正方向である。例えば、演算部１２ａは、データｄ１の演算を実行し、その後データｄ４の演算を実行する。演算部１２ｂは、データｄ２の演算を実行し、その後データｄ３の演算を実行する。このように、処理部１２は、演算部１２ａ，１２ｂを用いて、同時に２個のデータに対する演算を並列に実行する。 FIG. 1 shows an example 20 of execution of calculations by the calculation units 12a and 12b. Execution example 20 shows an example of the execution time of calculations by each of the calculation units 12a and 12b. The direction from left to right on the horizontal axis is the positive direction of time. For example, the calculation unit 12a performs calculations on data d1, and then performs calculations on data d4. The calculation unit 12b performs calculations on data d2, and then performs calculations on data d3. In this way, the processing unit 12 uses the calculation units 12a and 12b to simultaneously execute calculations on two pieces of data in parallel.

第１の実施の形態のデータ処理装置１０によれば、複数の相手データそれぞれとの組み合わせによる演算に用いられる２Ｎ個のデータから、演算対象の相手データの数でのソート結果における上位のＮ個の第１データと下位のＮ個の第２データとが特定される。上位のＮ個の第１データそれぞれが、演算対象の相手データの数の降順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てられる。下位のＮ個の第２データそれぞれが、演算対象の相手データの数の昇順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てられる。Ｎ個の演算部に対する２Ｎ個のデータの割り当て結果に基づいて、２Ｎ個のデータのうちのＮ個のデータに対する演算がＮ個の演算部により並列に実行される。これにより、複数の演算部の処理量のばらつきを低減できる。 According to the data processing device 10 of the first embodiment, from 2N pieces of data used for calculations in combination with each of a plurality of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of data processed by each other, the top N pieces of data are sorted according to the number of piece of pieces of pieces of pieces of pieces of pieces of pieces of pieces of pieces of data that are to be calculated. The first data and the lower N second data are specified. Each of the N pieces of first data at a higher rank is assigned to each of the N pieces of calculation parts from the first calculation part to the Nth calculation part in descending order of the number of partner data to be calculated. The lower N pieces of second data are respectively assigned to the N pieces of calculation units from the first calculation unit to the Nth calculation unit in ascending order of the number of partner data to be calculated. Based on the results of allocation of 2N data to the N calculation units, operations on N data out of the 2N data are executed in parallel by the N calculation units. This makes it possible to reduce variations in the amount of processing performed by the plurality of arithmetic units.

ここで、例えば、データｄ１，ｄ２，ｄ３，ｄ４を、相手データとの組み合わせ数（演算対象の相手データの数）の多い順に、演算部１２ａ，１２ｂにサイクリックに割り当てることも考えられる。サイクリックに割り当てる場合、演算部１２ａにデータｄ１，ｄ３が、演算部１２ｂにデータｄ２，ｄ４がそれぞれ割り当てられる。しかし、この場合、演算部１２ａが担当するデータｄ１，ｄ３に対する処理量と、演算部１２ｂが担当するデータｄ２，ｄ４に対する処理量との差は比較的大きくなる。その結果、演算部１２ａによるデータｄ１，ｄ３に対する合計の計算時間が比較的長くなり、全体の演算の終了が遅延する。 Here, for example, it is conceivable to cyclically allocate the data d1, d2, d3, and d4 to the calculation units 12a and 12b in descending order of the number of combinations with partner data (number of partner data to be calculated). In the case of cyclic allocation, data d1 and d3 are allocated to the calculation unit 12a, and data d2 and d4 are allocated to the calculation unit 12b, respectively. However, in this case, the difference between the processing amount for the data d1 and d3 handled by the calculation section 12a and the processing amount for the data d2 and d4 handled by the calculation section 12b becomes relatively large. As a result, the total calculation time for the data d1 and d3 by the calculation unit 12a becomes relatively long, and the end of the entire calculation is delayed.

一方、データ処理装置１０によれば、実行例２０に示されるように、演算部１２ａが担当するデータｄ１，ｄ４に対する処理量と、演算部１２ｂが担当するデータｄ２，ｄ３に対する処理量との差を低減できる。その結果、演算部１２ａの計算時間と演算部１２ｂの計算時間との差が低減され、全体の演算の終了の遅延が低減される。 On the other hand, according to the data processing device 10, as shown in execution example 20, there is a difference between the amount of processing for data d1 and d4 handled by the calculation unit 12a and the amount of processing for data d2 and d3 handled by the calculation unit 12b. can be reduced. As a result, the difference between the calculation time of the calculation unit 12a and the calculation time of the calculation unit 12b is reduced, and the delay in completing the entire calculation is reduced.

［第２の実施の形態］
次に、第２の実施の形態を説明する。
図２は、第２の実施の形態のデータ処理装置のハードウェア例を示す図である。 [Second embodiment]
Next, a second embodiment will be described.
FIG. 2 is a diagram illustrating a hardware example of a data processing device according to the second embodiment.

データ処理装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＨＤＤ１０３、ＧＰＵ１０４、入力インタフェース１０５、媒体リーダ１０６および通信インタフェース１０７を有する。データ処理装置１００が有するこれらのユニットは、データ処理装置１００の内部でバスに接続されている。ＣＰＵ１０１は、第１の実施の形態の処理部１２に対応する。ＲＡＭ１０２またはＨＤＤ１０３は、第１の実施の形態の記憶部１１に対応する。 The data processing device 100 includes a CPU 101, a RAM 102, an HDD 103, a GPU 104, an input interface 105, a media reader 106, and a communication interface 107. These units included in the data processing device 100 are connected to a bus inside the data processing device 100. The CPU 101 corresponds to the processing unit 12 of the first embodiment. RAM 102 or HDD 103 corresponds to storage unit 11 in the first embodiment.

ＣＰＵ１０１は、プログラムの命令を実行するプロセッサである。ＣＰＵ１０１は、ＨＤＤ１０３に記憶されたプログラムやデータの少なくとも一部をＲＡＭ１０２にロードし、プログラムを実行する。ＣＰＵ１０１は複数のプロセッサコアを含むマルチコアプロセッサである。プロセッサコアはＣＰＵコアと言われてもよい。以下ではプロセッサコアをコアと称する。 The CPU 101 is a processor that executes program instructions. The CPU 101 loads at least a portion of the programs and data stored in the HDD 103 into the RAM 102 and executes the programs. The CPU 101 is a multi-core processor that includes multiple processor cores. The processor cores may also be referred to as CPU cores. Hereinafter, the processor cores will be referred to as cores.

ＲＡＭ１０２は、ＣＰＵ１０１が実行するプログラムやＣＰＵ１０１が演算に用いるデータを一時的に記憶する揮発性の半導体メモリである。なお、データ処理装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数個のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the CPU 101 and data used for calculations by the CPU 101. Note that the data processing device 100 may include a type of memory other than RAM, or may include a plurality of memories.

ＨＤＤ１０３は、ＯＳ（Operating System）やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性の記憶装置である。なお、データ処理装置１００は、フラッシュメモリやＳＳＤ（Solid State Drive）などの他の種類の記憶装置を備えてもよく、複数の不揮発性の記憶装置を備えてもよい。 The HDD 103 is a nonvolatile storage device that stores software programs such as an OS (Operating System), middleware, and application software, and data. Note that the data processing device 100 may include other types of storage devices such as flash memory and SSD (Solid State Drive), or may include a plurality of nonvolatile storage devices.

ＧＰＵ１０４は、ＣＰＵ１０１からの命令に従って、データ処理装置１００に接続されたディスプレイ１１１に画像を出力する。ディスプレイ１１１としては、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、プラズマディスプレイ、有機ＥＬ（ＯＥＬ：Organic Electro-Luminescence）ディスプレイなど、任意の種類のディスプレイを用いることができる。 The GPU 104 outputs images to a display 111 connected to the data processing device 100 in accordance with instructions from the CPU 101. The display 111 may be any type of display, such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD), a plasma display, or an organic electro-luminescence (OEL) display.

入力インタフェース１０５は、データ処理装置１００に接続された入力デバイス１１２から入力信号を取得し、ＣＰＵ１０１に出力する。入力デバイス１１２としては、マウス、タッチパネル、タッチパッド、トラックボールなどのポインティングデバイス、キーボード、リモートコントローラ、ボタンスイッチなどを用いることができる。また、データ処理装置１００に、複数の種類の入力デバイスが接続されていてもよい。 The input interface 105 acquires an input signal from the input device 112 connected to the data processing apparatus 100 and outputs it to the CPU 101. As the input device 112, a mouse, a touch panel, a touch pad, a pointing device such as a trackball, a keyboard, a remote controller, a button switch, etc. can be used. Further, a plurality of types of input devices may be connected to the data processing apparatus 100.

媒体リーダ１０６は、記録媒体１１３に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体１１３として、例えば、磁気ディスク、光ディスク、光磁気ディスク（ＭＯ：Magneto-Optical disk）、半導体メモリなどを使用できる。磁気ディスクには、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤが含まれる。光ディスクには、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）が含まれる。 The media reader 106 is a reading device that reads programs and data recorded on the recording medium 113. For example, a magnetic disk, an optical disk, a magneto-optical disk (MO: Magneto-Optical disk), a semiconductor memory, etc. can be used as the recording medium 113. Magnetic disks include flexible disks (FD: Flexible Disks) and HDDs. Optical disks include compact discs (CDs) and digital versatile discs (DVDs).

媒体リーダ１０６は、例えば、記録媒体１１３から読み取ったプログラムやデータを、ＲＡＭ１０２やＨＤＤ１０３などの他の記録媒体にコピーする。読み取られたプログラムは、例えば、ＣＰＵ１０１によって実行される。なお、記録媒体１１３は可搬型記録媒体であってもよく、プログラムやデータの配布に用いられることがある。また、記録媒体１１３やＨＤＤ１０３を、コンピュータ読み取り可能な記録媒体と言うことがある。 For example, the media reader 106 copies programs and data read from the recording medium 113 to other recording media such as the RAM 102 and the HDD 103. The read program is executed by the CPU 101, for example. Note that the recording medium 113 may be a portable recording medium, and may be used for distributing programs and data. Further, the recording medium 113 and the HDD 103 are sometimes referred to as computer-readable recording media.

通信インタフェース１０７は、ネットワーク１１４に接続され、ネットワーク１１４を介して他の情報処理装置と通信する。通信インタフェース１０７は、スイッチやルータなどの有線通信装置に接続される有線通信インタフェースでもよいし、基地局やアクセスポイントなどの無線通信装置に接続される無線通信インタフェースでもよい。 Communication interface 107 is connected to network 114 and communicates with other information processing devices via network 114. The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or a router, or a wireless communication interface connected to a wireless communication device such as a base station or access point.

なお、データ処理装置１００は、複数のＣＰＵ１０１を有し、大規模な演算に用いられるＨＰＣ（High-Performance Computing）システムでもよい。
データ処理装置１００は、パターンマイニングに用いられる。パターンマイニングでは、データ処理装置１００は、パターンマイニングにより、条件を満たすサンプルが所定数以上ある条件の組み合わせを求める。以下では、一例として、マーケティング分野における購買予測や広告、ｗｅｂ掲載などの計画を行うために行われるパターンマイニングを例示する。ただし、データ処理装置１００は、選挙投票率の予測といった政治分野や病気の原因発見といった医療分野など、他分野におけるパターンマイニングに適用することもできる。 Note that the data processing device 100 may be an HPC (High-Performance Computing) system that includes a plurality of CPUs 101 and is used for large-scale calculations.
The data processing device 100 is used for pattern mining. In pattern mining, the data processing device 100 uses pattern mining to find combinations of conditions for which a predetermined number or more of samples satisfy the conditions. Below, as an example, pattern mining that is performed to plan purchase predictions, advertisements, web postings, etc. in the marketing field will be illustrated. However, the data processing device 100 can also be applied to pattern mining in other fields, such as the political field, such as predicting election turnout, and the medical field, such as discovering the causes of diseases.

図３は、商品の組み合わせの抽出例を示す図である。
例えば、マーケティング分野では、次のような場合にパターンマイニングが行われ得る。第１の例は、商品の組み合わせに関し、組み合わせ内の全ての商品を購入した人数が、例えば１００人など一定数以上いるような組み合わせを求める場合である。第２の例は、特定の２つの商品の両方を購入した人数が、例えば１００人など一定数以上であるか否かを調査する場合である。あるいは、第１の例、第２の例以外の他の目的でパターンマイニングが行われることもある。 FIG. 3 is a diagram showing an example of extracting product combinations.
For example, in the marketing field, pattern mining may be performed in the following cases. The first example is a case where a combination of products is sought in which the number of people who purchased all the products in the combination is equal to or greater than a certain number, such as 100 people. A second example is a case where it is investigated whether the number of people who have purchased both of two specific products is greater than or equal to a certain number, such as 100 people. Alternatively, pattern mining may be performed for purposes other than the first and second examples.

第２の実施の形態の例では、データ処理装置１００により、任意のｋ個の商品の組み合わせを選択し、それら全てを購入した顧客の人数がｓ＿ｍｉｎ以上である商品の組み合わせを抽出する場合を例示する。ｋは２以上の整数である。ｓ＿ｍｉｎは１以上の整数である。 In the example of the second embodiment, a case is exemplified in which the data processing device 100 selects a combination of k arbitrary products and extracts a combination of products for which the number of customers who purchased all of them is s_min or more. do. k is an integer of 2 or more. s_min is an integer greater than or equal to 1.

商品購入履歴データ２００は、顧客ＩＤに対する「商品１」、「商品２」、「商品３」、…の購入履歴を示す。商品の列の「０」は購入しなかったことを示し、「１」は購入したことを示す。出力リスト２１０は、商品購入履歴データ２００に基づいてデータ処理装置１００により出力される。出力リスト２１０は、あるｋ個の商品の組み合わせを全て購入した人数がｓ＿ｍｉｎ以上である当該ｋ個の商品の組み合わせを示す。次に、ｋ＝３の場合の計算例を説明する。 The product purchase history data 200 shows the purchase history of "product 1", "product 2", "product 3", etc. for the customer ID. "0" in the product column indicates that the product was not purchased, and "1" indicates that it was purchased. The output list 210 is output by the data processing device 100 based on the product purchase history data 200. The output list 210 shows combinations of k products for which the number of people who purchased all of the combinations of k products is equal to or greater than s_min. Next, a calculation example when k=3 will be explained.

図４は、３種類の商品を全て購入した人数の計算例を示す図である。
例えば、商品購入履歴データ２００に基づいて「商品１」、「商品２」、「商品３」の全てを購入した人数を次のように計算する。まず、データ処理装置１００は、商品購入履歴データ２００の「商品１」の列と「商品２」の列との論理積（ＡＮＤ）を計算する（ステップＳＴ１）。各商品の列に相当するデータは、０，１が並ぶベクトルとなる。Ａｎｄ_{（１，２）}は、「商品１」の列と「商品２」の列との論理積の計算結果である。例えば、「商品１」の列が「１１１１」であり、「商品２」の列が「０１１０」の場合、Ａｎｄ_{（１，２）}は、「０１１０」となる。 FIG. 4 is a diagram showing an example of calculating the number of people who purchased all three types of products.
For example, based on the product purchase history data 200, the number of people who purchased all of "Product 1,""Product2," and "Product 3" is calculated as follows. First, the data processing device 100 calculates the logical product (AND) of the "product 1" column and the "product 2" column of the product purchase history data 200 (step ST1). The data corresponding to the column of each product is a vector of 0's and 1's. And _{(1, 2)} is the result of calculating the logical product of the "Product 1" column and the "Product 2" column. For example, if the column for "Product 1" is "1111" and the column for "Product 2" is "0110", And _{(1, 2)} becomes "0110".

次に、データ処理装置１００は、Ａｎｄ_{（１，２）}と「商品３」の列との論理積Ａｎｄ_{（１，２，３）}を計算する（ステップＳＴ２）。「商品３」の列が「０１００」の場合、Ａｎｄ_{（１，２，３）}は「０１００」となる。 Next, the data processing device 100 calculates the logical product And _{(1, 2} _{, 3)} of And (1, 2) and the column of "product 3" (step ST2). If the column of "Product 3" is "0100", And _{(1, 2, 3)} becomes "0100".

そして、データ処理装置１００は、ステップＳＴ２の結果、すなわち、Ａｎｄ_{（１，２，３）}の要素の総和Ｓｕｍ_{（１，２，３）}を計算する（ステップＳＴ３）。Ａｎｄ_{（１，２，３）}に対して、Ｓｕｍ_{（１，２，３）}＝０＋１＋０＋０＝１となる。データ処理装置１００は、例えば、Ｓｕｍ_{（１，２，３）}がｓ＿ｍｉｎ以上の場合、「商品１」、「商品２」、「商品３」の組み合わせを出力リスト２１０に追加する。 Then, the data processing device 100 calculates the result of step ST2, that is, the total sum Sum _{(1, 2, 3)} of the elements of And _(1, 2, 3) (step ST3). For And _{(1, 2, 3)} , Sum _{(1, 2, 3)} = 0 + 1 + 0 + 0 = 1. For example, if Sum _{(1, 2, 3)} is greater than or equal to s_min, the data processing device 100 adds the combination of “product 1,” “product 2,” and “product 3” to the output list 210.

次に、ステップＳＴ１に例示される２種類の商品の組み合わせに対する演算の例を説明する。
図５は、２種類の商品の組み合わせに対する演算例を示す図である。 Next, an example of calculation for the combination of two types of products exemplified in step ST1 will be described.
FIG. 5 is a diagram showing an example of a calculation for a combination of two types of products.

一例として、商品数ｄ＿ｘ＝１０とする。マトリクス２０１は、２つの商品ｘ，ｙの組み合わせに対応する論理積Ａｎｄ_{（ｘ，ｙ）}の各演算を例示する。列ｘは、商品購入履歴データ２００の商品ｘに対応する。行ｙは、商品購入履歴データ２００の商品ｙに対応する。すなわち、「列１」、「列２」、…、「列１０」は、それぞれ「商品１」、「商品２」、…、「商品１０」に対応する。また、「行１」、「行２」、…、「行１０」は、それぞれ「商品１」、「商品２」、…、「商品１０」に対応する。 As an example, the number of products d_x=10. The matrix 201 illustrates each operation of logical product And _(x,y) corresponding to the combination of two products x and y. Column x corresponds to product x in the product purchase history data 200. Row y corresponds to product y in the product purchase history data 200. That is, "column 1", "column 2", ..., "column 10" correspond to "product 1", "product 2", ..., "product 10", respectively. Also, "row 1", "row 2", ..., "row 10" correspond to "product 1", "product 2", ..., "product 10", respectively.

マトリクス２０１の１つのマスが、商品ｘ，ｙそれぞれの列の組み合わせに対する１つの論理積Ａｎｄ_{（ｘ，ｙ）}の演算を示す。例えば、マトリクス２０１の７行８列目のマスは、商品購入履歴データ２００の「商品７」の列と同「商品８」の列との論理積Ａｎｄ_{（８，７）}を示す。商品数ｄ＿ｘ＝１０の場合、２つの商品の列の組み合わせ総数は、_１０Ｃ_２＝４５通りとなる。なお、マトリクス２０１の斜線が記載されたマスは、当該マスに対応する演算が実行されないことを示す。 One square of the matrix 201 indicates one logical product And _{(x, y) operation for each combination of columns of products x and y} . For example, the cell in the 7th row and 8th column of the matrix 201 indicates the logical product And _{(8, 7)} of the column "Product 7" and the column "Product 8" of the product purchase history data 200. When the number of products d_x=10, the total number of combinations of two product columns is ₁₀ C ₂ =45. Note that a square in the matrix 201 with diagonal lines indicates that the calculation corresponding to the square is not executed.

データ処理装置１００は、論理積Ａｎｄ_{（ｘ，ｙ）}に基づいて、３種類の商品の組み合わせに対する演算を行う。
図６は、３種類の商品の組み合わせに対する演算例を示す図である。 The data processing device 100 performs calculations for combinations of three types of products based on the logical product And _{(x, y)} .
FIG. 6 is a diagram showing an example of calculation for a combination of three types of products.

マトリクス２０２は、Ａｎｄ_{（ｘ，ｙ）}のデータと、商品ｘ，ｙとは異なる商品ｚのデータ（相手データ）との組み合わせに対応する論理積Ａｎｄ_{（ｘ，ｙ，ｚ）}の各演算を例示する。マトリクス２０２における列ｚは、商品購入履歴データ２００の商品ｚに対応する。マトリクス２０２の行は、Ａｎｄ_{（ｘ，ｙ）}に相当する。なお、図中、マトリクス２０２の行は、Ａｎｄ_{（ｘ，ｙ）}に対応するラベル（ｘ，ｙ）により識別される。マトリクス２０２の横軸におけるｚの値の数はｄ＿ｘ個である。マトリクス２０２の縦軸における（ｘ，ｙ）の数は_ｄ＿ｘＣ_２である。 The matrix 202 illustrates each operation of the logical product And _{(x, y, z)} corresponding to the combination of data of And _{(x, y) and data of product z (other data) different from products x and y.} do. Column z in matrix 202 corresponds to product z in product purchase history data 200. A row of matrix 202 corresponds to And _(x,y) . Note that in the figure, the rows of the matrix 202 are identified by labels ( _{x, y) corresponding to And (} x, y). The number of z values on the horizontal axis of matrix 202 is d_x. The number of (x, y) on the vertical axis of matrix 202 is _{d_x} C ₂ .

マトリクス２０２の行は、（９，１０），（８，１０），（８，９），（７，１０），（７，９），…，（１，２）の順に並べられる。列は、ｚの１０～１の順に並べられる。Ａｎｄ_{（ｘ，ｙ）}ごとに計算する組み合わせ数、すなわち、演算対象の相手データの数は、（ｘ，ｙ）に対して組み合わせるｚの数として、組み合わせの重複がないように予め特定される。マトリクス２０２の行に対応する（ｘ，ｙ）の並び順は、当該行に対応するＡｎｄ_{（ｘ，ｙ）}のデータを、当該Ａｎｄ_{（ｘ，ｙ）}に対して計算する組み合わせ数（ｚの数）の降順にソートした結果に相当する。 The rows of the matrix 202 are arranged in the order of (9, 10), (8, 10), (8, 9), (7, 10), (7, 9), ..., (1, 2). The columns are arranged in the order of z from 10 to 1. The number of combinations to be calculated for each And _{(x, y)} , that is, the number of partner data to be operated on, is specified in advance as the number of z to be combined for (x, y) so that there is no duplication of combinations. The arrangement order of (x, y) corresponding to a row of the matrix 202 is determined by the number of combinations (number of z) to calculate And _{(x, y)} data corresponding to the row with respect to the And _{(x, y).} ) is sorted in descending order.

マトリクス２０２の１つのマスが、Ａｎｄ_{（ｘ，ｙ）}と商品ｚの列との組み合わせに対する１つの論理積Ａｎｄ_{（ｘ，ｙ，ｚ）}の演算を示す。商品数ｄ＿ｘ＝１０の場合、Ａｎｄ_{（ｘ，ｙ）}と商品ｚの列との組み合わせ総数は、_１０Ｃ_３＝１２０通りとなる。なお、マトリクス２０２の斜線が記載されたマスは、当該マスに対応する演算が実行されないことを示す。 One square of the matrix 202 indicates one logical product operation And _{(x, y} _{, z) for the combination of And (x,} y) and the column of product z. When the number of products d_x=10, the total number of combinations of And _{(x, y)} and the column of product z is ₁₀ C ₃ =120. Note that a square in the matrix 202 with diagonal lines indicates that the calculation corresponding to the square is not executed.

マトリクス２０１やマトリクス２０２で示される論理積の演算は、１行に対応する演算を１スレッドとしてＣＰＵ１０１の各コアに割り振ることで、マルチスレッドで並列に実行可能である。 The logical AND operations shown in the matrix 201 and the matrix 202 can be executed in parallel in multiple threads by allocating the operation corresponding to one row to each core of the CPU 101 as one thread.

図７は、ＣＰＵのハードウェア例を示す図である。
ＣＰＵ１０１は、コア１２１，１２２，１２３，１２４およびキャッシュメモリ１２５を有する。コア１２１～１２４は、それぞれが並列に演算を実行するプロセッサコアである。キャッシュメモリ１２５は、コア１２１～１２４それぞれの演算に使用されるデータが格納される。キャッシュメモリ１２５は、コア１２１～１２４から、ＲＡＭ１０２よりも高速にアクセス可能である。キャッシュメモリ１２５には演算に用いられるデータがＲＡＭ１０２からロードされる。また、キャッシュメモリ１２５に格納されたデータがＲＡＭ１０２に書き込まれることもある。キャッシュメモリ１２５は、コア１２１～１２４により共有される。例えば、あるコアの演算のためにキャッシュメモリ１２５にロードされたデータは、他のコアの演算にも再利用できる。 FIG. 7 is a diagram showing an example of CPU hardware.
CPU 101 has cores 121, 122, 123, 124 and cache memory 125. Cores 121 to 124 are processor cores that each execute operations in parallel. The cache memory 125 stores data used for calculations by each of the cores 121-124. Cache memory 125 can be accessed faster than RAM 102 from cores 121 to 124. Data used for calculations is loaded into the cache memory 125 from the RAM 102 . Furthermore, data stored in the cache memory 125 may be written to the RAM 102. Cache memory 125 is shared by cores 121-124. For example, data loaded into the cache memory 125 for the operation of one core can be reused for the operation of other cores.

例えば、マトリクス２０２に対して、コア１２１～１２４には、次のように演算に用いられるデータが割り当てられる。
図８は、複数のコアに対するデータ割り当て例を示す図である。 For example, with respect to the matrix 202, data used for calculations is allocated to the cores 121 to 124 as follows.
FIG. 8 is a diagram illustrating an example of data allocation to multiple cores.

マトリクス３００は、マトリクス２０２の一部を記載したものである。第２の実施の形態の例では、４つのコア１２１～１２４に対し、ソート順位が隣接する８つ（４×２＝８）のＡｎｄ_{（ｘ，ｙ）}のデータを１セットとし、次のように各データが各コアに割り当てられる。 Matrix 300 describes a portion of matrix 202. In the example of the second embodiment, a set of eight (4×2=8) And _{(x, y)} data with adjacent sort orders is set for the four cores 121 to 124 as follows. Each data is assigned to each core.

具体的には、組み合わせ数が上位の４個のデータそれぞれが、組み合わせ数の降順となるように、１番目のコア１２１から４番目のコア１２４までの４個のコアそれぞれに割り当てられる。また、組み合わせ数が下位の４個のデータそれぞれが、組み合わせ数の昇順となるように、１番目のコア１２１から４番目のコア１２４までの４個のコアそれぞれに割り当てられる。なお、「組み合わせ数」は、演算対象の相手データの数に相当する。 Specifically, each of the four pieces of data with the highest number of combinations is allocated to each of the four cores from the first core 121 to the fourth core 124 in descending order of the number of combinations. Further, each of the four pieces of data with the lowest number of combinations is allocated to each of the four cores from the first core 121 to the fourth core 124 in ascending order of the number of combinations. Note that the "number of combinations" corresponds to the number of partner data to be calculated.

すると、マトリクス３００の例では、コア１２１～１２４に対して、次のように入れ子状にデータが割り当てられる。組み合わせ数「８」であるＡｎｄ_{（９，１０）}がコア１２１に割り当てられる。組み合わせ数「７」であるＡｎｄ_{（８，１０）}がコア１２２に割り当てられる。組み合わせ数「７」であるＡｎｄ_{（８，９）}がコア１２３に割り当てられる。組み合わせ数「６」であるＡｎｄ_{（７，１０）}がコア１２４に割り当てられる。また、組み合わせ数「６」であるＡｎｄ_{（７，９）}がコア１２４に割り当てられる。組み合わせ数「６」であるＡｎｄ_{（７，８）}がコア１２３に割り当てられる。組み合わせ数「５」であるＡｎｄ_{（６，１０）}がコア１２２に割り当てられる。組み合わせ数「５」であるＡｎｄ_{（６，９）}がコア１２１に割り当てられる。 Then, in the example of the matrix 300, data is assigned to the cores 121 to 124 in a nested manner as follows. And _{(9, 10)} , which has a combination number of “8”, is assigned to the core 121. And _{(8, 10)} , which has a combination number of “7”, is assigned to the core 122. And _{(8, 9)} , which has a combination number of “7”, is assigned to the core 123. And _{(7, 10)} , which has a combination number of “6”, is assigned to the core 124. Further, And _{(7, 9)} , which has a combination number of “6”, is assigned to the core 124. And _{(7, 8)} , which has a combination number of “6”, is assigned to the core 123. And _{(6, 10)} , which has a combination number of “5”, is assigned to the core 122. And _{(6, 9)} , which has a combination number of “5”, is assigned to the core 121.

マトリクス３００における行に対応する他のデータについても、同様にコア１２１～１２４が入れ子状に割り当てられる。
すると、コア１２１～１２４により４スレッドでマトリクス３００の行方向の演算を並列に実行することができる。例えば、コア１２１は、Ａｎｄ_{（９，１０）}と商品ｚ＝８～１の列それぞれとを組み合わせた８通り分の論理積の演算を１スレッドで実行する。コア１２２は、Ａｎｄ_{（８，１０）}と商品ｚ＝７～１の列それぞれとを組み合わせた７通り分の論理積の演算を１スレッドで実行する。コア１２３は、Ａｎｄ_{（８，９）}と商品ｚ＝７～１の列それぞれとを組み合わせた７通り分の論理積の演算を１スレッドで実行する。コア１２４は、Ａｎｄ_{（７，１０）}と商品ｚ＝６～１の列それぞれとを組み合わせた６通り分の論理積の演算を１スレッドで実行する。 For other data corresponding to rows in matrix 300, cores 121 to 124 are similarly assigned in a nested manner.
Then, the cores 121 to 124 can execute operations in the row direction of the matrix 300 in parallel using four threads. For example, the core 121 uses one thread to perform eight logical product operations combining And _{(9, 10)} and each of the columns of products z=8 to 1. The core 122 executes, in one thread, seven logical product operations combining And _{(8, 10)} and each of the columns of products z=7 to 1. The core 123 executes, in one thread, seven logical product operations combining And _{(8, 9)} and each of the columns of products z=7 to 1. The core 124 executes, in one thread, six logical product operations combining And _{(7, 10)} and each of the columns of products z=6 to 1.

コア１２４は、当該６通りの分の論理積の演算が終了すると、Ａｎｄ_{（７，９）}と商品ｚ＝６～１の列それぞれとを組み合わせた６通り分の論理積の演算を１スレッドで実行する。コア１２１～１２３も同様に、割り当てられたデータに対する演算を順次実行する。 When the core 124 completes the logical product calculation for the six ways, the core 124 uses one thread to calculate the logical product for the six ways combining And _{(7, 9)} and each column of products z=6 to 1. Execute. Similarly, the cores 121 to 123 sequentially execute operations on the assigned data.

なお、データ処理装置１００は、マトリクス２０１に関しても、図８で例示した方法と同様に、コア１２１～１２４へマトリクス２０１の行ｙに対応するデータ、すなわち、商品購入履歴データ２００の商品ｙの列を割り当てることができる。 Regarding the matrix 201, similarly to the method illustrated in FIG. can be assigned.

図９は、データ処理装置の機能例を示す図である。
データ処理装置１００は、データ記憶部１３０、キャッシュ記憶部１４０、割り当て部１５０および演算制御部１６０を有する。データ記憶部１３０は、ＲＡＭ１０２やＨＤＤ１０３の記憶領域により実現される。キャッシュ記憶部１４０はキャッシュメモリ１２５の記憶領域により実現される。割り当て部１５０および演算制御部１６０は、ＲＡＭ１０２に記憶されたプログラムをＣＰＵ１０１が実行することで実現される。 FIG. 9 is a diagram illustrating a functional example of the data processing device.
The data processing device 100 includes a data storage section 130, a cache storage section 140, an allocation section 150, and an arithmetic control section 160. The data storage unit 130 is realized by a storage area of the RAM 102 or the HDD 103. Cache storage unit 140 is realized by a storage area of cache memory 125. The allocation unit 150 and the calculation control unit 160 are realized by the CPU 101 executing a program stored in the RAM 102.

データ記憶部１３０は、マトリクス２０１やマトリクス２０２で使用されるデータの全体を記憶する。例えば、データ記憶部１３０は、商品購入履歴データ２００を記憶する。
キャッシュ記憶部１４０は、データ記憶部１３０に記憶されたデータのうちの一部を記憶する。キャッシュ記憶部１４０には、データ記憶部１３０に記憶されたデータのうちの一部が、演算制御部１６０による演算実行に応じてロードされる。キャッシュ記憶部１４０は一定のサイズを有する。 The data storage unit 130 stores all data used in the matrix 201 and the matrix 202. For example, the data storage unit 130 stores product purchase history data 200.
Cache storage unit 140 stores part of the data stored in data storage unit 130. A portion of the data stored in the data storage section 130 is loaded into the cache storage section 140 in accordance with the execution of calculations by the calculation control section 160. Cache storage unit 140 has a fixed size.

割り当て部１５０は、コア１２１～１２４それぞれに対して、マトリクス２０１やマトリクス２０２の行に対応するデータを割り当てる。データの割り当て方法には、図８で例示した方法が用いられる。 The allocation unit 150 allocates data corresponding to the rows of the matrix 201 and the matrix 202 to each of the cores 121 to 124. The data allocation method used is the method illustrated in FIG. 8.

演算制御部１６０は、割り当て部１５０によるデータの割り当て結果に基づき、コア１２１～１２４を用いて並列に各データに関する演算を実行する。演算制御部１６０は、演算結果をデータ記憶部１３０に格納する。 The calculation control unit 160 uses the cores 121 to 124 to execute calculations on each data in parallel based on the data allocation result by the allocation unit 150. The calculation control unit 160 stores the calculation results in the data storage unit 130.

次に、データ処理装置１００による処理手順を説明する。
図１０は、組み合わせ計算例を示すフローチャートである。
（Ｓ１０）演算制御部１６０は、コア１２１～１２４を用いてｋ＝２の組み合わせ計算を行い、各組み合わせの論理積Ａｎｄ_{（ｘ，ｙ）}をリストＡとして出力する。 Next, a processing procedure by the data processing device 100 will be explained.
FIG. 10 is a flowchart showing an example of combination calculation.
(S10) The calculation control unit 160 performs k=2 combination calculations using the cores 121 to 124, and outputs the logical product And _{(x, y)} of each combination as a list A.

（Ｓ１１）割り当て部１５０は、リストＡの各データ、すなわち、論理積Ａｎｄ_{（ｘ，ｙ）}を各コアに入れ子状に割り当てる。ステップＳ１１の割り当てでは、図８で例示した方法が用いられる。 (S11) The allocation unit 150 allocates each data in list A, that is, the logical product And _{(x, y)} , to each core in a nested manner. In the assignment in step S11, the method illustrated in FIG. 8 is used.

（Ｓ１２）演算制御部１６０は、下記ステップＳ１３～Ｓ１６で示される任意の３個の商品の組み合わせ計算をコア１２１～１２４を用いて繰り返し実行する。組み合わせ計算の総回数は、_ｄ＿ｘＣ_３回である。 (S12) The arithmetic control unit 160 repeatedly performs the combination calculation of three arbitrary products shown in steps S13 to S16 below using the cores 121 to 124. The total number of combination calculations is _{d_x} C ₃ times.

（Ｓ１３）コア１２１～１２４それぞれは、当該コアにて担当する組み合わせＡｎｄ_{（ｘ，ｙ）}を選択し、未選択の組み合わせ（ｘ，ｙ，ｚ）となる商品ｚを選択する。
（Ｓ１４）コア１２１～１２４それぞれは、３個の商品ｘ，ｙ，ｚの論理積Ａｎｄ_{（ｘ，ｙ，ｚ）}を計算する。 (S13) Each of the cores 121 to 124 selects the combination And _{(x, y)} for which it is responsible, and selects a product z that is part of an unselected combination (x, y, z).
(S14) Each of the cores 121 to 124 calculates the logical product And _{(x, y, z) of the three products x, y, and z} .

（Ｓ１５）コア１２１～１２４それぞれは、論理積Ａｎｄ_{（ｘ，ｙ，ｚ）}から３つの商品ｘ，ｙ，ｚを購入した人数Ｓｕｍ_{（ｘ，ｙ，ｚ）}を計数する。
（Ｓ１６）コア１２１～１２４それぞれは、Ｓｕｍ_{（ｘ，ｙ，ｚ）}をリストＢに追加する。 (S15) Each of the cores 121 to 124 counts the number of people who purchased three products x, y, z Sum _{(x, y, z) from the logical product And (x, y, z} ₎ .
(S16) Each of the cores 121 to 124 adds Sum _{(x, y, z)} to list B.

（Ｓ１７）演算制御部１６０は、ステップＳ１３～Ｓ１６で示される任意の３個の商品の全ての組み合わせに対する計算を終了すると、ステップＳ１８に処理を進める。
（Ｓ１８）演算制御部１６０は、リストＢを出力する。リストＢは、データ記憶部１３０に格納される。そして、組み合わせ計算の処理が終了する。 (S17) When the calculation control unit 160 completes the calculations for all combinations of three arbitrary products shown in steps S13 to S16, it advances the process to step S18.
(S18) The calculation control unit 160 outputs list B. List B is stored in the data storage section 130. Then, the combination calculation process ends.

例えば、ステップＳ１８で出力されたリストＢの中から、Ｓｕｍ_{（ｘ，ｙ，ｚ）}がｓ＿ｍｉｎ以上である商品ｘ，ｙ，ｚの組み合わせが、出力リスト２１０に追加される。
次に、図８のデータ割り当てに対する比較例を説明する。 For example, from the list B output in step S18, a combination of products x, y, z for which Sum _{(x, y, z)} is greater than or equal to s_min is added to the output list 210.
Next, a comparative example for the data allocation shown in FIG. 8 will be described.

図１１は、複数のコアに対するデータ割り当ての比較例を示す図である。
マトリクス４００は、マトリクス２０２の一部を記載したものである。例えば、マトリクス４００の各行に対応するデータを、組み合わせ数の多い順に、４つのコアＣ１，Ｃ２，Ｃ３，Ｃ４にサイクリックに割り当てることも考えられる。この場合、コアＣ１にＡｎｄ_{（９，１０）}、コアＣ２にＡｎｄ_{（８，１０）}、コアＣ３にＡｎｄ_{（８，９）}、コアＣ４にＡｎｄ_{（７，１０）}、コアＣ１にＡｎｄ_{（７，９）}、…というような割り当てとなる。しかし、比較例の割り当て方法では、コア数が増えるほど、各コアの処理量の差は大きくなる。コア間の処理量の差が大きくなると、処理が特定のコアに偏り、全体の処理時間が長くなる。 FIG. 11 is a diagram illustrating a comparative example of data allocation to a plurality of cores.
The matrix 400 is a part of the matrix 202. For example, it is possible to cyclically assign data corresponding to each row of the matrix 400 to four cores C1, C2, C3, and C4 in descending order of the number of combinations. In this case, the core C1 is assigned And _(9,10) , the core C2 is assigned And _(8,10) , the core C3 is assigned And _(8,9 ), the core C4 is assigned And _(7,10) , and the core C1 is assigned And _(7,9) . However, in the assignment method of the comparative example, the more the number of cores increases, the greater the difference in the amount of processing of each core becomes. When the difference in the amount of processing between the cores becomes large, processing is biased toward a specific core, and the overall processing time becomes longer.

また、他の比較例の方法として、マトリクス４００の一部を４分割してコアＣ１～Ｃ４に割り当てる例も考えられる。例えば、マトリクス４００の上から１行目～５行目かつ左から１列目～５列目の第１領域、１行目～５行目かつ６列目～１０列目の第２領域、６行目～１０行目かつ１列目～５列目の第３領域、６行目～１０行目かつ６列目～１０列目の第４領域のように分割され得る。そして、分割した領域に含まれる各論理積の演算を、領域ごとにコアＣ１～Ｃ４に割り振る。しかし、このような割り当て方法でも、コアＣ１～Ｃ４の処理量の差は大きくなる。例えば、コアＣ１に第１領域、コアＣ２に第２領域、コアＣ３に第３領域、コアＣ４に第４領域を割り当てる場合、コアＣ２，Ｃ４が担当する組み合わせ数が２５となり、コアＣ１が担当する組み合わせ数が９となり、コアＣ３が担当する組み合わせ数が１となる。このため、マトリクス４００の１行目～１０行目かつ１列目～１０列目の部分だけでもコアＣ１，Ｃ３間の処理量の差が非常に大きくなってしまい、当該差を埋めることが難しくなる。また、この方法では、４つのコアのうちの高々２つのコアでしか、同じ商品ｚのデータを使用できない。 Furthermore, as another method for comparison, a part of the matrix 400 may be divided into four parts and allocated to the cores C1 to C4. For example, a first region in the first to fifth rows and first to fifth columns from the left of the matrix 400, a second region in the first to fifth rows and sixth to tenth columns from the left, and a second region in the first to fifth rows and sixth to tenth columns from the left; It can be divided into a third area in the 1st to 10th rows and 1st to 5th columns, and a fourth area in the 6th to 10th rows and the 6th to 10th columns. Then, each logical product operation included in the divided areas is allocated to the cores C1 to C4 for each area. However, even with this allocation method, the difference in processing amount between the cores C1 to C4 becomes large. For example, when assigning the first region to core C1, the second region to core C2, the third region to core C3, and the fourth region to core C4, the number of combinations that cores C2 and C4 are responsible for is 25, and core C1 is responsible for The number of combinations handled by the core C3 becomes nine, and the number of combinations handled by the core C3 becomes one. For this reason, the difference in processing amount between the cores C1 and C3 becomes extremely large even in the first to tenth rows and first to tenth columns of the matrix 400, and it is difficult to bridge the difference. Become. Furthermore, in this method, data for the same product z can only be used in at most two of the four cores.

図１２は、各コアが計算する組み合わせ数の比較を示す図である。
テーブル５００は、図１１で示した比較例の方法を用いた場合と、図８で例示したデータ処理装置１００の方法を用いた場合とにおける、コア１２１～１２４それぞれが計算する組み合わせ数を示す。コア数＝４である。商品数ｄ＿ｘ＝１０である。あるコアが計算する組み合わせ数は、当該コアに割り当てられる（ｘ，ｙ）の各データに対応する組み合わせ数の総和である。 FIG. 12 is a diagram showing a comparison of the number of combinations calculated by each core.
Table 500 shows the number of combinations calculated by each of cores 121 to 124 when using the method of the comparative example shown in FIG. 11 and when using the method of data processing apparatus 100 illustrated in FIG. 8. The number of cores is 4. The number of products d_x=10. The number of combinations calculated by a certain core is the sum of the number of combinations corresponding to each piece of data (x, y) assigned to the core.

図１１の比較例の方法では、コア１２１，１２２，１２３，１２４それぞれが計算する組み合わせ数は、３３，３１，２９，２７である。比較例の方法では、コア間の処理量の差の最大値は６となる。 In the method of the comparative example shown in FIG. 11, the number of combinations calculated by each of the cores 121, 122, 123, and 124 is 33, 31, 29, and 27. In the method of the comparative example, the maximum value of the difference in processing amount between cores is 6.

一方、図８のデータ処理装置１００の方法では、コア１２１，１２２，１２３，１２４それぞれが計算する組み合わせ数は、３０，３０，３０，３０である。データ処理装置１００では、コア間の処理量の差の最大値は０となる。このため、データ処理装置１００では、比較例の方法に比べ、全体として３組み合わせ分（＝３３－３０）の計算時間を削減可能になる。 On the other hand, in the method of the data processing apparatus 100 of FIG. 8, the number of combinations calculated by each of the cores 121, 122, 123, and 124 is 30, 30, 30, and 30. In the data processing device 100, the maximum value of the difference in processing amount between cores is zero. Therefore, in the data processing apparatus 100, the calculation time can be reduced by three combinations (=33-30) as compared to the method of the comparative example.

以上説明したように、第２の実施の形態のデータ処理装置１００によれば、複数のコアの処理量のばらつきを低減できる。その結果、全体の組み合わせに対する演算の終了の遅延を低減できる。 As described above, according to the data processing device 100 of the second embodiment, variations in the processing amount of the plurality of cores can be reduced. As a result, it is possible to reduce the delay in completing calculations for all combinations.

ところで、コア１２１～１２４は、上記の演算において例えば次のようにキャッシュ記憶部１４０を共用する。
図１３は、第２の実施の形態のキャッシュ使用例（その１）を示す図である。 By the way, the cores 121 to 124 share the cache storage unit 140 in the above calculation, for example, as follows.
FIG. 13 is a diagram showing an example (part 1) of cache use according to the second embodiment.

ここで、Ａｎｄ_{（ｘ，ｙ）}とｚとの１つの組み合わせに対する演算を１ステップとする。テーブル６００は、各ステップにおけるコア１２１～１２４の演算対象の組み合わせ（ｚ，（ｘ，ｙ））を示す。ステップはステップ１，２，…というようにステップ番号の昇順に進む。また、図中、コア１２１を「コア１」、コア１２２を「コア２」、コア１２３を「コア１２３」、コア１２４を「コア４」と略記する。１つの組み合わせの計算にかかる時間は、コア１２１～１２４で同じであるとする。 Here, the calculation for one combination of And _{(x, y)} and z is defined as one step. Table 600 shows combinations (z, (x, y)) of calculation targets of cores 121 to 124 in each step. The steps proceed in ascending order of step numbers, such as steps 1, 2, . . . In addition, in the figure, the core 121 is abbreviated as "core 1," the core 122 as "core 2," the core 123 as "core 123," and the core 124 as "core 4." It is assumed that the time required to calculate one combination is the same for cores 121 to 124.

更に、各ステップにおける各コアのキャッシュ記憶部１４０へのデータのロード順は、コア１２１が最も早く、２番目にコア１２２が早く、３番目にコア１２３が早く、コア１２４が最も遅いものとする。キャッシュ記憶部１４０に空きがなくなると、キャッシュ記憶部１４０のデータは、最後に使用されてからの経過時間が長いデータが優先的に削除される。 Furthermore, the order of loading data into the cache memory unit 140 of each core in each step is assumed to be core 121 being the fastest, core 122 being second fastest, core 123 being third fastest, and core 124 being slowest. When there is no free space in the cache memory unit 140, data that has been used the longest is preferentially deleted from the cache memory unit 140.

例えば、ステップ１では、コア１２１～１２４は次の組み合わせ（ｚ，（ｘ，ｙ））の演算を実行する。コア１２１は（８，（９，１０））の演算を実行する。コア１２２は（７，（８，１０））の演算を実行する。コア１２３は（７，（８，９））の演算を実行する。コア１２４は（６，（７，１０））の演算を実行する。 For example, in step 1, the cores 121 to 124 execute the following combination (z, (x, y)) operation. The core 121 executes the operation (8, (9, 10)). Core 122 executes the operation (7, (8, 10)). The core 123 executes the operation (7, (8, 9)). Core 124 executes the operation (6, (7, 10)).

ステップ１終了直後、キャッシュ記憶部１４０に保持されるデータは古い方から順に、Ａｎｄ_{（９，１０）}、「商品８」、Ａｎｄ_{（８，１０）}、Ａｎｄ_{（８，９）}、「商品７」、Ａｎｄ_{（７，１０）}、「商品６」となる。ここで、「商品ｚ」は商品購入履歴データ２００の商品ｚの列に相当する。なお、図中では、キャッシュ記憶部１４０に格納されているデータは、図の左側へ向かうほど最後に使用されてからの時間が長く（使用履歴が古く）、図の右側へ向かうほど最後に使用されてからの時間が短い（使用履歴が新しい）。 Immediately after step 1 is completed, the data held in the cache storage unit 140 is stored in order from the oldest to the oldest: And _{(9, 10)} , "Product 8", And _{(8, 10)} , And _{(8, 9)} , "Product 7" , And _{(7, 10)} , becomes "product 6". Here, “product z” corresponds to the column of product z in the product purchase history data 200. In addition, in the figure, the time since the last use of data stored in the cache storage unit 140 is longer as it goes to the left side of the figure (older usage history), and the time that it has been last used is longer as it goes to the right side of the figure. It has been used for a short time (the usage history is new).

次に、ステップ２では、コア１２１～１２４は次の組み合わせ（ｚ，（ｘ，ｙ））の演算を実行する。コア１２１は（７，（９，１０））の演算を実行する。コア１２２は（６，（８，１０））の演算を実行する。コア１２３は（６，（８，９））の演算を実行する。コア１２４は（５，（７，１０））の演算を実行する。 Next, in step 2, cores 121 to 124 perform the calculations for the following combinations (z, (x, y)). Core 121 performs the calculation (7, (9, 10)). Core 122 performs the calculation (6, (8, 10)). Core 123 performs the calculation (6, (8, 9)). Core 124 performs the calculation (5, (7, 10)).

ステップ２終了直後、キャッシュ記憶部１４０に保持されるデータは古い方から順に、「商品８」、Ａｎｄ_{（９，１０）}、「商品７」、Ａｎｄ_{（８，１０）}、「商品６」、Ａｎｄ_{（７，１０）}、「商品５」となる。 Immediately after step 2 is completed, the data held in the cache storage unit 140 is stored in order from the oldest to the oldest: "Product 8", And _{(9, 10)} , "Product 7", And _{(8, 10)} , "Product 6", And _{(7, 10)} , becomes "product 5".

このように、第２の実施の形態では、コア１２１～１２４はマトリクス３００の行方向について、ｚの大きい方から小さい方へ向かう順で演算を行う。図１３のマトリクス３００の各マスには、当該マスの演算が実行されるステップ番号が記載されている。 In this way, in the second embodiment, the cores 121 to 124 perform calculations in the row direction of the matrix 300 in the order from the larger z to the smaller z. Each cell of the matrix 300 in FIG. 13 has a step number written in which the calculation for that cell is executed.

図１４は、第２の実施の形態のキャッシュ使用例（その２）を示す図である。
例えば、ステップｘの実行中に、キャッシュ記憶部１４０に新たにデータＤが追加される際、キャッシュ記憶部１４０に空き容量がない場合、キャッシュ記憶部１４０において、使用履歴が最も古いデータ（例えば、「商品８」のデータ）が削除される。 FIG. 14 is a diagram showing an example (part 2) of cache use according to the second embodiment.
For example, when new data D is added to the cache storage unit 140 during the execution of step x, if there is no free space in the cache storage unit 140, the data D with the oldest usage history (for example "Product 8" data) is deleted.

「商品８」のデータは、キャッシュ記憶部１４０から削除されると、他の演算でキャッシュ記憶部１４０のデータを再利用できなくなる。このため、他の演算で「商品８」のデータを使用する場合、「商品８」のデータは、キャッシュ記憶部１４０に再ロードされる。例えば、コア１２１～１２４が担当するマトリクス３００の各行を、４行ずつブロックに区切る。この場合、各コアが、マトリクス３００の第１ブロックの演算後に、次の第２ブロックの演算に移る際、キャッシュ上に「商品ｚ」のデータが無い場合がある。その場合、「商品ｚ」のデータが再ロードされる。 When the data for "product 8" is deleted from the cache storage unit 140, the data in the cache storage unit 140 cannot be reused in other calculations. For this reason, when the data for "product 8" is used in other calculations, the data for "product 8" is reloaded into the cache storage unit 140. For example, each row of the matrix 300 handled by the cores 121 to 124 is divided into blocks of four rows. In this case, when each core moves on to the calculation of the next block, after calculating the first block of the matrix 300, there may be no data for "product z" in the cache. In that case, the data for "product z" is reloaded.

［第３の実施の形態］
次に、第３の実施の形態を説明する。前述の第２の実施の形態と相違する事項を主に説明し、共通する事項の説明を省略する。 [Third embodiment]
Next, a third embodiment will be described. The points that are different from the second embodiment described above will be mainly explained, and the explanations of the common points will be omitted.

第３の実施の形態のデータ処理装置１００は、第２の実施の形態よりもキャッシュメモリ１２５へのデータのロード回数を低減する機能を提供する。
図１５は、第３の実施の形態の組み合わせ計算の実行順序の例を示す図である。 The data processing device 100 according to the third embodiment provides a function that reduces the number of times data is loaded into the cache memory 125 compared to the second embodiment.
FIG. 15 is a diagram illustrating an example of the execution order of combination calculations according to the third embodiment.

マトリクス７００は、マトリクス２０２の一部を記載したものである。
第３の実施の形態では、コア１２１～１２４は、マトリクス７００における、ある行の演算を、ｚを第１の順序で用いて実行する。そして、コア１２１～１２４は、次の行の演算を、ｚを第１の順序とは逆の順序で用いて実行する。第１の順序は、例えばｚの降順である。第１の順序はｚの昇順でもよい。 Matrix 700 describes a portion of matrix 202.
In the third embodiment, cores 121 to 124 perform operations on a certain row in matrix 700 using z in a first order. Then, the cores 121 to 124 execute the operation in the next row using z in the reverse order from the first order. The first order is, for example, descending order of z. The first order may be ascending order of z.

マトリクス７００の各マスには、行に対応するデータＡｎｄ_{（ｘ，ｙ）}に対する組み合わせ計算の実行順序を示すステップ数が記載されている。コア１２１～１２４には、第２の実施の形態と同様の方法で、各行に対応するデータＡｎｄ_{（ｘ，ｙ）}が割り当てられる。例えば、コア１２１～１２４は、自身が担当するデータＡｎｄ_{（ｘ，ｙ）}について、相手データをｚの降順に用いて演算を実行すると、次に自身が担当するデータＡｎｄ_{（ｘ’，ｙ’）}について、相手データをｚの昇順に用いて演算を実行する。 Each cell of the matrix 700 contains the number of steps indicating the execution order of combination calculation for the data And _{(x, y)} corresponding to the row. The data And _{(x, y)} corresponding to each row is assigned to the cores 121 to 124 in the same manner as in the second embodiment. For example, the cores 121 to 124 execute calculations for the data And _{(x, y)} they are responsible for using the other data in descending order of z, and then execute calculations for the data And _{(x', y')} they are responsible for using the other data in ascending order of z.

より具体的には、Ａｎｄ_{（９，１０）}，Ａｎｄ_{（６，９）}がコア１２１に割り当てられる例では、コア１２１はＡｎｄ_{（９，１０）}に対しｚ＝…，３，２，１の順で演算を実行した後、Ａｎｄ_{（６，９）}に対しｚ＝１，２，３，…の順で演算を実行する。このように、第３の実施の形態では、コア１２１～１２４は、組み合わせの相手データを折り返し順で使用して演算を実行する。これにより、ある組合せの演算の際に、他の組み合わせの演算のためにキャッシュメモリ１２５にロード済の相手データが利用され易くなる。 More specifically, in an example in which And _{(9, 10)} and And _{(6, 9)} are assigned to the core 121, the core 121 assigns And _{(9, 10)} in the order of z=..., 3, 2, 1. After performing the calculation, the calculation is performed for And _{(6, 9)} in the order of z=1, 2, 3, . . . In this manner, in the third embodiment, the cores 121 to 124 execute calculations using the partner data of the combination in the looped order. As a result, when a certain combination of calculations is performed, partner data that has already been loaded into the cache memory 125 can be easily used for another combination of calculations.

図１６は、商品数が１０個の場合の組み合わせ計算の実行順序の例を示す図である。
マトリクス７００ａは、商品数ｄ＿ｘ＝１０の場合における、コア１２１～１２４による全ての組み合わせ計算の実行順序を示す。なお、Ａｎｄ_{（９，１０）}にはコア１２１が、Ａｎｄ_{（８，１０）}にはコア１２２が、Ａｎｄ_{（８，９）}にはコア１２３が、Ａｎｄ_{（７，１０）}にはコア１２４がそれぞれ割り当てられるとする。以降、図８で例示した第２の実施の形態の方法で各コアに入れ子状にデータが割り当てられる。 FIG. 16 is a diagram showing an example of the execution order of combination calculations when the number of products is 10.
Matrix 700a shows the execution order of all combination calculations by cores 121 to 124 when the number of products d_x = 10. It is assumed that core 121 is assigned to And _{(9, 10)} , core 122 is assigned to And _{(8, 10)} , core 123 is assigned to And _{(8, 9)} , and core 124 is assigned to And _{(7, 10)} . After that, data is assigned to each core in a nested manner using the method of the second embodiment illustrated in FIG.

図１７は、第３の実施の形態のキャッシュ使用例を示す図である。
テーブル８００は、キャッシュ記憶部１４０におけるデータの格納状態を、マトリクス７００ａの各ステップ番号（ｓｔｅｐ）に対して例示する。テーブル８００の（ｘ，ｙ）の項目は、Ａｎｄ_{（ｘ，ｙ）}のデータを示す。テーブル８００のｚの項目は、商品購入履歴データ２００の「商品ｚ」の列に相当するデータを示す。 FIG. 17 is a diagram showing an example of cache use according to the third embodiment.
Table 800 illustrates the storage state of data in cache storage unit 140 for each step number (step) of matrix 700a. The (x, y) item in the table 800 indicates data for And _{(x, y)} . The item z in the table 800 indicates data corresponding to the "product z" column of the product purchase history data 200.

テーブル８００の太枠線で囲われたマスは、キャッシュ記憶部１４０に格納されたデータ、すなわち、キャッシュ上のデータを示す。なお、キャッシュ記憶部１４０には、各ステップにおいて、最大１０個のデータを格納できるものとする。すなわち、Ａｎｄ_{（ｘ，ｙ）}のデータおよび「商品ｚ」の列に相当する、キャッシュ上のデータの総数が１０個を超える場合、最後に使用されてからの時間が最も長いデータがキャッシュ記憶部１４０から削除される。 Cells surrounded by thick frames in the table 800 indicate data stored in the cache storage unit 140, that is, data on the cache. Note that the cache storage unit 140 can store up to 10 pieces of data in each step. In other words, if the total number of data in the cache that corresponds to the And _{(x, y)} data and the "product z" column exceeds 10, the data that has been used the longest since the last time is stored in the cache storage section. 140.

また、テーブル８００の薄いドットのハッチングのマスは、データ記憶部１３０からキャッシュ記憶部１４０にロードされるデータ、すなわち、ＲＡＭ１０２からキャッシュメモリ１２５にロードされるデータを示す。 Furthermore, hatched cells with thin dots in the table 800 indicate data loaded from the data storage unit 130 to the cache storage unit 140, that is, data loaded from the RAM 102 to the cache memory 125.

また、テーブル８００の斜線のハッチングのマスは、キャッシュ溢れにより、キャッシュ記憶部１４０から削除されるデータを示す。当該削除されるデータは、キャッシュ記憶部１４０内で使用履歴が最古のデータである。 Furthermore, the diagonally hatched cells in the table 800 indicate data deleted from the cache storage unit 140 due to cache overflow. The data to be deleted is the data with the oldest usage history in the cache storage unit 140.

更に、テーブル８００の濃いドットのハッチングのマスは、各ステップの計算で使用されるデータを示す。計算で使用されるデータは、キャッシュ記憶部１４０内で使用履歴が最新に更新される。 Furthermore, the darkly dotted hatched cells in table 800 indicate data used in calculations at each step. The usage history of the data used in the calculation is updated to the latest in the cache storage unit 140.

テーブル８００は、マトリクス７００ａにおける各ステップのうち、ステップ番号１～１８（ｓｔｅｐ＝１～１８）のステップを例示する。
図１８は、第３の実施の形態のキャッシュ使用例（続き）を示す図である。 Table 800 exemplifies steps with step numbers 1 to 18 (step=1 to 18) among the steps in matrix 700a.
FIG. 18 is a diagram showing an example of cache usage (continued) according to the third embodiment.

テーブル８００ａは、マトリクス７００ａにおける各ステップのうち、ステップ番号１９～２６（ｓｔｅｐ＝１９～２６）のステップを例示する。
図１９は、第３の実施の形態のキャッシュ使用例（続き）を示す図である。 Table 800a exemplifies step numbers 19 to 26 (step=19 to 26) among the steps in matrix 700a.
FIG. 19 is a diagram showing an example of cache use (continued) according to the third embodiment.

テーブル８００ｂは、マトリクス７００ａにおける各ステップのうち、ステップ番号２７～３０（ｓｔｅｐ＝２７～３０）のステップを例示する。なお、テーブル８００ｂは、最終的に各データをキャッシュ記憶部１４０から削除するステップ番号３１（ｓｔｅｐ＝３１）のステップも例示する。 Table 800b exemplifies steps with step numbers 27 to 30 (step=27 to 30) among the steps in matrix 700a. Note that the table 800b also exemplifies a step with step number 31 (step=31) in which each data is finally deleted from the cache storage unit 140.

テーブル８００～８００ｂの例において、ＲＡＭ１０２からキャッシュメモリ１２５へのロード回数は全部で５３回となる。
図２０は、組み合わせ計算例を示すフローチャートである。 In the examples of tables 800 to 800b, the number of loads from RAM 102 to cache memory 125 is 53 times in total.
FIG. 20 is a flowchart showing an example of combination calculation.

第３の実施の形態では、図１０の手順のステップＳ１３に代えて、ステップＳ１３ａが実行される点が第２の実施の形態と異なる。そこで、以下ではステップＳ１３ａを主に説明し、他のステップの説明を省略する。ステップＳ１３ａは、ステップＳ１２の次に実行される。 The third embodiment differs from the second embodiment in that step S13a of the procedure in FIG. 10 is replaced with step S13a. Therefore, step S13a will be mainly explained below, and explanations of other steps will be omitted. Step S13a is executed next to step S12.

（Ｓ１３ａ）コア１２１～１２４それぞれは、当該コアにて担当する組み合わせＡｎｄ_{（ｘ，ｙ）}を選択し、未選択の組み合わせ（ｘ，ｙ，ｚ）となる商品ｚを、折り返しとなる順序で選択する。ここで、折り返しとなる順序での選択方法には、図１５のマトリクス７００や図１６のマトリクス７００ａで例示した方法が用いられる。そして、ステップＳ１４に処理が進む。 (S13a) Each of the cores 121 to 124 selects the combination And _{(x, y)} for which it is responsible, and selects products z that are part of an unselected combination (x, y, z) in a wraparound order. Here, the selection method for the wraparound order uses the method exemplified by the matrix 700 in Fig. 15 or the matrix 700a in Fig. 16. Then, the process proceeds to step S14.

このように、データ処理装置１００は、組み合わせの相手データをｚの折り返し順で用いて組み合わせ計算を実行することで、キャッシュメモリ１２５にロードされているデータの再利用可能性を高め、キャッシュメモリ１２５へのロードの回数を低減できる。その結果、データ処理装置１００は、キャッシュメモリ１２５へのロードに伴うオーバーヘッドを低減でき、組み合わせ計算の高速化を図れる。 In this way, the data processing device 100 increases the possibility of reusing the data loaded in the cache memory 125 by executing the combination calculation using the partner data of the combination in the z-folding order. The number of times the file is loaded can be reduced. As a result, the data processing device 100 can reduce the overhead associated with loading to the cache memory 125, and can speed up combinatorial calculations.

［第４の実施の形態］
次に、第４の実施の形態を説明する。前述の第２，第３の実施の形態と相違する事項を主に説明し、共通する事項の説明を省略する。 [Fourth embodiment]
Next, a fourth embodiment will be described. Items that are different from the second and third embodiments described above will be mainly described, and descriptions of common items will be omitted.

第４の実施の形態では、データ処理装置１００は、キャッシュメモリ１２５へのデータのロード回数を更に低減する機能を提供する。第４の実施の形態は、ｋ≧３以上の場合に適用される。 In the fourth embodiment, the data processing device 100 provides a function to further reduce the number of times data is loaded into the cache memory 125. The fourth embodiment is applied when k is 3 or more.

図２１は、第４の実施の形態のセクタキャッシュの例を示す図である。
第４の実施の形態のキャッシュ記憶部１４０は、セクタ１４１，１４２を有する。セクタ１４１，１４２は、キャッシュ記憶部１４０の記憶領域全体を更に分割した記憶領域である。セクタ１４１の識別番号は「＃０」である。セクタ１４２の識別番号は「＃１」である。セクタキャッシュはソフトウェア制御可能なキャッシュ機構である。セクタキャッシュを用いると、このように、キャッシュメモリ１２５の記憶領域を複数のセクタに分割し、再利用性のあるデータと再利用性のないデータをセクタごとに住み分けることができる。 FIG. 21 is a diagram showing an example of a sector cache according to the fourth embodiment.
The cache storage unit 140 of the fourth embodiment has sectors 141 and 142. The sectors 141 and 142 are storage areas obtained by further dividing the entire storage area of the cache storage unit 140. The identification number of sector 141 is "#0". The identification number of sector 142 is "#1". Sector cache is a software-controllable cache mechanism. By using a sector cache, the storage area of the cache memory 125 can be divided into a plurality of sectors in this way, and data that can be reused and data that cannot be reused can be stored separately for each sector.

ここで、図１７～図１９で例示したテーブル８００～８００ｂの例において、Ａｎｄ_{（ｘ，ｙ）}よりも「商品ｚ」のデータの方が再利用性は高い。ｋ≧３の場合において、Ａｎｄ_{（ｘ，ｙ）}は、マトリクス７００ａの１行の処理が完了した後、二度と参照されることはないためである。 Here, in the examples of tables 800 to 800b illustrated in Figures 17 to 19, the data of "product z" is more reusable than And _{(x, y} ), because when k ≥ 3, And _{(x, y)} is never referenced again after processing of one row of matrix 700a is completed.

そこで、コア１２１～１２４は、Ａｎｄ_{（ｘ，ｙ）}と「商品ｚ」のデータとを異なるセクタに格納する。例えば、コア１２１～１２４は、Ａｎｄ_{（ｘ，ｙ）}のデータを、セクタ１４１に格納する。コア１２１～１２４は、「商品ｚ」のデータを、セクタ１４２に格納する。例えば、データａの配列をセクタ１４１に配置する、というように、何れのデータを何れのセクタに配置するかは、コア１２１～１２４が実行するプログラムのソースコード上で指定することができる。セクタ１４１，１４２それぞれの容量の割合は、例えばＷＡＹという単位で指定可能である。ＷＡＹ当たりの単位サイズｓに当該セクタのＷＡＹの値ｗを乗じた値が、当該セクタのサイズＭ（＝ｗ×ｓ）となる。ｓ，ｗは正の実数である。 Therefore, the cores 121 to 124 store And _{(x, y)} and the data of "product z" in different sectors. For example, the cores 121 to 124 store data of And _{(x, y)} in the sector 141. The cores 121 to 124 store data of “product z” in the sector 142. For example, which data is placed in which sector can be specified in the source code of the program executed by the cores 121 to 124, such as arranging the array of data a in the sector 141. The capacity ratio of each of the sectors 141 and 142 can be specified, for example, in units of WAY. The value obtained by multiplying the unit size s per WAY by the WAY value w of the sector becomes the size M (=w×s) of the sector. s and w are positive real numbers.

例えば、セクタ１４１のサイズは、コア数分のデータサイズ（例えば、４コアでは４データ分のサイズ）よりも大きくなるＷＡＹの値ｗ相当のサイズ（＝ｗ×ｓ）のうちの最小のサイズＭ１（＝ｗ１×ｓ）に対応するＷＡＹの値ｗ１とされる。ｗ１は正の実数である。割り当て部１５０は、データサイズに基づいて、キャッシュ記憶部１４０の記憶領域のうちセクタ１４１に割り当てるＷＡＹの値を決定し、残りのＷＡＹの値をセクタ１４２に割り当てるようにしてもよい。 For example, the size of the sector 141 is the smallest size M1 of the sizes corresponding to the WAY value w (=w x s) that is larger than the data size for the number of cores (for example, the size for 4 data for 4 cores). The WAY value w1 corresponds to (=w1×s). w1 is a positive real number. The allocation unit 150 may determine the WAY value to be allocated to the sector 141 in the storage area of the cache storage unit 140 based on the data size, and allocate the remaining WAY value to the sector 142.

図２２は、第４の実施の形態のキャッシュ使用例を示す図である。
テーブル９００は、キャッシュ記憶部１４０におけるデータの格納状態を、マトリクス７００ａの各ステップ番号（ｓｔｅｐ）に対して例示する。テーブル９００の（ｘ，ｙ）の項目は、Ａｎｄ_{（ｘ，ｙ）}のデータを示す。テーブル９００のｚの項目は、商品購入履歴データ２００の「商品ｚ」の列に相当するデータを示す。 FIG. 22 is a diagram showing an example of cache use according to the fourth embodiment.
The table 900 illustrates the storage state of data in the cache storage unit 140 for each step number (step) of the matrix 700a. The (x, y) item in the table 900 indicates data for And _{(x, y)} . The item z in the table 900 indicates data corresponding to the "product z" column of the product purchase history data 200.

コア数は４である。商品数ｄ＿ｘ＝１０である。また、キャッシュ記憶部１４０の記憶領域は、最大で１０データを保持可能であるとする。更に、キャッシュ記憶部１４０の当該記憶領域はセクタ＃０：セクタ＃１＝４：６の割合で分割されるものとする。 The number of cores is 4. The number of products is d_x = 10. Also, the memory area of the cache memory unit 140 is assumed to be capable of holding a maximum of 10 pieces of data. Furthermore, the memory area of the cache memory unit 140 is assumed to be divided in a ratio of sector #0:sector #1 = 4:6.

第４の実施の形態では、Ａｎｄ_{（ｘ，ｙ）}のデータがセクタ１４１（セクタ＃０）に載り、「商品ｚ」のデータがセクタ１４２（セクタ＃１）に載る点が、第３の実施の形態と異なる。 In the fourth embodiment, the data of And _{(x, y)} is placed in sector 141 (sector #0), and the data of "product z" is placed in sector 142 (sector #1), which is different from the third embodiment. The form is different from that of

テーブル９００の太枠線で囲われたマスは、キャッシュ記憶部１４０に格納されたデータ、すなわち、キャッシュ上のデータを示す。なお、キャッシュ記憶部１４０には、各ステップにおいて、最大１０個のデータを格納できるものとする。 Squares surrounded by thick frames in the table 900 indicate data stored in the cache storage unit 140, that is, data on the cache. Note that the cache storage unit 140 can store up to 10 pieces of data in each step.

また、テーブル９００の薄いドットのハッチングのマスは、データ記憶部１３０からキャッシュ記憶部１４０にロードされるデータ、すなわち、ＲＡＭ１０２からキャッシュメモリ１２５にロードされるデータを示す。 Furthermore, hatched cells with thin dots in the table 900 indicate data loaded from the data storage unit 130 to the cache storage unit 140, that is, data loaded from the RAM 102 to the cache memory 125.

また、テーブル９００の斜線のハッチングのマスは、キャッシュ溢れにより、キャッシュ記憶部１４０から削除されるデータを示す。データの削除は、セクタごとに行われる。当該削除されるデータは、該当のセクタ内で使用履歴が最古のデータである。 Further, the diagonally hatched cells in the table 900 indicate data deleted from the cache storage unit 140 due to cache overflow. Data deletion is performed sector by sector. The data to be deleted is the data with the oldest usage history within the sector.

更に、テーブル９００の濃いドットのハッチングのマスは、各ステップの計算で使用されるデータを示す。計算で使用されるデータは、該当のセクタ内で使用履歴が最新に更新される。 Furthermore, the darkly dotted hatched cells in the table 900 indicate data used in the calculation of each step. The usage history of the data used in calculations is updated to the latest within the corresponding sector.

テーブル９００は、マトリクス７００ａにおける各ステップのうち、ステップ番号１～１９（ｓｔｅｐ＝１～１９）のステップを例示する。
図２３は、第４の実施の形態のキャッシュ使用例（続き）を示す図である。 Table 900 exemplifies steps with step numbers 1 to 19 (step=1 to 19) among the steps in matrix 700a.
FIG. 23 is a diagram showing an example of cache use (continued) according to the fourth embodiment.

テーブル９００ａは、マトリクス７００ａにおける各ステップのうち、ステップ番号２０～２８（ｓｔｅｐ＝２０～２８）のステップを例示する。
図２４は、第４の実施の形態のキャッシュ使用例（続き）を示す図である。 Table 900a exemplifies steps with step numbers 20 to 28 (step=20 to 28) among the steps in matrix 700a.
FIG. 24 is a diagram showing an example of cache usage (continued) according to the fourth embodiment.

テーブル９００ｂは、マトリクス７００ａにおける各ステップのうち、ステップ番号２９，３０（ｓｔｅｐ＝２９，３０）のステップを例示する。
図２５は、組み合わせ計算例を示すフローチャートである。 Table 900b exemplifies steps with step numbers 29 and 30 (step=29, 30) among the steps in matrix 700a.
FIG. 25 is a flowchart showing an example of combination calculation.

第４の実施の形態では、図２０の手順のステップＳ１０の前にステップＳ１０ａが実行される点が第３の実施の形態と異なる。そこで、以下ではステップＳ１０ａを主に説明し、他のステップの説明を省略する。 The fourth embodiment differs from the third embodiment in that step S10a is executed before step S10 in the procedure of FIG. Therefore, step S10a will be mainly explained below, and explanations of other steps will be omitted.

（Ｓ１０ａ）割り当て部１５０は、セクタキャッシュの割合、すなわち、セクタ１４１，１４２の割合を指定し、データｚをセクタ１４２（セクタ＃１）に割り当てるように指定する。前述のように、割り当て部１５０は、コア数と、商品購入履歴データ２００の１列当たりのデータサイズとに基づいて、キャッシュ記憶部１４０のうちのセクタ１４１（セクタ＃０）のサイズを決定し、残りをセクタ１４２（セクタ＃１）のサイズとする。そして、ステップＳ１０に処理が進む。 (S10a) The allocation unit 150 specifies the sector cache ratio, that is, the ratio of sectors 141 and 142, and specifies that data z be allocated to sector 142 (sector #1). As described above, the allocation unit 150 determines the size of the sector 141 (sector #0) in the cache storage unit 140 based on the number of cores and the data size per column of the product purchase history data 200. , and the remainder is the size of sector 142 (sector #1). The process then proceeds to step S10.

このように、データ処理装置１００は、セクタキャッシュを用いることで、キャッシュメモリ１２５へのロード回数を一層低減できる。例えば、図２２～図２４のテーブル９００～９００ｂにおいて、ＲＡＭ１０２からキャッシュメモリ１２５へのロード回数は全部で４４回となる。 In this way, by using the sector cache, the data processing device 100 can further reduce the number of times the cache memory 125 is loaded. For example, in the tables 900 to 900b in FIGS. 22 to 24, the number of times the data is loaded from the RAM 102 to the cache memory 125 is 44 times in total.

第３の実施の形態の図１７～図１９の例では、ロード回数は全部で５３回である。例えば、第３の実施の形態では、図１８のｓｔｅｐ＝２２，２３のように、再利用可能性の高いデータ（例えばｚ＝１のデータ）がキャッシュ上から削除された後に、再びロードされるケースが発生している。このようなケースはロード回数が増える要因となる。 In the example of the third embodiment shown in Figures 17 to 19, the total number of loads is 53. For example, in the third embodiment, there are cases where data with high reusability (e.g. data with z=1) is deleted from the cache and then loaded again, as in steps 22 and 23 in Figure 18. Such cases cause the number of loads to increase.

一方、データ処理装置１００は、セクタキャッシュを用いることで、Ａｎｄ_{（ｘ，ｙ）}のデータのロードにより、「商品ｚ」のデータがキャッシュ上から追い出されることを抑えられる。例えば、Ａｎｄ_{（ｘ，ｙ）}のデータをロードするときには、最終使用時からの時間が最長であるＡｎｄ_{（ｘ，ｙ）}のデータがセクタ１４１から追い出されて、新たなＡｎｄ_{（ｘ，ｙ）}のデータがセクタ１４１に格納される。その結果、「商品ｚ」のデータが、キャッシュメモリ１２５のセクタ１４２上に長く保持されるようになり、ＲＡＭ１０２からキャッシュメモリ１２５へのロード回数を一層低減できる。 On the other hand, by using the sector cache, the data processing device 100 can prevent the data of "product z" from being evicted from the cache due to loading the data of And _{(x, y)} . For example, when loading the data of And _{(x, y)} , the data of And _{(x, y)} with the longest time since last use is evicted from sector 141, and the data of the new And _{(x, y)} is evicted. Data is stored in sector 141. As a result, the data of "product z" is retained in the sector 142 of the cache memory 125 for a long time, and the number of times the data is loaded from the RAM 102 to the cache memory 125 can be further reduced.

このように、第４の実施の形態では、セクタキャッシュを用いることで、第３の実施の形態よりもロード回数を低減できる。また、その結果、データ処理装置１００は、キャッシュメモリ１２５へのロードに伴うオーバーヘッドを一層低減でき、組み合わせ計算の一層の高速化を図れる。 In this way, in the fourth embodiment, by using the sector cache, the number of loads can be reduced more than in the third embodiment. Moreover, as a result, the data processing device 100 can further reduce the overhead associated with loading into the cache memory 125, and can further speed up the combination calculation.

ところで、パターンマイニング処理をマルチスレッドで実行する際に、スレッド（コア）ごとの処理負荷のばらつきや、キャッシュメモリ１２５の非効率な使用のため、処理時間が長くなることがある。 By the way, when pattern mining processing is executed in multiple threads, processing time may become long due to variations in processing load for each thread (core) and inefficient use of the cache memory 125.

例えば、コアごとの処理負荷のばらつきへの対策として、ＯＳのスケジューラがコアの処理負荷を常時監視し、負荷の少ないコアに処理を割り振ることが考えられる。しかし、このような方法では、常時監視用のプログラムを要することになる。また、ＯＳのスケジューラでは、パターンマイニング処理以外の他のプログラムなどの処理を各コアに分散するに過ぎない。 For example, as a countermeasure against variations in processing load among cores, it is conceivable that the scheduler of the OS constantly monitors the processing load of the cores and allocates processing to cores with less load. However, such a method requires a program for constant monitoring. Furthermore, the OS scheduler merely distributes processing such as programs other than pattern mining processing to each core.

これに対し、データ処理装置１００では、コアへのスレッドの割り振りにスケジューラを用いず、組み合わせ計算の処理の実行前に割り振りを決定する。このため、実装が容易である。また、データ処理装置１００の方法は、扱う行列データが三角行列に近いデータであるほど、各コアの処理量の平準化に有用である。すなわち、データ処理装置１００の方法は、扱う行列データが三角行列に近いデータであるほど、各コアの負荷を均等に近くすることができる。また、データ処理装置１００は、キャッシュメモリ１２５にロードされたデータの再利用可能性を高めることで、ＲＡＭ１０２からキャッシュメモリ１２５へのデータのロード回数を削減でき、組み合わせ計算の全体の処理時間を短縮できる。 In contrast, the data processing device 100 does not use a scheduler to allocate threads to cores, but rather determines the allocation before executing the combinational calculation process. Therefore, implementation is easy. Further, the method of the data processing device 100 is more useful for leveling the processing amount of each core as the matrix data to be handled is closer to a triangular matrix. That is, the method of the data processing device 100 can make the loads of each core more uniform as the matrix data to be handled is closer to a triangular matrix. Furthermore, by increasing the reusability of data loaded into the cache memory 125, the data processing device 100 can reduce the number of times data is loaded from the RAM 102 to the cache memory 125, reducing the overall processing time for combinational calculations. can.

次に、第２～第４の実施の形態の処理手順を一般化した例を説明する。
図２６は、第２の実施の形態の組み合わせ計算を一般化したフローチャートである。
（Ｓ２０）演算制御部１６０は、計算対象の商品の組み合わせ数ｎがｎ＝２であるか否かを判定する。ここで、ｎは２以上の整数である。ｎ≠２の場合、ステップＳ２１に処理が進む。ｎ＝２の場合、ステップＳ２２に処理が進む。 Next, an example in which the processing procedures of the second to fourth embodiments are generalized will be described.
FIG. 26 is a flowchart generalizing the combination calculation of the second embodiment.
(S20) The calculation control unit 160 determines whether the number n of product combinations to be calculated is n=2. Here, n is an integer of 2 or more. If n≠2, the process proceeds to step S21. If n=2, the process proceeds to step S22.

（Ｓ２１）演算制御部１６０は、コア１２１～１２４を用いてｋ＝ｎ－１の組み合わせ計算を行い、各組み合わせの論理積Ａｎｄ_{（ｘ，．．．）}をリストＡとして出力する。そして、ステップＳ２３に処理が進む。 (S21) The calculation control unit 160 performs k=n-1 combination calculations using the cores 121 to 124, and outputs the logical product And _{(x, . . .)} of each combination as a list A. The process then proceeds to step S23.

（Ｓ２２）演算制御部１６０は、相手データｄをリストＡとする。リストＡは、相手データｄの一覧となる。そして、ステップＳ２３に処理が進む。
（Ｓ２３）割り当て部１５０は、リストＡの各データ、すなわち、論理積Ａｎｄ_{（ｘ，．．．）}を各コアに入れ子状に割り当てる。ステップＳ２３の割り当てでは、図８で例示した方法が用いられる。（ｘ，．．．）は、ｎ－１個の商品の組み合わせを示す。 (S22) The calculation control unit 160 sets the other party's data d to list A. List A is a list of partner data d. The process then proceeds to step S23.
(S23) The allocation unit 150 allocates each data in list A, that is, the logical product And _{(x, . . .)} to each core in a nested manner. In the assignment in step S23, the method illustrated in FIG. 8 is used. (x,...) indicates a combination of n-1 products.

（Ｓ２４）演算制御部１６０は、下記ステップＳ２５～Ｓ２７で示される任意のｎ個の商品の組み合わせ計算をコア１２１～１２４を用いて繰り返し実行する。組み合わせ計算の総回数は、_ｄ＿ｘＣ_ｎ回である。 (S24) The arithmetic control unit 160 repeatedly performs the combination calculation for arbitrary n products shown in steps S25 to S27 below using the cores 121 to 124. The total number of combination calculations is _{d_x} C _n times.

（Ｓ２５）コア１２１～１２４それぞれは、当該コアにて担当する組み合わせＡｎｄ_{（ｘ，．．．）}を選択し、未選択の組み合わせ（ｘ，．．．，ｄ）となる商品ｄのデータ（すなわち、相手データｄ）を選択する。（ｘ，．．．，ｄ）は、ｎ個の商品の組み合わせを示す。コア１２１～１２４それぞれは、ｎ個の商品ｘ，．．．，ｄの論理積Ａｎｄ_{（ｘ，．．．，ｄ）}を計算する。 (S25) Each of the cores 121 to 124 selects the combination And _(x,...) for which the core is in charge, and selects the data of the product d that is the unselected combination (x,..., d) (i.e. , select the other party's data d). (x,...,d) indicates a combination of n products. Each of the cores 121 to 124 stores n products x, . ．．．． , d, and _(x,...,d) .

（Ｓ２６）コア１２１～１２４それぞれは、論理積Ａｎｄ_{（ｘ，．．．，ｄ）}からｎ個の商品ｘ，．．．，ｄを購入した人数Ｓｕｍ_{（ｘ，．．．，ｄ）}を計数する。
（Ｓ２７）コア１２１～１２４それぞれは、Ｓｕｍ_{（ｘ，．．．，ｄ）}をリストＢに追加する。 (S26) Each of the cores ₁₂₁ to 124 calculates n products x, . ．．．． , _d is counted.
(S27) Each of the cores 121 to 124 adds Sum _{(x, . . . , d)} to list B.

（Ｓ２８）演算制御部１６０は、ステップＳ２５～Ｓ２７で示される任意のｎ個の商品の全ての組み合わせに対する計算を終了すると、ステップＳ２９に処理を進める。
（Ｓ２９）演算制御部１６０は、リストＢを出力する。リストＢは、データ記憶部１３０に格納される。リストＢは、商品（ｘ，．．．，ｄ）を購入した総和のリストとなる。そして、組み合わせ計算の処理が終了する。 (S28) When the calculation control unit 160 finishes calculations for all combinations of arbitrary n products shown in steps S25 to S27, it advances the process to step S29.
(S29) The calculation control unit 160 outputs list B. List B is stored in the data storage section 130. List B is a list of total purchases of products (x,...,d). Then, the combination calculation process ends.

図２７は、第３の実施の形態の組み合わせ計算を一般化したフローチャートである。
図２７の手順では、図２６のステップＳ２４の次にステップＳ２４ａが実行される点が、図２６の手順と異なる。そこで、以下ではステップＳ２４ａを主に説明し、他のステップの説明を省略する。 FIG. 27 is a flowchart generalizing the combination calculation of the third embodiment.
The procedure in FIG. 27 differs from the procedure in FIG. 26 in that step S24a is executed after step S24 in FIG. Therefore, step S24a will be mainly explained below, and explanations of other steps will be omitted.

（Ｓ２４ａ）コア１２１～１２４それぞれは、当該コアにて担当する組み合わせＡｎｄ_{（ｘ，．．．）}を選択し、未選択の組み合わせ（ｘ，．．．，ｄ）となる商品ｄを、折り返しとなる順序で選択する。ここで、折り返しとなる順序での選択方法には、図１５のマトリクス７００や図１６のマトリクス７００ａで例示した方法が用いられる。そして、ステップＳ２５に処理が進む。 (S24a) Each of the cores 121 to 124 selects the combination And _(x,...) for which it is in charge, and returns the product d that is the unselected combination (x,..., d). Select in the order shown. Here, the method exemplified in matrix 700 in FIG. 15 and matrix 700a in FIG. 16 is used as the selection method for the order of folding. The process then proceeds to step S25.

図２８は、第４の実施の形態の組み合わせ計算を一般化したフローチャートである。
図２８の手順は、ｎ≧３の場合であり、図２７の手順のステップＳ２０に代えて、ステップＳ２０ａが実行される点、および、ステップＳ２２が実行されない点が図２７の手順と異なる。そこで、以下ではステップＳ２０ａを主に説明し、他のステップの説明を省略する。なお、図２８の手順ではｎ≧３であるため、ステップＳ２１のｋについてｋ≧２となる。 FIG. 28 is a flowchart generalizing the combination calculation of the fourth embodiment.
The procedure in FIG. 28 is for the case where n≧3, and differs from the procedure in FIG. 27 in that step S20a is executed instead of step S20 of the procedure in FIG. 27, and step S22 is not executed. Therefore, step S20a will be mainly explained below, and explanations of other steps will be omitted. Note that since n≧3 in the procedure of FIG. 28, k≧2 in step S21.

（Ｓ２０ａ）割り当て部１５０は、セクタキャッシュの割合、すなわち、セクタ１４１，１４２の割合を指定し、データｚをセクタ１４２（セクタ＃１）に割り当てるように指定する。前述のように、割り当て部１５０は、コア数と、商品購入履歴データ２００の１列当たりのデータサイズとに基づいて、キャッシュ記憶部１４０のうちのセクタ１４１（セクタ＃０）のサイズを決定し、残りをセクタ１４２（セクタ＃１）のサイズとする。そして、ステップＳ２１に処理が進む。 (S20a) The allocation unit 150 specifies the sector cache ratio, that is, the ratio of sectors 141 and 142, and specifies that data z be allocated to sector 142 (sector #1). As described above, the allocation unit 150 determines the size of the sector 141 (sector #0) in the cache storage unit 140 based on the number of cores and the data size per column of the product purchase history data 200. , and the remainder is the size of sector 142 (sector #1). The process then proceeds to step S21.

このように、第２～第４の実施の形態の手順を一般化することができる。
第２～第４の実施の形態で説明したように、データ処理装置１００は次の処理を実行する。 In this way, the procedures of the second to fourth embodiments can be generalized.
As described in the second to fourth embodiments, the data processing device 100 executes the following processing.

割り当て部１５０は、複数の相手データそれぞれとの組み合わせによる演算に用いられる２Ｎ（Ｎは２以上の整数）個のデータから、演算対象の相手データの数でのソート結果における上位のＮ個の第１データと下位のＮ個の第２データとを特定する。割り当て部１５０は、上位のＮ個の第１データそれぞれを、演算対象の相手データの数の降順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。割り当て部１５０は、下位のＮ個の第２データそれぞれを、演算対象の相手データの数の昇順となるように、第１の演算部から第Ｎの演算部までのＮ個の演算部それぞれに割り当てる。演算制御部１６０は、Ｎ個の演算部に対する２Ｎ個のデータの割り当て結果に基づいて、２Ｎ個のデータのうちのＮ個のデータに対する演算をＮ個の演算部により並列に実行する。 The allocation unit 150 selects the top Nth pieces of data from 2N (N is an integer of 2 or more) data to be used for calculations in combination with each of a plurality of partner data in the sorting result based on the number of partner data to be calculated. 1 data and the lower N pieces of second data are specified. The allocation unit 150 assigns each of the N first data items to each of the N calculation units from the first calculation unit to the N-th calculation unit in descending order of the number of partner data to be calculated. assign. The allocation unit 150 assigns each of the N pieces of lower second data to each of the N pieces of calculation units from the first calculation unit to the Nth calculation unit in ascending order of the number of partner data to be calculated. assign. The calculation control unit 160 causes the N calculation units to execute calculations on N data out of the 2N data in parallel based on the results of allocation of 2N data to the N calculation units.

これにより、データ処理装置１００は、複数の演算部（Ｎ個の演算部）の処理量のばらつきを低減できる。コア１２１～１２４は、４個の演算部の一例である。なお、データ処理装置１００の機能は、図５，６に例示されるように、扱うデータが三角行列に近いほど、複数の演算部の処理量のばらつきを小さくできる。また、図５のマトリクス２０１の例では、行に対応するｙのデータに対して、列に対応するｘのデータが相手データの一例となる。図６のマトリクス２０２の例では、行に対応するＡｎｄ_{（ｘ，ｙ）}のデータに対して、列に対応するｚのデータが相手データの一例となる。 Thereby, the data processing device 100 can reduce variations in the processing amount of the plurality of calculation units (N calculation units). Cores 121 to 124 are examples of four calculation units. Note that, as illustrated in FIGS. 5 and 6, the functions of the data processing device 100 are such that the closer the data to be handled is to a triangular matrix, the smaller the variation in the processing amount of the plurality of calculation units can be. Furthermore, in the example of the matrix 201 in FIG. 5, x data corresponding to a column is an example of counterpart data for y data corresponding to a row. In the example of the matrix 202 in FIG. 6, z data corresponding to a column is an example of counterpart data to And _{(x, y)} data corresponding to a row.

また、上位のＮ個の第１データそれぞれに対する演算の実行では、Ｎ個の演算部は、複数の相手データのうち当該演算に用いる相手データを第１の順序で選択してもよい。下位のＮ個の第２データそれぞれに対する演算の実行では、Ｎ個の演算部は、複数の相手データのうち当該演算に用いる相手データを第１の順序とは逆の順序で選択してもよい。 Furthermore, when performing a calculation on each of the N pieces of high-order first data, the N calculation units may select the partner data to be used for the calculation from among the plurality of partner data in the first order. When performing an operation on each of the lower N pieces of second data, the N calculation units may select the partner data to be used for the operation from among the plurality of partner data in an order opposite to the first order. .

これにより、データ処理装置１００は、Ｎ個の演算部が共用するキャッシュメモリ１２５上に相手データをロードする回数を低減できる。その結果、データ処理装置１００は、キャッシュメモリ１２５へのロードに伴うオーバーヘッドを低減でき、演算の高速化を図れる。 Thereby, the data processing device 100 can reduce the number of times that the other party's data is loaded onto the cache memory 125 shared by the N calculation units. As a result, the data processing device 100 can reduce the overhead associated with loading into the cache memory 125, and can speed up calculations.

また、割り当て部１５０は、Ｎ個の演算部によりアクセスされるキャッシュメモリ１２５の記憶領域を第１記憶領域と第２記憶領域とに分割してもよい。そして、Ｎ個の演算部は、２Ｎ個のデータのうちの演算の実行対象のデータを第１記憶領域にロードし、複数の相手データのうちの演算の実行対象の相手データを第２記憶領域にロードしてもよい。 Furthermore, the allocation unit 150 may divide the storage area of the cache memory 125 that is accessed by the N calculation units into a first storage area and a second storage area. Then, the N calculation units load the data on which the operation is to be performed out of the 2N pieces of data into the first storage area, and load the partner data on which the operation is to be performed among the plurality of partner data into the second storage area. You can also load it into

これにより、データ処理装置１００は、再利用可能性の高い相手データがキャッシュメモリ１２５から追い出されることを抑制し、キャッシュメモリ１２５上に相手データをロードする回数を一層低減できる。セクタ１４１は、第１記憶領域の一例である。セクタ１４２は、第２記憶領域の一例である。 Thereby, the data processing device 100 can suppress the expulsion of counterpart data with a high possibility of reuse from the cache memory 125, and further reduce the number of times that counterpart data is loaded onto the cache memory 125. Sector 141 is an example of a first storage area. Sector 142 is an example of a second storage area.

例えば、割り当て部１５０は、２Ｎ個のデータそれぞれのサイズである第１サイズにＮを乗じた値に基づいて、第１記憶領域のサイズを決定する。これにより、データ処理装置１００は、第１記憶領域のサイズを適切に決定でき、キャッシュメモリ１２５上に相手データをロードする回数を効率的に低減できる。例えば、データ処理装置１００は、演算部の数Ｎに対して、キャッシュメモリ１２５の記憶領域の使用可能なサイズのうち、第１記憶領域のサイズを必要最小限に定め、残りを第２記憶領域のサイズとする。このようにすると、第２記憶領域のサイズを比較的大きくすることができ、相手データを保持できるサイズが大きくなる。このため、データ処理装置１００は、キャッシュメモリ１２５上に相手データをロードする回数をより一層低減できる。 For example, the allocation unit 150 determines the size of the first storage area based on the value obtained by multiplying the first size, which is the size of each of the 2N pieces of data, by N. Thereby, the data processing device 100 can appropriately determine the size of the first storage area, and can efficiently reduce the number of times that the other party's data is loaded onto the cache memory 125. For example, the data processing device 100 determines the size of the first storage area to be the minimum necessary size among the usable storage areas of the cache memory 125 for the number N of calculation units, and sets the remaining size to the second storage area. The size shall be . In this way, the size of the second storage area can be made relatively large, and the size that can hold the other party's data becomes large. Therefore, the data processing device 100 can further reduce the number of times the other party's data is loaded onto the cache memory 125.

なお、第１の実施の形態の情報処理は、処理部１２にプログラムを実行させることで実現できる。また、第２の実施の形態の情報処理は、ＣＰＵ１０１にプログラムを実行させることで実現できる。プログラムは、コンピュータ読み取り可能な記録媒体１１３に記録できる。 Note that the information processing in the first embodiment can be realized by causing the processing unit 12 to execute a program. Further, the information processing according to the second embodiment can be realized by causing the CPU 101 to execute a program. The program can be recorded on a computer-readable recording medium 113.

例えば、プログラムを記録した記録媒体１１３を配布することで、プログラムを流通させることができる。また、プログラムを他のコンピュータに格納しておき、ネットワーク経由でプログラムを配布してもよい。コンピュータは、例えば、記録媒体１１３に記録されたプログラムまたは他のコンピュータから受信したプログラムを、ＲＡＭ１０２やＨＤＤ１０３などの記憶装置に格納し（インストールし）、当該記憶装置からプログラムを読み込んで実行してもよい。 For example, the program can be distributed by distributing the recording medium 113 on which the program is recorded. Alternatively, the program may be stored in another computer and distributed via a network. For example, the computer may store (install) a program recorded on the recording medium 113 or a program received from another computer in a storage device such as the RAM 102 or the HDD 103, and read and execute the program from the storage device. good.

１０データ処理装置
１１記憶部
１１ａテーブル
１２処理部
１２ａ，１２ｂ演算部
２０実行例
10 data processing device 11 storage unit 11a table 12 processing unit 12a, 12b calculation unit 20 execution example

Claims

to the computer,
From 2N (N is an integer of 2 or more) data used for calculations in combination with each of multiple partner data, the upper N first data and the lower N pieces of second data and
Allocating each of the N pieces of first data at the higher rank to each of the N pieces of calculation units from the first calculation unit to the N-th calculation unit in descending order of the number of the partner data to be calculated, and Allocating each of the lower N pieces of second data to each of the N pieces of calculation units from the first calculation unit to the Nth calculation unit in ascending order of the number of the partner data to be calculated. ,
Based on the result of allocation of the 2N data to the N calculation units, the N calculation units execute the calculation on N data of the 2N data in parallel;
A data processing program that executes processing.

In executing the calculation on each of the N first data items of the higher order, selecting the partner data to be used for the calculation among the plurality of partner data in a first order;
In executing the operation on each of the N pieces of second data at the lower level, selecting the partner data to be used for the operation from among the plurality of partner data in an order opposite to the first order;
The data processing program according to claim 1.

The storage area of the cache memory accessed by the N calculation units is divided into a first storage area and a second storage area, and the data to be executed for the calculation among the 2N pieces of data is stored in the first storage area. loading into the second storage area, and loading the partner data that is the target of the operation among the plurality of partner data into the second storage area;
The data processing program according to claim 2.

4. The data processing program according to claim 3, wherein the size of the first storage area is determined based on a value obtained by multiplying a first size, which is the size of each of the 2N pieces of data, by N.

The computer is
From 2N (N is an integer greater than or equal to 2) data used in an operation in combination with each of multiple partner data, the upper N first data and the lower N pieces of second data, and
Allocating each of the N pieces of high-rank first data to each of the N pieces of calculation units from the first calculation unit to the N-th calculation unit in descending order of the number of the partner data to be calculated, and assigning each of the lower N second data to each of the N calculation units from the first calculation unit to the N-th calculation unit in ascending order of the number of the partner data to be calculated; ,
Based on the result of allocation of the 2N data to the N calculation units, the N calculation units execute the calculation on N data among the 2N data in parallel;
Data processing methods.

A storage unit that stores 2N (N is an integer equal to or greater than 2) pieces of data used in a calculation based on a combination with each of a plurality of pieces of partner data;
a processing unit which identifies, from the 2N pieces of data, the top N pieces of first data and the bottom N pieces of second data in a sorting result by the number of pieces of partner data to be operated on, assigns the top N pieces of first data to each of the N pieces of calculation units from the first calculation unit to the Nth calculation unit in descending order of the number of pieces of partner data to be operated on, and assigns the bottom N pieces of second data to each of the N pieces of calculation units from the first calculation unit to the Nth calculation unit in ascending order of the number of pieces of partner data to be operated on, and executes the calculation on N pieces of data of the 2N pieces of data in parallel by the N calculation units based on the allocation result of the 2N pieces of data to the N calculation units;
A data processing device comprising: