CN110554988A

CN110554988A - high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation

Info

Publication number: CN110554988A
Application number: CN201810560114.XA
Authority: CN
Inventors: 李征; 雒文启
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2018-06-03
Filing date: 2018-06-03
Publication date: 2019-12-10

Abstract

The invention discloses a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and belongs to the fields of operation and research optimization, multi-target optimization, high-performance computation and the like. The method is designed based on three aspects of a task dividing mode, a memory type of data and the data amount exchanged between the CPU and the GPU, so that the calculation efficiency and the practicability of the domination algorithm are improved. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination. Compared with a serial high-dimensional multi-target LWM domination method, the method solves the problem of low calculation efficiency of the serial method, and meanwhile, the calculation efficiency and the practicability of a domination algorithm are improved.

Description

high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation

Technical Field

The invention relates to a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and belongs to the fields of operational optimization, multi-target optimization, high-performance computation and the like.

background

Aiming at the high-dimensional multi-target optimization problem, along with the increase of the number of targets, most solution sets become non-dominated solutions based on a Pareto dominated algorithm, and further the problem of loss of selection pressure is caused. The process of solving the LWM non-dominated solution is a process of solving a linear programming problem, which is the LWM non-dominated solution if the linear programming problem has an unbounded solution or an optimal solution and is greater than a least positive number. However, as the number of optimization targets and the set of candidate solutions increase, solving the linear programming problem is very time consuming. In order to effectively use the LWM domination relation in solving the practical high-dimensional multi-objective optimization problem, a parallel LWM domination algorithm based on CPU + GPU heterogeneous computation is provided, the parallel processing capacity of a GPU is fully utilized, and the computation efficiency is improved.

Disclosure of Invention

the invention aims to provide a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation. The method is designed based on three aspects of a task dividing mode, a memory type of data and data quantity exchanged between a CPU and a GPU, so that the calculation efficiency and the practicability of a domination algorithm are improved.

in order to achieve the purpose, the scheme adopted by the invention is a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and the CPU + GPU heterogeneous computation is applied to solving a high-dimensional multi-target optimization problem. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:

The first and the optimal solution are chosen: carrying out optimal solution judgment on the linear programming problem, and converting the CPU side judgment logic into a GPU side parallel computing strategy;

Second, unbounded solution processing: solving the unbounded solution of the linear programming problem, converting the CPU judgment logic into a GPU parallel computing strategy, and performing parallel acceleration by using a protocol strategy;

thirdly, processing the radical variable: acquiring a minimum sequence number corresponding to the radical variable by using a protocol strategy;

Fourthly, normalization: normalizing a row of data corresponding to the radical variable;

Fifthly, elimination: and performing Gaussian elimination to enable the algorithm to approach the optimal solution.

These five sections are described in detail below, respectively.

And in the aspect of task division, the judgment logic is converted into a process of solving the maximum value, and the calculation is carried out by utilizing a reduction strategy. Each Block thread allocation is one-dimensional, numbering 256. From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.

In the aspect of task division, the time complexity of the process is O (n + m) which can be converted into a process of solving the most value, and the calculation is carried out by using a reduction strategy. Each Block thread allocation is one-dimensional, numbering 256. From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.

The time complexity of the process is O (n + m) at the task division level, and the calculation is carried out by using a reduction strategy. By utilizing the characteristic of small correlation among the elements of the calculated matrix, the distribution of each Block thread is one-dimensional, the quantity is 256, and the theoretical time complexity of parallel calculation is O (1). From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.

the normalized fractional temporal complexity is O (n + m). By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal Thread number in a single Block is 256, and the theoretical time complexity of parallel computing is O (1). Because the data volume required to be processed is n + m, in order to ensure the efficiency, a storage mode of a shared memory is adopted.

The epoch-elimination portion time complexity is O (n × n + m). Data sharing is not needed among the blocks, two-dimensional distribution is reasonable (16,16) in the Block internal thread distribution, and the theoretical time complexity of parallel computing is O (n + m). The intermediate data amount required by each block is (m + n), and the intermediate data amount can be stored in the shared memory for ensuring the efficiency.

drawings

FIG. 1 is a flow chart of an embodiment of the method of the present invention.

Detailed Description

The invention aims to provide a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation. The method is designed based on three aspects of a task dividing mode, a memory type of data and data quantity exchanged between a CPU and a GPU, so that the calculation efficiency and the practicability of a domination algorithm are improved. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:

Specifically, the implementation mode is as shown in the implementation flowchart of fig. one, where the left side of the flowchart is task allocation at the CPU side, and the right side of the flowchart is task allocation at the GPU side. The whole method task is divided according to the above 5 steps, the logic judgment part is distributed at the CPU side, and the calculation intensive part is executed at the GPU side. In order to avoid the transmission of a large amount of data, calculation tasks in three stages of optimal solution decision, unbounded solution processing and radical variable processing are distributed at a GPU end, and a logic judgment task is distributed at a CPU end. The normalization and the elimination are two stages with the largest calculation amount of the method, and the calculation-intensive task is very suitable for parallel calculation at a GPU end, so that the normalization and the elimination are distributed to be executed at the GPU end. Details of the transfer of data and workload partitioning between a particular CPU and GPU can be observed in figure one. Only one flag value is used to judge whether the linear optimization has an optimal solution or an unbounded solution, and the whole data is not transmitted for judgment.

The first and the optimal solution are chosen: if all coefficients for the non-base variables are not greater than 0, then there must be an optimal solution. In the parallel algorithm, the maximum value of the coefficient corresponding to the non-basic variable can be calculated at the GPU end, and whether the maximum value is larger than 0 or not is judged at the CPU end. Meanwhile, only the sequence number of the radical variable is transmitted, and the strategy can avoid transmitting the whole data between the CPU and the GPU. The maximum worth process is calculated at the GPU end, and the strategy of specification in cuda programming is used.

Second, unbounded solution processing: if the iteration does not have the optimal solution, the serial number of the entry variable is obtained at the CPU end, and the serial number value is transmitted to the GPU end. If none of the sequence numbers correspond to a column value of 0, then an unbounded solution exists. If we do unbounded solution judgment at the CPU end, the intermediate result at the GPU end needs to be transferred to the CPU end, which is time-consuming. Therefore, the judgment can be carried out at the GPU end, the same as the judgment of the optimal solution, the maximum value of the row of values corresponding to the basis variable is calculated, the maximum value is transmitted to the CPU end, and then whether the maximum value is larger than 0 or not is judged at the CPU end. In the process of solving the maximum value, the maximum value can be quickly solved by applying the strategy of Reduce. Therefore, not only is the consumption of time caused by transmission of data avoided, but also the maximum value can be solved in multiple threads by using the Reduce strategy in cuda. The GPU-side thread allocation is one-dimensional, and 256 threads allocated to each Block are optimal through experiments.

Thirdly, processing the radical variable: in order to obtain the radical variable, the smallest number corresponding to the radical variable needs to be found. When the CPU is simply executed, the intermediate data of the GPU needs to be transmitted, so that the efficiency of the whole algorithm is reduced. And obtaining the minimum value at the GPU end, obtaining the sequence number value by adopting a Reduce strategy, and finally transmitting the sequence number value to the CPU end. GPU-side thread allocation is also one-dimensional, and each Block allocates 256 threads.

Fourthly, normalization: the gaussian elimination is applied across the entire matrix before the algorithm starts to iterate. In the experiment we first normalise the line of data for the radical variable. To increase the data acquisition rate, we store the row of data in the GPU shared memory, and then process one element per thread. The GPU side thread allocation is one-dimensional, with 256 threads allocated per Block.

Fifthly, elimination: after normalization is carried out, the row of data corresponding to the base variable is stored in a shared memory, then threads are redistributed, according to the characteristic of two dimensions of the matrix, in order to fully utilize GPU computing efficiency, two-dimensional (16,16) threads are distributed to each Block, and therefore each thread distributes each element in the corresponding matrix, and GPU computing resources can be effectively utilized.

Claims

1. A high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation is characterized in that: aiming at the high-dimensional multi-objective optimization problem, most of solution sets become non-dominated solutions along with the increase of the number of targets in the Pareto dominated algorithm, and further the problem of loss of selection pressure is caused. A high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation can solve the problems of selection pressure and algorithm time efficiency, and has an important effect on multi-target planning; the heterogeneous calculation domination method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:

2. the high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: and converting the judgment logic into a process of solving the maximum value, and calculating by using a reduction strategy. Each Block thread allocation is one-dimensional, the number is 256, and data is stored in a global memory.

3. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: and converting the judgment logic into a process of solving the maximum value, and calculating by using a reduction strategy. Each Block thread allocation is one-dimensional, the number is 256, and the theoretical time complexity of parallel computing is O (1). The data input quantity is n (m + n), wherein m is the target number and n is the size of the solution set. The data is stored in global memory.

4. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the time complexity of the discrete variable phase is O (n + m), and the GPU terminal utilizes a reduction strategy to calculate. By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal number of threads in a single Block is 256, and the parallel calculation complexity is O (1). Since the incoming data is large, it is stored in global memory in the GPU.

5. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the normalized fractional temporal complexity is O (n + m). By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal Thread number in a single Block is 256, and the theoretical time complexity of parallel computing is O (1). Because the data volume required to be processed is n + m, in order to ensure the efficiency, a storage mode of a shared memory is adopted.

6. the high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the epoch-elimination portion time complexity is O (n × n + m). Data sharing is not needed among the blocks, two-dimensional distribution is reasonable (16,16) in the Block internal thread distribution, and the theoretical time complexity of parallel computing is O (n + m). The intermediate data amount required by each block is (m + n), and the intermediate data amount can be stored in the shared memory for ensuring the efficiency.