CN110554988A - high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation - Google Patents

high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation Download PDF

Info

Publication number
CN110554988A
CN110554988A CN201810560114.XA CN201810560114A CN110554988A CN 110554988 A CN110554988 A CN 110554988A CN 201810560114 A CN201810560114 A CN 201810560114A CN 110554988 A CN110554988 A CN 110554988A
Authority
CN
China
Prior art keywords
cpu
gpu
target
data
domination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810560114.XA
Other languages
Chinese (zh)
Inventor
李征
雒文启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN201810560114.XA priority Critical patent/CN110554988A/en
Publication of CN110554988A publication Critical patent/CN110554988A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and belongs to the fields of operation and research optimization, multi-target optimization, high-performance computation and the like. The method is designed based on three aspects of a task dividing mode, a memory type of data and the data amount exchanged between the CPU and the GPU, so that the calculation efficiency and the practicability of the domination algorithm are improved. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination. Compared with a serial high-dimensional multi-target LWM domination method, the method solves the problem of low calculation efficiency of the serial method, and meanwhile, the calculation efficiency and the practicability of a domination algorithm are improved.

Description

high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation
Technical Field
The invention relates to a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and belongs to the fields of operational optimization, multi-target optimization, high-performance computation and the like.
background
Aiming at the high-dimensional multi-target optimization problem, along with the increase of the number of targets, most solution sets become non-dominated solutions based on a Pareto dominated algorithm, and further the problem of loss of selection pressure is caused. The process of solving the LWM non-dominated solution is a process of solving a linear programming problem, which is the LWM non-dominated solution if the linear programming problem has an unbounded solution or an optimal solution and is greater than a least positive number. However, as the number of optimization targets and the set of candidate solutions increase, solving the linear programming problem is very time consuming. In order to effectively use the LWM domination relation in solving the practical high-dimensional multi-objective optimization problem, a parallel LWM domination algorithm based on CPU + GPU heterogeneous computation is provided, the parallel processing capacity of a GPU is fully utilized, and the computation efficiency is improved.
Disclosure of Invention
the invention aims to provide a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation. The method is designed based on three aspects of a task dividing mode, a memory type of data and data quantity exchanged between a CPU and a GPU, so that the calculation efficiency and the practicability of a domination algorithm are improved.
in order to achieve the purpose, the scheme adopted by the invention is a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation, and the CPU + GPU heterogeneous computation is applied to solving a high-dimensional multi-target optimization problem. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:
The first and the optimal solution are chosen: carrying out optimal solution judgment on the linear programming problem, and converting the CPU side judgment logic into a GPU side parallel computing strategy;
Second, unbounded solution processing: solving the unbounded solution of the linear programming problem, converting the CPU judgment logic into a GPU parallel computing strategy, and performing parallel acceleration by using a protocol strategy;
thirdly, processing the radical variable: acquiring a minimum sequence number corresponding to the radical variable by using a protocol strategy;
Fourthly, normalization: normalizing a row of data corresponding to the radical variable;
Fifthly, elimination: and performing Gaussian elimination to enable the algorithm to approach the optimal solution.
These five sections are described in detail below, respectively.
The first and the optimal solution are chosen: carrying out optimal solution judgment on the linear programming problem, and converting the CPU side judgment logic into a GPU side parallel computing strategy;
And in the aspect of task division, the judgment logic is converted into a process of solving the maximum value, and the calculation is carried out by utilizing a reduction strategy. Each Block thread allocation is one-dimensional, numbering 256. From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.
Second, unbounded solution processing: solving the unbounded solution of the linear programming problem, converting the CPU judgment logic into a GPU parallel computing strategy, and performing parallel acceleration by using a protocol strategy;
In the aspect of task division, the time complexity of the process is O (n + m) which can be converted into a process of solving the most value, and the calculation is carried out by using a reduction strategy. Each Block thread allocation is one-dimensional, numbering 256. From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.
thirdly, processing the radical variable: acquiring a minimum sequence number corresponding to the radical variable by using a protocol strategy;
The time complexity of the process is O (n + m) at the task division level, and the calculation is carried out by using a reduction strategy. By utilizing the characteristic of small correlation among the elements of the calculated matrix, the distribution of each Block thread is one-dimensional, the quantity is 256, and the theoretical time complexity of parallel calculation is O (1). From the data transfer and storage plane, the amount of incoming data is n x (m + n), where m is the target number and n is the solution set size. Due to the large amount of data transferred, the data is stored in the global memory.
Fourthly, normalization: normalizing a row of data corresponding to the radical variable;
the normalized fractional temporal complexity is O (n + m). By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal Thread number in a single Block is 256, and the theoretical time complexity of parallel computing is O (1). Because the data volume required to be processed is n + m, in order to ensure the efficiency, a storage mode of a shared memory is adopted.
Fifthly, elimination: and performing Gaussian elimination to enable the algorithm to approach the optimal solution.
The epoch-elimination portion time complexity is O (n × n + m). Data sharing is not needed among the blocks, two-dimensional distribution is reasonable (16,16) in the Block internal thread distribution, and the theoretical time complexity of parallel computing is O (n + m). The intermediate data amount required by each block is (m + n), and the intermediate data amount can be stored in the shared memory for ensuring the efficiency.
drawings
FIG. 1 is a flow chart of an embodiment of the method of the present invention.
Detailed Description
The invention aims to provide a high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation. The method is designed based on three aspects of a task dividing mode, a memory type of data and data quantity exchanged between a CPU and a GPU, so that the calculation efficiency and the practicability of a domination algorithm are improved. The method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:
the first and the optimal solution are chosen: carrying out optimal solution judgment on the linear programming problem, and converting the CPU side judgment logic into a GPU side parallel computing strategy;
second, unbounded solution processing: solving the unbounded solution of the linear programming problem, converting the CPU judgment logic into a GPU parallel computing strategy, and performing parallel acceleration by using a protocol strategy;
Thirdly, processing the radical variable: acquiring a minimum sequence number corresponding to the radical variable by using a protocol strategy;
Fourthly, normalization: normalizing a row of data corresponding to the radical variable;
fifthly, elimination: and performing Gaussian elimination to enable the algorithm to approach the optimal solution.
Specifically, the implementation mode is as shown in the implementation flowchart of fig. one, where the left side of the flowchart is task allocation at the CPU side, and the right side of the flowchart is task allocation at the GPU side. The whole method task is divided according to the above 5 steps, the logic judgment part is distributed at the CPU side, and the calculation intensive part is executed at the GPU side. In order to avoid the transmission of a large amount of data, calculation tasks in three stages of optimal solution decision, unbounded solution processing and radical variable processing are distributed at a GPU end, and a logic judgment task is distributed at a CPU end. The normalization and the elimination are two stages with the largest calculation amount of the method, and the calculation-intensive task is very suitable for parallel calculation at a GPU end, so that the normalization and the elimination are distributed to be executed at the GPU end. Details of the transfer of data and workload partitioning between a particular CPU and GPU can be observed in figure one. Only one flag value is used to judge whether the linear optimization has an optimal solution or an unbounded solution, and the whole data is not transmitted for judgment.
The first and the optimal solution are chosen: if all coefficients for the non-base variables are not greater than 0, then there must be an optimal solution. In the parallel algorithm, the maximum value of the coefficient corresponding to the non-basic variable can be calculated at the GPU end, and whether the maximum value is larger than 0 or not is judged at the CPU end. Meanwhile, only the sequence number of the radical variable is transmitted, and the strategy can avoid transmitting the whole data between the CPU and the GPU. The maximum worth process is calculated at the GPU end, and the strategy of specification in cuda programming is used.
Second, unbounded solution processing: if the iteration does not have the optimal solution, the serial number of the entry variable is obtained at the CPU end, and the serial number value is transmitted to the GPU end. If none of the sequence numbers correspond to a column value of 0, then an unbounded solution exists. If we do unbounded solution judgment at the CPU end, the intermediate result at the GPU end needs to be transferred to the CPU end, which is time-consuming. Therefore, the judgment can be carried out at the GPU end, the same as the judgment of the optimal solution, the maximum value of the row of values corresponding to the basis variable is calculated, the maximum value is transmitted to the CPU end, and then whether the maximum value is larger than 0 or not is judged at the CPU end. In the process of solving the maximum value, the maximum value can be quickly solved by applying the strategy of Reduce. Therefore, not only is the consumption of time caused by transmission of data avoided, but also the maximum value can be solved in multiple threads by using the Reduce strategy in cuda. The GPU-side thread allocation is one-dimensional, and 256 threads allocated to each Block are optimal through experiments.
Thirdly, processing the radical variable: in order to obtain the radical variable, the smallest number corresponding to the radical variable needs to be found. When the CPU is simply executed, the intermediate data of the GPU needs to be transmitted, so that the efficiency of the whole algorithm is reduced. And obtaining the minimum value at the GPU end, obtaining the sequence number value by adopting a Reduce strategy, and finally transmitting the sequence number value to the CPU end. GPU-side thread allocation is also one-dimensional, and each Block allocates 256 threads.
Fourthly, normalization: the gaussian elimination is applied across the entire matrix before the algorithm starts to iterate. In the experiment we first normalise the line of data for the radical variable. To increase the data acquisition rate, we store the row of data in the GPU shared memory, and then process one element per thread. The GPU side thread allocation is one-dimensional, with 256 threads allocated per Block.
Fifthly, elimination: after normalization is carried out, the row of data corresponding to the base variable is stored in a shared memory, then threads are redistributed, according to the characteristic of two dimensions of the matrix, in order to fully utilize GPU computing efficiency, two-dimensional (16,16) threads are distributed to each Block, and therefore each thread distributes each element in the corresponding matrix, and GPU computing resources can be effectively utilized.

Claims (6)

1. A high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation is characterized in that: aiming at the high-dimensional multi-objective optimization problem, most of solution sets become non-dominated solutions along with the increase of the number of targets in the Pareto dominated algorithm, and further the problem of loss of selection pressure is caused. A high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation can solve the problems of selection pressure and algorithm time efficiency, and has an important effect on multi-target planning; the heterogeneous calculation domination method comprises five core parts of optimal solution choice, unbounded solution processing, radical variable processing, normalization and elimination:
the first and the optimal solution are chosen: carrying out optimal solution judgment on the linear programming problem, and converting the CPU side judgment logic into a GPU side parallel computing strategy;
second, unbounded solution processing: solving the unbounded solution of the linear programming problem, converting the CPU judgment logic into a GPU parallel computing strategy, and performing parallel acceleration by using a protocol strategy;
Thirdly, processing the radical variable: acquiring a minimum sequence number corresponding to the radical variable by using a protocol strategy;
fourthly, normalization: normalizing a row of data corresponding to the radical variable;
Fifthly, elimination: and performing Gaussian elimination to enable the algorithm to approach the optimal solution.
2. the high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: and converting the judgment logic into a process of solving the maximum value, and calculating by using a reduction strategy. Each Block thread allocation is one-dimensional, the number is 256, and data is stored in a global memory.
3. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: and converting the judgment logic into a process of solving the maximum value, and calculating by using a reduction strategy. Each Block thread allocation is one-dimensional, the number is 256, and the theoretical time complexity of parallel computing is O (1). The data input quantity is n (m + n), wherein m is the target number and n is the size of the solution set. The data is stored in global memory.
4. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the time complexity of the discrete variable phase is O (n + m), and the GPU terminal utilizes a reduction strategy to calculate. By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal number of threads in a single Block is 256, and the parallel calculation complexity is O (1). Since the incoming data is large, it is stored in global memory in the GPU.
5. The high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the normalized fractional temporal complexity is O (n + m). By utilizing the characteristic that the correlation among the elements of the calculated matrix is not large, each Thread in the Block processes one element, the optimal Thread number in a single Block is 256, and the theoretical time complexity of parallel computing is O (1). Because the data volume required to be processed is n + m, in order to ensure the efficiency, a storage mode of a shared memory is adopted.
6. the high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation of claim 1, characterized in that: the epoch-elimination portion time complexity is O (n × n + m). Data sharing is not needed among the blocks, two-dimensional distribution is reasonable (16,16) in the Block internal thread distribution, and the theoretical time complexity of parallel computing is O (n + m). The intermediate data amount required by each block is (m + n), and the intermediate data amount can be stored in the shared memory for ensuring the efficiency.
CN201810560114.XA 2018-06-03 2018-06-03 high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation Pending CN110554988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810560114.XA CN110554988A (en) 2018-06-03 2018-06-03 high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810560114.XA CN110554988A (en) 2018-06-03 2018-06-03 high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation

Publications (1)

Publication Number Publication Date
CN110554988A true CN110554988A (en) 2019-12-10

Family

ID=68735373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810560114.XA Pending CN110554988A (en) 2018-06-03 2018-06-03 high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation

Country Status (1)

Country Link
CN (1) CN110554988A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651273A (en) * 2020-05-29 2020-09-11 中国人民解放军国防科技大学 GPU-based large-capacity short burst signal receiver design
CN111970112A (en) * 2020-08-10 2020-11-20 山东大学 Ether house deployment method and system based on ZYNQ heterogeneous computing platform

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651273A (en) * 2020-05-29 2020-09-11 中国人民解放军国防科技大学 GPU-based large-capacity short burst signal receiver design
CN111651273B (en) * 2020-05-29 2023-05-05 中国人民解放军国防科技大学 High-capacity short burst signal receiver design based on GPU
CN111970112A (en) * 2020-08-10 2020-11-20 山东大学 Ether house deployment method and system based on ZYNQ heterogeneous computing platform
CN111970112B (en) * 2020-08-10 2022-01-21 山东大学 Ether house deployment method and system based on ZYNQ heterogeneous computing platform

Similar Documents

Publication Publication Date Title
CN106970896B (en) Vector processor-oriented vectorization implementation method for two-dimensional matrix convolution
CN109543830B (en) Splitting accumulator for convolutional neural network accelerator
CN103984527B (en) Optimization Sparse Matrix-Vector multiplies the method for lifting incompressible pipe flow field simulation efficiency
US9886418B2 (en) Matrix operands for linear algebra operations
CN108805266A (en) A kind of restructural CNN high concurrents convolution accelerator
KR20190091858A (en) Heterogenous Processor Architecture to Integrate CNN and RNN Neural Networks on a Single Chip
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN109840154A (en) A kind of computation migration method that task based access control relies under mobile cloud environment
CN111858465B (en) Large-scale matrix QR decomposition parallel computing system
CN110554988A (en) high-dimensional multi-target domination method based on CPU + GPU heterogeneous computation
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN102110079A (en) Tuning calculation method of distributed conjugate gradient method based on MPI
CN105373845A (en) Hybrid intelligent scheduling optimization method of manufacturing enterprise workshop
US20240119114A1 (en) Matrix Multiplier and Matrix Multiplier Control Method
CN109993293A (en) A kind of deep learning accelerator suitable for stack hourglass network
CN105229608A (en) Based on the database processing towards array of coprocessor
CN109740619B (en) Neural network terminal operation method and device for target recognition
CN113313252B (en) Depth separable convolution implementation method based on pulse array
CN107133332A (en) The distribution method and device of a kind of query task
CN104933110B (en) A kind of data prefetching method based on MapReduce
CN106648901A (en) Multichannel signal correlation analyzing method and system
US11886347B2 (en) Large-scale data processing computer architecture
CN108984470A (en) A kind of FPGA mine machine calculates the lifting system and method for power
CN114546652A (en) Parameter estimation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191210