CN110389784A - A kind of compiled query processing method in image processor environment - Google Patents
A kind of compiled query processing method in image processor environment Download PDFInfo
- Publication number
- CN110389784A CN110389784A CN201910678918.4A CN201910678918A CN110389784A CN 110389784 A CN110389784 A CN 110389784A CN 201910678918 A CN201910678918 A CN 201910678918A CN 110389784 A CN110389784 A CN 110389784A
- Authority
- CN
- China
- Prior art keywords
- gpu
- kernel
- image processor
- logic
- query processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The present invention provides the compiled query processing method in a kind of image processor environment, belongs to processor technical field.The present invention constructs compound kernel in GPU first;Then multiple channels are constructed, each channel is made of the complex nucleus constructed in multiple step 1, and complicated logic is able to carry out in each complex nucleus;Parallelization between channel is set again and handles data;When GPU receives query compiler instruction, for one channel of query compiler instruction distribution, each complex nucleus executes a logic step, while by intermediate data storage in the main memory of GPU.The present invention solves the problems, such as that the query processing speed of existing GPU formula image processor is limited by bandwidth memory.The query processing of image processor of the present invention.
Description
Technical field
The present invention relates to a kind of compiled query processing methods, belong to processor technical field.
Background technique
Currently, the query processing of GPU (Graphics Processing Unit graphics processing unit) formula image processor
The serious limitation mobile by data.Due within one device with the meter of teraflops (trillion times floating-point operation per second)
Handling capacity is calculated, even high bandwidth memory can not also provide enough data rationally to utilize.And query compiler processing is one
Kind improves the advanced technology of memory efficient.GPU is commonly used as the powerful accelerator of query processing.Since the arithmetic of coprocessor gulps down
The amount of spitting reaches peak ranges, therefore the teledata with enough data is challenged.The reason is that having in high bandwidth
The hardware data card deposited has the read and write rate of hundreds of GB/s.Due to various reasons, memory intensive application is still present in
Under the mobile cost of data.Conventional method by intermediate data there are in main memory, with when go in main memory to read, then again will
New step intermediate data writes back to main memory, and I/O times there are many number, and bandwidth becomes the bottleneck of query processing speed naturally.
Summary of the invention
The present invention is to solve the problems, such as that the query processing speed of existing GPU formula image processor is limited by bandwidth memory,
Provide the compiled query processing method in a kind of image processor environment.
Compiled query processing method in a kind of image processor environment of the present invention, it is real by the following technical programs
It is existing:
Step 1: constructing compound kernel in GPU;Kernel indicates core;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each complex nucleus
Inside it is able to carry out complicated logic;
Step 3: parallelization handles data between setting channel;
Step 4: one channel of distribution is instructed for the query compiler when GPU receives query compiler instruction, each
Complex nucleus executes a logic step, while by intermediate data storage in the main memory of GPU.
Present invention feature the most prominent and significant beneficial effect are:
Compiled query processing method in a kind of image processor environment according to the present invention, changes query compiler
Query compiler processing is integrated into the DBMS (data base handling system) of image processor acceleration, makes it by intrinsic processing mode
The large-scale parallel of GPU formula image processor can be suitble to execute model.The present invention can by query compiler and GPU style and
Row, and application carries out in different situations.The present invention shows multiple compilers for operating and being merged into single GPU kernel
Reduce bandwidth demand;It can effectively be handled in single channel data, greatly improve search efficiency, shorten
Memory executes the time and kernel executes the time.Compared with once-through operation symbol, the method for the present invention internal storage access amount reduces 7.5 times
Memory executes the time, and the kernel execution time is caused to shorten 9.5 times.
Detailed description of the invention
The mobile schematic diagram of data when Fig. 1 is query compiler;
Fig. 2 is the composition schematic diagram of complex nucleus in the present invention;
Wherein, MEM: main memory;GPU MEM:GPU main memory;SCRATCHPAD MEM/REGISTERS/CACHE: temporarily storage
Device/buffer/cache memory;CORES: kernel;Compound kernel: complex nucleus;Input: input;Probe:
Detection;Count: it counts;Prefix sum: preceding paragraph and (professional term, the intermediate data in calculating);Write: it writes;PCIe
Transfers: bus transfer;GPU Global Memory:GPU global memory;On-Chip Memory: on piece memory, brilliant load
Memory;Select/hash: selection/Hash procedure;Aligned write: it writes back;Project/join probe: there may be
Join operation;Fusion operator: fusion operator.
Specific embodiment
Specific embodiment 1: being illustrated in conjunction with Fig. 1 to present embodiment, at a kind of image that present embodiment provides
The compiled query processing method in device environment is managed, specifically includes the following steps:
Step 1: constructing compound kernel in GPU;Kernel indicates core, contain multiple cores in GPU, and core handles number
According to, including logical operation etc.;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each complex nucleus
Inside be able to carry out complicated logic (such as memory is write back after logical operation), processing nearly all so all in core into
Row, reduces many read-write amounts than conventional method, conventional method core only has unity logic, outputs data to master after logic calculation
It deposits, again reads out data from main memory if it is desired to executing written-back operation, there are many I/O (input/output) quantity, it is easy to have bottle
Neck;
Step 3: setting channel between can parallelization handle data, multiple channels are constructed in GPU in this way, being equivalent to,
Make its parallel data processing;
Step 4: instructing one channel of distribution when GPU receives query compiler instruction for the query compiler, each is multiple
Synkaryon executes a logic step and greatly accelerates the speed of compiled query in this way, compiled query is integrated into GPU, also promoted
Concurrency because there is the nuclear volume far more than CPU in GPU, and calculates power height, it is fast to execute speed;Simultaneously by intermediate data storage
In the main memory of GPU, it is possible to reduce I/O reads and writes quantity, reduces the factor for much limiting its performance, such as bandwidth bottleneck.Such as
The mobile schematic diagram of data when the method for the present invention query compiler shown in Fig. 1, it can be seen that the present invention can greatly reduce compiling
The mobile number of data when inquiry.
Analyze used macro the executions models discovery of various systems past, for evaluation relations inquiry operator, it is existing most
Advanced system will select multiple primitives and execute corresponding kernel sequence on GPU;In order to provide data, macro execution to kernel
Model defines how data transmission will interlock with kernel execution.Compared with legacy system, the data from kernel to kernel are moved for this
It is dynamic to may cause additional bandwidth demand.In order to understand this influence, by having studied existing macro execution model to multilevel bandwidth
Use the influence of (PCIe, GPU global memory etc.), analysis queried the executive condition of Star Schema benchmark (SSB), inquiry be
It is executed on NVIDIA GTX970GPU2 using CoGaDB with scale factor 10.
A kind of straightforward procedure for executing kernel sequence is to transmit all inputs first, executes kernel, and finally transmission is all defeated
Out, intermediate data that data are retained in GPU global storage and insignificant transmission are necessary.However, the disadvantage is that,
Only GPU memory could be used in the case where inputting output and centre.In order on kernel time-triggered protocol coprocessor
Big data, can execute each kernel in data block, and batch processing is to execute multiple kernels by each of PCIe transmission piece.This
The limitation of PCIe bandwidth is alleviated in invention by rearranging the operation of a kernel, and the transmission of intermediate result is shorted to host,
Rather than kernel is run before processing column.Batch processing by by it is previous operation (op1) output be re-used as input (op2) and
It is not to be transferred to host to realize this purpose.As long as intermediate batch processing result can be stored in GPU global memory, this is fitted
With.Data are transmitted in blocks and execute multiple operators in each piece, compared with a kernel, may be implemented scalable
Property simultaneously improves efficiency.The macro execution model of batch processing is handled by GPUDB and Hetero-DB for collaboration.
In order to more effectively using GPU global memory bandwidth, need to optimize using additional microstage using micro- execution model,
And combine them with macro execution model (batch processing), to realize scalability and performance.The optimization of existing microscopic level (such as
Vector processing and query compiler again and again) utilize processor cache on the more efficient equilibrium line of memory bandwidth.In order to reconcile
The explanation expense of Volcano and the materialization expense of operator-ata-time, a vector use batch for being suitble to processor cache
Processing, will use query compiler, it is also necessary to which fine-grained data concurrency is integrated into compiled query on GPU.
Specific embodiment 2: the present embodiment is different from the first embodiment in that, it is each described multiple in step 1
Include calculating logic in conjunction kernel, calculate lead data and write back logic;As shown in Figure 2 it is found that complex nucleus is by many steps
Suddenly it links together sequence to execute, logic is complicated.
Other steps and parameter are same as the specific embodiment one.
Specific embodiment 3: the present embodiment is different from the first and the second embodiment in that, it is every described in step 4
One core executes a logic step specifically:
Compound kernel A executes first logic step (such as the operation for connecting m with two databases of n), will be intermediate
As a result it is stored in the main memory of GPU, then, the kernel B being connected with kernel A performs the next step the operation (next step
Operation can be the intermediate result Zhong Zuo projection operation etc. in previous step), and so on.
Other steps and parameter are the same as one or two specific embodiments.
The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field
Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to
The protection scope of the appended claims of the present invention.
Claims (3)
1. the compiled query processing method in a kind of image processor environment, which is characterized in that specifically includes the following steps:
Step 1: constructing compound kernel in GPU;Kernel indicates core;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each compound intranuclear energy
It is enough to execute complicated logic;
Step 3: parallelization handles data between setting channel;
Step 4: instructing one channel of distribution, each complex nucleus when GPU receives query compiler instruction for the query compiler
A logic step is executed, while by intermediate data storage in the main memory of GPU.
2. the compiled query processing method in a kind of image processor environment according to claim 1, which is characterized in that step
In one in each compound kernel comprising calculating logic, calculate lead data and write back logic.
3. the compiled query processing method in a kind of image processor environment according to claim 1 or claim 2, which is characterized in that
Each core described in step 4 executes a logic step specifically:
Compound kernel A executes first logic step, intermediate result is stored in the main memory of GPU, then, with kernel
A connected kernel B performs the next step operation, and so on.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678918.4A CN110389784A (en) | 2019-07-23 | 2019-07-23 | A kind of compiled query processing method in image processor environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678918.4A CN110389784A (en) | 2019-07-23 | 2019-07-23 | A kind of compiled query processing method in image processor environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110389784A true CN110389784A (en) | 2019-10-29 |
Family
ID=68287530
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910678918.4A Pending CN110389784A (en) | 2019-07-23 | 2019-07-23 | A kind of compiled query processing method in image processor environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110389784A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526934A (en) * | 2009-04-21 | 2009-09-09 | 浪潮电子信息产业股份有限公司 | Construction method of GPU and CPU combined processor |
CN102201008A (en) * | 2011-06-17 | 2011-09-28 | 中国科学院软件研究所 | GPU (graphics processing unit)-based quick star catalogue retrieving method |
WO2012142186A2 (en) * | 2011-04-11 | 2012-10-18 | Child Timothy | Database acceleration using gpu and multicore cpu systems and methods |
CN102938653A (en) * | 2012-11-13 | 2013-02-20 | 航天恒星科技有限公司 | Parallel RS decoding method achieved through graphics processing unit (GPU) |
CN103336959A (en) * | 2013-07-19 | 2013-10-02 | 西安电子科技大学 | Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration |
CN104615576A (en) * | 2015-03-02 | 2015-05-13 | 中国人民解放军国防科学技术大学 | CPU+GPU processor-oriented hybrid granularity consistency maintenance method |
CN105069015A (en) * | 2015-07-13 | 2015-11-18 | 山东超越数控电子有限公司 | Web acceleration technology implementation method of domestic platform |
-
2019
- 2019-07-23 CN CN201910678918.4A patent/CN110389784A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526934A (en) * | 2009-04-21 | 2009-09-09 | 浪潮电子信息产业股份有限公司 | Construction method of GPU and CPU combined processor |
WO2012142186A2 (en) * | 2011-04-11 | 2012-10-18 | Child Timothy | Database acceleration using gpu and multicore cpu systems and methods |
CN102201008A (en) * | 2011-06-17 | 2011-09-28 | 中国科学院软件研究所 | GPU (graphics processing unit)-based quick star catalogue retrieving method |
CN102938653A (en) * | 2012-11-13 | 2013-02-20 | 航天恒星科技有限公司 | Parallel RS decoding method achieved through graphics processing unit (GPU) |
CN103336959A (en) * | 2013-07-19 | 2013-10-02 | 西安电子科技大学 | Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration |
CN104615576A (en) * | 2015-03-02 | 2015-05-13 | 中国人民解放军国防科学技术大学 | CPU+GPU processor-oriented hybrid granularity consistency maintenance method |
CN105069015A (en) * | 2015-07-13 | 2015-11-18 | 山东超越数控电子有限公司 | Web acceleration technology implementation method of domestic platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Funke et al. | Pipelined query processing in coprocessor environments | |
Koliousis et al. | Saber: Window-based hybrid stream processing for heterogeneous architectures | |
Heimel et al. | Hardware-oblivious parallelism for in-memory column-stores | |
Yuan et al. | Spark-GPU: An accelerated in-memory data processing engine on clusters | |
Lee et al. | Ysmart: Yet another sql-to-mapreduce translator | |
US9298768B2 (en) | System and method for the parallel execution of database queries over CPUs and multi core processors | |
Wu et al. | Optimizing data warehousing applications for GPUs using kernel fusion/fission | |
CN103309958B (en) | The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture | |
EP2585950B1 (en) | Apparatus and method for data stream processing using massively parallel processors | |
Furst et al. | Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries | |
US20180150515A1 (en) | Query Planning and Execution With Source and Sink Operators | |
US20100293135A1 (en) | Highconcurrency query operator and method | |
CN102981807B (en) | Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment | |
Rosenfeld et al. | Query processing on heterogeneous CPU/GPU systems | |
CN106446134B (en) | Local multi-query optimization method based on predicate specification and cost estimation | |
US20220342712A1 (en) | Method for Processing Task, Processor, Device and Readable Storage Medium | |
CN104361118A (en) | Mixed OLAP (on-line analytical processing) inquiring treating method adapting coprocessor | |
CN101556534A (en) | Large-scale data parallel computation method with many-core structure | |
CN104731729B (en) | A kind of table connection optimization method, CPU and accelerator based on heterogeneous system | |
Cheng et al. | SCANRAW: A database meta-operator for parallel in-situ processing and loading | |
Yang et al. | Efficient FPGA-based graph processing with hybrid pull-push computational model | |
Kumaigorodski et al. | Fast CSV loading using GPUs and RDMA for in-memory data processing | |
Shehab et al. | Accelerating relational database operations using both CPU and GPU co-processor | |
Breß et al. | Exploring the design space of a GPU-aware database architecture | |
CN110389784A (en) | A kind of compiled query processing method in image processor environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191029 |