CN110389784A - A kind of compiled query processing method in image processor environment - Google Patents

A kind of compiled query processing method in image processor environment Download PDF

Info

Publication number
CN110389784A
CN110389784A CN201910678918.4A CN201910678918A CN110389784A CN 110389784 A CN110389784 A CN 110389784A CN 201910678918 A CN201910678918 A CN 201910678918A CN 110389784 A CN110389784 A CN 110389784A
Authority
CN
China
Prior art keywords
gpu
kernel
image processor
logic
query processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910678918.4A
Other languages
Chinese (zh)
Inventor
赵志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Huituo Investment Center (limited Partnership)
Original Assignee
Harbin Huituo Investment Center (limited Partnership)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Huituo Investment Center (limited Partnership) filed Critical Harbin Huituo Investment Center (limited Partnership)
Priority to CN201910678918.4A priority Critical patent/CN110389784A/en
Publication of CN110389784A publication Critical patent/CN110389784A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30032Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present invention provides the compiled query processing method in a kind of image processor environment, belongs to processor technical field.The present invention constructs compound kernel in GPU first;Then multiple channels are constructed, each channel is made of the complex nucleus constructed in multiple step 1, and complicated logic is able to carry out in each complex nucleus;Parallelization between channel is set again and handles data;When GPU receives query compiler instruction, for one channel of query compiler instruction distribution, each complex nucleus executes a logic step, while by intermediate data storage in the main memory of GPU.The present invention solves the problems, such as that the query processing speed of existing GPU formula image processor is limited by bandwidth memory.The query processing of image processor of the present invention.

Description

A kind of compiled query processing method in image processor environment
Technical field
The present invention relates to a kind of compiled query processing methods, belong to processor technical field.
Background technique
Currently, the query processing of GPU (Graphics Processing Unit graphics processing unit) formula image processor The serious limitation mobile by data.Due within one device with the meter of teraflops (trillion times floating-point operation per second) Handling capacity is calculated, even high bandwidth memory can not also provide enough data rationally to utilize.And query compiler processing is one Kind improves the advanced technology of memory efficient.GPU is commonly used as the powerful accelerator of query processing.Since the arithmetic of coprocessor gulps down The amount of spitting reaches peak ranges, therefore the teledata with enough data is challenged.The reason is that having in high bandwidth The hardware data card deposited has the read and write rate of hundreds of GB/s.Due to various reasons, memory intensive application is still present in Under the mobile cost of data.Conventional method by intermediate data there are in main memory, with when go in main memory to read, then again will New step intermediate data writes back to main memory, and I/O times there are many number, and bandwidth becomes the bottleneck of query processing speed naturally.
Summary of the invention
The present invention is to solve the problems, such as that the query processing speed of existing GPU formula image processor is limited by bandwidth memory, Provide the compiled query processing method in a kind of image processor environment.
Compiled query processing method in a kind of image processor environment of the present invention, it is real by the following technical programs It is existing:
Step 1: constructing compound kernel in GPU;Kernel indicates core;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each complex nucleus Inside it is able to carry out complicated logic;
Step 3: parallelization handles data between setting channel;
Step 4: one channel of distribution is instructed for the query compiler when GPU receives query compiler instruction, each Complex nucleus executes a logic step, while by intermediate data storage in the main memory of GPU.
Present invention feature the most prominent and significant beneficial effect are:
Compiled query processing method in a kind of image processor environment according to the present invention, changes query compiler Query compiler processing is integrated into the DBMS (data base handling system) of image processor acceleration, makes it by intrinsic processing mode The large-scale parallel of GPU formula image processor can be suitble to execute model.The present invention can by query compiler and GPU style and Row, and application carries out in different situations.The present invention shows multiple compilers for operating and being merged into single GPU kernel Reduce bandwidth demand;It can effectively be handled in single channel data, greatly improve search efficiency, shorten Memory executes the time and kernel executes the time.Compared with once-through operation symbol, the method for the present invention internal storage access amount reduces 7.5 times Memory executes the time, and the kernel execution time is caused to shorten 9.5 times.
Detailed description of the invention
The mobile schematic diagram of data when Fig. 1 is query compiler;
Fig. 2 is the composition schematic diagram of complex nucleus in the present invention;
Wherein, MEM: main memory;GPU MEM:GPU main memory;SCRATCHPAD MEM/REGISTERS/CACHE: temporarily storage Device/buffer/cache memory;CORES: kernel;Compound kernel: complex nucleus;Input: input;Probe: Detection;Count: it counts;Prefix sum: preceding paragraph and (professional term, the intermediate data in calculating);Write: it writes;PCIe Transfers: bus transfer;GPU Global Memory:GPU global memory;On-Chip Memory: on piece memory, brilliant load Memory;Select/hash: selection/Hash procedure;Aligned write: it writes back;Project/join probe: there may be Join operation;Fusion operator: fusion operator.
Specific embodiment
Specific embodiment 1: being illustrated in conjunction with Fig. 1 to present embodiment, at a kind of image that present embodiment provides The compiled query processing method in device environment is managed, specifically includes the following steps:
Step 1: constructing compound kernel in GPU;Kernel indicates core, contain multiple cores in GPU, and core handles number According to, including logical operation etc.;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each complex nucleus Inside be able to carry out complicated logic (such as memory is write back after logical operation), processing nearly all so all in core into Row, reduces many read-write amounts than conventional method, conventional method core only has unity logic, outputs data to master after logic calculation It deposits, again reads out data from main memory if it is desired to executing written-back operation, there are many I/O (input/output) quantity, it is easy to have bottle Neck;
Step 3: setting channel between can parallelization handle data, multiple channels are constructed in GPU in this way, being equivalent to, Make its parallel data processing;
Step 4: instructing one channel of distribution when GPU receives query compiler instruction for the query compiler, each is multiple Synkaryon executes a logic step and greatly accelerates the speed of compiled query in this way, compiled query is integrated into GPU, also promoted Concurrency because there is the nuclear volume far more than CPU in GPU, and calculates power height, it is fast to execute speed;Simultaneously by intermediate data storage In the main memory of GPU, it is possible to reduce I/O reads and writes quantity, reduces the factor for much limiting its performance, such as bandwidth bottleneck.Such as The mobile schematic diagram of data when the method for the present invention query compiler shown in Fig. 1, it can be seen that the present invention can greatly reduce compiling The mobile number of data when inquiry.
Analyze used macro the executions models discovery of various systems past, for evaluation relations inquiry operator, it is existing most Advanced system will select multiple primitives and execute corresponding kernel sequence on GPU;In order to provide data, macro execution to kernel Model defines how data transmission will interlock with kernel execution.Compared with legacy system, the data from kernel to kernel are moved for this It is dynamic to may cause additional bandwidth demand.In order to understand this influence, by having studied existing macro execution model to multilevel bandwidth Use the influence of (PCIe, GPU global memory etc.), analysis queried the executive condition of Star Schema benchmark (SSB), inquiry be It is executed on NVIDIA GTX970GPU2 using CoGaDB with scale factor 10.
A kind of straightforward procedure for executing kernel sequence is to transmit all inputs first, executes kernel, and finally transmission is all defeated Out, intermediate data that data are retained in GPU global storage and insignificant transmission are necessary.However, the disadvantage is that, Only GPU memory could be used in the case where inputting output and centre.In order on kernel time-triggered protocol coprocessor Big data, can execute each kernel in data block, and batch processing is to execute multiple kernels by each of PCIe transmission piece.This The limitation of PCIe bandwidth is alleviated in invention by rearranging the operation of a kernel, and the transmission of intermediate result is shorted to host, Rather than kernel is run before processing column.Batch processing by by it is previous operation (op1) output be re-used as input (op2) and It is not to be transferred to host to realize this purpose.As long as intermediate batch processing result can be stored in GPU global memory, this is fitted With.Data are transmitted in blocks and execute multiple operators in each piece, compared with a kernel, may be implemented scalable Property simultaneously improves efficiency.The macro execution model of batch processing is handled by GPUDB and Hetero-DB for collaboration.
In order to more effectively using GPU global memory bandwidth, need to optimize using additional microstage using micro- execution model, And combine them with macro execution model (batch processing), to realize scalability and performance.The optimization of existing microscopic level (such as Vector processing and query compiler again and again) utilize processor cache on the more efficient equilibrium line of memory bandwidth.In order to reconcile The explanation expense of Volcano and the materialization expense of operator-ata-time, a vector use batch for being suitble to processor cache Processing, will use query compiler, it is also necessary to which fine-grained data concurrency is integrated into compiled query on GPU.
Specific embodiment 2: the present embodiment is different from the first embodiment in that, it is each described multiple in step 1 Include calculating logic in conjunction kernel, calculate lead data and write back logic;As shown in Figure 2 it is found that complex nucleus is by many steps Suddenly it links together sequence to execute, logic is complicated.
Other steps and parameter are same as the specific embodiment one.
Specific embodiment 3: the present embodiment is different from the first and the second embodiment in that, it is every described in step 4 One core executes a logic step specifically:
Compound kernel A executes first logic step (such as the operation for connecting m with two databases of n), will be intermediate As a result it is stored in the main memory of GPU, then, the kernel B being connected with kernel A performs the next step the operation (next step Operation can be the intermediate result Zhong Zuo projection operation etc. in previous step), and so on.
Other steps and parameter are the same as one or two specific embodiments.
The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field Technical staff makes various corresponding changes and modifications in accordance with the present invention, but these corresponding changes and modifications all should belong to The protection scope of the appended claims of the present invention.

Claims (3)

1. the compiled query processing method in a kind of image processor environment, which is characterized in that specifically includes the following steps:
Step 1: constructing compound kernel in GPU;Kernel indicates core;
Step 2: constructing multiple channels, each channel is made of the complex nucleus constructed in multiple step 1, each compound intranuclear energy It is enough to execute complicated logic;
Step 3: parallelization handles data between setting channel;
Step 4: instructing one channel of distribution, each complex nucleus when GPU receives query compiler instruction for the query compiler A logic step is executed, while by intermediate data storage in the main memory of GPU.
2. the compiled query processing method in a kind of image processor environment according to claim 1, which is characterized in that step In one in each compound kernel comprising calculating logic, calculate lead data and write back logic.
3. the compiled query processing method in a kind of image processor environment according to claim 1 or claim 2, which is characterized in that Each core described in step 4 executes a logic step specifically:
Compound kernel A executes first logic step, intermediate result is stored in the main memory of GPU, then, with kernel A connected kernel B performs the next step operation, and so on.
CN201910678918.4A 2019-07-23 2019-07-23 A kind of compiled query processing method in image processor environment Pending CN110389784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910678918.4A CN110389784A (en) 2019-07-23 2019-07-23 A kind of compiled query processing method in image processor environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910678918.4A CN110389784A (en) 2019-07-23 2019-07-23 A kind of compiled query processing method in image processor environment

Publications (1)

Publication Number Publication Date
CN110389784A true CN110389784A (en) 2019-10-29

Family

ID=68287530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910678918.4A Pending CN110389784A (en) 2019-07-23 2019-07-23 A kind of compiled query processing method in image processor environment

Country Status (1)

Country Link
CN (1) CN110389784A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
CN102201008A (en) * 2011-06-17 2011-09-28 中国科学院软件研究所 GPU (graphics processing unit)-based quick star catalogue retrieving method
WO2012142186A2 (en) * 2011-04-11 2012-10-18 Child Timothy Database acceleration using gpu and multicore cpu systems and methods
CN102938653A (en) * 2012-11-13 2013-02-20 航天恒星科技有限公司 Parallel RS decoding method achieved through graphics processing unit (GPU)
CN103336959A (en) * 2013-07-19 2013-10-02 西安电子科技大学 Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105069015A (en) * 2015-07-13 2015-11-18 山东超越数控电子有限公司 Web acceleration technology implementation method of domestic platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
WO2012142186A2 (en) * 2011-04-11 2012-10-18 Child Timothy Database acceleration using gpu and multicore cpu systems and methods
CN102201008A (en) * 2011-06-17 2011-09-28 中国科学院软件研究所 GPU (graphics processing unit)-based quick star catalogue retrieving method
CN102938653A (en) * 2012-11-13 2013-02-20 航天恒星科技有限公司 Parallel RS decoding method achieved through graphics processing unit (GPU)
CN103336959A (en) * 2013-07-19 2013-10-02 西安电子科技大学 Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration
CN104615576A (en) * 2015-03-02 2015-05-13 中国人民解放军国防科学技术大学 CPU+GPU processor-oriented hybrid granularity consistency maintenance method
CN105069015A (en) * 2015-07-13 2015-11-18 山东超越数控电子有限公司 Web acceleration technology implementation method of domestic platform

Similar Documents

Publication Publication Date Title
Funke et al. Pipelined query processing in coprocessor environments
Koliousis et al. Saber: Window-based hybrid stream processing for heterogeneous architectures
Heimel et al. Hardware-oblivious parallelism for in-memory column-stores
Yuan et al. Spark-GPU: An accelerated in-memory data processing engine on clusters
Lee et al. Ysmart: Yet another sql-to-mapreduce translator
US9298768B2 (en) System and method for the parallel execution of database queries over CPUs and multi core processors
Wu et al. Optimizing data warehousing applications for GPUs using kernel fusion/fission
CN103309958B (en) The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture
EP2585950B1 (en) Apparatus and method for data stream processing using massively parallel processors
Furst et al. Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries
US20180150515A1 (en) Query Planning and Execution With Source and Sink Operators
US20100293135A1 (en) Highconcurrency query operator and method
CN102981807B (en) Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
Rosenfeld et al. Query processing on heterogeneous CPU/GPU systems
CN106446134B (en) Local multi-query optimization method based on predicate specification and cost estimation
US20220342712A1 (en) Method for Processing Task, Processor, Device and Readable Storage Medium
CN104361118A (en) Mixed OLAP (on-line analytical processing) inquiring treating method adapting coprocessor
CN101556534A (en) Large-scale data parallel computation method with many-core structure
CN104731729B (en) A kind of table connection optimization method, CPU and accelerator based on heterogeneous system
Cheng et al. SCANRAW: A database meta-operator for parallel in-situ processing and loading
Yang et al. Efficient FPGA-based graph processing with hybrid pull-push computational model
Kumaigorodski et al. Fast CSV loading using GPUs and RDMA for in-memory data processing
Shehab et al. Accelerating relational database operations using both CPU and GPU co-processor
Breß et al. Exploring the design space of a GPU-aware database architecture
CN110389784A (en) A kind of compiled query processing method in image processor environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191029