WO2016041126A1 - Method and device for processing data stream based on gpu - Google Patents

Method and device for processing data stream based on gpu Download PDF

Info

Publication number
WO2016041126A1
WO2016041126A1 PCT/CN2014/086523 CN2014086523W WO2016041126A1 WO 2016041126 A1 WO2016041126 A1 WO 2016041126A1 CN 2014086523 W CN2014086523 W CN 2014086523W WO 2016041126 A1 WO2016041126 A1 WO 2016041126A1
Authority
WO
WIPO (PCT)
Prior art keywords
operation operator
operator
subtask
data
group
Prior art date
Application number
PCT/CN2014/086523
Other languages
French (fr)
Chinese (zh)
Inventor
邓利群
朱俊华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/086523 priority Critical patent/WO2016041126A1/en
Priority to CN201480038261.0A priority patent/CN105637482A/en
Publication of WO2016041126A1 publication Critical patent/WO2016041126A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements

Definitions

  • the embodiments of the present invention relate to computer technologies, and in particular, to a data processing method and apparatus based on a graphics processing unit (GPU).
  • GPU graphics processing unit
  • GPUs as coprocessors or accelerators to general-purpose computing fields (such as databases, data compression, etc.)
  • general-purpose computing fields such as databases, data compression, etc.
  • CPU Central Processing Unit
  • the GPU has the advantages of larger concurrent threads and higher memory bandwidth, which is more suitable for large-scale data parallel or computational parallel tasks.
  • the stream processing task has continuity and concurrent tasks, but the computational complexity of a single stream processing task is small. Therefore, the GPU is used to accelerate the data stream processing. When the GPU needs to be scheduled frequently, the GPU scheduling overhead is large.
  • the embodiment of the invention provides a GPU-based data stream processing method and device, which reduces the scheduling overhead of the GPU and improves the throughput of the stream data processing system.
  • a first aspect of the embodiments of the present invention provides a GPU-based data stream processing method, including:
  • the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
  • the scheduling image processor GPU performs data stream processing on the merge task.
  • the method before the combining the first subtask and the at least one second subtask into one merge task, the method further includes:
  • the data stream processed by the first operation operator is a first data stream
  • Combining the first subtask and the at least one second subtask into one merge task including:
  • the kernel includes the same second operation operator as the first operation operator, the first subtask and the at least one second subtask are merged into one merge task.
  • the adding the first operation operator to the kernel file includes:
  • the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group;
  • the operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
  • Sub-joining one of the first candidate operation operator groups includes:
  • the first preset rule is any one of the following rules:
  • the method further includes: if the at least one first candidate operation operator group in the kernel file cannot be When the first operation operator is added, the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule.
  • the first operation operator and the at least one first candidate operation operator according to the second preset rule Each operator in the group is regrouped, including:
  • An operation operator that divides the difference in execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
  • the first subtask and the at least one second sub The tasks are combined into one combined task, including:
  • each operation operator in the first operation operator group is merged into one merge operation logic, and each operation in the first operation operator group is The operation data of the subtask corresponding to the operator is merged in the same data structure;
  • Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.
  • a second aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:
  • a receiving module configured to receive a first subtask, where the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
  • a merging module configured to merge the first subtask and the at least one second subtask into one merge task, where an operation logic of the second subtask is the same as the first operation logic
  • a processing module configured to schedule an image processor GPU to perform data stream processing on the merge task.
  • the merging module is further used
  • the data stream processed by the first operation operator is a first data stream
  • the merge module includes:
  • a determining unit configured to determine whether the second operation operator is the same as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic,
  • the data stream processed by the second operation operator is the first data stream;
  • a first merging unit configured to add the first operation operator to the kernel file if the kernel file does not include a second operation operator identical to the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;
  • a second merging unit configured to merge the first subtask and the at least one second subtask into one merge task if the kernel includes a second operation operator that is the same as the first operation operator.
  • the first merging unit is specifically configured to: if there is at least one first candidate operation operator group in the kernel file, Adding the first operation operator to one of the first candidate operation operator groups; if the first candidate operation operator group is not in the kernel file, creating the first candidate operation operator group, Adding the first operation operator to the first candidate operation operator group; wherein operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
  • the first merging unit is specifically configured to have at least two first candidate operation operator groups in the kernel file, Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule; adding the first operation operator to the first operation operator group.
  • the first preset rule is any one of the following: the first candidate operation operator group that selects the least operator operator is The first operation operator group; the first candidate operation operator group that selects the smallest average data amount corresponding to each operation operator is the first operation operator group.
  • the first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, according to the second preset rule The first operation operator and each of the at least one first candidate operation operator group are regrouped.
  • the first merging unit is specifically configured to calculate each operation operator in the at least one first candidate operation operator group
  • the execution cost of the unit data and the execution cost of the unit data of the first operation operator; the operation operator that calculates the difference of the execution cost of the unit data within a preset range is stored in the same first candidate operation Subgroup.
  • the merging module is specifically configured to: when the first subtask is When triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and the operation of the sub-task corresponding to each operation operator in the first operation operator group is performed. Data is merged in the same data structure; according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group in the same data structure, the first operation operator Metadata information is generated by the number of subtasks corresponding to each operation operator in the group, the number of data records in each of the subtasks, and the length of each data record.
  • a third aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:
  • processors a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
  • the memory is configured to store a computer execution instruction
  • the processor is configured to execute the method in which the computer executes instructions to cause the GPU-based data stream processing apparatus to perform any of the possible implementations of the first aspect.
  • the GPU-based data stream processing method and apparatus provided by the embodiments of the present invention combines multiple subtasks with the same operation logic into one merge task, and invokes the GPU to perform data stream processing on the merge task, thereby reducing the scheduling frequency of the GPU and reducing
  • the scheduling overhead of the GPU improves the throughput of the streaming data processing system.
  • Embodiment 1 is a schematic flowchart of Embodiment 1 of a GPU-based data stream processing method according to the present invention
  • Embodiment 2 is a schematic flowchart of Embodiment 2 of a GPU-based data stream processing method according to the present invention
  • FIG. 3 is a schematic diagram of data combining results of the present invention.
  • Embodiment 4 is a schematic flowchart of Embodiment 3 of a GPU-based data stream processing method according to the present invention.
  • Embodiment 1 of a GPU-based data stream processing apparatus according to the present invention
  • FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention.
  • the present invention combines a plurality of subtasks that can be merged into one merge task, and schedules the GPU to perform data stream processing on the merged task, thereby reducing the frequency of scheduling the GPU and reducing the frequency.
  • the scheduling overhead of the GPU and further, since the amount of data to be processed by the merge task becomes large and is uniformly processed, the large-scale parallel processing performance of the GPU can be fully utilized, thereby improving the processing throughput of the system.
  • first subtask the task currently received by the CPU
  • first operational data the data to be processed included in the first subtask
  • first operational data The operation operator of a subtask
  • first operation logic the operation logic of the first operation operator
  • the operation logic refers to what kind of processing is performed, that is, the attributes of the operation operator include two: 1. the processed data stream, 2. the processing mode; and the operation logic refers to the processing mode, therefore, the different operation operators
  • the operational logic is likely to be the same.
  • the operation operator group in which the first operation operator in the GPU kernel file is located is referred to as the "first operation operator group”, and the operation operator of the kernel file having the same operation logic as the first operation operator is located.
  • the group is referred to as a "first candidate operation operator group", where a kernel file refers to a collection for storing kernel information.
  • a task having the same operational logic as the first operational logic among the tasks to be processed in the system is referred to as a "second subtask.”
  • the operation operator that is the same as the operation logic of the first operation operator, and the processed data and the data processed by the first operation operator belong to the same data stream is referred to as a "second operation operator.”
  • FIG. 1 is a schematic flowchart of a GPU-based data stream processing method according to Embodiment 1 of the present invention.
  • the execution body of the embodiment is a CPU, and the method in this embodiment is as follows:
  • each data stream has a memory buffer, and data from the same data stream is stored in the same memory buffer.
  • the stream processing system sends a task request (ie, the first subtask) to the CPU, requesting the CPU to process the data in the buffer.
  • a task request ie, the first subtask
  • the size of the first preset threshold and the second preset threshold are specifically determined according to actual application settings, and the present invention does not limit this.
  • the first subtask includes first operational data and a first operational operator, and the operational logic of the first operational operator is the first operational logic.
  • the first operation data is data that needs to be processed in the first subtask of the data stream, and the first operation operator performs what operation is performed on the first operation data, such as a selection operation, a mapping operation, and the like.
  • S102 Combine the first subtask and the at least one second subtask into one merge task.
  • the operation logic of the second subtask is the same as the first operation logic.
  • S103 The scheduling GPU performs data stream processing on the foregoing merge task.
  • the GPU is scheduled to perform data stream processing on the merge task, and the GPU is scheduled to be compared to each subtask in the prior art, thereby reducing the scheduling of the GPU.
  • the frequency reduces the scheduling overhead of the GPU, and because the amount of data to be processed by the merge task becomes larger, the GPU's large-scale parallel processing performance can be fully utilized, thereby improving the processing throughput of the system.
  • the method before executing S102, the method further includes: determining a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.
  • the processing of the result-free dependency means that each data record in the first sub-task can be processed in parallel, and the processing result of each data record does not affect the processing result of other data records. For example, a selection operation, a mapping operation, and the like; that is, only if the processing between the pieces of data records of the first operation data of the first sub-task has no result dependency, it can be compared with other sub-operations having the same operational logic.
  • the tasks are merged. If the processing result between the data records of the first operation data in the first sub-task has a dependency, the GPU is directly scheduled to process the first sub-task, and is no longer combined with other sub-tasks.
  • the specific implementation manner of S102 is: determining whether the kernel operation file includes the second operation operator that is the same as the first operation operator.
  • the operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is the first data stream, that is, the operation operator must satisfy the same two conditions: 1.
  • the operation logic is the same ; 2, the processed data belongs to the same data stream. If the GPU kernel (Kernel) file does not include the same second operation operator as the first operation operator, it indicates that the system has not executed the sub-task of the first data stream before, and adds the first operation operator to the kernel file.
  • the first subtask and the at least one second subtask are combined into one merge task. If the kernel file contains the same second operation operator as the first operation operator, it indicates that the system has previously processed the sub-task of the first data stream, and directly performs the process of merging the first sub-task and the at least one second sub-task into A merge task.
  • FIG. 2 is a schematic flowchart of Embodiment 2 of the GPU-based data stream processing method according to the present invention.
  • the operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
  • the determining method for determining whether the first candidate operation operator group can join the first operation operator may be determined according to a specific application; for example, for an application having strict requirements on processing delay, the first need to be considered. Whether the first operation operator in the candidate operation operator group will bring a greater delay cost to the subsequent merge task of the first candidate operation operator group, that is, if the first operation operator is added to a first After a candidate operation operator group, the estimated delay of the merge task based on the candidate operation operator group exceeds the maximum delay requirement of an operation operator in the first candidate operation operator group, and the first candidate The operation operator group cannot join the first operation operator.
  • the first operation operator can be added to the first candidate operation operator group, and then the at least one first candidate operation operator group can be added to the first operation operator. If the first operation operator cannot be added to all the first candidate operation operator groups in the kernel file, it is determined that the at least one first candidate operation operator group cannot join the first operation operator.
  • the first preset rule may be: selecting, from the at least two first candidate operation operator groups, the first candidate operation operator group having the least number of operation operators as the first operation operator group; or, from the above The first candidate operation operator group in which at least two first candidate operation operator groups select the smallest average data amount corresponding to each operation operator is the first operation operator group.
  • S204 Re-grouping each operation operator and the first operation operator in the at least one first candidate operation operator group.
  • the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule.
  • re-grouping each operation operator in the at least one first candidate operation operator group according to the second preset rule includes the following steps: 1) calculating each operation operator in the at least one first candidate operation operator group The execution cost of the unit data and the execution cost of the first operation operator. 2) The operation operators whose difference in the execution cost of the unit data are within the preset range are stored in the same operation operator group.
  • the operation operators that store the difference in the execution cost of the unit data within the preset range are stored in the same first candidate operation operator group, so that the GPU threads are load-balanced as much as possible during execution.
  • S205 Create a first candidate operation operator group, and add the first operation operator to the first candidate operation operator group.
  • the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group, that is, the first operation operator is the first The first operator in the candidate operation operator group.
  • the first subtask and the at least one second subtask are combined into one merge task, which specifically includes: operation logic merge and corresponding merged data to be processed.
  • the operation logics of all the operation operators in the first operation operator group are merged into one merge operation logic, and the operation data to be processed corresponding to each operation operator in the group is merged.
  • the operation data of the subtask corresponding to each operation operator in the first operation operator group is merged into one merge operation logic, and the operation data to be processed corresponding to each operation operator in the group is merged.
  • Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.
  • CUDA Computer Unified Device Architecture
  • the combined selection operator operator general interface Can be defined as follows:
  • the "mergedData” data structure includes the data records to be processed by all the merged operation operators and the corresponding metadata
  • "n” is the total number of all data records in mergedData
  • “result” is still used to save Select the result of the operation
  • filters is an array of functions, which in turn records the filter function operations corresponding to each data stream.
  • the schema of data stream A is defined as follows:
  • the data field "data” stores the data stream data records of each input in the form of a byte stream and its storage in the GPU memory is as shown in FIG. 3 is a schematic diagram of data combining results of the present invention,
  • the number of data records of data stream A is nA
  • the number of data records of data stream B is nB
  • the number of data records of data stream C is nC.
  • the dimensions of the "position”, “count”, and “length” fields are equal to the number of merged data streams, so they take up very little space.
  • the "MergedSelection” generic interface can be implemented in the following ways:
  • the data record to be processed is determined by the thread ID, and the data stream to which the data record to be processed belongs and its corresponding metadata information are determined according to the thread ID, for example, its start in the data stream.
  • the address in order to correctly read the data record, and then call the corresponding filter function for processing.
  • the "MergedSelection" interface (other merge interfaces are similar, such as the "MergedProjection” for the "Projection” operation) can be pre-compiled, passing only specific parameters at runtime, dynamically calling execution. This allows the original multiple subtasks to be merged into a single batch task.
  • FIG. 4 is a schematic flowchart of a method for processing a GPU-based data stream according to a third embodiment of the present invention.
  • FIG. 4 is an example of a selection operation. As shown in FIG. 4, the method in this embodiment includes:
  • S401 Collect input data of the data stream corresponding to each operation operator to be merged and corresponding metadata information.
  • the metadata information is the number of records, the length of the record, and the like.
  • the apparatus in this embodiment includes a receiving module 501, a merging module 502, and a processing module 503, where the receiving module 501 is configured to receive the first subtask.
  • the first subtask includes the first operation data and the first operation operator, and the operation logic of the first operation operator is the first operation logic;
  • the merging module 502 is configured to use the first subtask and the at least one second subtask
  • the merged task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic;
  • the processing module 503 is configured to schedule the image processor GPU to perform data stream processing on the merge task.
  • the device of this embodiment is correspondingly applicable to the technical solution of the method embodiment shown in FIG.
  • the implementation principle and technical effects are similar, and will not be described here.
  • the merging module 502 is further configured to determine a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.
  • the data stream processed by the first operation operator is a first data stream;
  • the merging module 502 further includes: a determining unit, a first merging unit, and a second merging unit, wherein the determining unit is configured to determine Whether the kernel operation file includes the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is The first data stream;
  • the first merging unit is configured to add the first operation operator to the kernel file if the kernel file does not include the second operation operator that is the same as the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;
  • the second merge unit is configured to: if the kernel includes the second operation operator that is the same as the first operation operator, The subtask and at least one second subtask are merged into one merge task.
  • the first merging unit is specifically configured to add the first operation operator to one of the first candidate operation operator groups if at least one first candidate operation operator group is included in the kernel file.
  • the operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic; if the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created. And adding the first operation operator to the first candidate operation operator group.
  • the first merging unit is specifically configured to: if there are at least two first candidate operation operator groups in the kernel file, from the at least two first candidate operation operator groups according to the first preset rule. Selecting a first operation operator group; adding the first operation operator to the first operation operator group.
  • the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the average data amount corresponding to each operation operator is selected.
  • the smallest first candidate operation operator group is the first operation operator group described above.
  • the first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, The first operational operator and each of the at least one first candidate operational operator group are regrouped.
  • the first merging unit is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator.
  • the operation operator that divides the difference in the execution cost of the unit data within the preset range is stored in the same first candidate operation operator group.
  • the merging module 502 is specifically configured to merge the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first subtask is triggered, and The operation data of the subtask corresponding to each operation operator in the first operation operator group is merged in the same data structure; according to the operation data of the subtask corresponding to each operation operator in the first operation operator group, a storage location in the same data structure, a number of subtasks corresponding to each operation operator in the first operation operator group, a number of data records in each of the subtasks, and a length generation element of each data record Data information.
  • the device in this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
  • FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention.
  • the GPU-based data stream processing apparatus 600 of the present embodiment includes: a processor 601, a memory 602, and a system bus 603.
  • the processor 601 and the memory 602 are connected to each other through the system bus and complete communication with each other;
  • the memory 602 is configured to store a computer execution instruction 6021;
  • the processor 601 is configured to run the computer execution instruction 6021 to enable the
  • the GPU-based data stream processing apparatus performs the following methods:
  • the first subtask includes a first operation data and a first operation operator, and the operation logic of the first operation operator is a first operation logic; the first subtask and the at least one second sub The task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic; and the scheduling image processor GPU performs data stream processing on the merge task.
  • the device of this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 1 , and the implementation principle and technical effects thereof are similar, and details are not described herein again.
  • the processor 601 is specifically configured to determine a processing result-independent relationship between the pieces of data records of the first operation data in the first sub-task.
  • the data stream processed by the first operation operator is a first data stream;
  • the processor 601 is specifically configured to determine whether the kernel file includes the second operation that is the same as the first operation operator.
  • An operator wherein the operation logic of the second operation operator is the same as the first operation logic, and the data stream processed by the second operation operator is the first data stream;
  • the kernel file does not include the second operation operator that is the same as the first operation operator, the first operation operator is added to the kernel file, and the first subtask and the at least one second subtask are merged. For a combined task;
  • the kernel includes the second operation operator that is the same as the first operation operator, the first subtask and the at least one second subtask are combined into one merge task.
  • processor 601 is specifically configured to: if the core file has at least one first candidate operation operator group, add the first operation operator to one of the first candidate operation operator groups, and the first candidate operation
  • the operation logic of each operation operator in the operator group is the same as the first operation logic described above;
  • the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group.
  • the processor 601 is specifically configured to select a first operation from the at least two first candidate operation operator groups according to the first preset rule. Operator group; adding the above first operation operator to the first operation operator group.
  • the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the minimum average data amount corresponding to each operation operator is selected.
  • a candidate operation operator group is the first operation operator group described above.
  • the processor 601 is specifically configured to: use the first operation operator according to the second preset rule, and Each of the at least one first candidate operation operator group is regrouped.
  • the processor 601 is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator;
  • the operation operators whose execution cost difference is within the preset range are stored in the same first candidate operation operator group as described above.
  • the processor 601 specifically combines the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first sub-task is triggered, and The operation data of the subtasks corresponding to the operation operators in the first operation operator group are combined in the same data structure; according to the operation data of the subtasks corresponding to the operation operators in the first operation operator group, The storage location in the same data structure, the number of subtasks corresponding to each operation operator in the first operation operator group, the number of data records in each of the subtasks, and the length of each data record are generated. Metadata information.
  • the device in this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
  • the embodiment of the present invention further provides a computer readable medium, including computer executed instructions, where the GPU-based data stream processing apparatus performs the GPU-based data stream processing method according to the first embodiment to the third embodiment of the present invention. method.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

A method and device for processing data stream based on GPU combine multiple sub tasks with the same operation logic into a merge task, and call GPU to process the data stream for the merge task, so as to reduce the scheduling frequency and the scheduling overhead of GPU and improve the throughput of the stream data processing system.

Description

基于GPU的数据流处理方法和装置GPU-based data stream processing method and device 技术领域Technical field
本发明实施例涉及计算机技术,尤其涉及一种基于图形处理器(Graphics Processing Unit,以下简称:GPU)的数据流处理方法和装置。The embodiments of the present invention relate to computer technologies, and in particular, to a data processing method and apparatus based on a graphics processing unit (GPU).
背景技术Background technique
目前,将GPU作为协处理器或加速器应用于通用计算领域(如数据库,数据压缩等)已成为业界的一大趋势。与中央处理器(Central Processing Unit,以下简称:CPU)相比,GPU具有更大规模的并发线程以及更高内存带宽等优势,更适合大规模的数据并行或计算并行型任务。At present, the application of GPUs as coprocessors or accelerators to general-purpose computing fields (such as databases, data compression, etc.) has become a major trend in the industry. Compared with the Central Processing Unit (CPU), the GPU has the advantages of larger concurrent threads and higher memory bandwidth, which is more suitable for large-scale data parallel or computational parallel tasks.
然而,针对数据流数目多且数据生成频率大的应用场景,流处理任务具有连续性且并发任务多,但单个流处理任务的计算量较小的特点,因此,采用GPU对数据流处理进行加速时,需要频繁调度GPU,造成GPU调度开销大。However, for an application scenario with a large number of data streams and a large data generation frequency, the stream processing task has continuity and concurrent tasks, but the computational complexity of a single stream processing task is small. Therefore, the GPU is used to accelerate the data stream processing. When the GPU needs to be scheduled frequently, the GPU scheduling overhead is large.
发明内容Summary of the invention
本发明实施例提供一种基于GPU的数据流处理方法和装置,以减少GPU的调度开销,提高流数据处理系统吞吐量。The embodiment of the invention provides a GPU-based data stream processing method and device, which reduces the scheduling overhead of the GPU and improves the throughput of the stream data processing system.
本发明实施例第一方面提供一种基于GPU的数据流处理方法,包括:A first aspect of the embodiments of the present invention provides a GPU-based data stream processing method, including:
接收第一子任务,所述第一子任务包含第一操作数据和第一操作算子,所述第一操作算子的操作逻辑为第一操作逻辑;Receiving a first subtask, the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
将所述第一子任务和至少一个第二子任务合并为一个合并任务,其中,所述第二子任务的操作逻辑与所述第一操作逻辑相同;Merging the first subtask and the at least one second subtask into one merge task, wherein the operation logic of the second subtask is the same as the first operation logic;
调度图像处理器GPU对所述合并任务进行数据流处理。The scheduling image processor GPU performs data stream processing on the merge task.
结合第一方面,在第一种可能的实现方式中,所述将所述第一子任务和至少一个第二子任务合并为一个合并任务之前,还包括:With reference to the first aspect, in a first possible implementation, before the combining the first subtask and the at least one second subtask into one merge task, the method further includes:
确定所述第一子任务中所述第一操作数据的各条数据记录之间的处理无结果依赖关系。 Determining a process-independent dependency between the pieces of data records of the first operational data in the first sub-task.
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述第一操作算子处理的数据流为第一数据流;With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the data stream processed by the first operation operator is a first data stream;
所述将所述第一子任务和至少一个第二子任务合并为一个合并任务,包括:Combining the first subtask and the at least one second subtask into one merge task, including:
判断内核档案中是否包含与所述第一操作算子相同的第二操作算子,其中,所述第二操作算子的操作逻辑与所述第一操作逻辑相同,所述第二操作算子处理的数据流为所述第一数据流;Determining whether the kernel file contains the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the second operation operator Processing the data stream as the first data stream;
若所述内核档案中不包含与所述第一操作算子相同的第二操作算子则将所述第一操作算子加入到所述内核档案中,将所述第一子任务和至少一个第二子任务合并为一个合并任务;Adding the first operation operator to the kernel file if the kernel file does not include the same second operation operator as the first operation operator, and the first subtask and at least one The second subtask is merged into one merge task;
若所述内核当中包含与所述第一操作算子相同的第二操作算子,则将所述第一子任务和至少一个第二子任务合并为一个合并任务。If the kernel includes the same second operation operator as the first operation operator, the first subtask and the at least one second subtask are merged into one merge task.
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述将所述第一操作算子加入到所述内核档案中,包括:In conjunction with the second possible implementation of the first aspect, in a third possible implementation, the adding the first operation operator to the kernel file includes:
若所述内核档案中有至少一个第一候选操作算子组,则将所述第一操作算子加入其中一个所述第一候选操作算子组;If the kernel file has at least one first candidate operation operator group, adding the first operation operator to one of the first candidate operation operator groups;
若所述内核档案中没有所述第一候选操作算子组,则创建所述第一候选操作算子组,将所述第一操作算子加入所述第一候选操作算子组;If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group;
所述第一候选操作算子组中的各操作算子的操作逻辑与所述第一操作逻辑相同。The operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
结合第一方面的第三种可能的实现方式,在第四种可能的实现方式中,若所述内核档案中有至少两个第一候选操作算子组,所述将所述第一操作算子加入其中一个所述第一候选操作算子组,包括:With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation, if the kernel file has at least two first candidate operation operator groups, the first operation is calculated Sub-joining one of the first candidate operation operator groups includes:
根据第一预设规则从所述至少两个第一候选操作算子组中选择一个第一操作算子组;Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule;
将所述第一操作算子加入所述第一操作算子组。Adding the first operation operator to the first operation operator group.
结合第一方面的第四种可能的实现方式,在第五种可能的实现方式中,所述第一预设规则为如下任一个规则:In conjunction with the fourth possible implementation of the first aspect, in a fifth possible implementation, the first preset rule is any one of the following rules:
选择操作算子最少的第一候选操作算子组为所述第一操作算子组;Selecting a first candidate operation operator group with the least operation operator as the first operation operator group;
选择各操作算子对应的平均数据量最小的第一候选操作算子组为所 述第一操作算子组。Selecting the first candidate operation operator group with the smallest average data amount corresponding to each operation operator as the The first operational operator group is described.
结合第一方面的第三种可能的实现方式,在第六种可能的实现方式中,所述方法还包括:若所述内核档案中的所述至少一个第一候选操作算子组中均无法加入所述第一操作算子,则根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组。In conjunction with the third possible implementation of the first aspect, in a sixth possible implementation, the method further includes: if the at least one first candidate operation operator group in the kernel file cannot be When the first operation operator is added, the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule.
结合第一方面的第六种可能的实现方式,在第七种可能的实现方式中,所述根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组,包括:With reference to the sixth possible implementation manner of the first aspect, in a seventh possible implementation, the first operation operator and the at least one first candidate operation operator according to the second preset rule Each operator in the group is regrouped, including:
计算所述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及所述第一操作算子的单位数据的执行代价;Calculating an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator;
将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个所述第一候选操作算子组。An operation operator that divides the difference in execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
结合第一方面的第四种至第七种可能的实现方式中任一种可能的实现方式,在第八种可能的实现方式中,所述将所述第一子任务和至少一个第二子任务合并为一个合并任务,包括:With reference to any one of the possible implementations of the fourth to seventh possible implementations of the first aspect, in an eighth possible implementation, the first subtask and the at least one second sub The tasks are combined into one combined task, including:
当所述第一子任务被触发时,将所述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将所述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;When the first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and each operation in the first operation operator group is The operation data of the subtask corresponding to the operator is merged in the same data structure;
根据所述第一操作算子组中的各操作算子对应的子任务的操作数据在所述同一个数据结构中的存储位置、所述第一操作算子组中的各操作算子对应的子任务的个数、每个所述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation operator in the first operation operator group Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.
本发明实施例第二方面提供一种基于GPU的数据流处理装置,包括:A second aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:
接收模块,用于接收第一子任务,所述第一子任务包含第一操作数据和第一操作算子,所述第一操作算子的操作逻辑为第一操作逻辑;a receiving module, configured to receive a first subtask, where the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
合并模块,用于将所述第一子任务和至少一个第二子任务合并为一个合并任务,其中,所述第二子任务的操作逻辑与所述第一操作逻辑相同;a merging module, configured to merge the first subtask and the at least one second subtask into one merge task, where an operation logic of the second subtask is the same as the first operation logic;
处理模块,用于调度图像处理器GPU对所述合并任务进行数据流处理。And a processing module, configured to schedule an image processor GPU to perform data stream processing on the merge task.
结合第二方面,在第一种可能的实现方式中,所述合并模块还用于 In conjunction with the second aspect, in a first possible implementation, the merging module is further used
确定所述第一子任务中所述第一操作数据的各条数据记录之间的处理无结果依赖关系。Determining a process-independent dependency between the pieces of data records of the first operational data in the first sub-task.
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述第一操作算子处理的数据流为第一数据流;With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner, the data stream processed by the first operation operator is a first data stream;
所述合并模块包括:The merge module includes:
判断单元,用于判断内核档案中是否包含与所述第一操作算子相同的第二操作算子,其中,所述第二操作算子的操作逻辑与所述第一操作逻辑相同,所述第二操作算子处理的数据流为所述第一数据流;a determining unit, configured to determine whether the second operation operator is the same as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, The data stream processed by the second operation operator is the first data stream;
第一合并单元,用于若所述内核档案中不包含与所述第一操作算子相同的第二操作算子则将所述第一操作算子加入到所述内核档案中,将所述第一子任务和至少一个第二子任务合并为一个合并任务;a first merging unit, configured to add the first operation operator to the kernel file if the kernel file does not include a second operation operator identical to the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;
第二合并单元,用于若所述内核当中包含与所述第一操作算子相同的第二操作算子,则将所述第一子任务和至少一个第二子任务合并为一个合并任务。And a second merging unit, configured to merge the first subtask and the at least one second subtask into one merge task if the kernel includes a second operation operator that is the same as the first operation operator.
结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,所述第一合并单元具体用于若所述内核档案中有至少一个第一候选操作算子组,则将所述第一操作算子加入其中一个所述第一候选操作算子组;若所述内核档案中没有所述第一候选操作算子组,则创建所述第一候选操作算子组,将所述第一操作算子加入所述第一候选操作算子组;其中,所述第一候选操作算子组中的各操作算子的操作逻辑与所述第一操作逻辑相同。With reference to the second possible implementation of the second aspect, in a third possible implementation, the first merging unit is specifically configured to: if there is at least one first candidate operation operator group in the kernel file, Adding the first operation operator to one of the first candidate operation operator groups; if the first candidate operation operator group is not in the kernel file, creating the first candidate operation operator group, Adding the first operation operator to the first candidate operation operator group; wherein operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
结合第二方面的第三种可能的实现方式,在第四种可能的实现方式中,所述第一合并单元具体用若所述内核档案中有至少两个第一候选操作算子组,于根据第一预设规则从所述至少两个第一候选操作算子组中选择一个第一操作算子组;将所述第一操作算子加入所述第一操作算子组。In conjunction with the third possible implementation of the second aspect, in a fourth possible implementation, the first merging unit is specifically configured to have at least two first candidate operation operator groups in the kernel file, Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule; adding the first operation operator to the first operation operator group.
结合第二方面的第四种可能的实现方式,在第五种可能的实现方式中,所述第一预设规则为如下任一个规则:选择操作算子最少的第一候选操作算子组为所述第一操作算子组;选择各操作算子对应的平均数据量最小的第一候选操作算子组为所述第一操作算子组。With reference to the fourth possible implementation manner of the second aspect, in a fifth possible implementation, the first preset rule is any one of the following: the first candidate operation operator group that selects the least operator operator is The first operation operator group; the first candidate operation operator group that selects the smallest average data amount corresponding to each operation operator is the first operation operator group.
结合第二方面的第三种可能的实现方式,在第六种可能的实现方式 中,所述第一合并单元还用于若所述内核档案中的所述至少一个第一候选操作算子组中均无法加入所述第一操作算子,则根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组。In conjunction with the third possible implementation of the second aspect, in a sixth possible implementation The first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, according to the second preset rule The first operation operator and each of the at least one first candidate operation operator group are regrouped.
结合第二方面的第六种可能的实现方式,在第七种可能的实现方式中,所述第一合并单元具体用于计算所述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及所述第一操作算子的单位数据的执行代价;将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个所述第一候选操作算子组。With reference to the sixth possible implementation of the second aspect, in a seventh possible implementation, the first merging unit is specifically configured to calculate each operation operator in the at least one first candidate operation operator group The execution cost of the unit data and the execution cost of the unit data of the first operation operator; the operation operator that calculates the difference of the execution cost of the unit data within a preset range is stored in the same first candidate operation Subgroup.
结合第二方面的第四种至第七种可能的实现方式中任一种可能的实现方式,在第八种可能的实现方式中,所述合并模块具体用于当所述第一子任务被触发时,将所述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将所述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;根据所述第一操作算子组中的各操作算子对应的子任务的操作数据在所述同一个数据结构中的存储位置、所述第一操作算子组中的各操作算子对应的子任务的个数、每个所述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。With reference to any one of the possible implementations of the fourth to seventh possible implementations of the second aspect, in an eighth possible implementation, the merging module is specifically configured to: when the first subtask is When triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and the operation of the sub-task corresponding to each operation operator in the first operation operator group is performed. Data is merged in the same data structure; according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group in the same data structure, the first operation operator Metadata information is generated by the number of subtasks corresponding to each operation operator in the group, the number of data records in each of the subtasks, and the length of each data record.
本发明实施例第三方面提供一种基于GPU的数据流处理装置,包括:A third aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:
处理器、存储器和系统总线,所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
所述存储器,用于存储计算机执行指令;The memory is configured to store a computer execution instruction;
所述处理器,用于运行所述计算机执行指令,使所述基于GPU的数据流处理装置执行如第一方面的任一种可能的实现方式的方法。The processor is configured to execute the method in which the computer executes instructions to cause the GPU-based data stream processing apparatus to perform any of the possible implementations of the first aspect.
本发明实施例提供的基于GPU的数据流处理方法和装置,通过将操作逻辑相同的多个子任务合并成一个合并任务,并调用GPU对合并任务进行数据流处理,从而降低GPU的调度频率,减少GPU的调度开销,提高流数据处理系统的吞吐量。The GPU-based data stream processing method and apparatus provided by the embodiments of the present invention combines multiple subtasks with the same operation logic into one merge task, and invokes the GPU to perform data stream processing on the merge task, thereby reducing the scheduling frequency of the GPU and reducing The scheduling overhead of the GPU improves the throughput of the streaming data processing system.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will be true. The drawings used in the examples or the description of the prior art are briefly introduced. It is obvious that the drawings in the following description are only some embodiments of the present invention, and no one is creative to those skilled in the art. Other drawings can also be obtained from these drawings on the premise of labor.
图1为本发明基于GPU的数据流处理方法实施例一的流程示意图;1 is a schematic flowchart of Embodiment 1 of a GPU-based data stream processing method according to the present invention;
图2为本发明基于GPU的数据流处理方法实施例二的流程示意图;2 is a schematic flowchart of Embodiment 2 of a GPU-based data stream processing method according to the present invention;
图3为本发明数据合并结果示意图;3 is a schematic diagram of data combining results of the present invention;
图4为本发明基于GPU的数据流处理方法实施例三的流程示意图;4 is a schematic flowchart of Embodiment 3 of a GPU-based data stream processing method according to the present invention;
图5为本发明基于GPU的数据流处理装置实施例一的结构示意图;5 is a schematic structural diagram of Embodiment 1 of a GPU-based data stream processing apparatus according to the present invention;
图6为本发明基于GPU的数据流处理装置实施例二的结构示意图。FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the specification and claims of the present invention and the above figures are used to distinguish similar objects without being used for Describe a specific order or order. It is to be understood that the data so used may be interchanged as appropriate, such that the embodiments of the invention described herein can be implemented, for example, in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
为了减少GPU的调度开销及提高流处理系统吞吐量,本发明通过将可以合并的多个子任务合并为一个合并任务,调度GPU对合并任务进行数据流处理,从而,降低了调度GPU的频率,减少了GPU的调度开销,且进一步地,由于合并任务待处理的数据量变大并被统一处理,从而能够充分利用GPU大规模的并行处理性能,进而提高该系统的处理吞吐量。 In order to reduce the scheduling overhead of the GPU and improve the throughput of the stream processing system, the present invention combines a plurality of subtasks that can be merged into one merge task, and schedules the GPU to perform data stream processing on the merged task, thereby reducing the frequency of scheduling the GPU and reducing the frequency. The scheduling overhead of the GPU, and further, since the amount of data to be processed by the merge task becomes large and is uniformly processed, the large-scale parallel processing performance of the GPU can be fully utilized, thereby improving the processing throughput of the system.
下面以具体地实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the present invention will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in some embodiments.
为了便于描述,本发明下述各实施例中,将CPU当前接收的任务称为“第一子任务”,将第一子任务中包含的待处理数据称为“第一操作数据”,将第一子任务的操作算子称为“第一操作算子”,将第一操作算子的操作逻辑称为“第一操作逻辑”,其中,操作算子是指对某数据流进行何种处理,操作逻辑是指进行何种处理,也就是,操作算子的属性包括两个:1、处理的数据流,2、处理方式;而操作逻辑是指处理方式,因此,不同的操作算子的操作逻辑有可能相同。将GPU内核档案(Kernel Profile)中第一操作算子所在的操作算子组称为“第一操作算子组”,将内核档案中与第一操作算子操作逻辑相同的操作算子所在的组称为“第一候选操作算子组”,其中,内核档案是指用于存储内核信息的集合。将系统中待处理的任务中具有操作逻辑与第一操作逻辑相同的任务称为“第二子任务”。将与第一操作算子的操作逻辑相同,并且处理的数据与第一操作算子处理的数据属于同一个数据流的操作算子称为“第二操作算子”。For convenience of description, in the following embodiments of the present invention, the task currently received by the CPU is referred to as a “first subtask”, and the data to be processed included in the first subtask is referred to as “first operational data”. The operation operator of a subtask is called "first operation operator", and the operation logic of the first operation operator is called "first operation logic", wherein the operation operator refers to what kind of processing is performed on a certain data stream. The operation logic refers to what kind of processing is performed, that is, the attributes of the operation operator include two: 1. the processed data stream, 2. the processing mode; and the operation logic refers to the processing mode, therefore, the different operation operators The operational logic is likely to be the same. The operation operator group in which the first operation operator in the GPU kernel file is located is referred to as the "first operation operator group", and the operation operator of the kernel file having the same operation logic as the first operation operator is located. The group is referred to as a "first candidate operation operator group", where a kernel file refers to a collection for storing kernel information. A task having the same operational logic as the first operational logic among the tasks to be processed in the system is referred to as a "second subtask." The operation operator that is the same as the operation logic of the first operation operator, and the processed data and the data processed by the first operation operator belong to the same data stream is referred to as a "second operation operator."
图1为本发明基于GPU的数据流处理方法实施例一的流程示意图,如图1所示,本实施例的执行主体为CPU,本实施例的方法如下:FIG. 1 is a schematic flowchart of a GPU-based data stream processing method according to Embodiment 1 of the present invention. As shown in FIG. 1 , the execution body of the embodiment is a CPU, and the method in this embodiment is as follows:
S101:接收第一子任务。S101: Receive the first subtask.
在系统的内存中,每个数据流均建有一个内存缓冲区,来自同一个数据流的数据存储在同一个内存缓冲区内,当一个内存缓冲区的数据量大于第一预设阈值时,或者,缓冲时间大于第二预设阈值时,流处理系统则向CPU发送任务请求(即:第一子任务),请求CPU对该缓冲区中的数据进行处理。其中,第一预设阈值、第二预设阈值的大小具体依据实际应用设置,本发明对此不作限制。In the memory of the system, each data stream has a memory buffer, and data from the same data stream is stored in the same memory buffer. When the data volume of a memory buffer is greater than the first preset threshold, Alternatively, when the buffering time is greater than the second preset threshold, the stream processing system sends a task request (ie, the first subtask) to the CPU, requesting the CPU to process the data in the buffer. The size of the first preset threshold and the second preset threshold are specifically determined according to actual application settings, and the present invention does not limit this.
第一子任务包含第一操作数据和第一操作算子,第一操作算子的操作逻辑为第一操作逻辑。第一操作数据即该数据流第一子任务中需要处理的数据,第一操作算子即对第一操作数据进行何种操作,如选择(Selection)操作,映射(Projection)操作等。The first subtask includes first operational data and a first operational operator, and the operational logic of the first operational operator is the first operational logic. The first operation data is data that needs to be processed in the first subtask of the data stream, and the first operation operator performs what operation is performed on the first operation data, such as a selection operation, a mapping operation, and the like.
S102:将第一子任务和至少一个第二子任务合并为一个合并任务。 S102: Combine the first subtask and the at least one second subtask into one merge task.
其中,第二子任务的操作逻辑与第一操作逻辑相同。The operation logic of the second subtask is the same as the first operation logic.
也就是,将操作逻辑相同的子任务合并为一个合并任务。That is, the subtasks with the same operational logic are merged into one merge task.
S103:调度GPU对上述合并任务进行数据流处理。S103: The scheduling GPU performs data stream processing on the foregoing merge task.
本实施例中,由于将操作逻辑相同的多个子任务合并为一个合并任务,调度GPU对合并任务进行数据流处理,相比于现有技术中的每个子任务调度一次GPU,降低了调度GPU的频率,减少了GPU的调度开销,并且,由于合并任务待处理的数据量变大,从而能够充分利用GPU大规模的并行处理性能,进而提高该系统的处理吞吐量。In this embodiment, since a plurality of subtasks having the same operation logic are merged into one merge task, the GPU is scheduled to perform data stream processing on the merge task, and the GPU is scheduled to be compared to each subtask in the prior art, thereby reducing the scheduling of the GPU. The frequency reduces the scheduling overhead of the GPU, and because the amount of data to be processed by the merge task becomes larger, the GPU's large-scale parallel processing performance can be fully utilized, thereby improving the processing throughput of the system.
在上述实施例中,在执行S102之前,还包括:确定第一子任务中第一操作数据的各条数据记录之间的处理无结果依赖关系。其中,处理无结果依赖是指:第一子任务中的各数据记录可以并行进行处理,每个数据记录的处理结果不会影响其他数据记录的处理结果。例如:选择操作、映射操作等;也就是,只有在第一子任务的第一操作数据的各条数据记录之间的处理无结果依赖关系的前提下,才可以与其他具有相同操作逻辑的子任务进行合并处理。如果第一子任务中第一操作数据的各条数据记录之间的处理结果有依赖,则直接调度GPU对第一子任务进行处理,而不再与其他的子任务进行合并处理。In the above embodiment, before executing S102, the method further includes: determining a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task. The processing of the result-free dependency means that each data record in the first sub-task can be processed in parallel, and the processing result of each data record does not affect the processing result of other data records. For example, a selection operation, a mapping operation, and the like; that is, only if the processing between the pieces of data records of the first operation data of the first sub-task has no result dependency, it can be compared with other sub-operations having the same operational logic. The tasks are merged. If the processing result between the data records of the first operation data in the first sub-task has a dependency, the GPU is directly scheduled to process the first sub-task, and is no longer combined with other sub-tasks.
在上述实施例中,假设,第一操作算子处理的数据流为第一数据流,S102的具体实现方式为:判断内核档案中是否包含与第一操作算子相同的第二操作算子,其中,第二操作算子的操作逻辑与第一操作逻辑相同,第二操作算子处理的数据流为第一数据流,也就是,操作算子相同必须满足两个条件:1、操作逻辑相同;2、处理的数据属于同一个数据流。若GPU内核(Kernel)档案中不包含与上述第一操作算子相同的第二操作算子,则说明系统之前尚未执行过第一数据流的子任务,将第一操作算子加入到内核档案中,将第一子任务和至少一个第二子任务合并为一个合并任务。如果内核档案中包含与第一操作算子相同的第二操作算子,则说明系统之前处理过第一数据流的子任务,则直接执行将第一子任务和至少一个第二子任务合并为一个合并任务。In the above embodiment, it is assumed that the data stream processed by the first operation operator is the first data stream, and the specific implementation manner of S102 is: determining whether the kernel operation file includes the second operation operator that is the same as the first operation operator. The operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is the first data stream, that is, the operation operator must satisfy the same two conditions: 1. The operation logic is the same ; 2, the processed data belongs to the same data stream. If the GPU kernel (Kernel) file does not include the same second operation operator as the first operation operator, it indicates that the system has not executed the sub-task of the first data stream before, and adds the first operation operator to the kernel file. The first subtask and the at least one second subtask are combined into one merge task. If the kernel file contains the same second operation operator as the first operation operator, it indicates that the system has previously processed the sub-task of the first data stream, and directly performs the process of merging the first sub-task and the at least one second sub-task into A merge task.
进一步地,将第一操作算子加入到内核档案中具体地实现方式如图2所示,图2为本发明基于GPU的数据流处理方法实施例二的流程示意图。 Further, the first operation operator is added to the kernel file. The specific implementation is as shown in FIG. 2. FIG. 2 is a schematic flowchart of Embodiment 2 of the GPU-based data stream processing method according to the present invention.
S201:判断内核档案中是否有至少一个第一候选操作算子组。若是,执行S202,若否,执行S205。S201: Determine whether there is at least one first candidate operation operator group in the kernel file. If yes, execute S202, if no, execute S205.
其中,第一候选操作算子组中的各操作算子的操作逻辑与第一操作逻辑相同。The operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
S202:判断至少一个第一候选操作算子组是否可以加入第一操作算子。若是,执行S203,若否,执行S204。S202: Determine whether at least one first candidate operation operator group can join the first operation operator. If yes, execute S203, and if no, execute S204.
具体地,判断第一候选操作算子组是否可以加入第一操作算子的判断方法可根据具体应用而定;例如:对于在处理延时上有严格要求的应用来说,需要考虑在第一候选操作算子组中加入第一操作算子后是否会给该第一候选操作算子组的后续合并任务带来更大的时延代价,即,若将第一操作算子加入到一个第一候选操作算子组后,基于该候选操作算子组的合并任务的预估时延超过该第一候选操作算子组中某个操作算子的最大时延要求,则该第一候选该操作算子组不可以加入第一操作算子。若否,则认为该第一候选操作算子组中可以加入第一操作算子,则确定至少一个第一候选操作算子组可以加入第一操作算子。若内核档案中的所有的第一候选操作算子组中均无法加入第一操作算子,则确定至少一个第一候选操作算子组不可以加入第一操作算子。Specifically, the determining method for determining whether the first candidate operation operator group can join the first operation operator may be determined according to a specific application; for example, for an application having strict requirements on processing delay, the first need to be considered. Whether the first operation operator in the candidate operation operator group will bring a greater delay cost to the subsequent merge task of the first candidate operation operator group, that is, if the first operation operator is added to a first After a candidate operation operator group, the estimated delay of the merge task based on the candidate operation operator group exceeds the maximum delay requirement of an operation operator in the first candidate operation operator group, and the first candidate The operation operator group cannot join the first operation operator. If no, the first operation operator can be added to the first candidate operation operator group, and then the at least one first candidate operation operator group can be added to the first operation operator. If the first operation operator cannot be added to all the first candidate operation operator groups in the kernel file, it is determined that the at least one first candidate operation operator group cannot join the first operation operator.
S203:将第一操作算子加入到第一操作算子组中。S203: Add the first operation operator to the first operation operator group.
当只有一个第一候选操作算子组可以加入第一操作算子时,则确定第一候选操作算子组为第一操作算子组,并将第一操作算子加入该第一操作算子组;当有至少两个第一候选操作算子组可以加入第一操作算子时,则根据第一预设规则从上述至少两个第一候选操作算子组中选择一个作为第一操作算子组,将第一操作算子加入第一操作算子组。其中,第一预设规则可以是:从上述至少两个第一候选操作算子组中选择操作算子数目最少的那个第一候选操作算子组为第一操作算子组;或者,从上述至少两个第一候选操作算子组中选择各操作算子对应的平均数据量最小的那个第一候选操作算子组为第一操作算子组。When only one first candidate operation operator group can join the first operation operator, determining that the first candidate operation operator group is the first operation operator group, and adding the first operation operator to the first operation operator a group; when at least two first candidate operation operator groups can join the first operation operator, selecting one of the at least two first candidate operation operator groups as the first operation calculation according to the first preset rule Subgroup, the first operation operator is added to the first operation operator group. The first preset rule may be: selecting, from the at least two first candidate operation operator groups, the first candidate operation operator group having the least number of operation operators as the first operation operator group; or, from the above The first candidate operation operator group in which at least two first candidate operation operator groups select the smallest average data amount corresponding to each operation operator is the first operation operator group.
S204:对至少一个第一候选操作算子组中的各操作算子和第一操作算子重新分组。S204: Re-grouping each operation operator and the first operation operator in the at least one first candidate operation operator group.
若内核档案中的至少一个第一候选操作算子组中均无法加入第一操 作算子,则根据第二预设规则对第一操作算子以及上述至少一个第一候选操作算子组中的各操作算子重新分组。具体地,根据第二预设规则对至少一个第一候选操作算子组中的各操作算子重新分组,包括如下步骤:1)计算至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及第一操作算子的执行代价。2)将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个操作算子组中。If at least one of the first candidate operation operator groups in the kernel file cannot join the first operation As an operator, the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule. Specifically, re-grouping each operation operator in the at least one first candidate operation operator group according to the second preset rule includes the following steps: 1) calculating each operation operator in the at least one first candidate operation operator group The execution cost of the unit data and the execution cost of the first operation operator. 2) The operation operators whose difference in the execution cost of the unit data are within the preset range are stored in the same operation operator group.
通过将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个第一候选操作算子组中,使得GPU各线程在执行时尽量负载均衡。The operation operators that store the difference in the execution cost of the unit data within the preset range are stored in the same first candidate operation operator group, so that the GPU threads are load-balanced as much as possible during execution.
S205:创建第一候选操作算子组,将第一操作算子加入第一候选操作算子组。S205: Create a first candidate operation operator group, and add the first operation operator to the first candidate operation operator group.
若内核档案中没有上述第一候选操作算子组,则创建第一候选操作算子组,将第一操作算子加入第一候选操作算子组,也就是,第一操作算子是第一候选操作算子组中的第一个操作算子。If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group, that is, the first operation operator is the first The first operator in the candidate operation operator group.
在图1或图2所示实施例中,将第一子任务和至少一个第二子任务合并为一个合并任务,具体包括:操作逻辑合并和相应的待处理数据的合并两部分。In the embodiment shown in FIG. 1 or FIG. 2, the first subtask and the at least one second subtask are combined into one merge task, which specifically includes: operation logic merge and corresponding merged data to be processed.
当第一子任务被触发后,将第一操作算子组中的所有操作算子的操作逻辑合并为一个合并操作逻辑,并将该组中的各操作算子对应的待处理的操作数据合并在同一个数据结构中。根据所述第一操作算子组中的各操作算子对应的子任务的操作数据在所述同一个数据结构中的存储位置、所述第一操作算子组中的各操作算子对应的子任务的个数、每个所述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。After the first subtask is triggered, the operation logics of all the operation operators in the first operation operator group are merged into one merge operation logic, and the operation data to be processed corresponding to each operation operator in the group is merged. In the same data structure. And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation operator in the first operation operator group Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.
其中,对于将第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,由于第一操作算子组中的各操作逻辑相同,因此,它们的接口定义大致相同,例如:以选择操作算子为例,其统一计算设备架构(Compute Unified Device Architecture,以下简称:CUDA)接口定义可能如下:Wherein, for combining the operation logics of the operation operators in the first operation operator group into one merge operation logic, since the operation logics in the first operation operator group are the same, their interface definitions are substantially the same, for example, For example, the selection operation operator (Compute Unified Device Architecture, hereinafter referred to as CUDA) interface definition may be as follows:
“_global_void selection(data,n,result,filter)”"_global_void selection(data,n,result,filter)"
其中“data”为待处理的数据,“n”为“data”中数据记录个数,“n”为大于或等于1的整数,“result”也是一个n维的数组,用以保存 选择操作后的结果,而“filter”为该选择操作算子的过滤函数接口,其代码描述如下:Where "data" is the data to be processed, "n" is the number of data records in "data", "n" is an integer greater than or equal to 1, and "result" is also an n-dimensional array for saving Select the result of the operation, and "filter" is the filter function interface of the selection operation operator. The code is described as follows:
Figure PCTCN2014086523-appb-000001
Figure PCTCN2014086523-appb-000001
从而,若“data”中第i(i为小于等于n的正整数)个数据记录满足“fiter”条件,即filter(data[i])结果为true,则“result[i]”为真,否则为假。Therefore, if the i-th (i is a positive integer less than or equal to n) data records in "data" satisfies the "fiter" condition, that is, the result of filter(data[i]) is true, then "result[i]" is true. Otherwise it is false.
对于不同数据流的选择操作算子,它们所处理的数据类型不同以及过滤函数定义也不同。而操作算子合并的结果便是为这些不同的选择操作算子提供统一的接口,使它们对于GPU而言是无差别的单个的操作算子,比如,合并后的选择操作算子通用接口便可以如下定义:For the selection operation operators of different data streams, they deal with different data types and different filter function definitions. The result of the operation operator merging is to provide a unified interface for these different selection operators, so that they are indistinguishable single operation operators for the GPU, for example, the combined selection operator operator general interface Can be defined as follows:
“_global_void MergedSelection(mergedData,n,result,filters);”"_global_void MergedSelection(mergedData,n,result,filters);"
其中,此处“mergedData”数据结构包括所有被合并的操作算子所需处理的数据记录以及相应的元数据,“n”为mergedData中所有数据记录的总数目,“result”仍是用以保存选择操作的结果,而“filters”为一个函数数组,依次记录各数据流所对应的过滤函数操作。Here, the "mergedData" data structure includes the data records to be processed by all the merged operation operators and the corresponding metadata, "n" is the total number of all data records in mergedData, and "result" is still used to save Select the result of the operation, and "filters" is an array of functions, which in turn records the filter function operations corresponding to each data stream.
由于不同的输入数据流可能具有不同的模式定义,以三个数据流为例,三个数据流的模式定义分别如下:数据流A的模式定义如下:Since different input data streams may have different schema definitions, taking three data streams as an example, the schema definitions of the three data streams are as follows: The schema of data stream A is defined as follows:
Figure PCTCN2014086523-appb-000002
Figure PCTCN2014086523-appb-000002
所以,需要合适的数据结构来保存合并的操作数据以便上述“MergedSelection”接口统一处理。Therefore, a suitable data structure is needed to save the merged operational data for uniform processing by the "MergedSelection" interface described above.
比如,现需合并三个数据流(如数据流A,数据流B,数据流C),且它们的模式定义如上所述,即它们的数据属性个数,属性类型,数据记录长度及条数均可能不同。为了使得这些数据流在合并后它们各自的数据记录能被无差别访问,同时也需保存一些必要的元数据信息。根据各数据流的操作数据在同一个数据结构中的存储位置、各数据流的子任务的个数、每个子任务中的数据记录条数和每个数据记录的长度生成元数据信息。从而,用以保存数据流合并结果的数据结构可如下“MergedData”的定义如下:For example, it is now necessary to merge three data streams (such as data stream A, data stream B, data stream C), and their schema definitions are as described above, that is, their number of data attributes, attribute type, data record length and number of pieces. They may all be different. In order to make these data streams, their respective data records can be accessed indiscriminately after merging, and also need to save some necessary metadata information. The metadata information is generated according to the storage location of the operation data of each data stream in the same data structure, the number of subtasks of each data stream, the number of data records in each subtask, and the length of each data record. Thus, the data structure used to save the data stream merge result can be defined as follows: "MergedData":
Figure PCTCN2014086523-appb-000004
Figure PCTCN2014086523-appb-000004
其中,数据域“data”即以字节流形式保存各输入的数据流数据记录且其在GPU内存的存储如图3所示。图3为本发明数据合并结果示意图, 数据流A的数据记录条数为nA,数据流B的数据记录条数为nB,数据流C的数据记录条数为nC。“position”、“count”和“length”域的维度等于合并的数据流个数,因而它们所占空间极小。The data field "data" stores the data stream data records of each input in the form of a byte stream and its storage in the GPU memory is as shown in FIG. 3 is a schematic diagram of data combining results of the present invention, The number of data records of data stream A is nA, the number of data records of data stream B is nB, and the number of data records of data stream C is nC. The dimensions of the "position", "count", and "length" fields are equal to the number of merged data streams, so they take up very little space.
基于“MergedData”数据结构,“MergedSelection”通用接口便可以通过以下方式实现:Based on the "MergedData" data structure, the "MergedSelection" generic interface can be implemented in the following ways:
Figure PCTCN2014086523-appb-000005
Figure PCTCN2014086523-appb-000005
Figure PCTCN2014086523-appb-000006
Figure PCTCN2014086523-appb-000006
即对于每个GPU线程,其需处理的数据记录由线程ID决定,同时根据该线程ID确定待处理数据记录所属的数据流以及其相应的元数据信息,如:其在数据流中的起始地址,从而以便正确读出该数据记录,进而调用相应的过滤函数进行处理。“MergedSelection”接口(其它合并接口也类似,如对于“Projection”操作的通用接口为“MergedProjection”)可被预先编译,只需在运行时对其传递具体参数,动态调用执行。从而使得原来多个子任务合并成单一的批量任务。That is, for each GPU thread, the data record to be processed is determined by the thread ID, and the data stream to which the data record to be processed belongs and its corresponding metadata information are determined according to the thread ID, for example, its start in the data stream. The address, in order to correctly read the data record, and then call the corresponding filter function for processing. The "MergedSelection" interface (other merge interfaces are similar, such as the "MergedProjection" for the "Projection" operation) can be pre-compiled, passing only specific parameters at runtime, dynamically calling execution. This allows the original multiple subtasks to be merged into a single batch task.
图4为本发明基于GPU的数据流处理方法实施例三的流程示意图,图4以选择操作为例,如图4所示,本实施例的方法,包括:4 is a schematic flowchart of a method for processing a GPU-based data stream according to a third embodiment of the present invention. FIG. 4 is an example of a selection operation. As shown in FIG. 4, the method in this embodiment includes:
S401:收集待合并的各操作算子对应的数据流的输入数据以及相应的元数据信息。S401: Collect input data of the data stream corresponding to each operation operator to be merged and corresponding metadata information.
其中,元数据信息为记录个数、记录长度等。The metadata information is the number of records, the length of the record, and the like.
S402:新建“MergedData”对象,并以收集的数据对“MergedData”各数据域赋值。S402: Create a new "MergedData" object, and assign values to each data field of "MergedData" with the collected data.
S403:将“MergedData”对象以及各操作算子的过滤函数作为参数传递给“MergedSelection Kernel”。S403: Pass the "MergedData" object and the filter function of each operation operator as parameters to the "MergedSelection Kernel".
S404:调度“MergedSelection”给GPU执行。S404: Schedule "MergedSelection" to be executed by the GPU.
图5为本发明基于GPU的数据流处理装置实施例一的结构示意图,本实施例的装置包括接收模块501、合并模块502和处理模块503,其中,接收模块501用于接收第一子任务,上述第一子任务包含第一操作数据和第一操作算子,上述第一操作算子的操作逻辑为第一操作逻辑;合并模块502用于将上述第一子任务和至少一个第二子任务合并为一个合并任务,其中,上述第二子任务的操作逻辑与上述第一操作逻辑相同;处理模块503用于调度图像处理器GPU对上述合并任务进行数据流处理。5 is a schematic structural diagram of Embodiment 1 of a GPU-based data stream processing apparatus according to the present invention. The apparatus in this embodiment includes a receiving module 501, a merging module 502, and a processing module 503, where the receiving module 501 is configured to receive the first subtask. The first subtask includes the first operation data and the first operation operator, and the operation logic of the first operation operator is the first operation logic; the merging module 502 is configured to use the first subtask and the at least one second subtask The merged task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic; the processing module 503 is configured to schedule the image processor GPU to perform data stream processing on the merge task.
本实施例的装置,对应地可用于执行图1所示方法实施例的技术方案, 其实现原理和技术效果类似,在此不再赘述。The device of this embodiment is correspondingly applicable to the technical solution of the method embodiment shown in FIG. The implementation principle and technical effects are similar, and will not be described here.
在上述实施例中,上述合并模块502还用于确定上述第一子任务中上述第一操作数据的各条数据记录之间的处理无结果依赖关系。In the above embodiment, the merging module 502 is further configured to determine a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.
在上述实施例中,上述第一操作算子处理的数据流为第一数据流;上述合并模块502进一步地包括:判断单元、第一合并单元和第二合并单元,其中,判断单元用于判断内核档案中是否包含与上述第一操作算子相同的第二操作算子,其中,上述第二操作算子的操作逻辑与上述第一操作逻辑相同,上述第二操作算子处理的数据流为上述第一数据流;第一合并单元,用于若上述内核档案中不包含与上述第一操作算子相同的第二操作算子则将上述第一操作算子加入到上述内核档案中,将上述第一子任务和至少一个第二子任务合并为一个合并任务;第二合并单元,用于若上述内核当中包含与上述第一操作算子相同的第二操作算子,则将上述第一子任务和至少一个第二子任务合并为一个合并任务。In the above embodiment, the data stream processed by the first operation operator is a first data stream; the merging module 502 further includes: a determining unit, a first merging unit, and a second merging unit, wherein the determining unit is configured to determine Whether the kernel operation file includes the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is The first data stream; the first merging unit is configured to add the first operation operator to the kernel file if the kernel file does not include the second operation operator that is the same as the first operation operator, The first subtask and the at least one second subtask are combined into one merge task; the second merge unit is configured to: if the kernel includes the second operation operator that is the same as the first operation operator, The subtask and at least one second subtask are merged into one merge task.
在上述实施例中,上述第一合并单元具体用于若上述内核档案中有至少一个第一候选操作算子组,则将上述第一操作算子加入其中一个上述第一候选操作算子组,上述第一候选操作算子组中的各操作算子的操作逻辑与上述第一操作逻辑相同;若上述内核档案中没有上述第一候选操作算子组,则创建上述第一候选操作算子组,将上述第一操作算子加入上述第一候选操作算子组。In the above embodiment, the first merging unit is specifically configured to add the first operation operator to one of the first candidate operation operator groups if at least one first candidate operation operator group is included in the kernel file. The operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic; if the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created. And adding the first operation operator to the first candidate operation operator group.
在上述实施例中,上述第一合并单元具体用若上述内核档案中有至少两个第一候选操作算子组,于根据第一预设规则从上述至少两个第一候选操作算子组中选择一个第一操作算子组;将上述第一操作算子加入上述第一操作算子组。In the foregoing embodiment, the first merging unit is specifically configured to: if there are at least two first candidate operation operator groups in the kernel file, from the at least two first candidate operation operator groups according to the first preset rule. Selecting a first operation operator group; adding the first operation operator to the first operation operator group.
在上述实施例中,上述第一预设规则为如下任一个规则:选择操作算子最少的第一候选操作算子组为上述第一操作算子组;选择各操作算子对应的平均数据量最小的第一候选操作算子组为上述第一操作算子组。In the above embodiment, the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the average data amount corresponding to each operation operator is selected. The smallest first candidate operation operator group is the first operation operator group described above.
在上述实施例中,上述第一合并单元还用于若上述内核档案中的上述至少一个第一候选操作算子组中均无法加入上述第一操作算子,则根据第二预设规则对上述第一操作算子以及上述至少一个第一候选操作算子组中的各操作算子重新分组。 In the above embodiment, the first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, The first operational operator and each of the at least one first candidate operational operator group are regrouped.
在上述实施例中,上述第一合并单元具体用于计算上述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及上述第一操作算子的单位数据的执行代价;将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个上述第一候选操作算子组。In the above embodiment, the first merging unit is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator. The operation operator that divides the difference in the execution cost of the unit data within the preset range is stored in the same first candidate operation operator group.
在上述实施例中,上述合并模块502具体用于当上述第一子任务被触发时,将上述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将上述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;根据上述第一操作算子组中的各操作算子对应的子任务的操作数据在上述同一个数据结构中的存储位置、上述第一操作算子组中的各操作算子对应的子任务的个数、每个上述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。In the above embodiment, the merging module 502 is specifically configured to merge the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first subtask is triggered, and The operation data of the subtask corresponding to each operation operator in the first operation operator group is merged in the same data structure; according to the operation data of the subtask corresponding to each operation operator in the first operation operator group, a storage location in the same data structure, a number of subtasks corresponding to each operation operator in the first operation operator group, a number of data records in each of the subtasks, and a length generation element of each data record Data information.
本实施例的装置,对应地可用于执行图2所示方法实施例的技术方案,其实现原理和技术效果类似,在此不再赘述。The device in this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
图6为本发明基于GPU的数据流处理装置实施例二的结构示意图,如图6所示,本实施例的基于GPU的数据流处理装置600包括:处理器601、存储器602和系统总线603,其中,上述处理器601和上述存储器602之间通过上述系统总线连接并完成相互间的通信;上述存储器602用于存储计算机执行指令6021;上述处理器601用于运行上述计算机执行指令6021,使上述基于GPU的数据流处理装置执行如下的方法:FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention. As shown in FIG. 6, the GPU-based data stream processing apparatus 600 of the present embodiment includes: a processor 601, a memory 602, and a system bus 603. The processor 601 and the memory 602 are connected to each other through the system bus and complete communication with each other; the memory 602 is configured to store a computer execution instruction 6021; the processor 601 is configured to run the computer execution instruction 6021 to enable the The GPU-based data stream processing apparatus performs the following methods:
接收第一子任务,上述第一子任务包含第一操作数据和第一操作算子,上述第一操作算子的操作逻辑为第一操作逻辑;将上述第一子任务和至少一个第二子任务合并为一个合并任务,其中,上述第二子任务的操作逻辑与上述第一操作逻辑相同;调度图像处理器GPU对上述合并任务进行数据流处理。Receiving a first subtask, the first subtask includes a first operation data and a first operation operator, and the operation logic of the first operation operator is a first operation logic; the first subtask and the at least one second sub The task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic; and the scheduling image processor GPU performs data stream processing on the merge task.
本实施例的装置,对应地可用于执行图1所示方法实施例的技术方案,其实现原理和技术效果类似,在此不再赘述。The device of this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 1 , and the implementation principle and technical effects thereof are similar, and details are not described herein again.
进一步地,处理器601具体用于确定上述第一子任务中上述第一操作数据的各条数据记录之间的处理无结果依赖关系。Further, the processor 601 is specifically configured to determine a processing result-independent relationship between the pieces of data records of the first operation data in the first sub-task.
进一步地,上述第一操作算子处理的数据流为第一数据流;处理器601具体用于判断内核档案中是否包含与上述第一操作算子相同的第二操作 算子,其中,上述第二操作算子的操作逻辑与上述第一操作逻辑相同,上述第二操作算子处理的数据流为上述第一数据流;Further, the data stream processed by the first operation operator is a first data stream; the processor 601 is specifically configured to determine whether the kernel file includes the second operation that is the same as the first operation operator. An operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the data stream processed by the second operation operator is the first data stream;
若上述内核档案中不包含与上述第一操作算子相同的第二操作算子则将上述第一操作算子加入到上述内核档案中,将上述第一子任务和至少一个第二子任务合并为一个合并任务;If the kernel file does not include the second operation operator that is the same as the first operation operator, the first operation operator is added to the kernel file, and the first subtask and the at least one second subtask are merged. For a combined task;
若上述内核当中包含与上述第一操作算子相同的第二操作算子,则将上述第一子任务和至少一个第二子任务合并为一个合并任务。If the kernel includes the second operation operator that is the same as the first operation operator, the first subtask and the at least one second subtask are combined into one merge task.
进一步地,处理器601具体用于若上述内核档案中有至少一个第一候选操作算子组,则将上述第一操作算子加入其中一个上述第一候选操作算子组,上述第一候选操作算子组中的各操作算子的操作逻辑与上述第一操作逻辑相同;Further, the processor 601 is specifically configured to: if the core file has at least one first candidate operation operator group, add the first operation operator to one of the first candidate operation operator groups, and the first candidate operation The operation logic of each operation operator in the operator group is the same as the first operation logic described above;
若上述内核档案中没有上述第一候选操作算子组,则创建上述第一候选操作算子组,将上述第一操作算子加入上述第一候选操作算子组。If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group.
进一步地,若上述内核档案中有至少两个第一候选操作算子组,处理器601具体用于根据第一预设规则从上述至少两个第一候选操作算子组中选择一个第一操作算子组;将上述第一操作算子加入上述第一操作算子组。Further, if there are at least two first candidate operation operator groups in the kernel file, the processor 601 is specifically configured to select a first operation from the at least two first candidate operation operator groups according to the first preset rule. Operator group; adding the above first operation operator to the first operation operator group.
进一步地,上述第一预设规则为如下任一个规则:选择操作算子最少的第一候选操作算子组为上述第一操作算子组;选择各操作算子对应的平均数据量最小的第一候选操作算子组为上述第一操作算子组。Further, the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the minimum average data amount corresponding to each operation operator is selected. A candidate operation operator group is the first operation operator group described above.
进一步地,若上述内核档案中的上述至少一个第一候选操作算子组中均无法加入上述第一操作算子,处理器601具体用于根据第二预设规则对上述第一操作算子以及上述至少一个第一候选操作算子组中的各操作算子重新分组。Further, if the first operation operator cannot be added to the at least one first candidate operation operator group in the kernel file, the processor 601 is specifically configured to: use the first operation operator according to the second preset rule, and Each of the at least one first candidate operation operator group is regrouped.
进一步地,处理器601具体用于计算上述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及上述第一操作算子的单位数据的执行代价;将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个上述第一候选操作算子组。Further, the processor 601 is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator; The operation operators whose execution cost difference is within the preset range are stored in the same first candidate operation operator group as described above.
进一步地,处理器601具体当上述第一子任务被触发时,将上述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将上 述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;根据上述第一操作算子组中的各操作算子对应的子任务的操作数据在上述同一个数据结构中的存储位置、上述第一操作算子组中的各操作算子对应的子任务的个数、每个上述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。Further, the processor 601 specifically combines the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first sub-task is triggered, and The operation data of the subtasks corresponding to the operation operators in the first operation operator group are combined in the same data structure; according to the operation data of the subtasks corresponding to the operation operators in the first operation operator group, The storage location in the same data structure, the number of subtasks corresponding to each operation operator in the first operation operator group, the number of data records in each of the subtasks, and the length of each data record are generated. Metadata information.
本实施例的装置,对应地可用于执行图2所示方法实施例的技术方案,其实现原理和技术效果类似,在此不再赘述。The device in this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.
本发明实施例还提供一种计算机可读介质,包含计算机执行指令,上述计算机执行指令用于基于GPU的数据流处理装置执行本发明基于GPU的数据流处理方法实施例一至实施例三所述的方法。The embodiment of the present invention further provides a computer readable medium, including computer executed instructions, where the GPU-based data stream processing apparatus performs the GPU-based data stream processing method according to the first embodiment to the third embodiment of the present invention. method.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims (19)

  1. 一种基于GPU的数据流处理方法,其特征在于,包括:A GPU-based data stream processing method, comprising:
    接收第一子任务,所述第一子任务包含第一操作数据和第一操作算子,所述第一操作算子的操作逻辑为第一操作逻辑;Receiving a first subtask, the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
    将所述第一子任务和至少一个第二子任务合并为一个合并任务,其中,所述第二子任务的操作逻辑与所述第一操作逻辑相同;Merging the first subtask and the at least one second subtask into one merge task, wherein the operation logic of the second subtask is the same as the first operation logic;
    调度图像处理器GPU对所述合并任务进行数据流处理。The scheduling image processor GPU performs data stream processing on the merge task.
  2. 根据权利要求1所述的方法,其特征在于,所述将所述第一子任务和至少一个第二子任务合并为一个合并任务之前,还包括:The method according to claim 1, wherein before the combining the first subtask and the at least one second subtask into one merge task, the method further comprises:
    确定所述第一子任务中所述第一操作数据的各条数据记录之间的处理无结果依赖关系。Determining a process-independent dependency between the pieces of data records of the first operational data in the first sub-task.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一操作算子处理的数据流为第一数据流;The method according to claim 1 or 2, wherein the data stream processed by the first operation operator is a first data stream;
    所述将所述第一子任务和至少一个第二子任务合并为一个合并任务,包括:Combining the first subtask and the at least one second subtask into one merge task, including:
    判断内核档案中是否包含与所述第一操作算子相同的第二操作算子,其中,所述第二操作算子的操作逻辑与所述第一操作逻辑相同,所述第二操作算子处理的数据流为所述第一数据流;Determining whether the kernel file contains the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the second operation operator Processing the data stream as the first data stream;
    若所述内核档案中不包含与所述第一操作算子相同的第二操作算子则将所述第一操作算子加入到所述内核档案中,将所述第一子任务和至少一个第二子任务合并为一个合并任务;Adding the first operation operator to the kernel file if the kernel file does not include the same second operation operator as the first operation operator, and the first subtask and at least one The second subtask is merged into one merge task;
    若所述内核当中包含与所述第一操作算子相同的第二操作算子,则将所述第一子任务和至少一个第二子任务合并为一个合并任务。If the kernel includes the same second operation operator as the first operation operator, the first subtask and the at least one second subtask are merged into one merge task.
  4. 根据权利要求3所述的方法,其特征在于,所述将所述第一操作算子加入到所述内核档案中,包括:The method according to claim 3, wherein said adding said first operation operator to said kernel file comprises:
    若所述内核档案中有至少一个第一候选操作算子组,则将所述第一操作算子加入其中一个所述第一候选操作算子组;If the kernel file has at least one first candidate operation operator group, adding the first operation operator to one of the first candidate operation operator groups;
    若所述内核档案中没有所述第一候选操作算子组,则创建所述第一候选操作算子组,将所述第一操作算子加入所述第一候选操作算子组;If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group;
    其中,所述第一候选操作算子组中的各操作算子的操作逻辑与所述第 一操作逻辑相同。The operation logic of each operation operator in the first candidate operation operator group and the first An operational logic is the same.
  5. 根据权利要求4所述的方法,其特征在于,若所述内核档案中有至少两个第一候选操作算子组,所述将所述第一操作算子加入其中一个所述第一候选操作算子组,包括:The method according to claim 4, wherein if there are at least two first candidate operation operator groups in the kernel file, the first operation operator is added to one of the first candidate operations Operator group, including:
    根据第一预设规则从所述至少两个第一候选操作算子组中选择一个第一操作算子组;Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule;
    将所述第一操作算子加入所述第一操作算子组。Adding the first operation operator to the first operation operator group.
  6. 根据权利要求5所述的方法,其特征在于,所述第一预设规则为如下任一个规则:The method according to claim 5, wherein the first preset rule is any one of the following rules:
    选择操作算子最少的第一候选操作算子组为所述第一操作算子组;Selecting a first candidate operation operator group with the least operation operator as the first operation operator group;
    选择各操作算子对应的平均数据量最小的第一候选操作算子组为所述第一操作算子组。The first candidate operation operator group with the smallest average data amount corresponding to each operation operator is selected as the first operation operator group.
  7. 根据权利要求4所述的方法,其特征在于,还包括:若所述内核档案中的所述至少一个第一候选操作算子组中均无法加入所述第一操作算子,则根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组。The method according to claim 4, further comprising: if the first operation operator cannot be added to the at least one first candidate operation operator group in the kernel file, according to the second The preset rule regroups the first operation operator and each of the at least one first candidate operation operator group.
  8. 根据权利要求7所述的方法,其特征在于,所述根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组,包括:The method according to claim 7, wherein the operation operators in the first operation operator and the at least one first candidate operation operator group are regrouped according to a second preset rule. include:
    计算所述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及所述第一操作算子的单位数据的执行代价;Calculating an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator;
    将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个所述第一候选操作算子组。An operation operator that divides the difference in execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
  9. 根据权利要求5~8任一项所述的方法,其特征在于,所述将所述第一子任务和至少一个第二子任务合并为一个合并任务,包括:The method according to any one of claims 5 to 8, wherein the combining the first subtask and the at least one second subtask into one merge task comprises:
    当所述第一子任务被触发时,将所述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将所述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;When the first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and each operation in the first operation operator group is The operation data of the subtask corresponding to the operator is merged in the same data structure;
    根据所述第一操作算子组中的各操作算子对应的子任务的操作数据在所述同一个数据结构中的存储位置、所述第一操作算子组中的各操作算 子对应的子任务的个数、每个所述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation in the first operation operator group Metadata information is generated by the number of subtasks corresponding to the child, the number of data records in each of the subtasks, and the length of each data record.
  10. 一种基于GPU的数据流处理装置,其特征在于,包括:A GPU-based data stream processing device, comprising:
    接收模块,用于接收第一子任务,所述第一子任务包含第一操作数据和第一操作算子,所述第一操作算子的操作逻辑为第一操作逻辑;a receiving module, configured to receive a first subtask, where the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;
    合并模块,用于将所述第一子任务和至少一个第二子任务合并为一个合并任务,其中,所述第二子任务的操作逻辑与所述第一操作逻辑相同;a merging module, configured to merge the first subtask and the at least one second subtask into one merge task, where an operation logic of the second subtask is the same as the first operation logic;
    处理模块,用于调度图像处理器GPU对所述合并任务进行数据流处理。And a processing module, configured to schedule an image processor GPU to perform data stream processing on the merge task.
  11. 根据权利要求10所述的装置,其特征在于,所述合并模块还用于确定所述第一子任务中所述第一操作数据的各条数据记录之间的处理无结果依赖关系。The apparatus according to claim 10, wherein the merging module is further configured to determine a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.
  12. 根据权利要求10或11所述的装置,其特征在于,所述第一操作算子处理的数据流为第一数据流;The apparatus according to claim 10 or 11, wherein the data stream processed by the first operation operator is a first data stream;
    所述合并模块包括:The merge module includes:
    判断单元,用于判断内核档案中是否包含与所述第一操作算子相同的第二操作算子,其中,所述第二操作算子的操作逻辑与所述第一操作逻辑相同,所述第二操作算子处理的数据流为所述第一数据流;a determining unit, configured to determine whether the second operation operator is the same as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, The data stream processed by the second operation operator is the first data stream;
    第一合并单元,用于若所述内核档案中不包含与所述第一操作算子相同的第二操作算子则将所述第一操作算子加入到所述内核档案中,将所述第一子任务和至少一个第二子任务合并为一个合并任务;a first merging unit, configured to add the first operation operator to the kernel file if the kernel file does not include a second operation operator identical to the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;
    第二合并单元,用于若所述内核当中包含与所述第一操作算子相同的第二操作算子,则将所述第一子任务和至少一个第二子任务合并为一个合并任务。And a second merging unit, configured to merge the first subtask and the at least one second subtask into one merge task if the kernel includes a second operation operator that is the same as the first operation operator.
  13. 根据权利要求12所述的装置,其特征在于,所述第一合并单元具体用于若所述内核档案中有至少一个第一候选操作算子组,则将所述第一操作算子加入其中一个所述第一候选操作算子组;若所述内核档案中没有所述第一候选操作算子组,则创建所述第一候选操作算子组,将所述第一操作算子加入所述第一候选操作算子组;其中,所述第一候选操作算子组中的各操作算子的操作逻辑与所述第一操作逻辑相同。 The apparatus according to claim 12, wherein the first merging unit is specifically configured to add the first operation operator to the kernel file if there is at least one first candidate operation operator group a first candidate operation operator group; if the first candidate operation operator group is not in the kernel file, creating the first candidate operation operator group, adding the first operation operator to the a first candidate operation operator group; wherein operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
  14. 根据权利要求13所述的装置,其特征在于,所述第一合并单元具体用若所述内核档案中有至少两个第一候选操作算子组,于根据第一预设规则从所述至少两个第一候选操作算子组中选择一个第一操作算子组;将所述第一操作算子加入所述第一操作算子组。The apparatus according to claim 13, wherein the first merging unit is specifically configured to: if there are at least two first candidate operation operator groups in the kernel file, from the at least one according to a first preset rule Selecting a first operation operator group from the two first candidate operation operator groups; adding the first operation operator to the first operation operator group.
  15. 根据权利要求14所述的装置,其特征在于,所述第一预设规则为如下任一个规则:选择操作算子最少的第一候选操作算子组为所述第一操作算子组;选择各操作算子对应的平均数据量最小的第一候选操作算子组为所述第一操作算子组。The apparatus according to claim 14, wherein the first preset rule is any one of: a first candidate operation operator group that selects the least operator is the first operation operator group; The first candidate operation operator group with the smallest average data amount corresponding to each operation operator is the first operation operator group.
  16. 根据权利要求13所述的装置,其特征在于,所述第一合并单元还用于若所述内核档案中的所述至少一个第一候选操作算子组中均无法加入所述第一操作算子,则根据第二预设规则对所述第一操作算子以及所述至少一个第一候选操作算子组中的各操作算子重新分组。The apparatus according to claim 13, wherein the first merging unit is further configured to: if the at least one first candidate operation operator group in the kernel file cannot join the first operation calculation Sub-, the operation operator in the first operation operator and the at least one first candidate operation operator group are re-grouped according to the second preset rule.
  17. 根据权利要求16所述的装置,其特征在于,所述第一合并单元具体用于计算所述至少一个第一候选操作算子组中的各操作算子的单位数据的执行代价以及所述第一操作算子的单位数据的执行代价;将单位数据的执行代价的差值在预设范围内的操作算子存储在同一个所述第一候选操作算子组。The apparatus according to claim 16, wherein the first merging unit is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and the An execution cost of the unit data of an operation operator; an operation operator that divides the difference of the execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
  18. 根据权利要求14~17任一项所述的装置,其特征在于,所述合并模块具体用于当所述第一子任务被触发时,将所述第一操作算子组中的各操作算子的操作逻辑合并为一个合并操作逻辑,并将所述第一操作算子组中的各操作算子对应的子任务的操作数据合并在同一个数据结构中;根据所述第一操作算子组中的各操作算子对应的子任务的操作数据在所述同一个数据结构中的存储位置、所述第一操作算子组中的各操作算子对应的子任务的个数、每个所述子任务中的数据记录条数和每个数据记录的长度生成元数据信息。The device according to any one of claims 14 to 17, wherein the merging module is specifically configured to calculate, when the first subtask is triggered, each operation in the first operation operator group The operation logic of the child is merged into one merge operation logic, and the operation data of the sub-task corresponding to each operation operator in the first operation operator group is merged into the same data structure; according to the first operation operator The storage location of the operation data of the subtask corresponding to each operation operator in the group in the same data structure, the number of subtasks corresponding to each operation operator in the first operation operator group, and each Metadata information is generated by the number of data records in the subtask and the length of each data record.
  19. 一种基于GPU的数据流处理装置,其特征在于,包括:A GPU-based data stream processing device, comprising:
    处理器、存储器和系统总线;Processor, memory and system bus;
    所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;Connecting between the processor and the memory through the system bus and completing communication with each other;
    所述存储器,用于存储计算机执行指令; The memory is configured to store a computer execution instruction;
    所述处理器,用于运行所述计算机执行指令,使所述基于GPU的数据流处理装置执行如权利要求1至9任一所述的方法。 The processor, configured to execute the computer to execute instructions, to cause the GPU-based data stream processing apparatus to perform the method of any one of claims 1-9.
PCT/CN2014/086523 2014-09-15 2014-09-15 Method and device for processing data stream based on gpu WO2016041126A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/086523 WO2016041126A1 (en) 2014-09-15 2014-09-15 Method and device for processing data stream based on gpu
CN201480038261.0A CN105637482A (en) 2014-09-15 2014-09-15 Method and device for processing data stream based on gpu

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/086523 WO2016041126A1 (en) 2014-09-15 2014-09-15 Method and device for processing data stream based on gpu

Publications (1)

Publication Number Publication Date
WO2016041126A1 true WO2016041126A1 (en) 2016-03-24

Family

ID=55532418

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086523 WO2016041126A1 (en) 2014-09-15 2014-09-15 Method and device for processing data stream based on gpu

Country Status (2)

Country Link
CN (1) CN105637482A (en)
WO (1) WO2016041126A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686352A (en) * 2016-12-23 2017-05-17 北京大学 Real-time processing method of multiple video data on multi-GPU (multiple graphics processing unit) platform
CN112395234A (en) * 2019-08-16 2021-02-23 阿里巴巴集团控股有限公司 Request processing method and device
CN112463158A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694675B (en) 2019-03-15 2022-03-08 上海商汤智能科技有限公司 Task scheduling method and device and storage medium
CN111899149A (en) * 2020-07-09 2020-11-06 浙江大华技术股份有限公司 Image processing method and device based on operator fusion and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060098017A1 (en) * 2004-11-05 2006-05-11 Microsoft Corporation Interpreter for simplified programming of graphics processor units in general purpose programming languages
US20070294666A1 (en) * 2006-06-20 2007-12-20 Papakipos Matthew N Systems and methods for determining compute kernels for an application in a parallel-processing computer system
CN102609978A (en) * 2012-01-13 2012-07-25 中国人民解放军信息工程大学 Method for accelerating cone-beam CT (computerized tomography) image reconstruction by using GPU (graphics processing unit) based on CUDA (compute unified device architecture) architecture
CN102708009A (en) * 2012-04-19 2012-10-03 华为技术有限公司 Method for sharing GPU (graphics processing unit) by multiple tasks based on CUDA (compute unified device architecture)
EP2620873A1 (en) * 2012-01-27 2013-07-31 Samsung Electronics Co., Ltd Resource allocation method and apparatus of GPU

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309889A (en) * 2012-03-15 2013-09-18 华北计算机系统工程研究所 Method for realizing of real-time data parallel compression by utilizing GPU (Graphic processing unit) cooperative computing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060098017A1 (en) * 2004-11-05 2006-05-11 Microsoft Corporation Interpreter for simplified programming of graphics processor units in general purpose programming languages
US20070294666A1 (en) * 2006-06-20 2007-12-20 Papakipos Matthew N Systems and methods for determining compute kernels for an application in a parallel-processing computer system
CN102609978A (en) * 2012-01-13 2012-07-25 中国人民解放军信息工程大学 Method for accelerating cone-beam CT (computerized tomography) image reconstruction by using GPU (graphics processing unit) based on CUDA (compute unified device architecture) architecture
EP2620873A1 (en) * 2012-01-27 2013-07-31 Samsung Electronics Co., Ltd Resource allocation method and apparatus of GPU
CN102708009A (en) * 2012-04-19 2012-10-03 华为技术有限公司 Method for sharing GPU (graphics processing unit) by multiple tasks based on CUDA (compute unified device architecture)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686352A (en) * 2016-12-23 2017-05-17 北京大学 Real-time processing method of multiple video data on multi-GPU (multiple graphics processing unit) platform
CN112395234A (en) * 2019-08-16 2021-02-23 阿里巴巴集团控股有限公司 Request processing method and device
CN112463158A (en) * 2020-11-25 2021-03-09 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium
CN112463158B (en) * 2020-11-25 2023-05-23 安徽寒武纪信息科技有限公司 Compiling method, compiling device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105637482A (en) 2016-06-01

Similar Documents

Publication Publication Date Title
US8656396B2 (en) Performance optimization based on threshold performance measure by resuming suspended threads if present or by creating threads within elastic and data parallel operators
Ben-Nun et al. Groute: An asynchronous multi-GPU programming model for irregular computations
US9152601B2 (en) Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units
US10733019B2 (en) Apparatus and method for data processing
EP2161685B1 (en) Pipelined image processing engine
US9197703B2 (en) System and method to maximize server resource utilization and performance of metadata operations
US8601458B2 (en) Profile-driven data stream processing
US9158795B2 (en) Compile-time grouping of tuples in a streaming application
US9996394B2 (en) Scheduling accelerator tasks on accelerators using graphs
WO2016041126A1 (en) Method and device for processing data stream based on gpu
CN108475212B (en) Method, system, and computer readable medium for processing data using dynamic partitioning
US9262223B2 (en) Lazy initialization of operator graph in a stream computing application
US9405349B2 (en) Multi-core apparatus and job scheduling method thereof
TWI564807B (en) Scheduling method and processing device using the same
CN105354089B (en) Support the stream data processing unit and system of iterative calculation
CN106055311A (en) Multi-threading Map Reduce task parallelizing method based on assembly line
US9471387B2 (en) Scheduling in job execution
US10083066B2 (en) Processing data by using simultaneous multithreading
Liu et al. Optimizing shuffle in wide-area data analytics
US10203988B2 (en) Adaptive parallelism of task execution on machines with accelerators
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
KR20220045026A (en) Hardware circuitry for accelerating neural network computations
WO2019000435A1 (en) Task processing method and device, medium, and device thereof
CN107329813B (en) Global sensing data active prefetching method and system for many-core processor
CN113806044B (en) Heterogeneous platform task bottleneck eliminating method for computer vision application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14902108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14902108

Country of ref document: EP

Kind code of ref document: A1