WO2016041126A1

WO2016041126A1 - Method and device for processing data stream based on gpu

Info

Publication number: WO2016041126A1
Application number: PCT/CN2014/086523
Authority: WO
Inventors: 邓利群; 朱俊华
Original assignee: 华为技术有限公司
Priority date: 2014-09-15
Filing date: 2014-09-15
Publication date: 2016-03-24
Also published as: CN105637482A

Abstract

A method and device for processing data stream based on GPU combine multiple sub tasks with the same operation logic into a merge task, and call GPU to process the data stream for the merge task, so as to reduce the scheduling frequency and the scheduling overhead of GPU and improve the throughput of the stream data processing system.

Description

GPU-based data stream processing method and device

Technical field

The embodiments of the present invention relate to computer technologies, and in particular, to a data processing method and apparatus based on a graphics processing unit (GPU).

Background technique

At present, the application of GPUs as coprocessors or accelerators to general-purpose computing fields (such as databases, data compression, etc.) has become a major trend in the industry. Compared with the Central Processing Unit (CPU), the GPU has the advantages of larger concurrent threads and higher memory bandwidth, which is more suitable for large-scale data parallel or computational parallel tasks.

However, for an application scenario with a large number of data streams and a large data generation frequency, the stream processing task has continuity and concurrent tasks, but the computational complexity of a single stream processing task is small. Therefore, the GPU is used to accelerate the data stream processing. When the GPU needs to be scheduled frequently, the GPU scheduling overhead is large.

Summary of the invention

The embodiment of the invention provides a GPU-based data stream processing method and device, which reduces the scheduling overhead of the GPU and improves the throughput of the stream data processing system.

A first aspect of the embodiments of the present invention provides a GPU-based data stream processing method, including:

Receiving a first subtask, the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;

Merging the first subtask and the at least one second subtask into one merge task, wherein the operation logic of the second subtask is the same as the first operation logic;

The scheduling image processor GPU performs data stream processing on the merge task.

With reference to the first aspect, in a first possible implementation, before the combining the first subtask and the at least one second subtask into one merge task, the method further includes:

Determining a process-independent dependency between the pieces of data records of the first operational data in the first sub-task.

With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, the data stream processed by the first operation operator is a first data stream;

Combining the first subtask and the at least one second subtask into one merge task, including:

Determining whether the kernel file contains the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the second operation operator Processing the data stream as the first data stream;

Adding the first operation operator to the kernel file if the kernel file does not include the same second operation operator as the first operation operator, and the first subtask and at least one The second subtask is merged into one merge task;

If the kernel includes the same second operation operator as the first operation operator, the first subtask and the at least one second subtask are merged into one merge task.

In conjunction with the second possible implementation of the first aspect, in a third possible implementation, the adding the first operation operator to the kernel file includes:

If the kernel file has at least one first candidate operation operator group, adding the first operation operator to one of the first candidate operation operator groups;

If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group;

The operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation, if the kernel file has at least two first candidate operation operator groups, the first operation is calculated Sub-joining one of the first candidate operation operator groups includes:

Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule;

Adding the first operation operator to the first operation operator group.

In conjunction with the fourth possible implementation of the first aspect, in a fifth possible implementation, the first preset rule is any one of the following rules:

Selecting a first candidate operation operator group with the least operation operator as the first operation operator group;

Selecting the first candidate operation operator group with the smallest average data amount corresponding to each operation operator as the The first operational operator group is described.

In conjunction with the third possible implementation of the first aspect, in a sixth possible implementation, the method further includes: if the at least one first candidate operation operator group in the kernel file cannot be When the first operation operator is added, the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule.

With reference to the sixth possible implementation manner of the first aspect, in a seventh possible implementation, the first operation operator and the at least one first candidate operation operator according to the second preset rule Each operator in the group is regrouped, including:

Calculating an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator;

An operation operator that divides the difference in execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.

With reference to any one of the possible implementations of the fourth to seventh possible implementations of the first aspect, in an eighth possible implementation, the first subtask and the at least one second sub The tasks are combined into one combined task, including:

When the first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and each operation in the first operation operator group is The operation data of the subtask corresponding to the operator is merged in the same data structure;

And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation operator in the first operation operator group Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.

A second aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:

a receiving module, configured to receive a first subtask, where the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;

a merging module, configured to merge the first subtask and the at least one second subtask into one merge task, where an operation logic of the second subtask is the same as the first operation logic;

And a processing module, configured to schedule an image processor GPU to perform data stream processing on the merge task.

In conjunction with the second aspect, in a first possible implementation, the merging module is further used

With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner, the data stream processed by the first operation operator is a first data stream;

The merge module includes:

a determining unit, configured to determine whether the second operation operator is the same as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, The data stream processed by the second operation operator is the first data stream;

a first merging unit, configured to add the first operation operator to the kernel file if the kernel file does not include a second operation operator identical to the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;

And a second merging unit, configured to merge the first subtask and the at least one second subtask into one merge task if the kernel includes a second operation operator that is the same as the first operation operator.

With reference to the second possible implementation of the second aspect, in a third possible implementation, the first merging unit is specifically configured to: if there is at least one first candidate operation operator group in the kernel file, Adding the first operation operator to one of the first candidate operation operator groups; if the first candidate operation operator group is not in the kernel file, creating the first candidate operation operator group, Adding the first operation operator to the first candidate operation operator group; wherein operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.

In conjunction with the third possible implementation of the second aspect, in a fourth possible implementation, the first merging unit is specifically configured to have at least two first candidate operation operator groups in the kernel file, Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule; adding the first operation operator to the first operation operator group.

With reference to the fourth possible implementation manner of the second aspect, in a fifth possible implementation, the first preset rule is any one of the following: the first candidate operation operator group that selects the least operator operator is The first operation operator group; the first candidate operation operator group that selects the smallest average data amount corresponding to each operation operator is the first operation operator group.

In conjunction with the third possible implementation of the second aspect, in a sixth possible implementation The first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, according to the second preset rule The first operation operator and each of the at least one first candidate operation operator group are regrouped.

With reference to the sixth possible implementation of the second aspect, in a seventh possible implementation, the first merging unit is specifically configured to calculate each operation operator in the at least one first candidate operation operator group The execution cost of the unit data and the execution cost of the unit data of the first operation operator; the operation operator that calculates the difference of the execution cost of the unit data within a preset range is stored in the same first candidate operation Subgroup.

With reference to any one of the possible implementations of the fourth to seventh possible implementations of the second aspect, in an eighth possible implementation, the merging module is specifically configured to: when the first subtask is When triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and the operation of the sub-task corresponding to each operation operator in the first operation operator group is performed. Data is merged in the same data structure; according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group in the same data structure, the first operation operator Metadata information is generated by the number of subtasks corresponding to each operation operator in the group, the number of data records in each of the subtasks, and the length of each data record.

A third aspect of the embodiments of the present invention provides a GPU-based data stream processing apparatus, including:

a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;

The memory is configured to store a computer execution instruction;

The processor is configured to execute the method in which the computer executes instructions to cause the GPU-based data stream processing apparatus to perform any of the possible implementations of the first aspect.

The GPU-based data stream processing method and apparatus provided by the embodiments of the present invention combines multiple subtasks with the same operation logic into one merge task, and invokes the GPU to perform data stream processing on the merge task, thereby reducing the scheduling frequency of the GPU and reducing The scheduling overhead of the GPU improves the throughput of the streaming data processing system.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will be true. The drawings used in the examples or the description of the prior art are briefly introduced. It is obvious that the drawings in the following description are only some embodiments of the present invention, and no one is creative to those skilled in the art. Other drawings can also be obtained from these drawings on the premise of labor.

1 is a schematic flowchart of Embodiment 1 of a GPU-based data stream processing method according to the present invention;

2 is a schematic flowchart of Embodiment 2 of a GPU-based data stream processing method according to the present invention;

3 is a schematic diagram of data combining results of the present invention;

4 is a schematic flowchart of Embodiment 3 of a GPU-based data stream processing method according to the present invention;

5 is a schematic structural diagram of Embodiment 1 of a GPU-based data stream processing apparatus according to the present invention;

FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The terms "first", "second", "third", "fourth", etc. (if present) in the specification and claims of the present invention and the above figures are used to distinguish similar objects without being used for Describe a specific order or order. It is to be understood that the data so used may be interchanged as appropriate, such that the embodiments of the invention described herein can be implemented, for example, in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.

In order to reduce the scheduling overhead of the GPU and improve the throughput of the stream processing system, the present invention combines a plurality of subtasks that can be merged into one merge task, and schedules the GPU to perform data stream processing on the merged task, thereby reducing the frequency of scheduling the GPU and reducing the frequency. The scheduling overhead of the GPU, and further, since the amount of data to be processed by the merge task becomes large and is uniformly processed, the large-scale parallel processing performance of the GPU can be fully utilized, thereby improving the processing throughput of the system.

The technical solutions of the present invention will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in some embodiments.

For convenience of description, in the following embodiments of the present invention, the task currently received by the CPU is referred to as a “first subtask”, and the data to be processed included in the first subtask is referred to as “first operational data”. The operation operator of a subtask is called "first operation operator", and the operation logic of the first operation operator is called "first operation logic", wherein the operation operator refers to what kind of processing is performed on a certain data stream. The operation logic refers to what kind of processing is performed, that is, the attributes of the operation operator include two: 1. the processed data stream, 2. the processing mode; and the operation logic refers to the processing mode, therefore, the different operation operators The operational logic is likely to be the same. The operation operator group in which the first operation operator in the GPU kernel file is located is referred to as the "first operation operator group", and the operation operator of the kernel file having the same operation logic as the first operation operator is located. The group is referred to as a "first candidate operation operator group", where a kernel file refers to a collection for storing kernel information. A task having the same operational logic as the first operational logic among the tasks to be processed in the system is referred to as a "second subtask." The operation operator that is the same as the operation logic of the first operation operator, and the processed data and the data processed by the first operation operator belong to the same data stream is referred to as a "second operation operator."

FIG. 1 is a schematic flowchart of a GPU-based data stream processing method according to Embodiment 1 of the present invention. As shown in FIG. 1 , the execution body of the embodiment is a CPU, and the method in this embodiment is as follows:

S101: Receive the first subtask.

In the memory of the system, each data stream has a memory buffer, and data from the same data stream is stored in the same memory buffer. When the data volume of a memory buffer is greater than the first preset threshold, Alternatively, when the buffering time is greater than the second preset threshold, the stream processing system sends a task request (ie, the first subtask) to the CPU, requesting the CPU to process the data in the buffer. The size of the first preset threshold and the second preset threshold are specifically determined according to actual application settings, and the present invention does not limit this.

The first subtask includes first operational data and a first operational operator, and the operational logic of the first operational operator is the first operational logic. The first operation data is data that needs to be processed in the first subtask of the data stream, and the first operation operator performs what operation is performed on the first operation data, such as a selection operation, a mapping operation, and the like.

S102: Combine the first subtask and the at least one second subtask into one merge task.

The operation logic of the second subtask is the same as the first operation logic.

That is, the subtasks with the same operational logic are merged into one merge task.

S103: The scheduling GPU performs data stream processing on the foregoing merge task.

In this embodiment, since a plurality of subtasks having the same operation logic are merged into one merge task, the GPU is scheduled to perform data stream processing on the merge task, and the GPU is scheduled to be compared to each subtask in the prior art, thereby reducing the scheduling of the GPU. The frequency reduces the scheduling overhead of the GPU, and because the amount of data to be processed by the merge task becomes larger, the GPU's large-scale parallel processing performance can be fully utilized, thereby improving the processing throughput of the system.

In the above embodiment, before executing S102, the method further includes: determining a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task. The processing of the result-free dependency means that each data record in the first sub-task can be processed in parallel, and the processing result of each data record does not affect the processing result of other data records. For example, a selection operation, a mapping operation, and the like; that is, only if the processing between the pieces of data records of the first operation data of the first sub-task has no result dependency, it can be compared with other sub-operations having the same operational logic. The tasks are merged. If the processing result between the data records of the first operation data in the first sub-task has a dependency, the GPU is directly scheduled to process the first sub-task, and is no longer combined with other sub-tasks.

In the above embodiment, it is assumed that the data stream processed by the first operation operator is the first data stream, and the specific implementation manner of S102 is: determining whether the kernel operation file includes the second operation operator that is the same as the first operation operator. The operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is the first data stream, that is, the operation operator must satisfy the same two conditions: 1. The operation logic is the same ; 2, the processed data belongs to the same data stream. If the GPU kernel (Kernel) file does not include the same second operation operator as the first operation operator, it indicates that the system has not executed the sub-task of the first data stream before, and adds the first operation operator to the kernel file. The first subtask and the at least one second subtask are combined into one merge task. If the kernel file contains the same second operation operator as the first operation operator, it indicates that the system has previously processed the sub-task of the first data stream, and directly performs the process of merging the first sub-task and the at least one second sub-task into A merge task.

Further, the first operation operator is added to the kernel file. The specific implementation is as shown in FIG. 2. FIG. 2 is a schematic flowchart of Embodiment 2 of the GPU-based data stream processing method according to the present invention.

S201: Determine whether there is at least one first candidate operation operator group in the kernel file. If yes, execute S202, if no, execute S205.

S202: Determine whether at least one first candidate operation operator group can join the first operation operator. If yes, execute S203, and if no, execute S204.

Specifically, the determining method for determining whether the first candidate operation operator group can join the first operation operator may be determined according to a specific application; for example, for an application having strict requirements on processing delay, the first need to be considered. Whether the first operation operator in the candidate operation operator group will bring a greater delay cost to the subsequent merge task of the first candidate operation operator group, that is, if the first operation operator is added to a first After a candidate operation operator group, the estimated delay of the merge task based on the candidate operation operator group exceeds the maximum delay requirement of an operation operator in the first candidate operation operator group, and the first candidate The operation operator group cannot join the first operation operator. If no, the first operation operator can be added to the first candidate operation operator group, and then the at least one first candidate operation operator group can be added to the first operation operator. If the first operation operator cannot be added to all the first candidate operation operator groups in the kernel file, it is determined that the at least one first candidate operation operator group cannot join the first operation operator.

S203: Add the first operation operator to the first operation operator group.

When only one first candidate operation operator group can join the first operation operator, determining that the first candidate operation operator group is the first operation operator group, and adding the first operation operator to the first operation operator a group; when at least two first candidate operation operator groups can join the first operation operator, selecting one of the at least two first candidate operation operator groups as the first operation calculation according to the first preset rule Subgroup, the first operation operator is added to the first operation operator group. The first preset rule may be: selecting, from the at least two first candidate operation operator groups, the first candidate operation operator group having the least number of operation operators as the first operation operator group; or, from the above The first candidate operation operator group in which at least two first candidate operation operator groups select the smallest average data amount corresponding to each operation operator is the first operation operator group.

S204: Re-grouping each operation operator and the first operation operator in the at least one first candidate operation operator group.

If at least one of the first candidate operation operator groups in the kernel file cannot join the first operation As an operator, the first operation operator and each operation operator in the at least one first candidate operation operator group are regrouped according to the second preset rule. Specifically, re-grouping each operation operator in the at least one first candidate operation operator group according to the second preset rule includes the following steps: 1) calculating each operation operator in the at least one first candidate operation operator group The execution cost of the unit data and the execution cost of the first operation operator. 2) The operation operators whose difference in the execution cost of the unit data are within the preset range are stored in the same operation operator group.

The operation operators that store the difference in the execution cost of the unit data within the preset range are stored in the same first candidate operation operator group, so that the GPU threads are load-balanced as much as possible during execution.

S205: Create a first candidate operation operator group, and add the first operation operator to the first candidate operation operator group.

If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group, that is, the first operation operator is the first The first operator in the candidate operation operator group.

In the embodiment shown in FIG. 1 or FIG. 2, the first subtask and the at least one second subtask are combined into one merge task, which specifically includes: operation logic merge and corresponding merged data to be processed.

After the first subtask is triggered, the operation logics of all the operation operators in the first operation operator group are merged into one merge operation logic, and the operation data to be processed corresponding to each operation operator in the group is merged. In the same data structure. And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation operator in the first operation operator group Metadata information is generated by the number of subtasks, the number of data records in each of the subtasks, and the length of each data record.

Wherein, for combining the operation logics of the operation operators in the first operation operator group into one merge operation logic, since the operation logics in the first operation operator group are the same, their interface definitions are substantially the same, for example, For example, the selection operation operator (Compute Unified Device Architecture, hereinafter referred to as CUDA) interface definition may be as follows:

"_global_void selection(data,n,result,filter)"

Where "data" is the data to be processed, "n" is the number of data records in "data", "n" is an integer greater than or equal to 1, and "result" is also an n-dimensional array for saving Select the result of the operation, and "filter" is the filter function interface of the selection operation operator. The code is described as follows:

Therefore, if the i-th (i is a positive integer less than or equal to n) data records in "data" satisfies the "fiter" condition, that is, the result of filter(data[i]) is true, then "result[i]" is true. Otherwise it is false.

For the selection operation operators of different data streams, they deal with different data types and different filter function definitions. The result of the operation operator merging is to provide a unified interface for these different selection operators, so that they are indistinguishable single operation operators for the GPU, for example, the combined selection operator operator general interface Can be defined as follows:

"_global_void MergedSelection(mergedData,n,result,filters);"

Here, the "mergedData" data structure includes the data records to be processed by all the merged operation operators and the corresponding metadata, "n" is the total number of all data records in mergedData, and "result" is still used to save Select the result of the operation, and "filters" is an array of functions, which in turn records the filter function operations corresponding to each data stream.

Since different input data streams may have different schema definitions, taking three data streams as an example, the schema definitions of the three data streams are as follows: The schema of data stream A is defined as follows:

Therefore, a suitable data structure is needed to save the merged operational data for uniform processing by the "MergedSelection" interface described above.

For example, it is now necessary to merge three data streams (such as data stream A, data stream B, data stream C), and their schema definitions are as described above, that is, their number of data attributes, attribute type, data record length and number of pieces. They may all be different. In order to make these data streams, their respective data records can be accessed indiscriminately after merging, and also need to save some necessary metadata information. The metadata information is generated according to the storage location of the operation data of each data stream in the same data structure, the number of subtasks of each data stream, the number of data records in each subtask, and the length of each data record. Thus, the data structure used to save the data stream merge result can be defined as follows: "MergedData":

The data field "data" stores the data stream data records of each input in the form of a byte stream and its storage in the GPU memory is as shown in FIG. 3 is a schematic diagram of data combining results of the present invention, The number of data records of data stream A is nA, the number of data records of data stream B is nB, and the number of data records of data stream C is nC. The dimensions of the "position", "count", and "length" fields are equal to the number of merged data streams, so they take up very little space.

Based on the "MergedData" data structure, the "MergedSelection" generic interface can be implemented in the following ways:

That is, for each GPU thread, the data record to be processed is determined by the thread ID, and the data stream to which the data record to be processed belongs and its corresponding metadata information are determined according to the thread ID, for example, its start in the data stream. The address, in order to correctly read the data record, and then call the corresponding filter function for processing. The "MergedSelection" interface (other merge interfaces are similar, such as the "MergedProjection" for the "Projection" operation) can be pre-compiled, passing only specific parameters at runtime, dynamically calling execution. This allows the original multiple subtasks to be merged into a single batch task.

4 is a schematic flowchart of a method for processing a GPU-based data stream according to a third embodiment of the present invention. FIG. 4 is an example of a selection operation. As shown in FIG. 4, the method in this embodiment includes:

S401: Collect input data of the data stream corresponding to each operation operator to be merged and corresponding metadata information.

The metadata information is the number of records, the length of the record, and the like.

S402: Create a new "MergedData" object, and assign values to each data field of "MergedData" with the collected data.

S403: Pass the "MergedData" object and the filter function of each operation operator as parameters to the "MergedSelection Kernel".

S404: Schedule "MergedSelection" to be executed by the GPU.

5 is a schematic structural diagram of Embodiment 1 of a GPU-based data stream processing apparatus according to the present invention. The apparatus in this embodiment includes a receiving module 501, a merging module 502, and a processing module 503, where the receiving module 501 is configured to receive the first subtask. The first subtask includes the first operation data and the first operation operator, and the operation logic of the first operation operator is the first operation logic; the merging module 502 is configured to use the first subtask and the at least one second subtask The merged task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic; the processing module 503 is configured to schedule the image processor GPU to perform data stream processing on the merge task.

The device of this embodiment is correspondingly applicable to the technical solution of the method embodiment shown in FIG. The implementation principle and technical effects are similar, and will not be described here.

In the above embodiment, the merging module 502 is further configured to determine a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.

In the above embodiment, the data stream processed by the first operation operator is a first data stream; the merging module 502 further includes: a determining unit, a first merging unit, and a second merging unit, wherein the determining unit is configured to determine Whether the kernel operation file includes the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the data flow processed by the second operation operator is The first data stream; the first merging unit is configured to add the first operation operator to the kernel file if the kernel file does not include the second operation operator that is the same as the first operation operator, The first subtask and the at least one second subtask are combined into one merge task; the second merge unit is configured to: if the kernel includes the second operation operator that is the same as the first operation operator, The subtask and at least one second subtask are merged into one merge task.

In the above embodiment, the first merging unit is specifically configured to add the first operation operator to one of the first candidate operation operator groups if at least one first candidate operation operator group is included in the kernel file. The operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic; if the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created. And adding the first operation operator to the first candidate operation operator group.

In the foregoing embodiment, the first merging unit is specifically configured to: if there are at least two first candidate operation operator groups in the kernel file, from the at least two first candidate operation operator groups according to the first preset rule. Selecting a first operation operator group; adding the first operation operator to the first operation operator group.

In the above embodiment, the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the average data amount corresponding to each operation operator is selected. The smallest first candidate operation operator group is the first operation operator group described above.

In the above embodiment, the first merging unit is further configured to: if the first operation operator is not added to the at least one first candidate operation operator group in the kernel file, The first operational operator and each of the at least one first candidate operational operator group are regrouped.

In the above embodiment, the first merging unit is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator. The operation operator that divides the difference in the execution cost of the unit data within the preset range is stored in the same first candidate operation operator group.

In the above embodiment, the merging module 502 is specifically configured to merge the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first subtask is triggered, and The operation data of the subtask corresponding to each operation operator in the first operation operator group is merged in the same data structure; according to the operation data of the subtask corresponding to each operation operator in the first operation operator group, a storage location in the same data structure, a number of subtasks corresponding to each operation operator in the first operation operator group, a number of data records in each of the subtasks, and a length generation element of each data record Data information.

The device in this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and the technical effect are similar, and details are not described herein again.

FIG. 6 is a schematic structural diagram of Embodiment 2 of a GPU-based data stream processing apparatus according to the present invention. As shown in FIG. 6, the GPU-based data stream processing apparatus 600 of the present embodiment includes: a processor 601, a memory 602, and a system bus 603. The processor 601 and the memory 602 are connected to each other through the system bus and complete communication with each other; the memory 602 is configured to store a computer execution instruction 6021; the processor 601 is configured to run the computer execution instruction 6021 to enable the The GPU-based data stream processing apparatus performs the following methods:

Receiving a first subtask, the first subtask includes a first operation data and a first operation operator, and the operation logic of the first operation operator is a first operation logic; the first subtask and the at least one second sub The task is merged into a merge task, wherein the operation logic of the second subtask is the same as the first operation logic; and the scheduling image processor GPU performs data stream processing on the merge task.

The device of this embodiment is correspondingly used to implement the technical solution of the method embodiment shown in FIG. 1 , and the implementation principle and technical effects thereof are similar, and details are not described herein again.

Further, the processor 601 is specifically configured to determine a processing result-independent relationship between the pieces of data records of the first operation data in the first sub-task.

Further, the data stream processed by the first operation operator is a first data stream; the processor 601 is specifically configured to determine whether the kernel file includes the second operation that is the same as the first operation operator. An operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the data stream processed by the second operation operator is the first data stream;

If the kernel file does not include the second operation operator that is the same as the first operation operator, the first operation operator is added to the kernel file, and the first subtask and the at least one second subtask are merged. For a combined task;

If the kernel includes the second operation operator that is the same as the first operation operator, the first subtask and the at least one second subtask are combined into one merge task.

Further, the processor 601 is specifically configured to: if the core file has at least one first candidate operation operator group, add the first operation operator to one of the first candidate operation operator groups, and the first candidate operation The operation logic of each operation operator in the operator group is the same as the first operation logic described above;

If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group.

Further, if there are at least two first candidate operation operator groups in the kernel file, the processor 601 is specifically configured to select a first operation from the at least two first candidate operation operator groups according to the first preset rule. Operator group; adding the above first operation operator to the first operation operator group.

Further, the first preset rule is any one of the following rules: the first candidate operation operator group with the least selection operation operator is the first operation operator group; and the minimum average data amount corresponding to each operation operator is selected. A candidate operation operator group is the first operation operator group described above.

Further, if the first operation operator cannot be added to the at least one first candidate operation operator group in the kernel file, the processor 601 is specifically configured to: use the first operation operator according to the second preset rule, and Each of the at least one first candidate operation operator group is regrouped.

Further, the processor 601 is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator; The operation operators whose execution cost difference is within the preset range are stored in the same first candidate operation operator group as described above.

Further, the processor 601 specifically combines the operation logics of the operation operators in the first operation operator group into one merge operation logic when the first sub-task is triggered, and The operation data of the subtasks corresponding to the operation operators in the first operation operator group are combined in the same data structure; according to the operation data of the subtasks corresponding to the operation operators in the first operation operator group, The storage location in the same data structure, the number of subtasks corresponding to each operation operator in the first operation operator group, the number of data records in each of the subtasks, and the length of each data record are generated. Metadata information.

The embodiment of the present invention further provides a computer readable medium, including computer executed instructions, where the GPU-based data stream processing apparatus performs the GPU-based data stream processing method according to the first embodiment to the third embodiment of the present invention. method.

One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

A GPU-based data stream processing method, comprising:

Receiving a first subtask, the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;

Merging the first subtask and the at least one second subtask into one merge task, wherein the operation logic of the second subtask is the same as the first operation logic;

The scheduling image processor GPU performs data stream processing on the merge task.
The method according to claim 1, wherein before the combining the first subtask and the at least one second subtask into one merge task, the method further comprises:

Determining a process-independent dependency between the pieces of data records of the first operational data in the first sub-task.
The method according to claim 1 or 2, wherein the data stream processed by the first operation operator is a first data stream;

Combining the first subtask and the at least one second subtask into one merge task, including:

Determining whether the kernel file contains the same second operation operator as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, and the second operation operator Processing the data stream as the first data stream;

Adding the first operation operator to the kernel file if the kernel file does not include the same second operation operator as the first operation operator, and the first subtask and at least one The second subtask is merged into one merge task;

If the kernel includes the same second operation operator as the first operation operator, the first subtask and the at least one second subtask are merged into one merge task.
The method according to claim 3, wherein said adding said first operation operator to said kernel file comprises:

If the kernel file has at least one first candidate operation operator group, adding the first operation operator to one of the first candidate operation operator groups;

If the first candidate operation operator group is not included in the kernel file, the first candidate operation operator group is created, and the first operation operator is added to the first candidate operation operator group;

The operation logic of each operation operator in the first candidate operation operator group and the first An operational logic is the same.
The method according to claim 4, wherein if there are at least two first candidate operation operator groups in the kernel file, the first operation operator is added to one of the first candidate operations Operator group, including:

Selecting a first operation operator group from the at least two first candidate operation operator groups according to a first preset rule;

Adding the first operation operator to the first operation operator group.
The method according to claim 5, wherein the first preset rule is any one of the following rules:

Selecting a first candidate operation operator group with the least operation operator as the first operation operator group;

The first candidate operation operator group with the smallest average data amount corresponding to each operation operator is selected as the first operation operator group.
The method according to claim 4, further comprising: if the first operation operator cannot be added to the at least one first candidate operation operator group in the kernel file, according to the second The preset rule regroups the first operation operator and each of the at least one first candidate operation operator group.
The method according to claim 7, wherein the operation operators in the first operation operator and the at least one first candidate operation operator group are regrouped according to a second preset rule. include:

Calculating an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and an execution cost of the unit data of the first operation operator;

An operation operator that divides the difference in execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
The method according to any one of claims 5 to 8, wherein the combining the first subtask and the at least one second subtask into one merge task comprises:

When the first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into one merge operation logic, and each operation in the first operation operator group is The operation data of the subtask corresponding to the operator is merged in the same data structure;

And storing, according to the storage location of the operation data of the subtask corresponding to each operation operator in the first operation operator group, the operation location in the same data structure, and each operation in the first operation operator group Metadata information is generated by the number of subtasks corresponding to the child, the number of data records in each of the subtasks, and the length of each data record.
A GPU-based data stream processing device, comprising:

a receiving module, configured to receive a first subtask, where the first subtask includes first operation data and a first operation operator, and an operation logic of the first operation operator is a first operation logic;

a merging module, configured to merge the first subtask and the at least one second subtask into one merge task, where an operation logic of the second subtask is the same as the first operation logic;

And a processing module, configured to schedule an image processor GPU to perform data stream processing on the merge task.
The apparatus according to claim 10, wherein the merging module is further configured to determine a processing result-free dependency between the pieces of data records of the first operation data in the first sub-task.
The apparatus according to claim 10 or 11, wherein the data stream processed by the first operation operator is a first data stream;

The merge module includes:

a determining unit, configured to determine whether the second operation operator is the same as the first operation operator, wherein the operation logic of the second operation operator is the same as the first operation logic, The data stream processed by the second operation operator is the first data stream;

a first merging unit, configured to add the first operation operator to the kernel file if the kernel file does not include a second operation operator identical to the first operation operator, The first subtask and the at least one second subtask are combined into one merge task;

And a second merging unit, configured to merge the first subtask and the at least one second subtask into one merge task if the kernel includes a second operation operator that is the same as the first operation operator.
The apparatus according to claim 12, wherein the first merging unit is specifically configured to add the first operation operator to the kernel file if there is at least one first candidate operation operator group a first candidate operation operator group; if the first candidate operation operator group is not in the kernel file, creating the first candidate operation operator group, adding the first operation operator to the a first candidate operation operator group; wherein operation logic of each operation operator in the first candidate operation operator group is the same as the first operation logic.
The apparatus according to claim 13, wherein the first merging unit is specifically configured to: if there are at least two first candidate operation operator groups in the kernel file, from the at least one according to a first preset rule Selecting a first operation operator group from the two first candidate operation operator groups; adding the first operation operator to the first operation operator group.
The apparatus according to claim 14, wherein the first preset rule is any one of: a first candidate operation operator group that selects the least operator is the first operation operator group; The first candidate operation operator group with the smallest average data amount corresponding to each operation operator is the first operation operator group.
The apparatus according to claim 13, wherein the first merging unit is further configured to: if the at least one first candidate operation operator group in the kernel file cannot join the first operation calculation Sub-, the operation operator in the first operation operator and the at least one first candidate operation operator group are re-grouped according to the second preset rule.
The apparatus according to claim 16, wherein the first merging unit is specifically configured to calculate an execution cost of the unit data of each operation operator in the at least one first candidate operation operator group and the An execution cost of the unit data of an operation operator; an operation operator that divides the difference of the execution cost of the unit data within a preset range is stored in the same first candidate operation operator group.
The device according to any one of claims 14 to 17, wherein the merging module is specifically configured to calculate, when the first subtask is triggered, each operation in the first operation operator group The operation logic of the child is merged into one merge operation logic, and the operation data of the sub-task corresponding to each operation operator in the first operation operator group is merged into the same data structure; according to the first operation operator The storage location of the operation data of the subtask corresponding to each operation operator in the group in the same data structure, the number of subtasks corresponding to each operation operator in the first operation operator group, and each Metadata information is generated by the number of data records in the subtask and the length of each data record.
A GPU-based data stream processing device, comprising:

Processor, memory and system bus;

Connecting between the processor and the memory through the system bus and completing communication with each other;

The memory is configured to store a computer execution instruction;

The processor, configured to execute the computer to execute instructions, to cause the GPU-based data stream processing apparatus to perform the method of any one of claims 1-9.