CN108121792B

CN108121792B - Method, device and equipment for processing data streams based on task parallel and storage medium

Info

Publication number: CN108121792B
Application number: CN201711381582.2A
Authority: CN
Inventors: 杨强; 陈雨强; 戴文渊; 焦英翔; 石光川
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2020-06-26
Anticipated expiration: 2037-12-20
Also published as: CN111752971A; CN111752971B; CN108121792A

Abstract

The invention discloses a method, a device, equipment and a storage medium for parallel processing of data streams. And taking out the tasks to be processed from the task queue through each determined working thread in the plurality of working threads so as to process the taken tasks to be processed, wherein the tasks to be processed are formed by packing batch data to be operated in the data stream and corresponding operation steps in the data stream processing. According to the parallelization operation mechanism based on the task generation formula, disclosed by the invention, for the tasks generated by packaging based on different operation steps, the parallelism can be automatically adjusted according to the consumed time in the actual operation process.

Description

Method, device and equipment for processing data streams based on task parallel and storage medium

Technical Field

The present invention relates to the field of data science, and in particular, to a method, an apparatus, a device, and a storage medium for parallel processing of data streams based on tasks.

Background

In the case of a large amount of data involved in a data processing service, it is often necessary to utilize multi-threaded parallel execution to reduce the overall execution time of the service. A thread is a unit for an operating system to perform operation scheduling, and a calculation must be performed by the operating system in a threaded manner. Using as many physical resources of the machine as possible may reduce the overall time overhead of the task.

One approach to utilizing multithreading is to divide the data to be processed into batches, with each thread being responsible for processing a batch of data. The problem with such a multi-threaded execution scheme is that the processing of data generally includes reading and writing, which is hard disk operation, and computation, which is processing such as CPU operation, and one thread may incur a large overhead if the operation is switched over on a variety of hardware devices. In addition, when the current thread waits for data to be read, the corresponding CPU does not execute the operation, which is equivalent to wasting the CPU resource at the same time.

The prior art mainly relies on asynchronous IO to solve this problem. Assuming that the data processing process includes three steps of data reading, data calculation, and calculation result storage, three threads may be set, where the first thread is responsible for reading data from the hard disk and placing the data in the first buffer, the second thread is responsible for reading data from the first buffer, calculating the data, and writing the calculation result in the second buffer, and the third thread is responsible for writing the data in the second buffer into a new file (where one thread does not stop until all data operations are completed). Generally, to prevent the data in the memory from being too large, the buffer has an upper limit number of buffers that can be cached.

Assuming that the time consumed for reading and writing data for a unit of data is one time unit, the time consumed for the calculation operation is more, and the time is three time units, the difference of the calculation speed can be offset by increasing the number of threads of the calculation operation. If five threads can be set, two are respectively responsible for reading and writing, and three are responsible for calculation, the data in the buffer area can not be accumulated continuously, and the calculation resources are in a used state in the whole task process.

However, the problem of asynchronous IO is that when the processing of data becomes complicated, it is difficult to configure the thread ratio to be optimal, for example, assuming that the processing of data includes a plurality of operation steps and the time resource consumption of each operation step is different. In the whole calculation process of the asynchronous IO, the opened thread number is the sum of the thread numbers required by all the operation steps. Because the number of physical CPUs on a computer is fixed, too many threads increase the switching overhead during system scheduling. The additive relationship results in the number of bus threads being related to the specific processing flow of the data, and the general computing framework has to take into account the specific characteristics of various data processing services to control the number of threads.

Disclosure of Invention

An object of the present invention is to provide a parallelization processing scheme based on task generation, which can adaptively adjust the parallelism according to the complexity of a data processing flow.

According to an aspect of the present invention, there is provided a method for parallel processing of data streams based on tasks, comprising: determining a plurality of working threads; and taking out the tasks to be processed from the task queue through each of the plurality of working threads respectively so as to process the taken tasks to be processed, wherein the tasks to be processed are formed by packing batch data to be operated in the data stream and corresponding operation steps in the data stream processing.

Optionally, the method may further include: and each working thread takes the processed operation result data as new batch data to be operated and packs the new batch data with the subsequent operation steps of the corresponding operation steps into a new task to be processed so as to be put into a task queue.

Optionally, each worker thread packages the operation result data obtained after processing as new batch data to be operated and subsequent operation steps of the branch whose branch condition is satisfied into a new task to be processed, so as to place the new task into the task queue.

Optionally, the method may further include: determining an individual source thread; and packaging the batch data to be operated in the data stream and the initial operation step in the data stream processing into the task to be processed by the source thread in a special cycle so as to be placed in the task queue.

Optionally, the method may further include: monitoring the memory use condition in the data stream parallel processing process; and under the condition that the currently used memory exceeds a preset threshold value, the source thread suspends the putting of a new task to be processed into the task queue.

Optionally, locking the task queue in the process of interaction between the work thread and the task queue; and after the interaction between the work thread and the task queue is finished, releasing the lock of the task queue.

Optionally, the task queue is divided into a computation task queue and an IO task queue, and the work thread is divided into a computation work thread and an IO work thread, the computation work thread takes out the task to be processed only from the computation task queue, the IO work thread takes out the task to be processed only from the IO task queue, and the work thread puts a new task to be processed into the computation task queue or the IO task queue according to whether a subsequent operation step is a computation operation step or an IO operation step.

Alternatively, in a case where the initial operation step in the data stream processing is a data input step, the subsequent operation steps of the initial operation include an operation step for performing successive processing on the read batch data to be operated and the data input step itself, wherein the data input step individually constitutes a task to be processed for reading the batch data to be operated.

Alternatively, the plurality of worker threads may be determined based on physical parameters of the machine that processes the data stream in parallel.

Alternatively, the data flow process can be characterized by a computation graph, which is a directed graph composed of at least two operation steps representing certain operations performed on the data and at least one data edge representing the flow direction of the data.

According to another aspect of the present invention, there is also provided an apparatus for parallel processing of data streams based on tasks, including: the working thread determining module is used for determining a plurality of working threads; and the task processing module is used for taking out the tasks to be processed from the task queue through each of the plurality of working threads respectively so as to process the taken out tasks to be processed, wherein the tasks to be processed are formed by packing batch data to be operated in the data stream and corresponding operation steps in the data stream processing.

Optionally, each worker thread packages the operation result data obtained after processing as new batch data to be operated and subsequent operation steps of the corresponding operation steps into a new task to be processed, so as to place the new task to be processed into a task queue.

Optionally, the apparatus may further include: and the task processing module is used for packaging batch data to be operated in the data stream and the initial operation step in the data stream processing into a task to be processed in a special cycle mode through the source thread so as to be placed in the task queue.

Optionally, the apparatus may further include: and the memory supervision module is used for monitoring the memory use condition in the data stream parallel processing process, and controlling the source thread to suspend putting a new task to be processed into the task queue under the condition that the currently used memory exceeds a preset threshold value.

Optionally, the worker thread determining module determines the plurality of worker threads according to physical parameters of the machine that processes the data stream in parallel.

According to another aspect of the present invention, there is also provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the above-mentioned method of task-based parallel processing of data streams.

According to another aspect of the present invention, there is also provided a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the above-mentioned method of task-based parallel processing of data streams.

The method, the device, the equipment and the storage medium for processing the data stream in parallel based on the task package the operation steps to be processed and the corresponding data to be operated into the tasks to be processed according to the processing flow of the data by taking the data and the operation steps as clues, and the packaged tasks are put into a task queue. According to the parallelization operation mechanism based on the task generation formula, for the tasks generated by packaging based on different operation steps, the parallelism can be automatically adjusted according to the consumed time in the actual operation process.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 shows a schematic flow diagram of a method for task-based parallel processing of data streams according to an embodiment of the invention.

FIG. 2 illustrates a schematic diagram of the interaction between a worker thread and a task queue.

Fig. 3 shows a schematic diagram of one implementation of performing the start-up operation steps.

FIG. 4 shows a parallelized processing diagram according to an embodiment of the invention.

FIG. 5 shows a parallelized processing diagram according to another embodiment of the invention.

Fig. 6 shows a schematic diagram of a computation graph.

Fig. 7 shows a schematic block diagram of the structure of a parallel processing apparatus according to an embodiment of the present invention.

FIG. 8 shows a schematic block diagram of the structure of a computing device according to one embodiment of the invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Different from the existing parallelization processing scheme, the invention provides a task generation type parallelization operation scheme. The parallelization operation scheme of the invention integrally considers data and corresponding operation steps, packs the operation steps to be processed and the corresponding data to be operated into tasks to be processed according to the processing flow of the data, and puts the packed tasks into a task queue. And the working thread extracts tasks from the task queue, processes the data according to the operation steps, continuously packs the processed operation result data and the downstream operation steps into a new task to be processed according to the data processing flow and puts the new task back to the task queue.

Based on the parallelization operation mechanism of the task generation type, for the tasks generated by packaging based on different operation steps, the parallelism can be automatically adjusted according to the consumed time in the actual operation process. For example, if a certain operation step a in the data processing flow takes a lot of time, and after the upstream task is executed, the generated tasks corresponding to the operation step a are accumulated continuously, the accumulated tasks are executed by more and more work threads, which is equivalent to improving the parallelism.

Therefore, when the number of the working threads is set, the number of steps included in the data processing flow and the time consumption of different steps do not need to be concerned. Further, as an optional mode, the specific number of the working threads may be set according to physical parameters of the used machine (such as the number of CPU hardware, the size of a hard disk, the speed of a network card, the size of a memory, and the like). That is, when configuring parallelism using the parallelization processing scheme of the present invention, the parameters that need to be adjusted may be dependent only on the machine hardware environment and not on the complexity of the data processing flow.

The data stream mentioned in the present invention can be regarded as streaming data, and is characterized in that the data is not streamed once but is streamed little by little like a pipeline, so that the data streamed little by little can be continuously processed according to the operation steps. In addition, the parallel processing of data streams according to the present invention means that the data to be processed has a large amount of data, and thus, it is impossible to process all the data at once, or the memory consumption for processing all the data at the same time is large, and therefore, it is necessary to fetch a large amount of data in batches and work in parallel by means of a plurality of threads to process the data.

Referring to fig. 1, in step S110, a plurality of worker threads are determined.

Here, the number of worker threads may be artificially specified, or may be determined in consideration of a hardware environment and/or network resources, etc. As an example, the number of worker threads may be determined based on physical parameters of a machine performing parallel processing on a data stream. For example, the number of the working threads may be determined according to physical parameters of the machine, such as the number of CPU hardware, the size of a hard disk, the speed of a network card, and the size of a memory, and a specific determination mechanism will be further described below, which is not described herein again.

In step S120, a task to be processed is fetched from the task queue by each of the plurality of work threads, respectively, to be processed with respect to the fetched task to be processed.

The task to be processed is formed by packaging batch data to be operated in the data stream and corresponding operation steps in the data stream processing. The batch data to be operated may be data for which a single operation is performed, or data for which a plurality of operations are repeatedly performed. For example, assuming that the data to be processed is a file a, where each line of data in the file a represents an integer, and the operation step is to square each integer, the batch data to be operated may be each line of data in the file a, where each task to be processed corresponds to an integer for squaring the integer; in addition, the batch data to be operated may also be multiple lines of data in the file a, where each task to be processed corresponds to multiple integers, and the integers are used to calculate the square of each integer.

As shown in FIG. 2, each worker thread may retrieve a task from the task queue and perform the corresponding operation step for the batch data to be operated therein. And then, each worker thread can also package the operation result data obtained after processing as new batch data to be operated and subsequent operation steps of the corresponding operation steps into a new task to be processed so as to be placed in a task queue.

As an optional embodiment of the present invention, in the case that each work thread has a branch structure in the data stream processing process, after each work thread takes out a task from the task queue and completes processing, the operation result data obtained after processing may be further packaged as new batch data to be operated and subsequent operation steps of branches whose branch conditions are satisfied into a new task to be processed, so as to be placed in the task queue. Thus, the present invention also supports data stream processing with a branching structure.

It should be noted that, in the case of having a branch structure in the data stream processing process, the branch condition may be a judgment of a condition related to the operation result data (for example, a branch condition satisfied by the operation result data), or may be a judgment of another current state.

In the case where the branch condition is a conditional judgment about the operation result data, the judgment operation of the branch condition may be performed by the worker thread. Specifically, the branch condition determining operation may be packaged as an independent operation step and corresponding operation result data, and put into a task queue for the task to be processed, and executed by the work thread. In addition, the judgment operation of the branch condition may be packaged as an additional operation step in a task corresponding to the previous operation step, and executed by a worker thread.

For example, after the work thread takes out the task from the task queue and executes the task, when the branch condition determination needs to be performed on the operation result data according to the data processing flow, the operation result data may be used as new batch data to be operated and packed with the branch condition determination operation to generate a new task to be processed, and the new task to be processed may be placed in the task queue. After the task is taken out by the working thread, judging operation of the branch condition can be executed, and according to the judging result, the operation result data obtained after processing can be used as new batch data to be operated and the subsequent operation steps of the branch with the branch condition being satisfied are packaged into a new task to be processed to be put into a task queue.

For another example, the judgment operation of the branch condition may be packaged together with the data to be operated corresponding to the previous operation step and the previous operation step into a task to be processed, and the task to be processed is placed in the task queue. Therefore, after the work thread takes out the task from the task queue for processing and obtains the operation result data, the operation result data can be used as new batch data to be operated and the subsequent operation steps of the branch with the branch condition being satisfied to be packaged into a new task to be processed according to the judgment result of the branch condition in the task, so as to be put into the task queue.

As an example, with respect to how the start operation step is performed, the following exemplary manner may be employed.

In a first mode

For the initial operation step in the data stream processing, the invention can utilize the additional thread to specially and continuously generate the task of the initial operation step, so that the residual working thread can complete the subsequent processing according to the process. Accordingly, the method shown in fig. 1 may further include: determining an individual source thread; and packaging the batch data to be operated in the data stream and the initial operation step in the data stream processing into the task to be processed by the source thread in a special cycle so as to be placed in the task queue.

As an example, the initial operation step referred to herein refers to the first operation step performed in addition to reading data. That is, the source thread may be responsible for continuously reading batch data to be operated from the data input end and packaging the batch data with an initial operation step (e.g., the first operation step after reading data) in the data stream processing to generate a task to be processed, and putting the task into a task queue, where the task is generally to read data from a hard disk or a network without CPU computation.

Mode two

In the case where the initial operation step itself in the data stream processing is a data input step, the subsequent operation steps of the initial operation step may include two, one being an operation step for performing subsequent processing on the read batch data to be operated and one being the data input step itself, wherein the data input step alone constitutes a task to be processed for reading the batch data to be operated.

Thus, in this embodiment, the data input step itself can be packaged into a task to be processed in the initial stage, and the task is executed to read the corresponding batch data to be processed. In this case, after the worker thread (for example, an IO worker thread mentioned below) takes out the task from the task queue and finishes reading data by executing the task, the worker thread not only packages operation result data as new batch data to be operated and operation steps for performing subsequent processing on the read batch data to be operated as a new batch task, but also generates a to-be-processed task for reading a next batch of data to be read.

Fig. 3 shows a schematic diagram of one implementation of performing the start-up operation steps. Wherein the arrows shown in fig. 3 represent the flow of the newly generated task after the task is executed.

Referring to FIG. 3, assume that the processing flow of the data stream includes N (N ≧ 2) steps, wherein the initial operation step is a data entry step, the initial task corresponding to the data entry step is denoted by the numeral 1, and the tasks corresponding to the subsequent other processing steps are denoted by 2, 3, 4 … N. For the initial task 1 generated based on the data input step, after the worker thread executes the task 1, not only the task 2 is generated in a packaging manner, but also a new initial task 1 is generated, and after the new initial task 1 is executed by the worker thread, the task 2 and the new initial task 1 are also generated, and so on. Therefore, in this case, only one initial task 1 needs to be generated, and all data can be read in the process of continuously executing the task.

During the processing of data, there may be dependency or other special relationship, for example, the calculation of a certain data may depend on the calculation results of other data, so the task queue may not set an upper limit to the number to avoid causing deadlock.

In order to solve the problem, the invention can introduce a memory supervision module to control the total memory overhead. The memory supervision module can monitor the memory use condition in the parallel processing process of the data stream. Accordingly, the method shown in fig. 1 may further include: monitoring the memory use condition in the data stream parallel processing process; and under the condition that the currently used memory exceeds a preset threshold value, the source thread suspends the putting of a new task to be processed into the task queue.

Referring to fig. 4, when applying for or releasing the memory, the working thread and the source thread may apply for or report to the memory supervision module, so that the memory supervision module may know the current use condition of the memory. The interaction with the memory supervision module affects the working efficiency under the condition that the memory applied or released by the working thread and the source thread is small, so that preferably, the working thread and the source thread can apply or report to the memory supervision module only under the condition that the memory applied or released is large (namely, the small intermediate variable consumption can not interact with the memory supervision module). When the currently used memory exceeds a predetermined threshold, the memory supervision module can control the source thread to suspend putting a new task to be processed into the task queue, and the source thread continues to input the new task to be processed into the task queue after a certain memory is released by a subsequent task.

The generated tasks are stored in the task queue, and the task queue can be preferably locked when the work thread or the source thread interacts with the task queue, for example, the task is taken out from the task queue or a new task is added to the task queue, and the lock of the task queue is released after the work thread or the source thread interacts with the task queue, so as to avoid causing conflicts, such as conflicts caused by different work threads contending for the same task. Wherein the time consumption caused by the lock operation can be ignored when the calculation amount of the task is large.

Since any intermediate process may cause deadlock during parallelization processing, in order to avoid deadlock problems, the memory supervisor module may preferably only control the source thread to suspend putting new pending tasks into the task queue if the currently used memory exceeds a predetermined threshold.

Assuming that each line of data in the file represents an integer, taking a data processing task as an example to perform a square operation on the integer in the file, assuming that three working threads and one source thread are started, first the source thread reads in a line of data a from the file, and packs the square operation and the data a into a task A₁Put into a task queue, and the source thread would then sequentially queue tasks B corresponding to data lines B, c, d, and e₁、C₁、D₁、E₁And putting the data into a task queue. Work thread 1 fetches A from the task queue₁Obtaining data a ^2 after execution, and packaging the operation of writing in the file and the a ^2 into a task A₂Put back into queue, the rest of the working threads will also have B in the same time₂、C₂Generating and returning to the queue, and continuously taking out the D from the queue after the execution of the working thread 1 is finished₁And executed.

Assuming that the squaring operation is a very complicated calculation operation, which takes very long time, and the process of reading the file by the source thread is fast, the squaring operation task is continuously accumulated in the task queue, and if the set threshold is that the number of data lines existing in the memory at the same time cannot exceed five, the data lines are stored in the source lineProgram put in E₁Later, the memory usage has reached the upper limit, when the source thread wants to put F into the queue₁And when the memory monitoring module is used, the operation of the memory monitoring module is suspended, and the task input of the source thread is recovered after the working thread releases enough memory. The memory monitoring module may send a pause signal to the source thread, the interaction return value of the source thread and the task queue is failed, or the source thread reads F₁Before or in F₁The method is characterized in that interaction with a memory supervision module is carried out before, and all the interaction belongs to implementation details and can be specifically selected according to actual conditions.

Assuming that the reading and writing time of the file is small, and the square calculation time is long, the more the square calculation task is, the higher the probability that the square calculation task is executed by the working thread is. If the square calculation time is too short, the number of tasks for calculating the square and the number of tasks for writing files in the task queue are basically equivalent, and the probability of being executed is basically the same, namely that the tasks halve the calculation resources.

Therefore, the scheme of the invention does not relate to any thread number configuration based on operation steps, that is, the self-adaptive parallelism adjusting mechanism according to task complexity of the invention does not need artificial configuration at all, and only needs to operate and allocate certain resources, so that the scheme can complete the processing flow of data with the most appropriate ratio.

FIG. 5 is a diagram illustrating parallelized processing according to another embodiment of the present invention.

Referring to fig. 5, in order to overcome the consumption caused by switching operations of work threads between IO and CPU computing hardware, a task queue may be divided into an IO task and a computing task queue (e.g., a CPU task queue) and an IO task queue, and accordingly, the work threads may be divided into computing work threads (e.g., CPU work threads) and IO work threads.

And the calculation work thread only takes out the tasks to be processed from the calculation task queue, and the IO work thread only takes out the tasks to be processed from the IO task queue. Whether the task is a computing work thread or an IO work thread, after the task is taken out and processed, the new task to be processed generated by packaging is placed into a computing task queue or an IO task queue according to whether the subsequent operation step is a computing operation step or an IO operation step. Therefore, the operation of the working thread can be ensured to always correspond to the same type of hardware, and the extra overhead caused by the switching of the working thread on different hardware can be avoided.

In this embodiment, the tasks of the source thread described above may be merged into the IO worker thread, that is, the IO worker thread may initialize the task queue.

Taking the initial operation step as the data input step as an example, the downstream of the data input operation step may include two operation steps at the same time, one is an operation step for performing the subsequent processing on the read batch data to be operated, and the other is the data input step itself. In this case, the data input step may constitute solely a job to be processed for reading batch data to be operated.

Specifically, the initial operation step may be placed into the IO task queue as the initial task alone. The IO work thread takes out the initial task from the IO task queue and reads data by executing the initial task, and after the data reading is completed, the IO work thread may pack the read data and the downstream operation step of the data input step into a task (generally, a calculation task), and place the task into a corresponding task queue (generally, a calculation task queue). And after the data reading is finished, the IO working thread may also separately form a new initial task (IO task) for the initial operation steps of the next batch of batch data to be operated to be read, and place the new initial task into an IO task queue. In this way, the read task (i.e., the initial task) can be continuously generated until all data is read.

In order to avoid deadlock, when the memory monitoring module monitors that the currently used memory exceeds a threshold value, the memory monitoring module may also only pause the task of putting data reading into the task queue, namely, the source of data input.

It can be seen that, in this embodiment, only the number of the working threads to be calculated, the number of the IO working threads, and the upper limit of the memory supervision module for supervising the memory are required to be external configuration items, the number of the working threads to be calculated is generally related to the number of the CPU hardware of the computer, the number of the IO working threads is generally related to the hard disk and the network card of the machine, and the memory supervision threshold of the memory supervision module is related to the size of the memory, which are determined by the hardware and are unrelated to the actual task. Compared with the traditional asynchronous IO mode, the method greatly reduces configuration items, and can use completely same parameters for the same physical environment without influencing the task completion speed.

According to an exemplary embodiment of the present invention, data flow processing may be characterized as a computational graph. In particular, computational graphs are widely used on data processing platforms because of their high expressive power and simplicity and intelligibility. The computation graph is a directed graph formed by operation steps (each operation step can be regarded as a node in the computation graph) and data edges, wherein the operation steps represent certain operation on data, and the data edges represent the flow direction of the data according to the directed edges, and the next operation is started after one operation is executed.

For example, assuming a file with an integer number in each row, the current computation graph is intended to store each number squared in a new file, and the whole computation process is divided into three operation steps, reading, computing and storing, and the computation graph is shown in fig. 6.

The parallelization processing scheme of the present invention can be implemented in combination with a computation graph, for example, can be implemented as a computation graph running framework for dynamically scheduling tasks. Specifically, the computation graph running framework may package, according to a step execution sequence shown in the computation graph, operation steps to be processed and related data on the computation graph to generate one task, place the task into a task queue, start a plurality of threads as work threads, take out one execution from the task queue for each work thread, package, according to the computation graph, execution result data and operation steps downstream of the execution result data to generate a new task, and place the new task back into the queue.

For the specific implementation process of the scheme, reference may be made to the above-mentioned relevant description, which is not described herein again. When the method is implemented as a computation graph operation framework, the parameters required to be adjusted are greatly reduced when the parallelism is configured for the computation graph. By way of example, the parameters may relate only to the machine hardware environment and not to the actual execution of the computational graph, and exactly the same parameters may be used for the same physical environment without affecting the speed of task completion.

The method for processing the data stream in parallel based on the task can also be realized as a device for processing the data stream in parallel based on the task. Fig. 7 is a schematic block diagram showing the structure of a parallel processing apparatus according to an embodiment of the present invention. Wherein the functional blocks of the device can be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present invention. It will be appreciated by those skilled in the art that the functional blocks described in fig. 7 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein. In the following, only the functional modules that the device can have and the operations that each functional module can perform are briefly described, and for the details related thereto, the above description may be referred to, and details are not repeated here.

Referring to fig. 7, the parallel processing apparatus 700 includes a worker thread determination module 710 and a task processing module 720.

The worker thread determining module 710 is configured to determine a plurality of worker threads, and the worker thread determining module 710 may specifically determine the number of worker threads according to physical parameters of a machine used, for example. For example, in the case that the working threads are divided into the calculation working thread and the IO working thread, the working thread determining module 710 may determine the number of the calculation working threads according to the number of the CPU hardware of the used machine, and determine the number of the IO working threads according to the hard disk and the network card of the machine.

The task processing module 720 is configured to take out the to-be-processed task from the task queue through each of the multiple worker threads, so as to process the taken-out to-be-processed task, where the to-be-processed task is formed by packing batch data to be operated in the data stream and corresponding operation steps in the data stream processing.

The working thread can lock the task queue in the process of interacting with the task queue; and after the interaction between the work thread and the task queue is finished, releasing the lock of the task queue. For details of other interactions between the work thread and the task queue, see the above description, and are not described herein again.

Preferably, the task queue is divided into a computation task queue and an IO task queue, the work thread is divided into a computation work thread and an IO work thread, the computation work thread takes out the task to be processed only from the computation task queue, the IO work thread takes out the task to be processed only from the IO task queue, and the work thread puts a new task to be processed into the computation task queue or the IO task queue according to whether the subsequent operation step is a computation operation step or an IO operation step.

Preferably, the data flow process can be characterized by a computation graph, which is a directed graph composed of at least two operation steps representing certain operations performed on the data and at least one data edge representing a flow direction of the data

As shown in fig. 7, the parallel processing apparatus 700 may further optionally include a source thread determining module 730 shown by a dashed line box.

Source thread determination module 730 may be used to determine individual source threads. The task processing module 720 may pack batch data to be operated in the data stream and an initial operation step in the data stream processing into a task to be processed in a special loop through the source thread, so as to be placed in the task queue.

Alternatively, in the case where the initial operation step in the data stream processing is a data input step, the subsequent operation steps of the initial operation step may include two, one being an operation step for performing subsequent processing on the read batch data to be operated, and one being the data input step itself, wherein the data input step alone constitutes a task to be processed for reading the batch data to be operated.

As shown in fig. 7, the parallel processing apparatus 700 may further optionally include a memory manager module 740 shown by a dashed box.

The memory supervisor module 740 may monitor memory usage during parallel processing of data streams, and in the event that the currently used memory exceeds a predetermined threshold, the memory supervisor module 740 may control the source thread to suspend placing new pending tasks into the task queue.

FIG. 8 is a schematic block diagram illustrating the structure of a computing device 800 according to one embodiment of the invention. The computing device 800 may be, among other things, a computing apparatus that may be implemented as various types of computing apparatus, such as a desktop, a portable computer, a tablet, a smartphone, a Personal Data Assistant (PDA), or other type of computing apparatus, but is not limited to any particular form.

As shown in fig. 8, a computing device 800 of the present invention may include a processor 810 and a memory 820. The processor 810 may be a multi-core processor or may include multiple processors. In some embodiments, processor 810 may include a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, the processor 810 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 820 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 910 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. In addition, the memory 820 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 820 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

In embodiments of the present invention, the memory 820 has executable code stored thereon, and the processor 810 may execute the executable code stored on the memory 820. The executable code, when executed by the processor 810, may cause the processor 820 to perform the method of task-based parallel processing of data streams of the present invention. In addition to executable code, the memory 820 may also store some or all of the data required by the processor 810 in performing the present invention.

The method, apparatus and computing device for task-based parallel processing of data streams according to the present invention have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for parallel processing of a data stream based on a task, wherein the data stream processing can be characterized by a computation graph, the computation graph is a directed graph formed by at least two operation steps and at least one data edge, the operation steps represent a certain operation performed on the data, and the data edge represents a flow direction of the data, the method comprises:

determining an individual source thread;

determining a plurality of working threads according to physical parameters of a machine for parallel processing of data streams;

packaging batch data to be operated in a data stream and initial operation steps in data stream processing into tasks to be processed in a special cycle mode according to the step execution sequence shown by the calculation diagram through a source thread, and putting the tasks into a task queue;

respectively taking out the tasks to be processed from the task queue through each of the plurality of working threads so as to process the taken tasks to be processed, wherein the tasks to be processed are formed by packing batch data to be operated in a data stream and corresponding operation steps in data stream processing;

each working thread packages operation result data obtained after processing the to-be-processed tasks taken out of the task queue as new to-be-operated batch data and subsequent operation steps of the corresponding operation steps into new to-be-processed tasks according to the step execution sequence shown by the calculation diagram, and puts the new to-be-processed tasks into the task queue,

the task queue is divided into a calculation task queue and an IO task queue, the work thread is divided into a calculation work thread and an IO work thread, the calculation work thread takes out the task to be processed only from the calculation task queue, the IO work thread takes out the task to be processed only from the IO task queue, the source thread puts the task to be processed into the calculation task queue or the IO task queue according to whether the initial operation step is a calculation operation step or an IO operation step, and the work thread puts the new task to be processed into the calculation task queue or the IO task queue according to whether the subsequent operation step is a calculation operation step or an IO operation step.

2. The method according to claim 1, wherein each worker thread packages the operation result data obtained after processing as new batch data to be operated and subsequent operation steps of the branch whose branch condition is satisfied into a new task to be processed to be placed in the task queue.

3. The method of claim 1, further comprising:

monitoring the memory use condition in the data stream parallel processing process;

and under the condition that the currently used memory exceeds a preset threshold value, the source thread suspends the putting of a new task to be processed into the task queue.

4. The method of claim 1, wherein the worker thread locks the task queue during interaction with the task queue; and after the interaction between the working thread and the task queue is finished, releasing the lock of the task queue.

5. The method according to claim 1, wherein, in a case where an initial operation step among data stream processing is a data input step, subsequent operation steps of the initial operation include an operation step for successively processing read batch data to be operated and the data input step itself,

wherein the data input step individually constitutes a task to be processed for reading batch data to be operated.

6. An apparatus for parallel processing of a data stream based on a task, wherein the data stream processing can be characterized by a computation graph, the computation graph is a directed graph composed of at least two operation steps and at least one data edge, the operation steps represent certain operations performed on the data, and the data edge represents a flow direction of the data, the apparatus comprising:

a source thread determination module for determining individual source threads;

the working thread determining module is used for determining a plurality of working threads according to physical parameters of a machine for parallel processing of data streams;

the task processing module is used for packaging batch data to be operated in the data stream and initial operation steps in the data stream processing into tasks to be processed in a special cycle mode according to the step execution sequence shown by the calculation diagram through a source thread so as to be placed in a task queue,

the task processing module is further configured to take out a to-be-processed task from the task queue through each of the plurality of working threads, and process the to-be-processed task, where the to-be-processed task is formed by packing batch data to be operated in a data stream and corresponding operation steps in data stream processing;

7. The device of claim 6, wherein each worker thread packages the processed operation result data as new batch data to be operated and subsequent operation steps of the branch with satisfied branch conditions into a new task to be processed to be placed in the task queue.

8. The apparatus of claim 6, further comprising:

a memory monitoring module for monitoring the memory usage in the parallel processing of data stream,

and under the condition that the currently used memory exceeds a preset threshold value, the memory supervision module controls the source thread to suspend putting a new task to be processed into the task queue.

9. The apparatus of claim 6, wherein the worker thread locks the task queue during interaction with the task queue; and after the interaction between the working thread and the task queue is finished, releasing the lock of the task queue.

10. The apparatus according to claim 6, wherein, in a case where an initial operation step among data stream processing is a data input step, subsequent operation steps of the initial operation include an operation step for performing successive processing on the read batch data to be operated and the data input step itself,

11. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1 to 5.

12. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-5.