CN108351783A

CN108351783A - The method and apparatus that task is handled in multinuclear digital information processing system

Info

Publication number: CN108351783A
Application number: CN201580083942.3A
Authority: CN
Inventors: 范冰*; 范冰; 周卫荣; 李海龙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2015-10-29
Filing date: 2015-10-29
Publication date: 2018-07-31
Also published as: WO2017070900A1

Abstract

The method and apparatus that task is handled in a kind of multinuclear digital information processing system reduce the waiting expense that data load, the degree of parallelism between raising task, reduction system call expense in handle task process.This method includes：It determines the ready task (S110) in task queue, determines the target arithmetic element (S120) for executing the ready task；The ready task is executed by the target arithmetic element, and is simultaneously that pending task prepares data (S130) by the target arithmetic element.

Description

The method and apparatus of task are handled in multicore digital information processing system

Technical field

The present embodiments relate to digital signal processor fields, and more particularly, to the method and apparatus for handling task in multicore digital information processing system.

Background technique

As development of Mobile Internet technology develops, data processing amount rapidly increases, and DSP chip is strided forward towards multicore big data quantity processing direction.Digital signal processor usually uses software code realization when carrying out digital operation, and nucleus number mesh, which increases, all brings many difficulties to software development and hardware resource utilization and debugging.When demand converts, software architecture requires to repartition the function mapping relations of multiple and different cores, and to some hardware resources such as memory, data channel, message source etc. causes to waste using insufficient.

The method of Static task scheduling in the related technology is during Static task scheduling, software designer is according to the basic performance for obtaining each functional module after software task icon and each function algorithm module performance Simulation Evaluation, it is matched according to the ability of target mapping hardware resource, according to function granularity, different software functions is deployed on different hardware resources by resource consumption, but the method for Static task scheduling is applicable in that scene is limited, scheduling complexity is high, memory source utilization rate is low.

The resource pool scheme of master-salve distributed scheduling is used in dynamic task scheduling scheme in the related technology, operating system (the Operating System of a cutting is carried on each processor, referred to as " OS "), it can support the task of creation different priorities, can response external interrupt etc., task is divided into granularity appropriate by main core and is put into task buffer pond, when from the core free time, task is obtained from the main core of principal and subordinate and is executed.But in the program each be required to from core carrying one operating system, task switching, data loading can occupy it is many from core load, computing resource and memory source utilization rate are lower.

Summary of the invention

The embodiment of the present invention provides the method and apparatus that task is handled in a kind of multicore digital information processing system, and the scheduling process of operation can be determined in task run, dynamically distributes calculation resources, improves the utilization rate of calculation resources, reduction system call expense.

In a first aspect, providing a kind of method for handling task in multicore digital information processing system, comprising: determine the ready task in task queue；Determine the target arithmetic element for executing the ready task；The ready task is executed by the target arithmetic element, and is simultaneously that pending task is quasi- by the target arithmetic element Standby data.

The method of task is handled in multicore digital information processing system according to an embodiment of the present invention, when executing task by an arithmetic element, it is simultaneously that other tasks prepare data by the arithmetic element, thus, it is possible to realize that data are loaded to execute parallel with algorithm business, reduce the waiting expense that data load, degree of parallelism between raising task reduces system call expense.

With reference to first aspect, in a kind of implementation of first aspect, when determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the execution ready task is determined as the target arithmetic element.

At this point, the arithmetic element for executing the ready task and the dependence task of the ready task is the same arithmetic element, when executing the ready task, loading data again is not needed, mitigates the jam situation loaded on path.

With reference to first aspect and its above-mentioned implementation, in another implementation of first aspect, before executing the ready task by the target arithmetic element, this method further include: determine in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；The input data corresponding with the ready task is moved in the memory block.

With reference to first aspect and its above-mentioned implementation, in another implementation of first aspect, for storing the memory block of input data corresponding with the ready task in the proximal end memory of the determination target arithmetic element, it include: according to fixed resource pond algorithm, determine the memory block, wherein, the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.

With reference to first aspect and its above-mentioned implementation, in another implementation of first aspect, this method further include: when having executed the ready task by the target arithmetic element, the output data of the ready task is stored in the proximal end memory.

In this way, the read/write memory when task of execution is proximal end memory, it will not be because waiting pending datas to reach consumption time delay when executing task, and apply that memory fragmentation can be reduced using fixed resource pond algorithm when memory, improve memory week transfer efficient, saving memory.

With reference to first aspect and its above-mentioned implementation, in the alternatively possible implementation of first aspect, this is according to fixed resource pond algorithm, determine the memory block, include: the ratio of the summation and the size of the single memory block in the proximal end memory of the corresponding data block of all parameters needed according to the ready task, determines the quantity of the memory block.

As a result, by the way that the internal storage data in task is carried out assembled processing, memory can be further improved Service efficiency reduces memory waste.

With reference to first aspect and its above-mentioned implementation, in the alternatively possible implementation of first aspect, before determining the ready task in task queue, this method further include: abstract processing is carried out to the business for including the ready task, abstract processing information is obtained, which includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.

With reference to first aspect and its above-mentioned implementation, in the alternatively possible implementation of first aspect, which is multiple parallel task queues；Wherein, the ready task in the determination task queue, comprising: according to the multiple parallel task queue of priority orders poll, determine the ready task.

With reference to first aspect and its above-mentioned implementation, in the alternatively possible implementation of first aspect, this carries out abstract processing to the business for including the ready task, comprising: is created and is cached according to the needs of the business；According to the cashing indication ID of the caching, the data dependence relation information is determined.

Second aspect, provide the device that task is handled in a kind of multicore digital information processing system, the method in any possible implementation for executing above-mentioned first aspect or first aspect, specifically, which includes the module for executing the method in any possible implementation of above-mentioned first aspect or first aspect.

The third aspect provides a kind of computer-readable medium, and for storing computer program, which includes the instruction for executing the method in any possible implementation of first aspect or first aspect.

Fourth aspect, provide a kind of computer program product, the computer program product includes: computer program code, but when the computer program code is run by the device of the processing task of multicore digital information processing system, so that the device executes the method in any possible implementation of above-mentioned first aspect or first aspect.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, attached drawing needed in the embodiment of the present invention will be briefly described below, apparently, drawings described below is only some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the schematic architectural diagram using the application system of the embodiment of the present invention；

Fig. 2 is each management module that the scheduler in application system shown in FIG. 1 includes and its mutually closes The schematic diagram of system；

Fig. 3 is the schematic diagram of the data dependence relation in the application system using the embodiment of the present invention；

Fig. 4 is scheduling result schematic diagram when only one core is scheduled in the application system using the embodiment of the present invention；

Fig. 5 is the scheduling result schematic diagram in the application system using the embodiment of the present invention there are three core when scheduled；

Fig. 6 is the schematic flow chart that the method for task is handled in multicore digital information processing system according to an embodiment of the present invention；

Fig. 7 is another schematic flow chart that the method for task is handled in multicore digital information processing system according to an embodiment of the present invention；

Fig. 8 is another schematic flow chart that the method for task is handled in multicore digital information processing system according to an embodiment of the present invention；

Fig. 9 is the schematic flow chart of the method according to an embodiment of the present invention that abstract processing is carried out to business；

Figure 10 is the schematic flow chart that the method for processing task is realized under a kind of concrete condition according to an embodiment of the present invention；

Figure 11 is the schematic flow chart of the method for determining ready task according to an embodiment of the present invention and idle arithmetic element；

Figure 12 is the schematic block diagram that the device of task is handled in multicore digital information processing system according to an embodiment of the present invention；

Figure 13 is another schematic block diagram that the device of task is handled in multicore digital information processing system according to an embodiment of the present invention；

Figure 14 is another schematic block diagram that the device of task is handled in multicore digital information processing system according to an embodiment of the present invention；

Figure 15 is the schematic block diagram that the device of task is handled in multicore digital information processing system according to another embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is a part of the embodiments of the present invention, rather than whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making wound Every other embodiment obtained under the premise of the property made is worked, all should belong to the scope of protection of the invention.

It should be understood that the technical solution of the embodiment of the present invention is mainly used in the digital information processing system for needing multicore to handle, the scene with a large amount of parallel computations, such as macro station baseband chip, terminal chip.The multicore feature, is embodied in that, is integrated with the quantity of computing module on a single chip, including but not limited to multiple general processors, multiple IP kernels, multiple application specific processors etc..It is then multicore if the quantity of computing module is greater than one.

Fig. 1 shows the schematic architectural diagram of the application system (multicore digital information processing system) using the embodiment of the present invention, and application system consists of three parts: master control layer, execution level and operation layer.Wherein, master control layer carries user software, completes the functions such as high-rise information exchange, Row control, Task-decomposing, the definition of task dependence.Execution level consists of three parts, master control core execution level, scheduler, from core execution level.Master control core execution level provides Programming Interface, to scheduler submiting command and receives order feedback or readjustment notice；Scheduler is hardware components, is responsible for the scheduling of processing task, specific function includes: the functions such as dependence processing, memory management, task distribution, data-moving between task.As shown in Fig. 2, being managed inside scheduler by multiple management modules: order management, command queue's management, incident management, the management of caching (buffer) descriptor, shared-memory management, operation memory management, calculation resources condition managing and scheduling main control module；It is software section from core execution level, is mainly responsible for reception task message, calls algorithmic function library to cloud is executed, task is sent after operation terminates feedback message.Operation layer can may be software for hardware, be mainly responsible for and handle task.

It will be exemplified below the scene of the method application of the embodiment of the present invention.Assuming that known, there are three types of the Channel Processing segments of priority；Its center (Kernel) 0~2 is the processing of high priority, and Kernel3~5 are the processing of middle priority, and Kernel7~9 are the processing of low priority.As shown in Figure 3, in order to indicate the data flow (direction that the direction that arrow is directed toward shows data flowing) of processing, the data flow between host (Host) and equipment (Device) is cached into input/output (Buff_In/Out) according to hand designations.And the data flow among Kernel and Kernel processing is denoted as caching _ M (Buff_M).Data dependence relation between different IPs can be described as:

The input of Kernel_0/3/7 is respectively Buff_In0/1/2, is prepared by Host and (is from the output of external interface or hardware-accelerated control (Hardware Accelerate Control, referred to as " HAC ") in practical application).

Data export that (Host is usually required Digital Signal Processing (Digital Signal Processing in practical application to Host after Kernel_2 completion processing, referred to as " DSP ") processing data be sent to outside piece, or continued with to Hac).

The output of the input dependence Kernel_1 of Kernel_2 processing, while also relying on the output of Kernel_5.

The output of the input dependence Kernel_9 of Kernel_4 processing, while also relying on the output of Kernel_3.

The output of the input dependence Kernel_7 of Kernel_8 processing, while also relying on the output of Kernel_3.

The output of Kernel_3 processing to Kernel_4 in addition to using, it is also necessary to be supplied to Kernel_8 and use (Kenrel_8 may be only with a part therein).

Scheduler is scheduled in the case where guaranteeing the correct situation of data dependence according to priority according to the number of scheduled practical execution core.The Kernel dependence being outlined above, in the initial stage, Host all will submit Kernel to command queue (CommandQueue), the data Buff_In0/1/2 of input is all ready simultaneously, then scheduler is in the case where only one core can be scheduled, its scheduling result is as shown in Figure 4, for there are in the case where three cores, its scheduling result is as shown in Figure 5, in Fig. 5, whether dotted line Kernel_2, Kernel_4 are scheduled in the size that the data volume of its input is relied on DSP2.Such as when the data Buff_M8 (output of Kernel_9) of Kernel_4 input is greater than or equal to Buff_M2 (output of Kernel_3), Kernel_4 should be then dispatched on DSP2, otherwise should be dispatched on the core where the output of Kernel_3.

It should be understood that, in embodiments of the present invention, business refers to the program independently of hardware handles data, it is the concept distinguished with operating system, driving, business for example can be data by the operation such as channel estimation, fast Fourier transform (Fast Fourier Transformation, referred to as " FFT "), decoding.Task refers to software task, which is the Duan Chengxu for realizing a certain function, it usually needs is operated on core processor.

Fig. 6 is the schematic flow chart of the method 100 of processing task according to an embodiment of the present invention.This method 100 can multicore digital information processing system as shown in Figure 1 execute, as shown in fig. 6, this method 100 includes:

S110 determines the ready task in task queue；

S120 determines the target arithmetic element for executing the ready task；

S130 executes the ready task by the target arithmetic element, and is simultaneously that pending task prepares data by the target arithmetic element.

Specifically, multicore digital information processing system after determining the ready task in task queue, is determined to execute the target arithmetic element of the ready task, held later by the determining target arithmetic element The row ready task, and be simultaneously that pending task prepares data by the target arithmetic element.

Therefore, the method of the processing task of the embodiment of the present invention, data are prepared for pending task by the target arithmetic element while executing ready task by target arithmetic element, it is possible thereby to data be loaded parallel with operation, reduce the waiting expense that data load, degree of parallelism between raising task reduces system call expense.

It should be noted that ready task refers to that relying on for task has been completed, the task of operation can star, pending task can be understood as needing tasks to be performed after the ready task.One arithmetic element can be understood as a core.

Optionally, in S110, which is multiple parallel task queues；

Correspondingly, S110 specifically: according to the multiple parallel task queue of priority orders poll, determine the ready task.

That is, multicore digital information processing system can create parallel task queue, it is concurrency relation between these parallel tasks, but these parallel task queues have different priority, mission dispatching is into after task queue, task in each task queue serially executes, using the packet sequence principle of first in, first out.

Specifically, when determining ready task, it can be polled according to the priority sequence from high to low of multiple parallel task queues, continue the task queue of the lower priority of poll if there is no ready task in the task queue of high priority, when finding a ready queue or being polled to the task queue of lowest priority, terminate poll.

Optionally, in S130, the ready task is executed by target arithmetic element, and is simultaneously that pending task prepares data by the object element, it can be understood as is virtually two logical resources of table tennis by an arithmetic element, after a logical resource distribution task, another logical resource can also reallocate task, can guarantee a logical resource in this way in operation, the data of another logical resource are also in preparation, it is possible thereby to reduce data latency, calculation resources utilization rate is improved.

Optionally, S120 specifically: when determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the dependence task of the execution ready task is determined as the target arithmetic element.

Specifically, the dependence task of the ready task refers to that output data is the task of the input data of the ready task, multicore digital information processing system can select idling-resource to execute the ready task according to Data Position, preferably, multicore digital information processing system has recorded the arithmetic element for handling the dependence task of the ready task, and multicore digital information processing system executes the arithmetic element of the dependence task determining When being in idle condition, then ready task is distributed in the arithmetic element of the execution dependence task, i.e., the arithmetic element of the execution dependence task is determined as the target arithmetic element.Since the arithmetic element for handling the ready task and the arithmetic element for handling the ready task are the same arithmetic element, it is possible thereby to not need to mitigate the jam situation loaded on path again for the ready task loading data.

Optionally, in multicore digital information processing system when determining that the arithmetic element for executing the dependence task is not at idle state, an idle arithmetic element can be randomly selected as target arithmetic element from other idle arithmetic elements.

Optionally, as shown in fig. 7, before executing the ready task by the target arithmetic element, this method 100 further include:

S140 is determined in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；

S150 moves the input data corresponding with the ready task in the memory block.

Specifically, executing data required for ready task can be in other arithmetic elements or in external memory (such as Double Data Rate synchronous DRAM (Double Data Rate, referred to as " DDR "), L3 caching etc.) on, needed before execution task on the proximal end memory of the data-moving on these memories to the arithmetic element that will run task (such as L1 caching or L2 are cached).It needs first to determine memory block for storing data before moving data, or needs the memory of first to file for storing data, it later will be on data-moving to memory that is determining or applying.

Optionally, in S140, according to fixed resource pond algorithm, the memory block is determined, wherein the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.

That is, memory headroom can be classified according to the distance of memory freestone, then corresponding processing is done according to the rank of memory, fixed resource pond algorithm is used for the memory allocation algorithm of proximal end memory, it is possible thereby to reduce memory fragmentation and improve application release efficiency.

It should be understood that, it can also be according to the proximal end memory of other algorithm application target arithmetic elements in the embodiment of the present invention, such as can be according to chained list memory allocation algorithm, buddy algorithm, the based on memory buddy algorithm in pond, work set algorithm scheduling algorithm application proximal end memory, but the present invention is not limited thereto.

Optionally, S140 specifically: according to the ratio of the size of the single memory block in the summation and the proximal end memory of the corresponding data block of all parameters of ready task needs, determine the quantity of the memory block in the proximal end memory for needing to apply.

Specifically, task is equivalent to a power function, has parameter, each parameter may be one piece Data, it is also possible to which numerical value can carry out data being put into same or multiple memory blocks after assembled processing.As an example it is assumed that task A has 10 parameters, each parameter is data block type, it is assumed that this corresponding data block size summation of 10 parameters is 31K, and the size of single memory block is 4K in the memory of proximal end, then the number for the memory block for needing to apply is 8.It is possible thereby to which improving memory service efficiency reduces memory waste.

Optionally, as shown in figure 8, before S110, this method 100 further include:

S160 carries out abstract processing to the business for including the ready task, obtains abstract processing information, which includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.

Specifically, a business can be split as multiple tasks, abstract processing is carried out to the business.During being abstracted to the business for including the ready task, caching (buffer) can be created according to the needs of the business and the data dependence relation information is determined according to the cashing indication ID of the caching.Buffer is one piece of data space, the loading data before task starts, the destruction when task does not need this buffer.Each buffer has an ID, passes through the association of the data relationship between this ID carry out task, it is assumed that the data of the output of task A are the input data of task B, then the output buffer of task A is buffer2, the input of task B is also buffer2.

It should be understood that the creation of buffer can be program staff according to business it needs to be determined that, the number of the buffer of actual creation is dynamically determined according to actual task implementation procedure.

In S160, task dependency information is used to indicate the dependence between task, dependence between being gone out on missions by event correlation, such as task A completion can choose one event id of publication, task B needs that task A is waited to complete, in the event id for waiting filling event A in list of thing of task B, waiting the description of event includes the number of waiting event and the ID list for waiting event.

In embodiments of the present invention, it is alternatively possible to describe task inputoutput data feature, limited input/output argument is supported, parameter supports different feature: input-buffer, output caching, external input pointer, incoming value, global pointer etc..

In embodiments of the present invention, due to that can carry out abstract processing to business, different multi core chip software does not need to reconstruct, and specification variation software is also not required to redeploy, it is only necessary to create a resource collection.Simplify the complexity that software programmers need to be designed according to the limitation of different operations, load and hardware.

In embodiments of the present invention, it is alternatively possible to which the output data of the ready task is stored in the proximal end memory when having executed the ready task by the target arithmetic element.Thus work as next task It when being loaded into identical arithmetic element, does not need to load again, mitigates the jam situation loaded on path.

S160 is described in detail below in conjunction with Fig. 9, S160 can the master control core execution level in architecture diagram as shown in Figure 1 execute, which can be Programming Interface, these software interfaces generate order and are submitted to scheduler execution.As shown in figure 9, in embodiments of the present invention, optionally, carrying out abstract processing to the business for including the ready task, the process for obtaining abstract processing information may comprise steps of:

S161 creates task execution function, executes function library by calling from core execution level, main core execution level registers the pointer of function or index in function list；

S162, creation use functional resources set；

The functional resources set includes the resource of execution task, i.e., runs task in which arithmetic element, uses which direct memory access (Direct Memory Access, referred to as " DMA ") etc..

S163 creates parallel queue；

Queue has different priority, is concurrency relation between queue, mission dispatching enters queue, serially executes in queue, using the order-preserving principle of first in, first out.

S164, creation caching buffer；

Buffer is one piece of data space, and task starts preceding loading data, the destruction when task does not need this buffer data.Each buffer has an ID, and by the association of data relationship between this ID carry out task, i.e. the data of the output of task A are the input data of task B, then the output buffer of task A is buffer2, the input buffer of task B is also buffer2.

S165, the dependence between description task；

Dependence between being gone out on missions by event correlation, that is task A completion can choose one event id of publication, task B needs that task A is waited to complete, in its event id for waiting filling event A in list of thing, waiting the description of event includes the number of waiting event and the ID list for waiting event.

S166 describes task inputoutput data feature；

The multicore digital information processing system of the embodiment of the present invention supports limited input/output argument number, parameter to support different features, input-buffer, output caching, external input pointer, incoming value and global pointer etc..

S167 is arranged algorithm service parameter, actual parameter value is inserted in corresponding parameter list；

S168 submits algorithm business to execute request；

S169 waits pending readjustment, or receives external tasks.

Showing for the method for processing task is realized in one specific case below in conjunction with Figure 10 detailed description Meaning property flow chart.Figure 10 will be illustrated in conjunction with Figure 11.As shown in Figure 10, the method 200 of processing task according to an embodiment of the present invention includes:

S201, scheduler initialization；

Creation needs resource (queue, event, buffer, order) to be used, all resources is put into shared queue, You Zhuhe execution level application uses.

S202, scheduler create the response processing of functional resources set；

The response processing of creation functional resources set includes initialization to arithmetic element, the initialization of storage manager, the initialization of shared drive in arithmetic element.

S203, the order that scheduler waits main core execution level to issue, line command of going forward side by side processing；

S204, scheduler have high to Low poll parallel queue according to priority, find ready task；

S205, if there is ready task, scheduler selects an idle arithmetic element, and the data that ready task needs in arithmetic element set；

It virtually can be two logical resources of table tennis by idle arithmetic element, after a logical resource distribution task, another logical resource can also reallocate task, a logical resource can be guaranteed in operation in this way, the data of another logical resource are also in preparation, to reduce data latency, calculation resources utilization rate is improved.

Before preparing data, the proximal end memory of first to file arithmetic element is needed, DMA is set later and carries out data-moving, data are moved in the memory of proximal end.

Corresponding processing can be done to memory according to the rank of memory, the application of fixed resource pond is used for the memory allocation algorithm of proximal end memory to reduce memory fragmentation and improve application release efficiency.The data of proximal end memory are supported resident until data are replaced to remote memory and (replaced according to displacement rank set by user) by user's release or Out of Memory.Due to being the memory application of fixed size, memory waste is reduced in order to improve memory service efficiency.

In S205, if handling the feedback of having issued for task without ready task, update event list updates ready task table, discharges correspondence memory, returns to S203 later.

In embodiments of the present invention, it is alternatively possible to which memory lock is arranged, guarantee the consistency of reading and writing data, so as to solve the consistency problem that multiple cores (arithmetic element) are read while write automatically.Specifically, when may be set in the data of file and being written over, task cannot read the data in this document；Or equipment is scheduled on the data of file when being read by task, the data of this document cannot be written over.But the data that multiple tasks are read in this document simultaneously are allowed.

Determination ready task and the free time in step S204 and S205 are illustrated below in conjunction with Figure 11 The method of arithmetic element.Method in Figure 11 is executed by the scheduler in digital information processing system.

As shown in figure 11, in S301, poll parallel queue, and judge whether the queue for being polled to lowest priority；

S302 when determination is not polled to the queue of lowest priority, obtains ready queue；

S303 is determined in ready queue with the presence or absence of the arithmetic element in ready task and system with the presence or absence of the free time；

It in a queue, after mission dispatching to queue, serially executes, is executed according to the order-preserving principle of first in, first out.

S304 obtains ready task and idle arithmetic element when determining the arithmetic element there are ready task and free time；

S305, the ready task got in the idle arithmetic element got prepare data；

After S305, ready task is searched in continuation in the queue of current priority, and is determined in system with the presence or absence of idle arithmetic element, i.e. execution S303 and its subsequent step.

Optionally, in S303, if it is determined that there is no ready task in the ready queue of current priority, then inquire in the queue of next priority with the presence or absence of ready queue, that is, re-execute S301 and its subsequent step.

Optionally, in S301, if being polled to the queue (that is, poll finishes for all queues) of lowest priority, following steps are executed:

S306 judges whether available free calculation resources；

S307 finds idle calculation resources in available free calculation resources；

Idle calculation resources are interpreted as a logical resource of an arithmetic element.

S308 obtains the Fellow resource of idle calculation resources；

The Fellow resource of idle calculation resources refers to another logical resource of the arithmetic element in S307.

S309 prepares data in the idle calculation resources found for task.

It specifically, is to there is the task of dependence to prepare data with disposing on the Fellow resource for task in the idle calculation resources found when being deployed with task on the Fellow resource got in S308.After the having dependence with disposing on the Fellow resource for task for this of the task prepares data, S306 and its subsequent step can be continued to execute, and then prepare data for other tasks.

Therefore, the method for the processing task of the embodiment of the present invention prepares data by the target arithmetic element while executing ready task by target arithmetic element for pending task, it is possible thereby to which data are filled The waiting expense parallel with operation, reduction data load is carried, the degree of parallelism between raising task reduces system call expense.

Below in conjunction with the device for handling task in the multicore digital information processing system of Figure 12 the present invention is described in detail embodiment.As shown in figure 12, which includes:

Determining module 11, for determining the ready task in task queue；

The determining module 11 is also used to determine the target arithmetic element for executing the ready task；

Task execution module 12 for executing the ready task by the target arithmetic element, and is simultaneously that pending task prepares data by the target arithmetic element.

Therefore, the device of task is handled in the multicore digital information processing system of the embodiment of the present invention, data are prepared for pending task by the target arithmetic element while executing ready task by target arithmetic element, it is possible thereby to data be loaded parallel with operation, reduce the waiting expense that data load, degree of parallelism between raising task reduces system call expense.

In embodiments of the present invention, optionally, in terms of determining the target arithmetic element for executing the ready task, the determining module 11 is specifically used for: when determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the dependence task of the execution ready task being determined as the target arithmetic element.

In embodiments of the present invention, optionally, as shown in figure 13, which further includes memory application module 13；

Wherein, before the task execution module 12 executes the ready task by the target arithmetic element, which is specifically used for: determining in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；The input data corresponding with the ready task is moved in the memory block.

In embodiments of the present invention, optionally, in terms of memory block in the proximal end memory for determining the target arithmetic element for storing input data corresponding with the ready task, the memory application module 13 is specifically used for: according to fixed resource pond algorithm, determine the memory block, wherein, the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.

In embodiments of the present invention, optionally, according to fixed resource pond algorithm, in terms of determining the memory block, the memory application module 13 is specifically used for: according to the ratio of the size of the single memory block in the summation and the proximal end memory of the corresponding data block of all parameters of ready task needs, determining the quantity of the memory block in the proximal end memory for needing to apply.

In embodiments of the present invention, optionally, as shown in figure 14, the device further include:

Business abstract module 14, for before the ready task that the determining module 10 determines in task queue, abstract processing is carried out to the business for including the ready task, abstract processing information is obtained, which includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.

In embodiments of the present invention, optionally, which is multiple parallel task queues；

Wherein, in terms of determining the ready task in task queue, which is specifically used for:

According to the multiple parallel task queue of priority orders poll, the ready task is determined.

In embodiments of the present invention, optionally, in terms of carrying out abstract processing to the business for including the ready task, which is specifically used for: being created and is cached according to the needs of the business；According to the cashing indication ID of the caching, the data dependence relation information is determined.

In embodiments of the present invention, optionally, which is also used to: when having executed the ready task by the target arithmetic element, the output data of the ready task being stored in the proximal end memory.

It should be understood that above and other operation and/or function of device 10 according to an embodiment of the present invention is respectively in order to realize each method of the Fig. 6 into Fig. 8, for sake of simplicity, details are not described herein.

Figure 15 shows the schematic block diagram that the device 100 of task is handled in multicore digital information processing system according to another embodiment of the present invention.

As shown in figure 15, the hardware configuration for the device 100 of task being handled in the multicore digital information processing system may include: 103 3 transceiving device 101, software components 102 and hardware device parts.

Wherein, transceiving device 101 is the hardware circuit for completing pack receiving and transmitting；

Hardware device 103 can also become " hardware processing module " or simpler, it can also referred to as " hardware ", hardware device 103 mainly includes being based on FPGA, the hardware circuit of ASIC etc (can also cooperate other support devices, such as memory) realize the hardware circuits of certain specific functions, its processing speed is often many fastly compared to general processor, but function one is customized, just it is difficult to change, therefore, it implements and not flexible, commonly used to handle the function of some fixations, it should be noted that, hardware device 103 is in practical applications, it also may include MCU (microprocessor, such as single-chip microcontroller), or CPU Equal processors, but the major function of these processors is not to complete the processing of big data, and be mainly used for carrying out some controls, is hardware device by the system that these devices are arranged in pairs or groups under this application scenarios.

Software components 102 (or also simple " software ") mainly include general processor (such as CPU) and the matched device of some (such as memory, hard disk store equipment), processor can be allowed to have corresponding processing function by programming, with software come when realizing, it can speed be slow compared to for hardware device according to business demand flexible configuration, but often.After software has been handled, the data handled can be sent by transceiving device 101 by hardware device 103, the interface that can also be connected by one with transceiving device 101 sends the data handled to transceiving device 101.

Optionally, as one embodiment, the hardware device 103: for determining the ready task in task queue；Determine the target arithmetic element for executing the ready task；The ready task is executed by the target arithmetic element, and is simultaneously that pending task prepares data by the target arithmetic element.

Optionally, as one embodiment, in terms of determining the target arithmetic element for executing the ready task, the hardware device 103 is specifically used for: when determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the dependence task of the execution ready task being determined as the target arithmetic element.

Optionally, as one embodiment, before executing the ready task by the target arithmetic element, which is specifically used for: determining in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；The input data corresponding with the ready task is moved in the memory block.

Optionally, as one embodiment, in terms of memory block in the proximal end memory for determining the target arithmetic element for storing input data corresponding with the ready task, the hardware device 103 is specifically used for: according to fixed resource pond algorithm, determine the memory block, wherein, the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.

Optionally, as one embodiment, according to fixed resource pond algorithm, in terms of determining the memory block, the hardware device 103 is specifically used for: according to the ratio of the size of the single memory block in the summation and the proximal end memory of the corresponding database of all parameters of ready task needs, determining the quantity of the memory block.

Optionally, as one embodiment, the hardware device 103 is also used to: before determining the ready task in task queue, abstract processing is carried out to the business for including the ready task, abstract processing information is obtained, which includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.

Optionally, as one embodiment, which is multiple parallel task queues；Wherein, In terms of carrying out abstract processing to the business for including the ready task, which is specifically used for: being created and is cached according to the needs of the business；According to the cashing indication ID of the caching, the data dependence relation information is determined.

Optionally, as one embodiment, which is also used to: when having executed the ready task by the target arithmetic element, the output data of the ready task being stored in the proximal end memory.

The method of soft or hard combination through this embodiment, it can be that pending task prepares data by the target arithmetic element while executing ready task by target arithmetic element, it is possible thereby to data be loaded parallel with operation, reduce the waiting expense that data load, degree of parallelism between raising task reduces system call expense.

Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm steps, can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Professional technician can use different methods to achieve the described function each specific application, but such implementation should not be considered as beyond the scope of the present invention.

It is apparent to those skilled in the art that for convenience and simplicity of description, system, the specific work process of device and unit of foregoing description can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods may be implemented in other ways.Such as, the apparatus embodiments described above are merely exemplary, such as, the division of the unit, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, and component shown as a unit may or may not be physical unit, it can and it is in one place, or may be distributed over multiple network units.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

In addition, the functional units in various embodiments of the present invention may be integrated into one processing unit, it is also possible to each unit and physically exists alone, can also be integrated in one unit with two or more units.

If the function is realized in the form of SFU software functional unit and when sold or used as an independent product, can store in a computer readable storage medium.Based on this understanding, substantially the part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products technical solution of the present invention in other words, the computer software product is stored in a storage medium, it uses including some instructions so that a computer equipment (can be personal computer, server or the network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various media that can store program code such as magnetic or disk.

It is described above; only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and anyone skilled in the art is in the technical scope disclosed by the present invention; it can easily think of the change or the replacement, should be covered by the protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

The method of task is handled in a kind of multicore digital information processing system, which is characterized in that the described method includes:

Determine the ready task in task queue；

Determine the target arithmetic element for executing the ready task；

The ready task is executed by the target arithmetic element, and is simultaneously that pending task prepares data by the target arithmetic element.
The method according to claim 1, wherein the determining target arithmetic element for executing the ready task, comprising:

When determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the dependence task for executing the ready task is determined as the target arithmetic element.
Method according to claim 1 or 2, which is characterized in that before executing the ready task by the target arithmetic element, the method also includes:

It determines in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；

The input data corresponding with the ready task is moved in the memory block.
According to the method described in claim 3, it is characterized in that, for storing the memory block of input data corresponding with the ready task in the proximal end memory of the determination target arithmetic element, comprising:

According to fixed resource pond algorithm, the memory block is determined, wherein the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.
According to the method described in claim 4, determining the memory block it is characterized in that, described according to fixed resource pond algorithm, comprising:

According to the ratio of the size of the single memory block in the summation and the proximal end memory of the corresponding data block of all parameters of ready task needs, the quantity of the memory block is determined.
The method according to any one of claims 1 to 5, which is characterized in that before determining the ready task in task queue, the method also includes:

Abstract processing is carried out to the business for including the ready task, obtains abstract processing information, the abstract processing information includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.
Method according to any one of claim 1 to 6, which is characterized in that the task queue is multiple parallel task queues；

Wherein, the ready task in the determining task queue, comprising:

According to the multiple parallel task queue of priority orders poll, the ready task is determined.
According to the method described in claim 6, it is characterized in that, the business that described pair includes the ready task carries out abstract processing, comprising:

It is created and is cached according to the needs of the business；

According to the cashing indication ID of the caching, the data dependence relation information is determined.
Method according to any one of claim 1 to 8, which is characterized in that the method also includes:

When having executed the ready task by the target arithmetic element, the output data of the ready task is stored in the proximal end memory.
The device of task is handled in a kind of multicore digital information processing system, which is characterized in that described device includes:

Determining module, for determining the ready task in task queue；

The determining module is also used to determine the target arithmetic element for executing the ready task；

Task execution module for executing the ready task by the target arithmetic element, and is simultaneously that pending task prepares data by the target arithmetic element.
Device according to claim 10, which is characterized in that in terms of determining the target arithmetic element for executing the ready task, the determining module is specifically used for:

When determining the arithmetic element free time for executing the dependence task of the ready task, the arithmetic element of the dependence task for executing the ready task is determined as the target arithmetic element.
Device described in 0 or 11 according to claim 1, which is characterized in that described device further includes memory application module；

Wherein, before the task execution module executes the ready task by the target arithmetic element, the memory application module is specifically used for:

It determines in the proximal end memory of the target arithmetic element for storing the memory block of input data corresponding with the ready task；

The input data corresponding with the ready task is moved in the memory block.
Device according to claim 12, which is characterized in that for storing the memory block side of input data corresponding with the ready task in the proximal end memory for determining the target arithmetic element Face, the memory application module are specifically used for:

According to fixed resource pond algorithm, the memory block is determined, wherein the proximal end of target arithmetic element data stored in memory are supported resident until user discharges or replaces data to remote memory in proximal end Out of Memory.
Device according to claim 13, which is characterized in that according to fixed resource pond algorithm, in terms of determining the memory block, the memory application module is specifically used for:

According to the ratio of the size of the single memory block in the summation and the proximal end memory of the corresponding data block of all parameters of ready task needs, the quantity of the memory block is determined.
Device described in any one of 0 to 14 according to claim 1, which is characterized in that described device further include:

Business abstract module, for before the determining module determines the ready task in task queue, abstract processing is carried out to the business for including the ready task, abstract processing information is obtained, the abstract processing information includes at least one of following message: the precedence information of task dependency information, data dependence relation information and task execution.
Device described in any one of 0 to 15 according to claim 1, which is characterized in that the task queue is multiple parallel task queues；

Wherein, in terms of determining the ready task in task queue, the determining module is specifically used for:

According to the multiple parallel task queue of priority orders poll, the ready task is determined.
Device according to claim 15, which is characterized in that in terms of carrying out abstract processing to the business for including the ready task, the business abstract module is specifically used for:

It is created and is cached according to the needs of the business；

According to the cashing indication ID of the caching, the data dependence relation information is determined.
Device described in any one of 0 to 17 according to claim 1, which is characterized in that the task execution module is also used to:

When having executed the ready task by the target arithmetic element, the output data of the ready task is stored in the proximal end memory.