CN112835718A

CN112835718A - Method and device for processing task, many-core system and computer readable medium

Info

Publication number: CN112835718A
Application number: CN202110184918.6A
Authority: CN
Inventors: 施路平; 张伟豪; 林俊峰
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2021-02-10
Filing date: 2021-02-10
Publication date: 2021-05-25

Abstract

The present disclosure provides a method of task processing, the method comprising: acquiring a calculation chart of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer; dividing each layer of the computation graph into a plurality of task blocks; each task block comprises at least one task; determining a mapping relation between each task block and a plurality of processing cores of a many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores. The disclosure also provides a task processing device, a many-core system and a computer readable medium.

Description

Method and device for processing task, many-core system and computer readable medium

Technical Field

The present disclosure relates to the field of many-core technologies, and in particular, to a method and an apparatus for task processing, a many-core system, and a computer-readable medium.

Background

One problem to be solved by an electronic computing process is essentially to process a plurality of tasks (or operations) corresponding thereto.

The above process may be performed using a many-core system. The many-core system includes a plurality of processing cores (or cores, processing engines) capable of interacting, and a plurality of tasks corresponding to the problem to be processed can be mapped (or distributed) to different processing cores and processed by each processing core respectively.

It is clear that there is an inevitable possibility of invalidation (e.g., due to a failure) of the processing cores of the many-core system, and it is important to ensure that some usable processing results are still obtained when some of the processing cores of the many-core system are invalid.

Disclosure of Invention

The embodiment of the disclosure provides a task processing method and device, a many-core system and a computer readable medium.

In a first aspect, an embodiment of the present disclosure provides a method for task processing, including:

acquiring a calculation chart of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;

dividing each layer of the computation graph into a plurality of task blocks; each task block comprises at least one task;

determining a mapping relation between each task block and a plurality of processing cores of a many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.

In some embodiments, between the obtaining a computational graph of the to-be-processed problem and the dividing each layer of the computational graph into a plurality of task blocks, further comprising:

training the computational graph to improve the redundancy performance of the computational graph.

In some embodiments, the training the computational graph comprises at least one of:

invalidating a portion of the tasks in the computation graph to train the computation graph;

invalidating a region of the computation graph to train the computation graph; the region includes a plurality of tasks;

the computational graph is trained against sample defense.

In some embodiments, said dividing each layer of said computational graph into a plurality of task blocks comprises:

expanding the calculation graph, and dividing each layer of the expanded calculation graph into a plurality of task blocks; the extending includes adding redundant tasks in at least some layers of the computational graph.

In some embodiments, the redundancy tasks include at least one of:

a backup task; the backup task is the same as the task in the corresponding layer;

an empty task;

and (5) invalidating the task.

In some embodiments, the dividing each layer of the computational graph into a plurality of task blocks comprises any one of:

randomly dividing each layer of the computation graph into a plurality of task blocks;

uniformly dividing each layer of the computation graph into a plurality of task blocks;

dividing each layer of the computation graph into a plurality of pre-task blocks, and merging all pre-task blocks which are mapped to one processing core according to the mapping relation into one task block;

each layer of the computational graph is divided into a plurality of task blocks based at least on hardware resources of the processing cores.

In some embodiments, between the dividing each layer of the computation graph into a plurality of task blocks and the determining the mapping relationship between each task block and a plurality of processing cores of a many-core system, the method further includes:

and invalid part of task blocks so as to train each task block and improve the redundancy performance of the calculation graph.

In some embodiments, the invalidating the partial task blocks to train each task block includes at least one of:

randomly invalidating a portion of the task blocks to train each task block;

determining key task blocks comprising key tasks, and invalidating the key task blocks to train each task block.

In some embodiments, any two task blocks of any one layer are mapped into two different processing cores according to the mapping relationship.

In some embodiments, after the determining the mapping relationship between each task block and the plurality of processing cores of the many-core system, the method further includes:

and processing all the task blocks mapped in the core by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.

In some embodiments, the invalidating the portion of all task blocks mapped in the processing core to train each task block includes at least one of:

processing all task blocks mapped in the kernel by the random invalid part so as to train each task block;

sequentially and respectively invalidating all the task blocks mapped in each processing core so as to train each task block;

determining a key task block including a key task, and invalidating all task blocks mapped in a processing core to which the key task block is mapped, to train each task block.

In some embodiments, the computational graph is a trainable computational graph; the trainable computational graph can solve the same problem to be processed in cases where at least some of the tasks are different.

In some embodiments, the computational graph is a neural network.

mapping each task block into a plurality of processing cores according to the mapping relation;

each processing core processes tasks in the task block mapped thereto.

In a second aspect, an embodiment of the present disclosure provides an apparatus for task processing, including:

the acquisition module is configured to acquire a calculation graph of the problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;

a partitioning module configured to partition each layer of the computational graph into a plurality of task blocks; each task block comprises at least one task;

the mapping module is configured to determine the mapping relation between each task block and a plurality of processing cores of the many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.

In a third aspect, an embodiment of the present disclosure provides a many-core system, including:

a plurality of processing cores; and

a network on chip configured to interact data among the plurality of processing cores and external data;

one or more of the processing cores have one or more instructions stored therein which are executed by the one or more processing cores to enable the one or more processing cores to perform a method of performing any of the above task processes.

In a fourth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processing core, implements any one of the above-mentioned task processing methods.

In the embodiment of the disclosure, all tasks in the same layer of the computation graph are mapped to at least two different processing cores for processing, so that when any processing core is invalid (for example, due to a fault), at most one layer of the computation graph is only 'broken by one part', and the situation that all tasks in one layer are invalid is not generated, so that the computation graph as a whole can still obtain a processing result usable to a certain extent, and the robustness of the computation graph is greatly improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:

fig. 1 is a flowchart of a method for task processing according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of another method of task processing provided by embodiments of the present disclosure;

fig. 3 is a schematic process diagram of a computation graph in a method for task processing according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of a task processing apparatus according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a many-core system according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a computer-readable medium according to an embodiment of the disclosure.

Detailed Description

To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The actual work to be done in many problems (e.g., image processing, speech recognition, etc.) can be expressed in the form of computational graphs (or task graphs, logic graphs). That is, all operations to be performed to solve the problem are divided into a plurality of "tasks (or nodes)", each task includes a certain operation, and a certain sequence exists between different tasks. For example, if the operation result of another task is used for the operation of a certain task, the task is said to be performed based on the result of the other task; alternatively, the task is a subsequent task to the other task, and the other task is a previous task to the task.

Because of the above relationships between tasks, referring to fig. 3, the computation graph may be divided into multiple "layers," each layer including multiple tasks, and tasks in any one layer are not performed based on tasks in the layer or subsequent layers, and at least some of the tasks in at least some layers are performed based on results of tasks in previous layers. That is, if the task in the previous layer is not completed, the task in the subsequent layer may not be performed, because the operation result of the task in the previous layer may be used in the operation process of the task in the subsequent layer; however, if the task in the later layer cannot be performed, the previous layer cannot be affected, because the operation of the task in the previous layer cannot use the operation result of the task in the later layer; and the tasks in the same layer do not have the relationship, because if the tasks exist, the corresponding tasks belong to two different layers.

For example, in fig. 3, the tasks of different layers (layer 0 to layer 3) are represented by differently filled boxes, the number of tasks therein is represented by the lateral size of the filled boxes, and the different processing cores (processing core 0 to processing core 3) are represented by blank boxes.

An exemplary "Neural Network (NN)" is a form of computational graph, among others. The neural network is divided into a plurality of layers, each layer comprises a plurality of nodes, certain operation is required to be carried out in each node, and the nodes of different layers are connected in a certain relation (for example, one node output is used as the input of the next layer of nodes); thus, each layer of the neural network may be considered as one layer of the computational graph, while each node of the neural network may be considered as one task of the computational graph.

Illustratively, the neural network in the embodiment of the present disclosure may be used for image processing, voice recognition, and the like, and may be specifically in the form of a Convolutional Neural Network (CNN), a Spiking Neural Network (SNN), a Recurrent Neural Network (RNN), and the like.

For example, some of the questions may correspond to a plurality of different computation graphs. That is, the number of tasks in the computation graph, the layer in which the tasks are located, the relationship between the tasks, the specific calculation for each task, and the like may be different, but these different computation graphs can solve the problem (but the effect of solving the problem is not necessarily the same).

The computational graph that may have many forms is referred to above as a "trainable computational graph". That is, the task of a computation graph that can solve a problem can be adjusted by training, and the effect of solving the problem of the computation graph after training is different.

For example, a neural network is a form of trainable computational graph. For example, a neural network that handles a problem (e.g., image classification) is usually trained by adjusting nodes therein (e.g., adjusting weights of the nodes) according to the effect of the current neural network on solving the problem (e.g., accuracy of image classification), thereby changing the neural network (computational graph) and improving the effect of processing the problem (e.g., improving accuracy of image classification).

In some related techniques, when a problem is to be handled by a many-core system, the tasks at each layer of its corresponding computation graph may be mapped (allocated) to one processing core, while the tasks at different layers are mapped to different processing cores.

However, according to the above manner, once a certain processing core of the many-core system is invalid (e.g., due to a fault), all tasks corresponding to one layer of the computation graph cannot be processed, and thus all tasks behind the layer cannot be actually performed, which inevitably results in that the entire problem cannot be solved at all (i.e., no processing result can be obtained), and the robustness of the system is poor.

In a first aspect, referring to fig. 1 to 3, an embodiment of the present disclosure provides a method for task processing.

The method of task processing of the disclosed embodiments is based on a many-core system, which includes how to map tasks of one computation graph into each processing core of the many-core system.

Referring to fig. 1, a method for task processing according to an embodiment of the present disclosure includes:

and S101, acquiring a calculation chart of the problem to be processed.

The calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer.

When a many-core system is used for processing a problem to be processed (such as image processing, voice recognition and the like), a corresponding calculation map is obtained. Wherein, a preset calculation chart can be obtained; the calculation graph can also be generated according to a preset rule according to a specific problem to be processed.

And S103, dividing each layer of the calculation graph into a plurality of task blocks.

Wherein each task block comprises at least one task.

Referring to FIG. 3, the tasks in each layer of the computational graph are divided into multiple "groups," i.e., each layer is divided into multiple "task blocks," each task block including one or more tasks in that layer.

The number of task blocks divided in each layer may be preset, for example, each layer is preset to be divided into a task blocks.

Alternatively, "blocking" may be performed according to a certain manner, based on the number of actually obtained task blocks.

And S105, determining the mapping relation between each task block and a plurality of processing cores of the many-core system.

According to the mapping relation, each task block is mapped to one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any one layer are mapped to at least two different processing cores.

Referring to fig. 3, after obtaining a plurality of task blocks, it may be determined into which processing core of the many-core system each task block should be mapped, i.e., mapped (not necessarily actually mapped).

According to the mapping relationship of the embodiment of the present disclosure, the mapping is performed "dispersedly", that is, all task blocks in the same layer should be mapped into different processing cores "separately" as much as possible, at least to ensure that they are not all mapped into the same processing core. Since each task block includes tasks in the same layer, and each task block of the same layer is mapped to a different processing core, the above mapping relationship ensures that all tasks of the same layer are mapped to at least two different processing cores.

In some embodiments, the computational graph is a trainable computational graph.

Wherein the trainable computational graph is capable of solving the same problem to be processed in situations where at least some of the tasks are different.

In some embodiments, the computational graph is a Neural Network (NN).

As a way of an embodiment of the present disclosure, the above computation graph is a trainable computation graph, and further is a neural network, such as a Convolutional Neural Network (CNN), a Spiking Neural Network (SNN), a Recurrent Neural Network (RNN), and the like; therefore, the redundancy performance of the calculation graph can be further improved by training the calculation graph.

Of course, embodiments of the present disclosure are not limited to trainable computational graphs (e.g., neural networks) and may be used with other computational graphs.

In some embodiments, any two task blocks of any one layer are mapped into two different processing cores, respectively, according to the mapping relationship.

Further, referring to FIG. 3, a maximally dispersed mapping (or "mutually exclusive" mapping) may be performed, i.e., to ensure that no task blocks from the same layer are mapped to the same processing core.

In some embodiments, referring to fig. 2 and 3, between acquiring a computation graph of a to-be-processed problem (S101) and dividing each layer of the computation graph into a plurality of task blocks (S103), the method further includes:

and S102, training the calculation graph to improve the redundancy performance of the calculation graph.

Before the computation graph is subjected to blocking, the computation graph can be trained to improve the redundancy performance of the computation graph.

In some embodiments, a computational graph (S102) is trained, including at least one of:

(1) and invalidating part of tasks in the computation graph to train the computation graph.

As a way of the embodiment of the present disclosure, a Dropout manner may be adopted, in which some tasks in the computation graph are invalidated (e.g., the weights of some nodes of the neural network are set to 0), and other tasks are adjusted, so that the computation graph may produce results that are usable to some extent in the case that these tasks are invalid, thereby improving the robustness thereof.

(2) A region of the computation graph is invalidated to train the computation graph.

Wherein the area includes a plurality of tasks.

As a way of the embodiments of the present disclosure, Dropblock may be used to invalidate all tasks in a region (which may include a portion of a layer, or corresponding portions of multiple layers) in the computation graph, and adjust other tasks so that the computation graph may produce a result that is usable to some extent if the tasks in the region are invalid, thereby improving the robustness thereof.

In each "region" of the above manner, more tasks are generally mapped to one processing core, so that the redundant performance of the computation graph can be improved better.

(3) The computational graph is trained against the sample defense.

As a way of embodiments of the present disclosure, a computational graph may be trained in a "resist sample defense" manner.

In the embodiment of the present disclosure, all computation graphs to be trained are necessarily trainable computation graphs (e.g., neural networks).

In the embodiment of the present disclosure, each training may be performed only once, or may be performed repeatedly.

In the embodiment of the present disclosure, when performing multiple training, the specific manner adopted by each training may be the same or different.

In the embodiment of the present disclosure, when training is performed for multiple times, the training may be ended when a preset end criterion is reached, where the end criterion may include calculating graph convergence, reaching a predetermined training number, reaching a predetermined redundancy performance, and the like.

All training in the embodiments of the present disclosure may meet the above requirements and will not be described in detail later.

In some embodiments, referring to fig. 2, 3, each layer of the computation graph is divided into a plurality of task blocks (S103), including:

and S1031, expanding the calculation graph, and dividing each layer of the expanded calculation graph into a plurality of task blocks.

Wherein the extending comprises adding redundant tasks in at least part of the layers of the computational graph.

As a way of the embodiment of the present disclosure, it may be "expanded," or "decompressed," that is, some tasks (redundant tasks) that are not originally present are "added" in each layer of the computation graph, and then the computation graph that is decompressed is partitioned, so that at least some of the obtained task blocks include the above "redundant tasks" to improve the redundancy performance.

The specific "extension" of each layer may be predetermined. For example, a redundancy coefficient b may be set, which indicates a ratio of the amount of computation of the expanded task to the amount of computation of the original task: if the redundancy coefficient b is 0, the expansion is not performed; if b is 1, the calculation amount of the expanded task is the same as that of the original task; generally, b may be greater than 0 and less than or equal to 1 (it is also possible if b is greater than 1, e.g. there may be "multiple copies" of the task backup).

Alternatively, each layer may be expanded according to a certain method, and the actual task amount obtained by the expansion according to the method may be used as a criterion.

In some embodiments, the redundancy tasks include at least one of:

(1) and (5) backing up tasks.

Wherein the backup task is the same as the task in the corresponding layer.

As a way of implementing the disclosed embodiments, each expanded task may be an original task in its corresponding layer, so that the corresponding tasks actually exist in "multiple copies" and may be mutually "backed up". That is, when a task is not completed (e.g., a processing core fails), the subsequent task can be performed by using the backup operation result, so as to improve the robustness.

Of course, for "backup tasks" and "original tasks", the successors should typically be separated into different task blocks, and these task blocks should be mapped into different processing cores.

(2) And (4) an empty task.

As a way of the disclosed embodiments, null tasks that do not perform actual operations (or do null operations) may be expanded.

(3) And (5) invalidating the task.

As a way of the embodiments of the present disclosure, some tasks that need to perform operations but are not needed in the original computation graph, that is, invalid tasks, may be expanded.

Wherein the ineffective tasks can be generated randomly or through other specific counter-compression techniques.

The extension modes of different layers may be the same or different.

In some embodiments, dividing each layer of the computational graph into a plurality of task blocks (S103) comprises any one of:

(1) each layer of the computation graph is randomly divided into a plurality of task blocks.

As a way of the embodiment of the present disclosure, the tasks in each layer may be randomly "blocked", that is, the number of tasks in each task block and the specific tasks are all random.

(2) Each layer of the computation graph is divided into a plurality of task blocks uniformly.

As one mode of the embodiment of the present disclosure, each layer may be uniformly divided into a plurality of task blocks, that is, the number of tasks in each task block of the same layer is equal or substantially equal (for example, in all task blocks of the same layer, if the number of tasks in the task block with the least tasks is 100%, the number of tasks in the task block with the most tasks does not exceed 110%).

(3) Dividing each layer of the computation graph into a plurality of pre-task blocks, and combining all pre-task blocks which are mapped to one processing core according to the mapping relation into one task block.

As a mode of the embodiment of the present disclosure, when partitioning, a subsequent "mapping" may be considered, that is, each layer is first partitioned into a plurality of "blocks (pre-task blocks)", and if a plurality of pre-task blocks are mapped into one processing core, then they may be directly merged to be one task block.

(4) Each layer of the computational graph is divided into a plurality of task blocks based at least on hardware resources of the processing cores.

As a way of the embodiment of the present disclosure, when performing "blocking", the actual hardware resources (such as cache, etc.) of the processing core and the subsequent "mapping" may also be considered, so that according to the hardware resources of the processing core to which each task block is mapped, which tasks should be divided into the task block is determined.

The hardware resources of different processing cores may be the same or different.

The partitioning modes of different layers may be the same or different.

In some embodiments, referring to fig. 2 and 3, between dividing each layer of the computation graph into a plurality of task blocks (S103) and determining mapping relationships between the task blocks and a plurality of processing cores of the many-core system (S105), the method further includes:

and S104, invalidating part of the task blocks to train each task block and improve the redundancy performance of the calculation graph.

After "chunking" is performed, one or more (but not all) task blocks may also be invalidated (i.e., all of the tasks are invalidated), and the tasks in the other task blocks may be adjusted so that the remaining task blocks may still produce results that are usable to some extent (i.e., training task blocks), improving robustness.

Of course, the above training is also essentially the training of the "computation graph".

The training is performed after blocking (also after expansion or back compression), which is equivalent to improving the robustness at the "block level".

In some embodiments, invalidating the partial task blocks, to train each task block (S104), includes at least one of:

(1) and randomly invalidating part of the task blocks to train each task block.

As a way of embodiments of the present disclosure, there may be a random invalid partial task block to train the remaining task blocks.

(2) And determining key task blocks comprising key tasks and invalid key task blocks to train each task block.

As a mode of the embodiment of the present disclosure, a "key task" playing a key role may be determined according to a structural feature of the computation graph, and a key task block where the key task is located is invalidated to train a task block.

In some embodiments, referring to fig. 2 and 3, after determining the mapping relationship between each task block and the plurality of processing cores of the many-core system (S105), the method further includes:

and S106, processing all the task blocks mapped in the kernel by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.

After the mapping relationship is determined (including after the actual mapping is performed), all the tasks mapped in the partial processing cores can be invalidated (which is equivalent to invalidating part of the processing cores), and other tasks can be adjusted, so that the rest of the tasks can still generate a certain usable result (i.e., training task blocks), and the robustness is improved.

The training is equivalent to simulating the invalid condition of a part of processing cores due to faults and the like, so that the robustness of the 'core level' can be improved from the perspective of final practical application.

In some embodiments, invalidating all task blocks mapped in a partial processing core to train each task block (S106) includes at least one of:

(1) the random invalidation portion processes all task blocks mapped in the core to train each task block.

As a way of embodiments of the present disclosure, it may be that all tasks mapped to one or more (but not all) processing cores are invalidated (i.e., one or more processing cores are invalidated) to train each task block.

(2) And sequentially and respectively invalidating all the task blocks mapped in each processing core so as to train each task block.

As one way of the disclosed embodiment, the processing cores may be invalidated in sequence, i.e., only one processing core is invalidated at a time, but all processing cores are invalidated.

(3) Determining key task blocks including key tasks, all task blocks mapped in the processing core to which invalid key task blocks are mapped, to train each task block.

As a manner of the embodiment of the present disclosure, a key task may be determined according to a structure of a computation graph, and then a key task block where the key task is located is determined, and a processing core corresponding to the key task block is invalidated to train each task block.

In some embodiments, referring to fig. 2, after determining the mapping relationship between each task block and the plurality of processing cores of the many-core system (S105), the method further includes:

and S107, mapping each task block to a plurality of processing cores according to the mapping relation.

And S108, each processing core processes the tasks in the task block mapped to the processing core.

After the mapping relationship is determined, each task block can be mapped (or distributed) to the processing core according to the mapping relationship, and the processing core performs processing to realize the actual function of the computation graph and solve the above to-be-processed problem.

Of course, the above steps of determining the mapping relationship (S105) and mapping according to the mapping relationship (S107) may actually be integrated, i.e., mapping may be performed directly.

Of course, when the training of the above step S106 is to be performed, referring to fig. 2, the step S106 may be performed before the step S107, that is, the task corresponding to the processing core may be invalidated and trained only according to the mapping relationship without actually mapping the task into the processing core.

Alternatively, step S106 may be performed after step S107, i.e., the task may be actually mapped to the processing core, and the processing core may be actually disabled for training.

In a second aspect, referring to fig. 4, an embodiment of the present disclosure provides a task processing apparatus 600, including:

an obtaining module 601 configured to obtain a calculation graph of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;

a partitioning module 602 configured to partition each layer of the computational graph into a plurality of task blocks; each task block comprises at least one task;

a mapping module 603 configured to determine a mapping relationship between each task block and a plurality of processing cores of the many-core system; according to the mapping relation, each task block is mapped to one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any one layer are mapped to at least two different processing cores.

The task processing device 600 of the embodiment of the present disclosure may implement the method of task processing described above.

It should be understood that when other steps are included in the above-mentioned method for task processing, other modules for implementing the corresponding steps may also be included in the device 600 for task processing.

In a third aspect, referring to fig. 5, an embodiment of the disclosure provides a many-core system 700, comprising:

a plurality of processing cores 701; and

a network on chip 702 configured to interact data between the plurality of processing cores 701 and external data;

one or more instructions are stored in the one or more processing cores 701 and executed by the one or more processing cores 701 to enable the one or more processing cores 701 to perform a method that implements any of the task processes described above.

The many-core system 700 of the embodiment of the present disclosure may implement the above-mentioned task processing method, including actually executing the tasks in the computation graph to obtain the processing result of the problem to be processed.

In a fourth aspect, referring to fig. 6, the present disclosure provides a computer readable medium 800 on which a computer program is stored, wherein the computer program, when executed by a processing core, implements any one of the above-mentioned methods of task processing.

The embodiment of the present disclosure provides a method for implementing the task processing when a computer program is stored in a computer readable medium 800 and then executed by a processing core.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A method of task processing, comprising:

2. The method of task processing according to claim 1, wherein at least one of the following is also satisfied:

between the obtaining of the computation graph of the to-be-processed problem and the dividing of each layer of the computation graph into a plurality of task blocks, the method further includes: training the computational graph to improve the redundancy performance of the computational graph;

the method further comprises the following steps of dividing each layer of the computation graph into a plurality of task blocks and determining the mapping relation between each task block and a plurality of processing cores of the many-core system: invalid part of task blocks to train each task block and improve the redundancy performance of the calculation graph;

after the determining the mapping relationship between each task block and a plurality of processing cores of the many-core system, the method further comprises the following steps: and processing all the task blocks mapped in the core by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.

3. The method of task processing according to claim 1, wherein said dividing each layer of the computation graph into a plurality of task blocks comprises:

4. The method of task processing according to claim 1,

and mapping any two task blocks of any one layer into two different processing cores according to the mapping relation.

5. The method of task processing according to any one of claims 1 to 4,

the computation graph is a trainable computation graph; the trainable computational graph can solve the same problem to be processed in cases where at least some of the tasks are different.

6. The method of task processing according to any one of claims 1 to 4,

the computational graph is a neural network.

7. The method for task processing according to any one of claims 1 to 4, wherein after the determining the mapping relationship between each task block and the plurality of processing cores of the many-core system, the method further comprises:

each processing core processes tasks in the task block mapped thereto.

8. An apparatus for task processing, comprising:

9. A many-core system, comprising:

a plurality of processing cores; and

one or more of the processing cores have stored therein one or more instructions that are executed by the one or more processing cores to enable the one or more processing cores to perform a method of task processing according to any of claims 1 to 7.

10. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processing core, carries out the method of task processing according to any one of claims 1 to 7.