CN112835718A - Method and device for processing task, many-core system and computer readable medium - Google Patents

Method and device for processing task, many-core system and computer readable medium Download PDF

Info

Publication number
CN112835718A
CN112835718A CN202110184918.6A CN202110184918A CN112835718A CN 112835718 A CN112835718 A CN 112835718A CN 202110184918 A CN202110184918 A CN 202110184918A CN 112835718 A CN112835718 A CN 112835718A
Authority
CN
China
Prior art keywords
task
layer
tasks
processing
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110184918.6A
Other languages
Chinese (zh)
Inventor
施路平
张伟豪
林俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lynxi Technology Co Ltd
Original Assignee
Beijing Lynxi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lynxi Technology Co Ltd filed Critical Beijing Lynxi Technology Co Ltd
Priority to CN202110184918.6A priority Critical patent/CN112835718A/en
Publication of CN112835718A publication Critical patent/CN112835718A/en
Priority to PCT/CN2022/074490 priority patent/WO2022171002A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The present disclosure provides a method of task processing, the method comprising: acquiring a calculation chart of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer; dividing each layer of the computation graph into a plurality of task blocks; each task block comprises at least one task; determining a mapping relation between each task block and a plurality of processing cores of a many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores. The disclosure also provides a task processing device, a many-core system and a computer readable medium.

Description

Method and device for processing task, many-core system and computer readable medium
Technical Field
The present disclosure relates to the field of many-core technologies, and in particular, to a method and an apparatus for task processing, a many-core system, and a computer-readable medium.
Background
One problem to be solved by an electronic computing process is essentially to process a plurality of tasks (or operations) corresponding thereto.
The above process may be performed using a many-core system. The many-core system includes a plurality of processing cores (or cores, processing engines) capable of interacting, and a plurality of tasks corresponding to the problem to be processed can be mapped (or distributed) to different processing cores and processed by each processing core respectively.
It is clear that there is an inevitable possibility of invalidation (e.g., due to a failure) of the processing cores of the many-core system, and it is important to ensure that some usable processing results are still obtained when some of the processing cores of the many-core system are invalid.
Disclosure of Invention
The embodiment of the disclosure provides a task processing method and device, a many-core system and a computer readable medium.
In a first aspect, an embodiment of the present disclosure provides a method for task processing, including:
acquiring a calculation chart of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;
dividing each layer of the computation graph into a plurality of task blocks; each task block comprises at least one task;
determining a mapping relation between each task block and a plurality of processing cores of a many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.
In some embodiments, between the obtaining a computational graph of the to-be-processed problem and the dividing each layer of the computational graph into a plurality of task blocks, further comprising:
training the computational graph to improve the redundancy performance of the computational graph.
In some embodiments, the training the computational graph comprises at least one of:
invalidating a portion of the tasks in the computation graph to train the computation graph;
invalidating a region of the computation graph to train the computation graph; the region includes a plurality of tasks;
the computational graph is trained against sample defense.
In some embodiments, said dividing each layer of said computational graph into a plurality of task blocks comprises:
expanding the calculation graph, and dividing each layer of the expanded calculation graph into a plurality of task blocks; the extending includes adding redundant tasks in at least some layers of the computational graph.
In some embodiments, the redundancy tasks include at least one of:
a backup task; the backup task is the same as the task in the corresponding layer;
an empty task;
and (5) invalidating the task.
In some embodiments, the dividing each layer of the computational graph into a plurality of task blocks comprises any one of:
randomly dividing each layer of the computation graph into a plurality of task blocks;
uniformly dividing each layer of the computation graph into a plurality of task blocks;
dividing each layer of the computation graph into a plurality of pre-task blocks, and merging all pre-task blocks which are mapped to one processing core according to the mapping relation into one task block;
each layer of the computational graph is divided into a plurality of task blocks based at least on hardware resources of the processing cores.
In some embodiments, between the dividing each layer of the computation graph into a plurality of task blocks and the determining the mapping relationship between each task block and a plurality of processing cores of a many-core system, the method further includes:
and invalid part of task blocks so as to train each task block and improve the redundancy performance of the calculation graph.
In some embodiments, the invalidating the partial task blocks to train each task block includes at least one of:
randomly invalidating a portion of the task blocks to train each task block;
determining key task blocks comprising key tasks, and invalidating the key task blocks to train each task block.
In some embodiments, any two task blocks of any one layer are mapped into two different processing cores according to the mapping relationship.
In some embodiments, after the determining the mapping relationship between each task block and the plurality of processing cores of the many-core system, the method further includes:
and processing all the task blocks mapped in the core by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.
In some embodiments, the invalidating the portion of all task blocks mapped in the processing core to train each task block includes at least one of:
processing all task blocks mapped in the kernel by the random invalid part so as to train each task block;
sequentially and respectively invalidating all the task blocks mapped in each processing core so as to train each task block;
determining a key task block including a key task, and invalidating all task blocks mapped in a processing core to which the key task block is mapped, to train each task block.
In some embodiments, the computational graph is a trainable computational graph; the trainable computational graph can solve the same problem to be processed in cases where at least some of the tasks are different.
In some embodiments, the computational graph is a neural network.
In some embodiments, after the determining the mapping relationship between each task block and the plurality of processing cores of the many-core system, the method further includes:
mapping each task block into a plurality of processing cores according to the mapping relation;
each processing core processes tasks in the task block mapped thereto.
In a second aspect, an embodiment of the present disclosure provides an apparatus for task processing, including:
the acquisition module is configured to acquire a calculation graph of the problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;
a partitioning module configured to partition each layer of the computational graph into a plurality of task blocks; each task block comprises at least one task;
the mapping module is configured to determine the mapping relation between each task block and a plurality of processing cores of the many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.
In a third aspect, an embodiment of the present disclosure provides a many-core system, including:
a plurality of processing cores; and
a network on chip configured to interact data among the plurality of processing cores and external data;
one or more of the processing cores have one or more instructions stored therein which are executed by the one or more processing cores to enable the one or more processing cores to perform a method of performing any of the above task processes.
In a fourth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processing core, implements any one of the above-mentioned task processing methods.
In the embodiment of the disclosure, all tasks in the same layer of the computation graph are mapped to at least two different processing cores for processing, so that when any processing core is invalid (for example, due to a fault), at most one layer of the computation graph is only 'broken by one part', and the situation that all tasks in one layer are invalid is not generated, so that the computation graph as a whole can still obtain a processing result usable to a certain extent, and the robustness of the computation graph is greatly improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
fig. 1 is a flowchart of a method for task processing according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another method of task processing provided by embodiments of the present disclosure;
fig. 3 is a schematic process diagram of a computation graph in a method for task processing according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of a task processing apparatus according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of a many-core system according to an embodiment of the present disclosure;
fig. 6 is a block diagram of a computer-readable medium according to an embodiment of the disclosure.
Detailed Description
To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The actual work to be done in many problems (e.g., image processing, speech recognition, etc.) can be expressed in the form of computational graphs (or task graphs, logic graphs). That is, all operations to be performed to solve the problem are divided into a plurality of "tasks (or nodes)", each task includes a certain operation, and a certain sequence exists between different tasks. For example, if the operation result of another task is used for the operation of a certain task, the task is said to be performed based on the result of the other task; alternatively, the task is a subsequent task to the other task, and the other task is a previous task to the task.
Because of the above relationships between tasks, referring to fig. 3, the computation graph may be divided into multiple "layers," each layer including multiple tasks, and tasks in any one layer are not performed based on tasks in the layer or subsequent layers, and at least some of the tasks in at least some layers are performed based on results of tasks in previous layers. That is, if the task in the previous layer is not completed, the task in the subsequent layer may not be performed, because the operation result of the task in the previous layer may be used in the operation process of the task in the subsequent layer; however, if the task in the later layer cannot be performed, the previous layer cannot be affected, because the operation of the task in the previous layer cannot use the operation result of the task in the later layer; and the tasks in the same layer do not have the relationship, because if the tasks exist, the corresponding tasks belong to two different layers.
For example, in fig. 3, the tasks of different layers (layer 0 to layer 3) are represented by differently filled boxes, the number of tasks therein is represented by the lateral size of the filled boxes, and the different processing cores (processing core 0 to processing core 3) are represented by blank boxes.
An exemplary "Neural Network (NN)" is a form of computational graph, among others. The neural network is divided into a plurality of layers, each layer comprises a plurality of nodes, certain operation is required to be carried out in each node, and the nodes of different layers are connected in a certain relation (for example, one node output is used as the input of the next layer of nodes); thus, each layer of the neural network may be considered as one layer of the computational graph, while each node of the neural network may be considered as one task of the computational graph.
Illustratively, the neural network in the embodiment of the present disclosure may be used for image processing, voice recognition, and the like, and may be specifically in the form of a Convolutional Neural Network (CNN), a Spiking Neural Network (SNN), a Recurrent Neural Network (RNN), and the like.
For example, some of the questions may correspond to a plurality of different computation graphs. That is, the number of tasks in the computation graph, the layer in which the tasks are located, the relationship between the tasks, the specific calculation for each task, and the like may be different, but these different computation graphs can solve the problem (but the effect of solving the problem is not necessarily the same).
The computational graph that may have many forms is referred to above as a "trainable computational graph". That is, the task of a computation graph that can solve a problem can be adjusted by training, and the effect of solving the problem of the computation graph after training is different.
For example, a neural network is a form of trainable computational graph. For example, a neural network that handles a problem (e.g., image classification) is usually trained by adjusting nodes therein (e.g., adjusting weights of the nodes) according to the effect of the current neural network on solving the problem (e.g., accuracy of image classification), thereby changing the neural network (computational graph) and improving the effect of processing the problem (e.g., improving accuracy of image classification).
In some related techniques, when a problem is to be handled by a many-core system, the tasks at each layer of its corresponding computation graph may be mapped (allocated) to one processing core, while the tasks at different layers are mapped to different processing cores.
However, according to the above manner, once a certain processing core of the many-core system is invalid (e.g., due to a fault), all tasks corresponding to one layer of the computation graph cannot be processed, and thus all tasks behind the layer cannot be actually performed, which inevitably results in that the entire problem cannot be solved at all (i.e., no processing result can be obtained), and the robustness of the system is poor.
In a first aspect, referring to fig. 1 to 3, an embodiment of the present disclosure provides a method for task processing.
The method of task processing of the disclosed embodiments is based on a many-core system, which includes how to map tasks of one computation graph into each processing core of the many-core system.
Referring to fig. 1, a method for task processing according to an embodiment of the present disclosure includes:
and S101, acquiring a calculation chart of the problem to be processed.
The calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer.
When a many-core system is used for processing a problem to be processed (such as image processing, voice recognition and the like), a corresponding calculation map is obtained. Wherein, a preset calculation chart can be obtained; the calculation graph can also be generated according to a preset rule according to a specific problem to be processed.
And S103, dividing each layer of the calculation graph into a plurality of task blocks.
Wherein each task block comprises at least one task.
Referring to FIG. 3, the tasks in each layer of the computational graph are divided into multiple "groups," i.e., each layer is divided into multiple "task blocks," each task block including one or more tasks in that layer.
The number of task blocks divided in each layer may be preset, for example, each layer is preset to be divided into a task blocks.
Alternatively, "blocking" may be performed according to a certain manner, based on the number of actually obtained task blocks.
And S105, determining the mapping relation between each task block and a plurality of processing cores of the many-core system.
According to the mapping relation, each task block is mapped to one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any one layer are mapped to at least two different processing cores.
Referring to fig. 3, after obtaining a plurality of task blocks, it may be determined into which processing core of the many-core system each task block should be mapped, i.e., mapped (not necessarily actually mapped).
According to the mapping relationship of the embodiment of the present disclosure, the mapping is performed "dispersedly", that is, all task blocks in the same layer should be mapped into different processing cores "separately" as much as possible, at least to ensure that they are not all mapped into the same processing core. Since each task block includes tasks in the same layer, and each task block of the same layer is mapped to a different processing core, the above mapping relationship ensures that all tasks of the same layer are mapped to at least two different processing cores.
In the embodiment of the disclosure, all tasks in the same layer of the computation graph are mapped to at least two different processing cores for processing, so that when any processing core is invalid (for example, due to a fault), at most one layer of the computation graph is only 'broken by one part', and the situation that all tasks in one layer are invalid is not generated, so that the computation graph as a whole can still obtain a processing result usable to a certain extent, and the robustness of the computation graph is greatly improved.
In some embodiments, the computational graph is a trainable computational graph.
Wherein the trainable computational graph is capable of solving the same problem to be processed in situations where at least some of the tasks are different.
In some embodiments, the computational graph is a Neural Network (NN).
As a way of an embodiment of the present disclosure, the above computation graph is a trainable computation graph, and further is a neural network, such as a Convolutional Neural Network (CNN), a Spiking Neural Network (SNN), a Recurrent Neural Network (RNN), and the like; therefore, the redundancy performance of the calculation graph can be further improved by training the calculation graph.
Of course, embodiments of the present disclosure are not limited to trainable computational graphs (e.g., neural networks) and may be used with other computational graphs.
In some embodiments, any two task blocks of any one layer are mapped into two different processing cores, respectively, according to the mapping relationship.
Further, referring to FIG. 3, a maximally dispersed mapping (or "mutually exclusive" mapping) may be performed, i.e., to ensure that no task blocks from the same layer are mapped to the same processing core.
In some embodiments, referring to fig. 2 and 3, between acquiring a computation graph of a to-be-processed problem (S101) and dividing each layer of the computation graph into a plurality of task blocks (S103), the method further includes:
and S102, training the calculation graph to improve the redundancy performance of the calculation graph.
Before the computation graph is subjected to blocking, the computation graph can be trained to improve the redundancy performance of the computation graph.
In some embodiments, a computational graph (S102) is trained, including at least one of:
(1) and invalidating part of tasks in the computation graph to train the computation graph.
As a way of the embodiment of the present disclosure, a Dropout manner may be adopted, in which some tasks in the computation graph are invalidated (e.g., the weights of some nodes of the neural network are set to 0), and other tasks are adjusted, so that the computation graph may produce results that are usable to some extent in the case that these tasks are invalid, thereby improving the robustness thereof.
(2) A region of the computation graph is invalidated to train the computation graph.
Wherein the area includes a plurality of tasks.
As a way of the embodiments of the present disclosure, Dropblock may be used to invalidate all tasks in a region (which may include a portion of a layer, or corresponding portions of multiple layers) in the computation graph, and adjust other tasks so that the computation graph may produce a result that is usable to some extent if the tasks in the region are invalid, thereby improving the robustness thereof.
In each "region" of the above manner, more tasks are generally mapped to one processing core, so that the redundant performance of the computation graph can be improved better.
(3) The computational graph is trained against the sample defense.
As a way of embodiments of the present disclosure, a computational graph may be trained in a "resist sample defense" manner.
In the embodiment of the present disclosure, all computation graphs to be trained are necessarily trainable computation graphs (e.g., neural networks).
In the embodiment of the present disclosure, each training may be performed only once, or may be performed repeatedly.
In the embodiment of the present disclosure, when performing multiple training, the specific manner adopted by each training may be the same or different.
In the embodiment of the present disclosure, when training is performed for multiple times, the training may be ended when a preset end criterion is reached, where the end criterion may include calculating graph convergence, reaching a predetermined training number, reaching a predetermined redundancy performance, and the like.
All training in the embodiments of the present disclosure may meet the above requirements and will not be described in detail later.
In some embodiments, referring to fig. 2, 3, each layer of the computation graph is divided into a plurality of task blocks (S103), including:
and S1031, expanding the calculation graph, and dividing each layer of the expanded calculation graph into a plurality of task blocks.
Wherein the extending comprises adding redundant tasks in at least part of the layers of the computational graph.
As a way of the embodiment of the present disclosure, it may be "expanded," or "decompressed," that is, some tasks (redundant tasks) that are not originally present are "added" in each layer of the computation graph, and then the computation graph that is decompressed is partitioned, so that at least some of the obtained task blocks include the above "redundant tasks" to improve the redundancy performance.
The specific "extension" of each layer may be predetermined. For example, a redundancy coefficient b may be set, which indicates a ratio of the amount of computation of the expanded task to the amount of computation of the original task: if the redundancy coefficient b is 0, the expansion is not performed; if b is 1, the calculation amount of the expanded task is the same as that of the original task; generally, b may be greater than 0 and less than or equal to 1 (it is also possible if b is greater than 1, e.g. there may be "multiple copies" of the task backup).
Alternatively, each layer may be expanded according to a certain method, and the actual task amount obtained by the expansion according to the method may be used as a criterion.
In some embodiments, the redundancy tasks include at least one of:
(1) and (5) backing up tasks.
Wherein the backup task is the same as the task in the corresponding layer.
As a way of implementing the disclosed embodiments, each expanded task may be an original task in its corresponding layer, so that the corresponding tasks actually exist in "multiple copies" and may be mutually "backed up". That is, when a task is not completed (e.g., a processing core fails), the subsequent task can be performed by using the backup operation result, so as to improve the robustness.
Of course, for "backup tasks" and "original tasks", the successors should typically be separated into different task blocks, and these task blocks should be mapped into different processing cores.
(2) And (4) an empty task.
As a way of the disclosed embodiments, null tasks that do not perform actual operations (or do null operations) may be expanded.
(3) And (5) invalidating the task.
As a way of the embodiments of the present disclosure, some tasks that need to perform operations but are not needed in the original computation graph, that is, invalid tasks, may be expanded.
Wherein the ineffective tasks can be generated randomly or through other specific counter-compression techniques.
The extension modes of different layers may be the same or different.
In some embodiments, dividing each layer of the computational graph into a plurality of task blocks (S103) comprises any one of:
(1) each layer of the computation graph is randomly divided into a plurality of task blocks.
As a way of the embodiment of the present disclosure, the tasks in each layer may be randomly "blocked", that is, the number of tasks in each task block and the specific tasks are all random.
(2) Each layer of the computation graph is divided into a plurality of task blocks uniformly.
As one mode of the embodiment of the present disclosure, each layer may be uniformly divided into a plurality of task blocks, that is, the number of tasks in each task block of the same layer is equal or substantially equal (for example, in all task blocks of the same layer, if the number of tasks in the task block with the least tasks is 100%, the number of tasks in the task block with the most tasks does not exceed 110%).
(3) Dividing each layer of the computation graph into a plurality of pre-task blocks, and combining all pre-task blocks which are mapped to one processing core according to the mapping relation into one task block.
As a mode of the embodiment of the present disclosure, when partitioning, a subsequent "mapping" may be considered, that is, each layer is first partitioned into a plurality of "blocks (pre-task blocks)", and if a plurality of pre-task blocks are mapped into one processing core, then they may be directly merged to be one task block.
(4) Each layer of the computational graph is divided into a plurality of task blocks based at least on hardware resources of the processing cores.
As a way of the embodiment of the present disclosure, when performing "blocking", the actual hardware resources (such as cache, etc.) of the processing core and the subsequent "mapping" may also be considered, so that according to the hardware resources of the processing core to which each task block is mapped, which tasks should be divided into the task block is determined.
The hardware resources of different processing cores may be the same or different.
The partitioning modes of different layers may be the same or different.
In some embodiments, referring to fig. 2 and 3, between dividing each layer of the computation graph into a plurality of task blocks (S103) and determining mapping relationships between the task blocks and a plurality of processing cores of the many-core system (S105), the method further includes:
and S104, invalidating part of the task blocks to train each task block and improve the redundancy performance of the calculation graph.
After "chunking" is performed, one or more (but not all) task blocks may also be invalidated (i.e., all of the tasks are invalidated), and the tasks in the other task blocks may be adjusted so that the remaining task blocks may still produce results that are usable to some extent (i.e., training task blocks), improving robustness.
Of course, the above training is also essentially the training of the "computation graph".
The training is performed after blocking (also after expansion or back compression), which is equivalent to improving the robustness at the "block level".
In some embodiments, invalidating the partial task blocks, to train each task block (S104), includes at least one of:
(1) and randomly invalidating part of the task blocks to train each task block.
As a way of embodiments of the present disclosure, there may be a random invalid partial task block to train the remaining task blocks.
(2) And determining key task blocks comprising key tasks and invalid key task blocks to train each task block.
As a mode of the embodiment of the present disclosure, a "key task" playing a key role may be determined according to a structural feature of the computation graph, and a key task block where the key task is located is invalidated to train a task block.
In some embodiments, referring to fig. 2 and 3, after determining the mapping relationship between each task block and the plurality of processing cores of the many-core system (S105), the method further includes:
and S106, processing all the task blocks mapped in the kernel by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.
After the mapping relationship is determined (including after the actual mapping is performed), all the tasks mapped in the partial processing cores can be invalidated (which is equivalent to invalidating part of the processing cores), and other tasks can be adjusted, so that the rest of the tasks can still generate a certain usable result (i.e., training task blocks), and the robustness is improved.
Of course, the above training is also essentially the training of the "computation graph".
The training is equivalent to simulating the invalid condition of a part of processing cores due to faults and the like, so that the robustness of the 'core level' can be improved from the perspective of final practical application.
In some embodiments, invalidating all task blocks mapped in a partial processing core to train each task block (S106) includes at least one of:
(1) the random invalidation portion processes all task blocks mapped in the core to train each task block.
As a way of embodiments of the present disclosure, it may be that all tasks mapped to one or more (but not all) processing cores are invalidated (i.e., one or more processing cores are invalidated) to train each task block.
(2) And sequentially and respectively invalidating all the task blocks mapped in each processing core so as to train each task block.
As one way of the disclosed embodiment, the processing cores may be invalidated in sequence, i.e., only one processing core is invalidated at a time, but all processing cores are invalidated.
(3) Determining key task blocks including key tasks, all task blocks mapped in the processing core to which invalid key task blocks are mapped, to train each task block.
As a manner of the embodiment of the present disclosure, a key task may be determined according to a structure of a computation graph, and then a key task block where the key task is located is determined, and a processing core corresponding to the key task block is invalidated to train each task block.
In some embodiments, referring to fig. 2, after determining the mapping relationship between each task block and the plurality of processing cores of the many-core system (S105), the method further includes:
and S107, mapping each task block to a plurality of processing cores according to the mapping relation.
And S108, each processing core processes the tasks in the task block mapped to the processing core.
After the mapping relationship is determined, each task block can be mapped (or distributed) to the processing core according to the mapping relationship, and the processing core performs processing to realize the actual function of the computation graph and solve the above to-be-processed problem.
Of course, the above steps of determining the mapping relationship (S105) and mapping according to the mapping relationship (S107) may actually be integrated, i.e., mapping may be performed directly.
Of course, when the training of the above step S106 is to be performed, referring to fig. 2, the step S106 may be performed before the step S107, that is, the task corresponding to the processing core may be invalidated and trained only according to the mapping relationship without actually mapping the task into the processing core.
Alternatively, step S106 may be performed after step S107, i.e., the task may be actually mapped to the processing core, and the processing core may be actually disabled for training.
In a second aspect, referring to fig. 4, an embodiment of the present disclosure provides a task processing apparatus 600, including:
an obtaining module 601 configured to obtain a calculation graph of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;
a partitioning module 602 configured to partition each layer of the computational graph into a plurality of task blocks; each task block comprises at least one task;
a mapping module 603 configured to determine a mapping relationship between each task block and a plurality of processing cores of the many-core system; according to the mapping relation, each task block is mapped to one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any one layer are mapped to at least two different processing cores.
The task processing device 600 of the embodiment of the present disclosure may implement the method of task processing described above.
It should be understood that when other steps are included in the above-mentioned method for task processing, other modules for implementing the corresponding steps may also be included in the device 600 for task processing.
In a third aspect, referring to fig. 5, an embodiment of the disclosure provides a many-core system 700, comprising:
a plurality of processing cores 701; and
a network on chip 702 configured to interact data between the plurality of processing cores 701 and external data;
one or more instructions are stored in the one or more processing cores 701 and executed by the one or more processing cores 701 to enable the one or more processing cores 701 to perform a method that implements any of the task processes described above.
The many-core system 700 of the embodiment of the present disclosure may implement the above-mentioned task processing method, including actually executing the tasks in the computation graph to obtain the processing result of the problem to be processed.
In a fourth aspect, referring to fig. 6, the present disclosure provides a computer readable medium 800 on which a computer program is stored, wherein the computer program, when executed by a processing core, implements any one of the above-mentioned methods of task processing.
The embodiment of the present disclosure provides a method for implementing the task processing when a computer program is stored in a computer readable medium 800 and then executed by a processing core.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (10)

1. A method of task processing, comprising:
acquiring a calculation chart of a problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;
dividing each layer of the computation graph into a plurality of task blocks; each task block comprises at least one task;
determining a mapping relation between each task block and a plurality of processing cores of a many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.
2. The method of task processing according to claim 1, wherein at least one of the following is also satisfied:
between the obtaining of the computation graph of the to-be-processed problem and the dividing of each layer of the computation graph into a plurality of task blocks, the method further includes: training the computational graph to improve the redundancy performance of the computational graph;
the method further comprises the following steps of dividing each layer of the computation graph into a plurality of task blocks and determining the mapping relation between each task block and a plurality of processing cores of the many-core system: invalid part of task blocks to train each task block and improve the redundancy performance of the calculation graph;
after the determining the mapping relationship between each task block and a plurality of processing cores of the many-core system, the method further comprises the following steps: and processing all the task blocks mapped in the core by an invalid part so as to train each task block and improve the redundancy performance of the calculation graph.
3. The method of task processing according to claim 1, wherein said dividing each layer of the computation graph into a plurality of task blocks comprises:
expanding the calculation graph, and dividing each layer of the expanded calculation graph into a plurality of task blocks; the extending includes adding redundant tasks in at least some layers of the computational graph.
4. The method of task processing according to claim 1,
and mapping any two task blocks of any one layer into two different processing cores according to the mapping relation.
5. The method of task processing according to any one of claims 1 to 4,
the computation graph is a trainable computation graph; the trainable computational graph can solve the same problem to be processed in cases where at least some of the tasks are different.
6. The method of task processing according to any one of claims 1 to 4,
the computational graph is a neural network.
7. The method for task processing according to any one of claims 1 to 4, wherein after the determining the mapping relationship between each task block and the plurality of processing cores of the many-core system, the method further comprises:
mapping each task block into a plurality of processing cores according to the mapping relation;
each processing core processes tasks in the task block mapped thereto.
8. An apparatus for task processing, comprising:
the acquisition module is configured to acquire a calculation graph of the problem to be processed; the calculation graph comprises a plurality of layers which are arranged in sequence, each layer comprises a plurality of tasks, the tasks in any layer are not performed based on the results of the tasks in the layer or the subsequent layer, and at least part of the tasks in at least part of the layers are performed based on the results of the tasks in the previous layer;
a partitioning module configured to partition each layer of the computational graph into a plurality of task blocks; each task block comprises at least one task;
the mapping module is configured to determine the mapping relation between each task block and a plurality of processing cores of the many-core system; according to the mapping relation, each task block is mapped into one processing core, a plurality of task blocks are mapped in each processing core, and all the task blocks of any layer are mapped into at least two different processing cores.
9. A many-core system, comprising:
a plurality of processing cores; and
a network on chip configured to interact data among the plurality of processing cores and external data;
one or more of the processing cores have stored therein one or more instructions that are executed by the one or more processing cores to enable the one or more processing cores to perform a method of task processing according to any of claims 1 to 7.
10. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processing core, carries out the method of task processing according to any one of claims 1 to 7.
CN202110184918.6A 2021-02-10 2021-02-10 Method and device for processing task, many-core system and computer readable medium Pending CN112835718A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110184918.6A CN112835718A (en) 2021-02-10 2021-02-10 Method and device for processing task, many-core system and computer readable medium
PCT/CN2022/074490 WO2022171002A1 (en) 2021-02-10 2022-01-28 Task processing method and apparatus, many-core system, and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110184918.6A CN112835718A (en) 2021-02-10 2021-02-10 Method and device for processing task, many-core system and computer readable medium

Publications (1)

Publication Number Publication Date
CN112835718A true CN112835718A (en) 2021-05-25

Family

ID=75933596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110184918.6A Pending CN112835718A (en) 2021-02-10 2021-02-10 Method and device for processing task, many-core system and computer readable medium

Country Status (1)

Country Link
CN (1) CN112835718A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022171002A1 (en) * 2021-02-10 2022-08-18 北京灵汐科技有限公司 Task processing method and apparatus, many-core system, and computer-readable medium
CN115098262A (en) * 2022-06-27 2022-09-23 清华大学 Multi-neural-network task processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124626A (en) * 2018-11-01 2020-05-08 北京灵汐科技有限公司 Many-core system and data processing method and processing device thereof
US20200160182A1 (en) * 2018-05-31 2020-05-21 Neuralmagic Inc. System and method of executing neural networks
CN111723900A (en) * 2019-03-18 2020-09-29 北京灵汐科技有限公司 Mapping method of neural network based on many-core processor and computing device
CN112348828A (en) * 2020-10-27 2021-02-09 浙江大华技术股份有限公司 Example segmentation method and device based on neural network and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160182A1 (en) * 2018-05-31 2020-05-21 Neuralmagic Inc. System and method of executing neural networks
CN111124626A (en) * 2018-11-01 2020-05-08 北京灵汐科技有限公司 Many-core system and data processing method and processing device thereof
CN111723900A (en) * 2019-03-18 2020-09-29 北京灵汐科技有限公司 Mapping method of neural network based on many-core processor and computing device
CN112348828A (en) * 2020-10-27 2021-02-09 浙江大华技术股份有限公司 Example segmentation method and device based on neural network and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马晓慧;陈娟;: "一种基于数据划分和任务映射的并行调度算法", 现代计算机(专业版), no. 14 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022171002A1 (en) * 2021-02-10 2022-08-18 北京灵汐科技有限公司 Task processing method and apparatus, many-core system, and computer-readable medium
CN115098262A (en) * 2022-06-27 2022-09-23 清华大学 Multi-neural-network task processing method and device
CN115098262B (en) * 2022-06-27 2024-04-23 清华大学 Multi-neural network task processing method and device

Similar Documents

Publication Publication Date Title
US11847569B2 (en) Training and application method of a multi-layer neural network model, apparatus and storage medium
WO2018068421A1 (en) Method and device for optimizing neural network
CN118297118A (en) Generating discrete potential representations of input data items
US10366106B2 (en) Quorum-based replication of data records
US20200311549A1 (en) Method of pruning convolutional neural network based on feature map variation
CN112835718A (en) Method and device for processing task, many-core system and computer readable medium
Gabler Minimax solutions in sampling from finite populations
JP6901423B2 (en) Information processing equipment, information processing terminals, and programs
CN112835719B (en) Method and device for task processing, many-core system and computer readable medium
CN115543871A (en) Data storage method and related equipment
CN112001491A (en) Search method and device for determining neural network architecture for processor
CN114861907A (en) Data calculation method, device, storage medium and equipment
CN111046004B (en) Data file storage method, device, equipment and storage medium
TWI758223B (en) Computing method with dynamic minibatch sizes and computing system and computer-readable storage media for performing the same
CN109992196B (en) Index data storage method and device and storage system
CN116627659A (en) Model check point file storage method, device, equipment and storage medium
WO2023197460A1 (en) Image recognition method and apparatus, electronic device, and storage medium
CN113064554B (en) Optimal storage node matching method, device and medium based on distributed storage
CN115774605A (en) Kubernetes prediction type elastic expansion method and system
CN116976440A (en) Model reasoning method, device, computer equipment, storage medium and product
CN112527473B (en) Distributed transaction processing method and device
CN113836238A (en) Batch processing method and device for data commands
CN112955906B (en) Neural network layer grouping method, device, equipment, storage medium and program product
WO2022171002A1 (en) Task processing method and apparatus, many-core system, and computer-readable medium
CN111767204A (en) Overflow risk detection method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination