CN114968594B

CN114968594B - Task processing method, device, electronic equipment and storage medium

Info

Publication number: CN114968594B
Application number: CN202210667202.6A
Authority: CN
Inventors: 赵蓉; 张伟豪; 马松辰; 施路平
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2024-04-23
Anticipated expiration: 2042-06-13
Also published as: CN114968594A

Abstract

The disclosure relates to a task processing method, a task processing device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an intermediate representation model of a task to be processed; grouping the processing nodes to obtain a processing node group; determining a target processing component and a first mapping mode according to the performance parameters of the processing component, the resource requirements of the processing node group and the connection relation among the processing nodes; determining a target processing core and a second mapping mode according to the performance parameters of the processing core, the resource requirements of the processing nodes and the connection relation among the processing nodes; and processing the task to be processed according to the first mapping mode, the second mapping mode and the target processing core to obtain a processing result. According to the task processing method disclosed by the embodiment of the invention, the tasks to be processed can be grouped, the mapping relation and the processing time sequence are determined, the mapping of each level of processing resources is realized, and the processing efficiency of the tasks is higher.

Description

Task processing method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a task processing method, a task processing device, electronic equipment and a storage medium.

Background

A many-core chip refers to a chip that includes multiple processing cores that can independently execute different instructions or process different data. Because of the characteristics of high parallelism and high expandability of many-core chips, many-core chips have great potential in the scene of acceleration of neural network model processing tasks, and more special many-core acceleration chips for the neural network are designed. But because of the flexibility of many-core chips, the flexibility of deploying each node of a task in each processing core for the same task becomes greater, i.e., there are more possible deployment schemes. The method brings huge optimization space for the deployment of the neural network on the many-core chip and increases the difficulty of deployment.

Mapping is a key step in the deployment process, i.e., specifying what processing each processing core performs at each time instant. More specifically, what instruction is executed by each core at each time, what data is stored, sent, operated on, or received, and what operation is executed.

The neural network mapping schemes in the related art are mostly directed to a neural network accelerator with a PE (Process Element) array or a MAC array (multiplier and accumulation, multiply-accumulate unit) as a core. Including designing the execution sequence (time mapping) of the operators on the PE array, the parallel unfolding mode (space mapping) of the operators on the PE array, and the transmission and buffering modes of input values, weights, intermediate data and the like on the PE array. A typical technique in this regard is a round robin control (for-loop) based scheduling optimization technique.

However, the solutions in the related art are not fully applicable to many-core chips, and the many-core chip for performing the processing task of the neural network model is composed of a plurality of processing cores, and the processing cores include a massively parallel computing component with a PE array (or a MAC array) as a processing core. The chips can form a chip array, so that a layering processing resource system of chip array, chip, processing core and PE array is formed. While the approach in the related art is applicable to mapping within a processing core, it is not applicable to mapping of tasks in higher level processing resources. So that the algorithm flow aiming at a specific hardware structure or a specific mapping mode has poor universality and inter-level iteration.

Disclosure of Invention

The disclosure provides a task processing method, a task processing device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a task processing method, including: obtaining an intermediate representation model of a task to be processed, wherein the intermediate representation model comprises processing nodes of the task to be processed and connection relations among the processing nodes; grouping each processing node in the intermediate representation model to obtain a processing node group; determining a target processing component for processing the processing node group according to performance parameters of each processing component in processing resources, resource requirements of the processing node group and connection relations among the processing nodes, and determining a first mapping mode of the target processing component, wherein the first mapping mode represents specific corresponding relations among each target processing component, the processing node group and working time sequences of the target processing components; determining a target processing core in the target processing component according to performance parameters of the processing cores in the target processing component, resource requirements of all processing nodes in the processing node group and connection relations among the processing nodes, and determining a second mapping mode of the target processing core, wherein the target processing core is used for processing all processing nodes in the processing node group, and the second mapping mode represents specific corresponding relations among all target processing cores and the processing nodes and working time sequences of the target processing cores; and processing the task to be processed according to the first mapping mode, the second mapping mode and the target processing core to obtain a processing result.

In one possible implementation, grouping the processing nodes in the intermediate representation model to obtain a processing node group includes: and grouping according to at least one of the data transmission quantity in the connection relation, the processing resource requirement of each processing node, the storage resource requirement of each processing node, the type of each processing node and the communication times among the processing node groups to obtain the processing node groups.

In a possible implementation manner, the task to be processed includes a task processed through at least one neural network model, wherein each processing node in the intermediate representation model is grouped to obtain a plurality of processing node groups, including: and grouping according to the neural network model to which the processing nodes belong to obtain the processing node group, or dividing each processing node into one processing node group respectively.

In one possible implementation manner, the performance parameters of the processing components include the functions and computing power of the processing components, wherein the target processing components for processing the processing node group are determined according to the performance parameters of each processing component in the processing resources, the resource requirements of the processing node group and the connection relation between the processing nodes, and the target processing components comprise at least one of the following: determining the target processing component according to the type of each processing node in the processing node group and the function of each processing component; determining the target processing component according to the resource requirement of the processing node group and the computing power of each processing component; determining the target processing component according to the resource requirement of the processing node group and the resource utilization rate of each processing component; determining the target processing component according to the resource requirement of the processing node group and the energy consumption of each processing component; and determining the target processing component according to the resource requirement of the processing node group and the processing time of each processing component.

In one possible implementation manner, the determining the first mapping manner of the target processing component includes: determining a data transmission path between the target processing components according to at least one of storage resources of the processing components, resource utilization rates of the processing components and whether processing processes of the processing components generate deadlocks and a connection relation between the processing nodes; and determining the first mapping mode according to the data transmission path of the target processing component.

In one possible implementation, the method further includes: and adjusting at least one of the target processing component, the first mapping mode, the target processing core and the second mapping mode according to the process parameters in the task processing process to be processed, wherein the process parameters comprise at least one of the resource utilization rate of the target processing component, the energy consumption of the target processing component, the processing time of the target processing component, the data transmission time among the target processing components, the resource utilization rate of the target processing core, the processing time of the target processing core and the data transmission time among the target processing cores.

In one possible implementation, the method further includes: and adjusting the processing node group according to the process parameters in the processing process of the task to be processed, wherein the process parameters comprise at least one of the resource utilization rate of the target processing components, the energy consumption of the target processing components, the processing time of the target processing components, the data transmission time among the target processing components, the resource utilization rate of the target processing cores, the processing time of the target processing cores and the data transmission time among the target processing cores.

In one possible implementation, the method further includes: receiving a mapping instruction, wherein the mapping instruction comprises an instruction for determining the target processing component, the first mapping mode, the target processing core and the second mapping mode; and processing the task to be processed according to the mapping instruction to obtain a processing result.

According to an aspect of the present disclosure, there is provided a task processing device including: the system comprises an intermediate representation model acquisition module, a processing module and a processing module, wherein the intermediate representation model is used for acquiring an intermediate representation model of a task to be processed, and the intermediate representation model comprises processing nodes of the task to be processed and connection relations among the processing nodes; the grouping module is used for grouping all the processing nodes in the intermediate representation model to obtain a processing node group; the first mapping module is used for determining a target processing component for processing the processing node group according to performance parameters of each processing component in processing resources, resource requirements of the processing node group and connection relations among the processing nodes, and determining a first mapping mode of the target processing component, wherein the first mapping mode represents specific corresponding relations among each target processing component, the processing node group and working time sequences of the target processing components; a second mapping module, configured to determine a target processing core in the target processing component according to a performance parameter of the processing core in the target processing component, a resource requirement of each processing node in the processing node group, and a connection relationship between the processing nodes, and determine a second mapping manner of the target processing core, where the target processing core is configured to process each processing node in the processing node group, and the second mapping manner represents a specific correspondence relationship between each target processing core and the processing node and a time sequence of operation of the target processing core; and the processing module is used for processing the task to be processed according to the first mapping mode, the second mapping mode and the target processing core to obtain a processing result.

In one possible implementation, the grouping module is further configured to: and grouping according to at least one of the data transmission quantity in the connection relation, the processing resource requirement of each processing node, the storage resource requirement of each processing node, the type of each processing node and the communication times among the processing node groups to obtain the processing node groups.

In one possible implementation, the tasks to be processed include tasks processed by at least one neural network model, and the grouping module is further configured to: and grouping according to the neural network model to which the processing nodes belong to obtain the processing node group, or dividing each processing node into one processing node group respectively.

In one possible implementation, the performance parameters of the processing component include a function and a computing power of the processing component, and the first mapping module is further configured to at least one of: determining the target processing component according to the type of each processing node in the processing node group and the function of each processing component; determining the target processing component according to the resource requirement of the processing node group and the computing power of each processing component; determining the target processing component according to the resource requirement of the processing node group and the resource utilization rate of each processing component; determining the target processing component according to the resource requirement of the processing node group and the energy consumption of each processing component; and determining the target processing component according to the resource requirement of the processing node group and the processing time of each processing component.

In one possible implementation, the first mapping module is further configured to: determining a data transmission path between the target processing components according to at least one of storage resources of the processing components, resource utilization rates of the processing components and whether processing processes of the processing components generate deadlocks and a connection relation between the processing nodes; and determining the first mapping mode according to the data transmission path of the target processing component.

In one possible implementation, the apparatus further includes: the first adjusting module is configured to adjust at least one of the target processing component, the first mapping manner, the target processing core and the second mapping manner according to a process parameter in the task processing process to be processed, where the process parameter includes at least one of a resource utilization rate of the target processing component, an energy consumption of the target processing component, a processing time of the target processing component, a data transmission time between the target processing components, a resource utilization rate of the target processing core, a processing time of the target processing core, and a data transmission time between the target processing cores.

In one possible implementation, the apparatus further includes: the second adjusting module is configured to adjust the processing node group according to a process parameter in the processing process of the task to be processed, where the process parameter includes at least one of a resource utilization rate of the target processing component, an energy consumption of the target processing component, a processing time of the target processing component, a data transmission time between the target processing components, a resource utilization rate of the target processing core, a processing time of the target processing core, and a data transmission time between the target processing cores.

In one possible implementation, the apparatus further includes: the input module is used for receiving mapping instructions, wherein the mapping instructions comprise instructions for determining the target processing component, the first mapping mode, the target processing core and the second mapping mode; and processing the task to be processed according to the mapping instruction to obtain a processing result.

According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

According to the task processing method of the embodiment of the disclosure, tasks to be processed can be grouped, the target processing components and the first mapping mode of the processing node group are determined, so that mapping of the tasks to be processed is applicable to processing resources of a hierarchy higher than the processing cores, the target processing cores and the second mapping mode of the processing nodes can be determined, mapping of each hierarchy of the processing resources is realized, universality of task processing flows is higher, and execution efficiency is higher.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

FIG. 1 illustrates a flow chart of a task processing method according to an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a processing resource according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an intermediate representation model according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a packet according to an embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of a spatio-temporal unit according to an embodiment of the present disclosure;

FIG. 6 shows a schematic diagram of a timing sequence according to an embodiment of the present disclosure;

FIG. 7 illustrates an application schematic of a task processing method according to an embodiment of the present disclosure;

FIG. 8 illustrates a block diagram of a task processing device according to an embodiment of the present disclosure

FIG. 9 illustrates a block diagram of an electronic device, according to an embodiment of the present disclosure;

Fig. 10 shows a block diagram of an electronic device, according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Fig. 1 shows a flowchart of a task processing method according to an embodiment of the present disclosure, as shown in fig. 1, the method including:

In step S11, an intermediate representation model of a task to be processed is obtained, where the intermediate representation model includes processing nodes of the task to be processed and connection relationships between the processing nodes;

In step S12, grouping the processing nodes in the intermediate representation model to obtain a processing node group;

In step S13, determining a target processing component for processing the processing node group according to performance parameters of each processing component in the processing resources, resource requirements of the processing node group and connection relation among the processing nodes, and determining a first mapping mode of the target processing component, where the first mapping mode represents a specific corresponding relation between each target processing component and the processing node group and a time sequence of operation of the target processing component;

In step S14, determining a target processing core in the target processing component according to the performance parameters of the processing cores in the target processing component, the resource requirements of each processing node in the processing node group, and the connection relationship between the processing nodes, and determining a second mapping manner of the target processing core, where the target processing core is configured to process each processing node in the processing node group, and the second mapping manner represents a specific correspondence relationship between each target processing core and the processing node and a time sequence of the operation of the target processing core;

In step S15, the task to be processed is processed according to the first mapping manner, the second mapping manner and the target processing core, so as to obtain a processing result.

In one possible implementation, the task execution method is directed to many-core chips, including but not limited to many-core neural network acceleration chips, neuromorphic chips, brain-like many-core chips, many-core graphics processing chips (GPUs), many-core CPU chips with vector or matrix acceleration units. The neural network-oriented models include, but are not limited to, artificial Neural Networks (ANNs), impulse neural networks (SNNs), hybrid Neural Networks (HNNs), dynamic neural networks, and multiple neural network models. The facing tasks to be processed comprise processing tasks of a neural network model, including a forward processing process, a reverse training process and a neural architecture searching process of the neural network. The method can be applied to neural network-like tasks, such as multi-agent simulation, chip simulation, brain simulation, graphics, scientific calculation, parallel search algorithm and the like.

Fig. 2 shows a schematic diagram of a processing resource according to an embodiment of the present disclosure, as shown in fig. 2, in a processing resource composed of many-core chips, a chip array for performing tasks to be processed (e.g., neural network model processing tasks) may be included, that is, an array including a plurality of many-core chips, where each many-core chip may include a plurality of processing cores, for example, an array of processing cores, and each processing core may include a MAC array or a PE array for executing specific processing instructions therein. When executing the task to be processed through the processing resources, especially when processing the task by using a more complex neural network model, the mapping relationship between each processing node of the task to be processed and the processing resources (such as a many-core chip or a processing core) of each level can be determined, so that the task to be processed can be executed in a more flexible and efficient manner. In an example, the processing resources of each level may be mapped layer by layer, for example, task nodes of a task to be processed may be grouped, and then each processing node group may be mapped to each many-core chip, and in the many-core chip, processing nodes in the processing node group may be mapped to each processing core of the many-core chip, so that each processing node may be processed by each processing core, thereby improving processing flexibility and parallelism, and improving processing efficiency.

In one possible implementation, when processing a task to be processed by the processing resources having multiple levels as described above, a mapping relationship between each processing resource and a processing node of the task to be processed may be determined level by level, for example, which step in the task to be processed is determined by which processing resource. In an example, the task to be processed is a neural network model processing task, in the task, one or more neural network models can be operated, and data interaction may exist between the neural network models, so that mapping between the task to be processed and processing resources is complex.

In one possible implementation, in step S11, an intermediate representation model of the task to be processed may be obtained, where the intermediate representation model includes processing nodes of the task to be processed and connection relations between the processing nodes. The processing nodes may represent steps of a task to be processed, operators of operations performed, etc., e.g., in a neural network model processing task, each network level of the neural network model may act as a processing node. The connection relation may connect each processing node in the intermediate representation model, and the connection relation has directivity, and may represent a data transmission path between the processing nodes, or a data dependency relation, for example, in a neural network model processing task, a processing node corresponding to one network level points to a processing node corresponding to the next network level, which indicates that data output by the one network level is transmitted to the next network level, or indicates that output data of a processing node corresponding to the one network level is transmitted to a processing node corresponding to the next network level, or indicates that a processing node corresponding to the next network level has a data dependency relation to a processing node of the one network level.

Fig. 3 illustrates a schematic diagram of an intermediate representation model according to an embodiment of the present disclosure, as illustrated in fig. 3, a task to be processed may include a processing task to a neural network model 1 and a neural network model 2, wherein the neural network model 1 may include 5 processing nodes, the neural network model 2 may include 4 processing nodes, and a connection relationship between the processing nodes is illustrated in fig. 3. Further, since there is data interaction between the 2 nd processing node of the neural network model 1 and the 2 nd processing node of the neural network model 2, there is a data dependency relationship between the 2 nd processing node of the neural network model 1 and the 2 nd processing node of the neural network model 2 in addition to the data dependency relationship determined by the connection relationship in the neural network model.

In one possible implementation, in step S12, as described above, in order to make the relationship between processing nodes clearer, mapping of tasks to be processed and processing resources is facilitated, for example, the processing nodes may be grouped in order to facilitate allocation of processing components (e.g., many-core chips) for a processing node group. In grouping processing nodes, various grouping bases may be considered, for example, closely connected processing nodes may be grouped, processing nodes with consistent or similar executed processing instructions (e.g., consistent or similar executed algorithms) may be grouped, processing nodes belonging to the same neural network model may be grouped, etc., and the grouping bases of the present disclosure are not limited.

In one possible implementation, step S12 may include: and grouping according to at least one of the data transmission quantity in the connection relation, the processing resource requirement of each processing node, the storage resource requirement of each processing node, the type of each processing node and the communication times among the processing node groups to obtain the processing node groups.

In one possible implementation, the amount of data transmission in the connection relationship may be considered in the packet. The connection relationship may represent a data dependency relationship between processing nodes, and data may be transferred between processing nodes having the data dependency relationship. After grouping, different processing node groups may be mapped to different processing components (e.g., many-core chips), and if two processing nodes with larger data transmission amounts are grouped into different groups, the data transmission amount between the processing components is larger, thereby putting a larger pressure on transmission bandwidth. Therefore, two or more processing nodes having a large data transmission amount in the connection relationship can be divided into the same group when the processing nodes are grouped. In an example, a data transfer amount threshold may be set, and two or more processing nodes whose data transfer amounts are greater than or equal to the data transfer amount threshold may be grouped into the same group.

In one possible implementation, the processing resource requirements of each processing node may be considered in the packet. The processing resources of each processing component are limited and therefore the processing resource requirements required by the processing nodes in each group should not be greater than the processing resources that the processing component can provide. Alternatively, to increase the processing efficiency of each processing component, the total amount of processing resources required by the processing nodes in each group may be not greater than a predetermined threshold, for example, the predetermined threshold may be less than the total amount of processing resources that the processing component can provide. The specific setting mode of the preset threshold is not limited in the present disclosure.

In one possible implementation, the storage resource requirements of each processing node may be considered in the packet. During task processing, each processing node may generate data, e.g., each level of the neural network model may generate output results, or may require data, e.g., each level of the neural network model may require weight data and input data, so that during task execution, processing components may store the data for use during processing, or call from memory and send when needed to other components. The memory resources of each processing component are limited and therefore the memory resource requirements required by the processing nodes in each group should not be greater than the memory resources available to the processing component. The total amount of storage resource requirements required by the processing nodes in each group may be no greater than some preset threshold, e.g., the preset threshold may be less than the total amount of storage resources that the processing components can provide. The specific setting mode of the preset threshold is not limited in the present disclosure.

In one possible implementation, the types of processing nodes may be considered in the packet, and in an example, the processing manners required by the processing nodes of the same or similar type are the same or similar, so that the processing nodes of the same or similar type may be divided into the same group, so that the processing efficiency of the processing component processing the group of processing nodes is improved.

In one possible implementation, factors such as the number of communications between groups of processing nodes, or the number of communications between processing components, may be considered in the packet. In an example, the above neural network model 2 may include 4 processing nodes, namely, processing node 1, processing node 2, processing node 3, and processing node 4. And the connection relationship between the processing nodes is that the processing node 1 points to the processing node 2, the processing node 2 points to the processing node 3, and the processing node 3 points to the processing node 4. In the grouping, if the processing nodes 1 and 2 are grouped into one group and the processing nodes 3 and 4 are grouped into one group, only the processing node 2 is required to transmit data to the processing node 3 between the two groups of processing nodes, and therefore, the processing component that processes the processing nodes 1 and 2 is also required to transmit data to the processing component that processes the processing nodes 3 and 4, that is, the number of communications between the processing components is 1. If processing nodes 1 and 4 are divided into one group, processing nodes 2 and 3 are divided into one group, between the two groups, processing node 1 transmits data to processing node 2 and processing node 3 transmits data to processing node 4, and therefore, a processing component that processes processing nodes 1 and 4 needs to transmit data to a processing component that processes processing nodes 2 and 3 (i.e., processing node 1 transmits data to processing node 2), and a processing component that processes processing nodes 2 and 3 also needs to transmit data to a processing component that processes processing nodes 1 and 4 (i.e., processing node 3 transmits data to processing node 4), the number of communications between processing components is 2 times. Thus, to reduce the communication pressure between processing components, processing node 1 and processing node 2 may be grouped together, and processing node 3 and processing and node 4 may be grouped together. The above grouping is merely an example, and the present disclosure is not limited to a specific grouping.

In one possible implementation, the above factors are merely examples, and one or more of the above factors may be considered in grouping, although other relevant factors may be considered, and the disclosure is not limited to factors considered in grouping.

In one possible implementation, the step of grouping above is an optional step, and the step of grouping above may not be performed, but a default grouping may be directly used. In an example, the task to be processed includes a task processed through at least one neural network model, and step S12 may include: and grouping according to the neural network model to which the processing nodes belong to obtain the processing node group, or dividing each processing node into one processing node group respectively. That is, each processing node may correspond to a network hierarchy or an operator in the neural network model, and in a default grouping mode, processing nodes belonging to the same neural network may be directly divided into the same processing node group. Of course, each processing node may be divided into a respective group of processing nodes. The present disclosure does not limit the default grouping.

Fig. 4 shows a schematic diagram of a packet according to an embodiment of the present disclosure, as shown in fig. 4, with processing nodes 1 and 2 of neural network model 1 grouped together, and processing nodes 3, 4, and 5 of neural network model 1 grouped together. The processing nodes 1 and 2 of the neural network model 2 are divided into a group, and data interaction exists between the processing nodes 1 and 2 of the neural network model 1 and the processing node group formed by the processing nodes 1 and 2, and the processing nodes 3 and 4 of the neural network model 2 are divided into a group.

In one possible implementation, after grouping processing nodes, a mapping relationship between processing components and groups of processing nodes may be determined. In step S13, a target processing component for processing the processing node group may be determined according to the performance parameters of each processing component in the processing resources and the resource requirements of the processing node group. That is, each processing node group is assigned a processing component, e.g., a many-core chip, to implement a mapping relationship at the level of the many-core chip.

In an example, during the allocation process, splitting, copying, fusing, etc. may be performed on the set of processing nodes. For example, if a certain processing node group includes a larger number of processing nodes or requires a larger number of resources such as operations, storage, bandwidth, etc., the processing node group may be split into two processing node groups and mapped to two processing components respectively. For example, if two or more processing node groups include fewer processing nodes or resources such as required operations, storage, bandwidth, etc., multiple processing node groups can be processed by the same processing component, two or more processing node groups may be merged into one processing node group and mapped to one processing component. For another example, if the output data amount of a certain processing node group is larger and needs to be output to a plurality of other processing components respectively, the processing node group can be duplicated, and the duplicated processing node group can be mapped to two or more processing components respectively, so that the data output by the two or more processing components can be transmitted to the plurality of other processing components simultaneously, thereby reducing the waiting time in the data transmission process and improving the transmission efficiency.

In one possible implementation, a number of factors may be considered in the mapping process to determine the mapping relationship between the set of processing nodes and the processing component. The performance parameters of the processing component include the functionality and computing power of the processing component. The step S13 may include one of the following: determining the target processing component according to the type of each processing node in the processing node group and the function of each processing component; determining the target processing component according to the resource requirement of the processing node group and the computing power of each processing component; determining the target processing component according to the resource requirement of the processing node group and the resource utilization rate of each processing component; determining the target processing component according to the resource requirement of the processing node group and the energy consumption of each processing component; and determining the target processing component according to the resource requirement of the processing node group and the processing time of each processing component. That is, in determining the target processing component, in other words, the mapping relationship between the processing component and the set of processing nodes, the mapping may be based on any one or more of the above factors.

In one possible implementation, the type of each processing node in the set of processing nodes and the functionality of each processing component may be considered in determining the mapping relationship. In an example, each processing node in the processing node group may include a plurality of nodes, and an algorithm required to perform the processing steps of the processing node may also include a plurality of nodes, and thus, a processing component having a function of processing all the processing nodes in the processing node group may be determined among the processing components.

In one possible implementation, the resource requirements of the set of processing nodes and the computational power (i.e., processing resources) of each processing component may be considered in determining the mapping relationship. The processing resources of each processing component are limited and therefore the processing resource requirements required by the processing nodes in the processing node group should not be greater than the processing resources provided by the processing components. If the processing resources that can be provided by the processing components are different from each other, then it is necessary to consider whether, in mapping, the processing resources that can be provided by a certain processing component are greater than the processing resources required by each processing node in the set of processing nodes.

In one possible implementation, the resource requirements of the set of processing nodes and the resource utilization of each processing component may be considered in determining the mapping relationship. The resource requirements may include storage resource requirements and processing resource requirements, and the resource utilization may include processing resource utilization and storage resource utilization. In an example, when determining the mapping relationship, each processing component may have a relatively uniform resource utilization rate when processing the processing nodes in the processing node group, and the idle time of each processing component is relatively short, that is, the situations that a part of processing component operations occur and another part of processing components are idle and the situations that data transmission and/or data storage are waited for are reduced, so that the overall resource utilization rate and the overall processing efficiency of the processing resource are improved.

In one possible implementation, the resource requirements of the set of processing nodes and the power consumption of each processing component may be considered in determining the mapping relationship. The resource requirements of the processing node may include energy consumption requirements, e.g., the more computing steps are required, the more complex the computation, the greater the amount of computing resources required to be invoked and the higher the energy consumption. In an example, the processing complexity of each processing component may be relatively balanced such that the overall energy consumption of the processing resource is balanced to reduce the overall energy consumption of the processing resource.

In one possible implementation, the resource requirements of the set of processing nodes and the processing time of each processing component may be considered in determining the mapping relationship. The resource requirements of a processing node group include the requirements of operation time, e.g., the time required to perform an operation for some type of operation, as well as the operation resources that the processing node group can provide. The processing time of each processing component in the same time sequence can be made to be close to or equal to reduce the waiting time of part of processing components in the same time sequence, and the overall processing efficiency of the processing resource is improved.

In an example, the processing components in the same timing sequence may include a plurality of processing components, for example, the processing results of the processing node group a and the processing node group B are each sent to the processing node group C, that is, the execution of the processing node group C depends on the processing results of the processing node group a and the processing node group B, and then the processing components corresponding to the processing node group a and the processing components corresponding to the processing node group B are in the same timing sequence, and the timing sequence of the processing components corresponding to the processing node group C is subsequent to the timing sequence of the processing components corresponding to the processing node group a and the processing components corresponding to the processing node group B. Therefore, the processing node group A and the processing node group B with the same time sequence can consider the problem of processing time in mapping, so that the processing time in the two processing components is close to or equal to reduce the situation that one processing component waits for the operation of the other processing component to be completed before the next time sequence processing can be performed.

In the example, the operation time required by the processing node group a is 1 ms, the operation time required by the processing node group B is 0.5 ms, the processing node group a can be mapped into a single processing component a, the processing node group B can be mapped into a processing component B for processing other processing node groups requiring 0.5 ms, when the processing component a starts to execute, the processing component B starts to execute other processing node groups, when the execution of the other processing node groups is completed, the processing component B starts to execute the processing node group B, when the execution of the processing node group B is completed, the processing component B executes the two groups of processing node groups for 1 ms in common, and the total time of the processing component a is also 1 ms, so that the two times are equal, thereby reducing the condition that one processing component waits for the other processing component, and improving the processing efficiency of the whole processing resource.

In one possible implementation manner, the above reference factors when determining the mapping relationship are all examples, when determining the mapping relationship, the mapping may be performed with reference to one or more of the above factors, or with reference to other reasonable factors, and the disclosure does not limit the reference factors for determining the mapping relationship.

In one possible implementation, after determining the mapping relationship between the processing node groups and the processing components, that is, after determining the target processing components for processing each processing node group, the first mapping manner of each target processing component may also be determined. The first mapping mode represents specific corresponding relation between each target processing component and the processing node group and time sequence of work of the target processing component.

Fig. 5 illustrates a schematic diagram of a space-time cell, as shown in fig. 5, the abscissa of fig. 5 may represent a space cell, may represent various levels of processing resources, e.g., a chip array in a larger scale space cell, a chip in a medium scale space cell, e.g., a processing component or many-core chip, and a processing core (i.e., a "core" in fig. 0, 5) in a smaller scale space cell, according to an embodiment of the present disclosure. The ordinate of fig. 5 may represent time units, may represent processing timings of processing resources of various levels, for example, in a larger scale time unit (e.g., time unit 3), may represent timings of processing of a group of processing nodes by a chip, in a middle scale time unit (e.g., time unit 2), may represent processing timings of processing nodes by a processing core, and in a smaller scale time unit, may represent processing timings of operators in processing nodes by a PE array or a MAC array within the processing core.

Fig. 6 shows a schematic diagram of a timing sequence according to an embodiment of the present disclosure. As shown in fig. 6, in the leftmost space-time diagram of fig. 6, two points may represent two processing node groups or processing nodes having a dependency relationship. The two space-time diagrams in the middle and the right may represent two mapping relations, respectively. As shown in the space-time diagram in the middle of fig. 6, the processing node group or processing node in the upper left corner is processed in the 1 st time unit by the 1 st processing component or processing core, after the processing is completed, the processing result may be directly sent to the 3 rd processing component or processing core in the 1 st time unit, and stored in the 3 rd processing component or processing core to wait until the 4 th time unit, and the 3 rd processing component or processing core processes another processing node group or processing node based on the processing result. As shown in the space-time diagram on the right side of fig. 6, the processing node group or processing node in the upper left corner is processed in the 1 st time unit by the 1 st processing component or processing core, the processing result is directly saved after the processing is completed, and is sent to the 2 nd processing component or processing core when waiting for the 2 nd time unit, and the 2 nd processing component or processing core can save the processing result and send to the 3 rd processing component or processing core when waiting for the 4 th time unit, so that the 3 rd processing component or processing core can process another processing node group or processing node based on the processing result. The above timings are merely examples, and specific processing timings and data transmission timings of the present disclosure are not limited.

In one possible implementation, as can be seen from the above examples, the processing of the processing node or the processing node group and the transfer of the data may be constrained by the timing, and in determining the timing, the processing order of the processing node or the processing node group by each processing component or processing core, whether there are enough memory resources in the processing component or the processing core to store the data, etc., are factors that may be considered in determining the timing.

In one possible implementation, step S13 may include: determining a data transmission path between the target processing components according to at least one of storage resources of the processing components, resource utilization rates of the processing components and whether processing processes of the processing components generate deadlocks and a connection relation between the processing nodes; and determining the first mapping mode according to the data transmission path of the target processing component.

In one possible implementation, when determining the first mapping manner, the storage resources of each processing component may be considered, for example, after a certain processing component generates data, it needs to be sent to another processing component in another time unit, and then it needs to be considered whether the storage resources of itself or the storage resources of the other processing component can store the data until the other processing component uses the data in the another time unit. Thus, the first mapping manner may be determined based on the storage resources of each processing component to determine the transmission direction and storage location of the data.

In one possible implementation, the resource utilization of each processing component may be considered in determining the first mapping manner. For example, after a certain processing component generates data, the data needs to be sent to another processing component in another time unit, and the resource utilization rate of the processing component, i.e. other processing components, needs to be considered, if the processing resource utilization rate of the processing component is high, no more storage resources store the data to the other time unit, and the data can be sent to the processing component with low storage resource utilization rate for storage. For another example, if the communication bandwidth in the current time unit is busy and the communication bandwidth is not busy in some time unit between the other time units, the data may not be transmitted in the current time unit but in some time unit between the other time units to reduce the communication pressure of the current time unit without delaying the use of the data by the other processing component in the other time units.

In one possible implementation, when determining the first mapping manner, whether the processing process generates a deadlock may be considered, if the processing process of the processing node group or the data transmission process generates a deadlock, the first mapping manner may be changed, so as to avoid the deadlock, and reduce the probability that the processing process cannot continue.

In one possible implementation, the above factors are examples, and other reasonable factors may be considered when determining the first mapping manner. The present disclosure is not limited by the factors that determine the first mapping manner. The data transmission path, i.e., the transmission direction of the data, and the waiting time in each processing component may be determined based on at least one of the above factors and the connection relationship between the processing nodes, so that the data is transmitted and processed in the respective processing components. And a first mapping scheme, i.e., the timing of data transfer, storage, and processing between the processing components, may be determined based on the data transfer paths.

In one possible implementation, in step S14, a mapping relationship between the processing cores in each processing component and the processing nodes in each processing node group may be determined in a similar manner to the above determination of the target processing component, and a second mapping manner of the processing cores may be determined in a similar manner to the above first mapping manner.

In an example, in determining a mapping relationship between a processing core and a processing node, at least one of the following factors may be considered: a function of a processing node type and a processing core; processing the resource requirements of the nodes and the computational power of the processing cores; the resource demand of the processing node and the resource utilization of the processing core; the resource requirement of the processing node and the energy consumption of the processing core; the resource requirements of the processing nodes and the processing time of the processing cores. The factors considered in the above mapping are merely examples, and other reasonable factors may also be considered in determining the mapping relationship, which is not limited by the present disclosure.

In another example, determining the second mapping manner of the processing cores may consider at least one of memory resources of the processing sums, resource utilization of the processing cores, and whether processing processes of the processing cores produce deadlocks, so that a data transmission path between the processing cores may be determined based on the above factors and a connection relationship between the processing nodes. And further determines a second mapping scheme, i.e., timing of data transfer, preservation and processing between the processing cores, based on the data transfer paths.

In one possible implementation, in step S15, the processing may be performed based on the first mapping manner, the second mapping manner, and the target processing cores determined above, for example, the processing order and the data transmission path of each target processing component may be determined according to the first mapping manner, and then, the refinement may be performed layer by layer, that is, in a more refined time unit of the first mapping manner, the processing order and the data transmission path of each processing core in the target processing component may be determined according to the second mapping manner. Thus, the detailed processing order and data transfer paths of the individual processing cores may be determined. Furthermore, each processing node of the task to be processed may be processed by each processing core.

In an example, there is still a complex and diverse implementation of processing nodes on a PE array, or MAC array, within a processing core. In an example, one or more processing nodes (e.g., operators of a neural network model) that a certain processing core needs to execute at a certain moment may be expressed in the form of multi-layer loop control, and a mapping process within the processing core may be expressed as transforming each layer of loop, for example, changing one layer of loop into two layers of loop, fully expanding one layer of loop into parallel operation, merging two layers of loop, tilting the multi-layer loop (skew), exchanging the order of the two layers of loop, changing the position of a data cache in the loop, and the like, through the mapping process, the process executed by each layer of loop may be executed with higher parallelism through a PE array or a MAC array, thereby improving the execution efficiency. The mapping of the one or more operators on the PE array or the MAC array and the data transmission path may be determined by the transformation, for example, mapping the processing performed per layer of the loop to each arithmetic unit on the PE array or the MAC array.

In one possible implementation, the above mapping, determining the processing timing and executing the process may be performed automatically by a computer, but in the mapping and determining the processing timing, there may be problems in the mapping between the processing node group and the processing component, the determination of the first mapping manner, the mapping between the processing node and the processing core, the determination of the second mapping manner, etc., for example, that the mapping is unreasonable resulting in a lower processing efficiency, or the timing is unreasonable resulting in a lower processing efficiency, etc. Therefore, the above procedure can be adjusted to promote rationality in grouping, mapping and timing procedures and to improve processing efficiency.

In one possible implementation, based on the process parameters, parameters such as efficiency of processing performed in each processing core and each processing component, resource utilization during data transfer and data storage, and energy consumption may be determined. If the processing in some of the processing cores or processing components is not reasonable, it may be the result of unreasonable processing timing or mapping procedures, and therefore, the mapping procedures and/or processing timing may be adjusted based on the above-described process parameters, e.g., may be manually adjusted, or may be automatically adjusted by a computer based on the process parameters, and in an example, feedback adjustment may be performed with reference to a training manner of the neural network model, etc., which is not a limitation of the adjustment manner of the present disclosure. By adjusting the process, the overall processing efficiency of the processing resources can be improved.

In one possible implementation, in addition to the above adjustments to the timing and mapping relationships, the grouping of processing nodes may be adjusted based on the above process parameters. Similarly, manual adjustments may be made, or automatic adjustments may be made by a computer based on process parameters, such that overall processing efficiency of the processing resource may be improved after adjustments are made to the packet.

In one possible implementation, the grouping, mapping and timing determination may be performed manually, in addition to the above-described processes of automatically performing the grouping, mapping, timing determination and processing by a computer. And executing, by the processing resource, the task to be processed based on the results of the manual grouping, mapping, and timing determination. The method further comprises the steps of: receiving a mapping instruction, wherein the mapping instruction comprises an instruction for determining the target processing component, the first mapping mode, the target processing core and the second mapping mode; and processing the task to be processed according to the mapping instruction to obtain a processing result.

In an example, the mapping instructions may be manually input into a computer, and the processes of grouping, mapping, and determining timing may be performed based on the mapping instructions. For example, the computer may display a space-time state, such as the space-time diagram described above, for reference by the operator. The operator can fill the processing node group in the time-space diagram, the processing nodes and the processing time sequence, the computer can return the time-space diagram filled with the information in real time for the operator to refer to, each processing node is mapped to the processing core, after the time sequence is determined, the processing resource can process the task to be processed based on the mapping relation and the processing time sequence determined manually, and a processing result is obtained.

According to the task processing method of the embodiment of the disclosure, tasks to be processed can be grouped, the target processing components and the first mapping mode of the processing node group are determined, so that mapping of the tasks to be processed is applicable to processing resources higher than the level of the processing cores, the target processing cores and the second mapping mode of the processing nodes can be determined, mapping at each level of the processing resources is realized, universality of task processing flows is higher, and in the processes of grouping, determining mapping relation and processing time sequence, various factors are considered, so that overall execution efficiency of the processing resources is higher. Further, the grouping, mapping relationship and processing time sequence can be adjusted based on the process parameters in the process of executing the processing task, so as to obtain higher overall processing efficiency.

Fig. 7 illustrates an application diagram of a task processing method according to an embodiment of the present disclosure, and as illustrated in fig. 7, a neural network model 1 may include a processing node 1, a processing node 2, a processing node 4, and a processing node 5. The neural network model 2 may include a processing node 1, a processing node 2, a processing node 3, and a processing node 4.

In one possible implementation, the above-mentioned processing nodes may be grouped with reference to the above grouping factors, for example, the processing nodes 1 and 2 of the neural network model 1 may be divided into the processing node group a, the processing nodes 3, 4, and 5 of the neural network model 1 may be divided into the processing node group B, the processing nodes 1 and 2 of the neural network model 2 may be divided into the processing node group C, and the processing nodes 3 and 4 of the neural network model 2 may be divided into the processing node group D.

In one possible implementation, the mapping relationship between the processing node group and the chips may be determined by referring to the above mapping factors, and the first mapping manner of each chip may be determined by referring to the above factors for determining the time sequence. For example, processing node group A may be mapped into chip A, processing node group C into chip C, processing node group D into chip D, and processing node group B into chip B. And, chip a and chip D are in the same time unit, i.e., executed simultaneously, and chip B and chip D are in the same time unit, i.e., executed after chip a and chip C are executed. Further, a first mapping manner between the chips may be determined, for example, based on factors such as memory resources of the chips and data dependency relationships between the chips, for example, a mutual data dependency relationship between the chip a and the chip C, a data dependency relationship of the chip B to the chip a, a data dependency relationship of the chip D to the chip C, and the like.

In one possible implementation, further, a mapping relationship between the processing cores in each chip and the processing nodes in each processing node group, and a second mapping manner of the processing cores may be determined. Taking chip B as an example, the mapping relationship between each node and the processing core within the chip may be determined using factors similar to the above, and the second mapping manner may be determined based on factors similar to the above. For example, two processing nodes are respectively started to execute in the 1 st time unit by two processing cores, the other processing node is divided, and the two processing nodes are respectively executed in the 3 rd time unit and the 4 th time unit according to the data generated by the two processing nodes. In the process of executing the processing nodes by each processing core, multiplication and addition operation can be performed by the MAC array or data operation can be performed by the PE array, operators of the processing nodes can be unfolded, and highly parallel operation can be performed by the operation units in the MAC array or the PE array, so that the processing efficiency is improved. Thereby obtaining the final task processing result.

Fig. 8 shows a block diagram of a task processing device according to an embodiment of the present disclosure, as shown in fig. 8, the device including: an intermediate representation model obtaining module 11, configured to obtain an intermediate representation model of a task to be processed, where the intermediate representation model includes processing nodes of the task to be processed and connection relationships between the processing nodes; a grouping module 12, configured to group each processing node in the intermediate representation model to obtain a processing node group; the first mapping module 13 is configured to determine a target processing component for processing the processing node group according to a performance parameter of each processing component in the processing resource, a resource requirement of the processing node group, and a connection relationship between the processing nodes, and determine a first mapping manner of the target processing component, where the first mapping manner represents a specific correspondence relationship between each target processing component and the processing node group, and a time sequence of operation of the target processing component; a second mapping module 14, configured to determine a target processing core in the target processing component according to a performance parameter of the processing core in the target processing component, a resource requirement of each processing node in the processing node group, and a connection relationship between the processing nodes, and determine a second mapping manner of the target processing core, where the target processing core is configured to process each processing node in the processing node group, and the second mapping manner represents a specific correspondence relationship between each target processing core and the processing node and a time sequence of operation of the target processing core; and the processing module 15 is configured to process the task to be processed according to the first mapping manner, the second mapping manner, and the target processing core, so as to obtain a processing result.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the disclosure further provides an apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the neural network processing method provided in any of the embodiments above.

The disclosed embodiments also provide another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the neural network processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server or other form of device.

Fig. 9 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 9, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only an edge of a touch or slide action, but also a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.

Fig. 10 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 10, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server ^TM,Mac OS X^TM,Unix^TM,Linux^TM,FreeBSD^TM or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.

The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

The computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of task processing, comprising:

Obtaining an intermediate representation model of a task to be processed, wherein the intermediate representation model comprises processing nodes of the task to be processed and connection relations among the processing nodes;

Grouping each processing node in the intermediate representation model to obtain a processing node group;

Determining a target processing component for processing the processing node group according to performance parameters of each processing component in processing resources, resource requirements of the processing node group and connection relations among the processing nodes, and determining a first mapping mode of the target processing component, wherein the first mapping mode represents specific corresponding relations among each target processing component, the processing node group and working time sequences of the target processing components;

Determining a target processing core in the target processing component according to performance parameters of the processing cores in the target processing component, resource requirements of all processing nodes in the processing node group and connection relations among the processing nodes, and determining a second mapping mode of the target processing core, wherein the target processing core is used for processing all processing nodes in the processing node group, and the second mapping mode represents specific corresponding relations among all target processing cores and the processing nodes and working time sequences of the target processing cores;

Processing the task to be processed according to the first mapping mode, the second mapping mode and the target processing core to obtain a processing result;

The performance parameters of the processing component include the functionality and computing power of the processing component,

The method comprises the steps of determining a target processing component for processing the processing node group according to performance parameters of each processing component in processing resources, resource requirements of the processing node group and connection relations among the processing nodes, wherein the target processing component comprises at least one of the following steps:

Determining the target processing component according to the type of each processing node in the processing node group and the function of each processing component;

determining the target processing component according to the resource requirement of the processing node group and the computing power of each processing component;

determining the target processing component according to the resource requirement of the processing node group and the resource utilization rate of each processing component;

Determining the target processing component according to the resource requirement of the processing node group and the energy consumption of each processing component;

determining the target processing component according to the resource requirement of the processing node group and the processing time of each processing component;

The determining the first mapping manner of the target processing component includes:

Determining a data transmission path between the target processing components according to at least one of storage resources of the processing components, resource utilization rates of the processing components and whether processing processes of the processing components generate deadlocks and a connection relation between the processing nodes;

determining the first mapping mode according to the data transmission path of the target processing component;

According to the process parameters in the task processing process to be processed, at least one of the target processing component, the first mapping mode, the target processing core and the second mapping mode is adjusted, wherein the process parameters comprise at least one of the resource utilization rate of the target processing component, the energy consumption of the target processing component, the processing time of the target processing component, the data transmission time among the target processing components, the resource utilization rate of the target processing core, the processing time of the target processing core and the data transmission time among the target processing cores;

And adjusting the processing node group according to the process parameters in the processing process of the task to be processed, wherein the process parameters comprise at least one of the resource utilization rate of the target processing components, the energy consumption of the target processing components, the processing time of the target processing components, the data transmission time among the target processing components, the resource utilization rate of the target processing cores, the processing time of the target processing cores and the data transmission time among the target processing cores.

2. The method of claim 1, wherein grouping processing nodes in the intermediate representation model to obtain a group of processing nodes comprises:

and grouping according to at least one of the data transmission quantity in the connection relation, the processing resource requirement of each processing node, the storage resource requirement of each processing node, the type of each processing node and the communication times among the processing node groups to obtain the processing node groups.

3. The method of claim 1, wherein the task to be processed comprises a task processed through at least one neural network model,

Wherein grouping each processing node in the intermediate representation model to obtain a plurality of processing node groups comprises:

grouping according to the neural network model to which the processing node belongs to obtain the processing node group, or

Each processing node is divided into a respective group of processing nodes.

4. The method according to claim 1, wherein the method further comprises:

receiving a mapping instruction, wherein the mapping instruction comprises an instruction for determining the target processing component, the first mapping mode, the target processing core and the second mapping mode;

and processing the task to be processed according to the mapping instruction to obtain a processing result.

5. A task processing device, comprising:

The system comprises an intermediate representation model acquisition module, a processing module and a processing module, wherein the intermediate representation model is used for acquiring an intermediate representation model of a task to be processed, and the intermediate representation model comprises processing nodes of the task to be processed and connection relations among the processing nodes;

The grouping module is used for grouping all the processing nodes in the intermediate representation model to obtain a processing node group;

The first mapping module is used for determining a target processing component for processing the processing node group according to performance parameters of each processing component in processing resources, resource requirements of the processing node group and connection relations among the processing nodes, and determining a first mapping mode of the target processing component, wherein the first mapping mode represents specific corresponding relations among each target processing component, the processing node group and working time sequences of the target processing components;

a second mapping module, configured to determine a target processing core in the target processing component according to a performance parameter of the processing core in the target processing component, a resource requirement of each processing node in the processing node group, and a connection relationship between the processing nodes, and determine a second mapping manner of the target processing core, where the target processing core is configured to process each processing node in the processing node group, and the second mapping manner represents a specific correspondence relationship between each target processing core and the processing node and a time sequence of operation of the target processing core;

the processing module is used for processing the task to be processed according to the first mapping mode, the second mapping mode and the target processing core to obtain a processing result;

The performance parameters of the processing component include the functionality and computing power of the processing component, and the first mapping module is further configured to at least one of: determining the target processing component according to the type of each processing node in the processing node group and the function of each processing component; determining the target processing component according to the resource requirement of the processing node group and the computing power of each processing component; determining the target processing component according to the resource requirement of the processing node group and the resource utilization rate of each processing component; determining the target processing component according to the resource requirement of the processing node group and the energy consumption of each processing component; determining the target processing component according to the resource requirement of the processing node group and the processing time of each processing component;

the first mapping module is further configured to: determining a data transmission path between the target processing components according to at least one of storage resources of the processing components, resource utilization rates of the processing components and whether processing processes of the processing components generate deadlocks and a connection relation between the processing nodes; determining the first mapping mode according to the data transmission path of the target processing component;

the first adjusting module is configured to adjust at least one of the target processing component, the first mapping manner, the target processing core and the second mapping manner according to a process parameter in the task processing process to be processed, where the process parameter includes at least one of a resource utilization rate of the target processing component, an energy consumption of the target processing component, a processing time of the target processing component, a data transmission time between the target processing components, a resource utilization rate of the target processing core, a processing time of the target processing core and a data transmission time between the target processing cores;

The second adjusting module is configured to adjust the processing node group according to a process parameter in the processing process of the task to be processed, where the process parameter includes at least one of a resource utilization rate of the target processing component, an energy consumption of the target processing component, a processing time of the target processing component, a data transmission time between the target processing components, a resource utilization rate of the target processing core, a processing time of the target processing core, and a data transmission time between the target processing cores.

6. An electronic device, comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 4.

7. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 4.