CN116501503A

CN116501503A - Architecture mapping method and device for load task, computer equipment and medium

Info

Publication number: CN116501503A
Application number: CN202310761526.0A
Authority: CN
Inventors: 王筱上; 祖云飞
Original assignee: Shanghai Enflame Technology Co ltd
Current assignee: Shanghai Suiyuan Technology Co ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-07-28
Anticipated expiration: 2043-06-27
Also published as: CN116501503B

Abstract

The invention discloses a framework mapping method, device, computer equipment and medium for load tasks. The method comprises the following steps: acquiring a load task to be loaded to a target architecture for performing a benchmark test, and splitting the load task into a plurality of subtasks; constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group; and respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test. The technical scheme of the embodiment of the invention provides a novel comprehensive, high-availability and expandable framework mapping mode of load tasks, provides high-efficiency and available data preparation for subsequent flexible and expandable benchmark test modeling, and can reduce development cost and period of the benchmark test modeling to a certain extent.

Description

Architecture mapping method and device for load task, computer equipment and medium

Technical Field

The embodiment of the invention relates to a modeling test technology of an artificial intelligent chip architecture, in particular to an architecture mapping method, an architecture mapping device, computer equipment and a medium of a load task.

Background

When the architecture of an AI (Artificial Intelligence ) chip is explored, complex modeling is usually required to be performed on a specific architecture and a load task, then construction evaluation is performed, and the overall development cost and period are relatively high. In order to accurately simulate the running states of a plurality of task loads on a framework and the influence of the plurality of task loads on data handling delay and power consumption in a multi-level storage subsystem, a mapping relation between the task loads and the framework needs to be defined first.

Because the modeling platform needs to be compatible with architectures of different levels, different storages and different computing capabilities, defining a derivation method that can be applied to all scenarios and architectures becomes difficult. The inefficient load task architecture mapping approach can produce large errors in subsequent system modeling and architecture evaluation.

Therefore, how to efficiently and accurately construct the mapping relation between the load task and the framework before the benchmark test is performed, and provide efficient and available data preparation for the follow-up benchmark test modeling with flexibility and expandability, which is an important problem to be solved at present.

Disclosure of Invention

The embodiment of the invention provides a method, a device, computer equipment and a medium for architecture mapping of load tasks, which are used for providing an overall, high-availability and extensible architecture mapping mode of the load tasks so as to assist in improving the efficiency of architecture assessment.

In a first aspect, an embodiment of the present invention provides a method for architecture mapping of a load task, where the method includes:

acquiring a load task to be loaded to a target architecture for performing a benchmark test, and splitting the load task into a plurality of subtasks;

constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group;

and respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test.

In a second aspect, an embodiment of the present invention further provides an architecture mapping apparatus for a load task, where the apparatus includes:

the subtask splitting module is used for acquiring a load task to be loaded to the target architecture for performing the benchmark test and splitting the load task into a plurality of subtasks;

the architecture mapping sequence determining module is used for constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks and determining the architecture mapping sequence of each dependency relation group;

the task architecture mapping relation establishing module is used for respectively establishing task architecture mapping relation between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping order and taking the task architecture mapping relation as modeling reference information in the benchmark test.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the architecture mapping method of load tasks according to any one of the embodiments of the present invention.

In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium storing computer instructions for causing a processor to implement an architecture mapping method for load tasks according to any of the embodiments of the present invention.

According to the embodiment of the invention, the load task to be loaded to the target architecture for performing the benchmark test is obtained, and the load task is split into a plurality of subtasks; constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group; according to the framework mapping sequence, the task framework mapping relation between each sub-task in each dependency relation group and each subsystem in the target framework is respectively established, and is used as a technical means for modeling reference information in the benchmark test, a novel framework mapping mode of comprehensive, high-availability and expandable load tasks is provided, efficient and available data preparation is provided for subsequent benchmark test modeling with flexibility and expandability, development cost and period of benchmark test modeling can be reduced to a certain extent, and framework evaluation efficiency can be improved in an auxiliary manner, so that more, wider and deeper evaluation can be completed in a framework exploration stage.

Drawings

Fig. 1 is a flowchart of a method for mapping a load task architecture according to a first embodiment of the present invention;

fig. 2 is a flowchart of a method for mapping a load task architecture according to a second embodiment of the present invention;

FIG. 3 is a schematic diagram of a dependency grouping applicable to the technical solution of the embodiment of the present invention;

FIG. 4 is a schematic diagram of a logic for determining the architectural mapping order of each dependency group according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of mapping heterogeneous sub-tasks in a load task to heterogeneous sub-systems in a target architecture, to which the technical solution of the present invention is applicable;

FIG. 6 is a logic diagram of a task architecture mapping relationship between each sub-task in each dependency group and each subsystem in a target architecture, to which the technical method of the present invention is applicable;

fig. 7 is a flowchart of a method for mapping a load task architecture according to a third embodiment of the present invention;

fig. 8 is a schematic diagram of a data relay space node to which the technical solution of the embodiment of the present invention is applicable;

fig. 9 is a schematic diagram of another data relay space node to which the technical solution of the embodiment of the present invention is applicable;

FIG. 10 is a schematic diagram of data residence conditions of a subsystem adapted to the technical solution of the embodiment of the present invention at different time points;

FIG. 11 is a schematic diagram of a specific application scenario to which the technical solution of the embodiment of the present invention is applied;

fig. 12 is a block diagram of an architecture mapping apparatus for load tasks according to a fourth embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device implementing a method for space-time mapping of data streams according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of an architecture mapping method for load tasks according to an embodiment of the present invention, where the method may be implemented by an architecture mapping device for load tasks, and the device may be implemented by software and/or hardware, and may be generally integrated in a computer device with a data processing function, where the method is applicable to a case where a specific load task is mapped into a specific target architecture before modeling a benchmark test on the target architecture. Correspondingly, the method specifically comprises the following steps:

S110, acquiring a load task to be loaded to a target architecture to execute a benchmark test, and splitting the load task into a plurality of subtasks.

The target architecture refers to an AI architecture to be subjected to architecture exploration, and can be understood as a set of hardware resources for processing load tasks. The target architecture may be an existing hardware entity architecture, or may be a custom software implementation architecture in a development stage, etc., and all the hardware resource sets included in the target architecture may be isomorphic hardware resources, or may be heterogeneous hardware resources, etc., which is not limited in this embodiment.

Specifically, the target architecture may include one or more hardware subsystems, where different subsystems have a set topological connection relationship to form a set hierarchy. Wherein each subsystem is configured to implement a set subsystem function. Such as a computing function, a codec function, or a storage function, etc.

In this embodiment, the target architecture may be described by subsystem definition and subsystem specification, and these two kinds of information may be collectively referred to as architecture definition information of the target architecture. The subsystem definition may include the following information: the target architecture comprises a subsystem level, an instantiation number of subsystems, identification information of each subsystem, a topological connection relation between each subsystem and other subsystems, subsystem functions and an instantiation number of functional modules in each subsystem, for example, a computing subsystem comprises a plurality of computing units as functional modules. Meanwhile, for each subsystem, description is made by subsystem specifications, respectively. Subsystems of different functional types typically differ in subsystem specifications.

In a specific example, for a subsystem implementing computing core computing, the subsystem specification generally includes: the micro-architecture type, highest frequency, vector calculation power, tensor calculation shape, read and write bandwidth and number of read and write ports; for a subsystem implementing an on-chip memory function, the subsystem specification generally includes: the micro-architecture type, the storage capacity, the read and write bandwidths of the subsystems connected with the micro-architecture type, and the number of read and write ports of the subsystems connected with the micro-architecture type; for a subsystem implementing an off-chip storage function, the subsystem specification generally includes: the micro-architecture type, the storage capacity, the read and write bandwidths of the subsystem connected with the micro-architecture type, and the number of read and write ports of the subsystem connected with the micro-architecture type; for a subsystem that implements an interconnection function between subsystems, the subsystem specification generally includes: the micro-architecture type, the connected subsystem level, the read and write bandwidth and the number of read and write ports; for a subsystem that implements an interconnection function within a subsystem, the subsystem specification generally includes: microarchitectural type, subsystem type, read and write bandwidth, number of read and write ports, etc.

It should be noted that, for a hardware entity architecture, the instantiated number may be understood as the number actually included in the hardware entity architecture, and for a custom software implementation architecture, the instantiated number may be understood as the number obtained by software simulation.

In this embodiment, a load task may be understood as a task that needs to be loaded into the target architecture to perform benchmark testing, i.e., the load task may be performed by the target architecture to implement one or more load functions. Specifically, the load task may be a task of realizing a single function type, such as a calculation task, a storage task, or an encoding/decoding task, or may be a multi-function type composite task formed by combining a plurality of tasks of a single function type, which is not limited in this embodiment.

In this embodiment, the tester may set the load task in a user-defined manner according to the actual architecture exploration requirement, so as to meet the actual modeling evaluation requirement. Specifically, the load task may be split into one or more subtasks according to a preset splitting rule. For example, if the load task is a calculation task based on a setting calculation graph, the load task may be split into a plurality of subtasks according to each calculation operator included in the calculation graph, or the number of function types included in the load task may be first analyzed and split into a plurality of subtasks in units of function types, or the total number of hardware resources required for the load task may be first estimated, and based on the total number of hardware resources and a preset number of subtask divisions, the average number of hardware resources required for each subtask may be estimated, and based on the average number of hardware resources, the load task may be split into a plurality of subtasks, or the like, which is not limited in this embodiment.

In an optional implementation manner of this embodiment, the original load description information of each subtask in the load task may be initialized and constructed, and then the load task may be simply and conveniently split into multiple subtasks by analyzing the original load description information of each subtask.

Optionally, the original load description information of each subtask may include: the task name, domain description information, operand description information and operand access type of each subtask, dependency description information among the subtasks, and the like.

The domain description information comprises data dimension and operand shape description information contained in the subtasks; the operand description information comprises an operand name, operand domain description information and operand data precision; the operand access type comprises a calculation type such as reading or writing, and the dependency relationship description information comprises explicit dependency relationships between subtasks and other subtasks.

Further, the original load description information needs to specify a required resource type of each subtask, that is, the required resource type defines what kind of functional hardware resource (subsystem) a subtask needs to be configured to execute.

S120, constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group.

In this embodiment, by analyzing the original load description information of each subtask, an explicit or implicit logical dependency relationship between every two subtasks may be obtained. Specifically, the explicit dependency relationship of the subtask 1 on the subtask 2 can be directly obtained by analyzing the original load description information, or the implicit dependency relationship of the subtask 4 on the subtask 3 can be mined by determining that the subtask 3 reads the operand X after writing the operand X by the subtask 4.

Further, after the logical dependencies among the plurality of subtasks are obtained, the subtasks having direct or indirect logical dependencies may be divided into the same dependency group. Further, the plurality of subtasks may be divided into one or more dependency groupings. Alternatively, if a subtask and any subtask do not have a logical dependency relationship, the subtasks may be separately divided into an independent dependency relationship group, i.e., each dependency relationship group includes one or more subtasks.

If the number of the constructed dependency groups is a plurality, the architecture mapping order of each dependency group may be determined first. The architecture mapping order can be understood as the order in which each dependency group is in order to build a mapping relationship with each subsystem in the target architecture.

In a specific example, if the architecture mapping order determined for the dependency group 1 and the dependency group 2 is the dependency group 2- > dependency group 1, each subtask in the dependency group 2 may be mapped into the target architecture first, and then each subtask in the dependency group 1 may be mapped into the target architecture.

In this embodiment, a preset mapping order determining policy may be adopted to determine the architectural mapping order of each dependency group. The mapping order determining policy may be a dependency group with a large calculation amount of priority mapping, or may be a dependency group with a priority mapping matching the number of calculation units in the target architecture, which is not limited in this embodiment.

S130, respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each sub-system in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test.

In this embodiment, after determining the architecture mapping sequence, each dependency relationship group may be sequentially obtained, and according to the logical dependency relationship of each sub-task in each dependency relationship group, each sub-task is sequentially obtained and mapped to each subsystem in the target architecture, so as to establish a task architecture mapping relationship between the sub-task in the load task and each subsystem in the target architecture.

Wherein, one or more mapping strategies constructed in advance can be used for mapping a specific sub-task to a specific subsystem in the target architecture. The mapping policy may specify that certain subtasks may be allocated to multiple computing resources, that certain subtasks be performed by only a single computing resource, that multiple subtasks may be allocated to the same computing resource for sequential execution, that heterogeneous tasks need to be sequentially allocated to a specific heterogeneous architecture according to their assigned architecture resources, and so on.

Of course, it can be understood that, besides the one or more mapping strategies constructed in advance, the user-defined mapping strategy can be manually added by the tester in a manner of reserving a manual intervention interface, for example, a data parallel mapping strategy, a model parallel mapping strategy or a pipeline parallel mapping strategy, so as to meet the intention exploration of the tester on the mapping effect of a specific direction.

In this embodiment, after the task architecture mapping relationship between each sub-task in the load task and each sub-system in the target architecture is obtained, the task architecture mapping relationship may be used as a modeling reference information in the benchmark test, so as to provide efficient and usable data preparation for system modeling and architecture evaluation when the load task is configured on the target architecture to execute the benchmark test.

Example two

Fig. 2 is a flowchart of a method for mapping a load task architecture according to a second embodiment of the present invention, where the method is based on the above embodiment, and in this embodiment, at least one dependency group is constructed according to a logical dependency relationship between a plurality of sub-tasks, and an architecture mapping order of each dependency group is determined, and implementation forms of operations such as establishing a task architecture mapping relationship between each sub-task in each dependency group and each subsystem in a target architecture are refined.

Accordingly, as shown in fig. 2, the method specifically may include:

s210, acquiring a load task to be loaded to a target architecture to execute a benchmark test, and splitting the load task into a plurality of subtasks.

In this embodiment, different load tasks may be distinguished by task identification, and a plurality of subtasks split by one load task are identified by using the task identification and the task number in combination.

In a specific example, splitting for a load task that identifies a task as S may result in the form: s1, S2, S3, … …, S n.

S220, deducing at least one subtask relation pair according to the original load description information of each subtask.

In this embodiment, each subtask relationship pair includes two subtasks having logical dependencies. That is, after splitting the load task into multiple subtasks, the multiple subtasks may be formed into one or more subtask relationship pairs. It will be appreciated that after deriving at least one subtask relationship pair, there may be 0, 1 or more isolated subtasks that do not belong to any subtask relationship pair, each of which has no logical dependency relationship with other subtasks. Meanwhile, different subtask relationship pairs can contain the same subtask, which indicates that the subtask has a logic dependency relationship with two or more other subtasks at the same time.

In an optional implementation manner of this embodiment, deriving at least one subtask relationship pair according to the original load description information of each subtask may include:

the method comprises the steps of analyzing the original load description information of each subtask to obtain operand description information and operand access types of each subtask, wherein the operand description information is identical, the operand access types are respectively two read and write subtasks, and a subtask relation pair is established.

Specifically, the access relationship of each subtask in the task load to the operand can be constructed through the original load description information of each subtask. For example S [ id ]]->Access[id]Accordingly, logical dependencies between subtasks may be obtained by Domain Product (Domain Product) computation. Wherein the domain product expression may be:。

in one specific example: subtask S [3]]For operand Tensor [1]]By performing a read operation, a form may be constructed asIs an expression of (2); subtask S1]And subtask S2]Respectively to the operands Tensor [1]]And Tensor [2 ]]Performing the write operation, a form may be constructed as follows:is an expression of (2).

Further, by substituting the above two expressions into the domain product expression:can be calculated as follows:

correspondingly, it can be determined that the sub-task S3 has an implicit logic dependency relationship with the sub-task S1, and then a sub-task relationship pair can be constructed by using S3 and S1.

In another optional implementation manner of this embodiment, deriving at least one subtask relationship pair according to the original load description information of each subtask may further include:

and analyzing the original load description information of each subtask to obtain explicit dependency relationship description information, and establishing subtask relationship pairs respectively corresponding to the explicit dependency relationship description information.

Optionally, explicit dependency relationship description information in the form of S [ id ] - > S [ id+1] may be directly defined in the original load description information, and by parsing the original load description information, a subtask relationship pair may be directly constructed by using S [ id ] and S [ id+1 ].

Further, a logical dependency relationship between two subtasks may be specified in each subtask relationship pair, i.e., to specify which of the two subtasks is the logically dependent subtask. In a specific example of a form such as SID ] - > SID+1, the subtask SID to the right of the arrow is the logically dependent subtask.

S230, constructing at least one dependency relation group according to each subtask relation pair.

In an optional implementation manner of this embodiment, the constructing at least one dependency relationship group according to each subtask relationship pair may include:

constructing at least one dependency tree by taking the subtasks as nodes according to the same subtasks contained in each subtask relation pair, wherein any subtask on different dependency trees has no dependency; the subtasks on each dependency tree are partitioned into the same dependency group.

In this alternative embodiment, by combining pairs of subtask relationships that include the same subtask, one or more dependency trees may be constructed that are node-wise subtasks.

Specifically, fig. 3 shows a schematic structural diagram of a dependency relationship group to which the technical solution of the embodiment of the present invention is applicable. Taking the dependency relationship group 1 in fig. 3 as an example, it is assumed that subtask relationship pair 1 includes subtask 1 and subtask 2, and subtask 2 depends on subtask 1; subtask relation pair 2 comprises subtask 1 and subtask 3, and subtask 3 depends on subtask 1; subtask relationship pair 3 includes subtask 3 and subtask 4, subtask 4 depending on subtask 3. By analyzing the same subtasks in subtask relationship pair 1, subtask relationship pair 2, and subtask relationship pair 3, a dependency tree 1 as shown in FIG. 3 can be constructed, and each subtask on the dependency tree 1 is divided into dependency groups 1. Wherein upper nodes in the dependency tree 1 are relied upon by lower nodes.

It should be noted that, in the process of constructing the subtask relationship pair, an isolated subtask that does not have a logical dependency relationship with any subtask may occur, and such isolated subtasks may be separately grouped according to a separate dependency relationship, for example, the subtask 11 is only included in the dependency relationship group 3 shown in fig. 3.

S240, determining the architecture mapping sequence of each dependency relation group.

Specifically, the policy may be determined according to a preset mapping order, and an architectural mapping order of each dependency group may be determined.

In an optional implementation manner of this embodiment, determining the architectural mapping order of each dependency group may include:

respectively calculating attribute values of all sub-tasks in each dependency relation group under at least one task attribute, and determining attribute total values respectively corresponding to each dependency relation group according to calculation results; determining the architecture mapping sequence of each dependency relation group according to the total value of each attribute; wherein the task attributes include at least one of a calculation amount, an operation access amount, and a priority index.

In this optional embodiment, an attribute value of each subtask in each dependency relationship group under at least one task attribute may be calculated according to the original load description information of each subtask. Taking task attributes as an example only, the subtask calculation amount of each subtask can be evaluated by analyzing and setting domain description information, operand description information and operand access types of each subtask in the dependency relation group, and then the calculation amounts of the subtasks are accumulated and summed to obtain the calculation total amount as the attribute total value of the dependency relation group. Then, the architecture mapping order of the dependency groups may be determined in order of the calculation amount from large to small or from small to large.

The computing amount of the subtasks can be understood as the operation times of the minimum operation required by executing the task functions of the subtasks, the operation access amount can be understood as the read-write times of the minimum operand read-write operation required by executing the task functions of the subtasks, the priority index can be understood as the preset execution priority, and the priority index can be a plurality of gears such as high, medium or low.

In another optional implementation manner of this embodiment, determining an architectural mapping order of each dependency group may further include:

calculating the matching degree index between each sub-task in each dependency relation group and each subsystem in the target architecture respectively, and determining the index total value corresponding to each dependency relation group according to the calculation result; determining the architecture mapping sequence of each dependency relation group according to the total value of each index; the matching degree index comprises at least one of the matching degree of the number of the computing units, the matching degree of the computing capacity and the consistency of heterogeneous attributes.

The matching degree of the number of computing units may be understood as the matching degree of the number of computing units required for executing the task function of the subtask and the number of computing units included in each subsystem in the target architecture. Alternatively, the closer the number of both can be set, the higher the matching degree of the number of the calculation units. The computing power matching degree can be understood as the matching degree of the computing power consumed by executing the task functions of the subtasks and the computing power of each subsystem in the target architecture. Heterogeneous attribute consistency can be understood as the degree of matching of heterogeneous types of hardware resources (e.g., two heterogeneous resources, codec and CPU, needed) required to perform the task functions of the subtasks with heterogeneous hardware resources that can be provided by the subsystems in the target architecture.

Fig. 4 is a schematic logic diagram of determining an architectural mapping sequence of each dependency group, which is applicable to the technical solution of the embodiment of the present invention. As shown in fig. 4, taking the configuration mapping order of each sub-task in each dependency group and each subsystem in the target configuration as an example of a mapping order determining policy, after the configuration definition information of the target configuration is obtained, the configuration mapping order of the dependency group 1, the dependency group 2 and the dependency group 3 can be determined by combining the original load description information of each sub-task in each dependency group, and the configuration mapping order of the dependency group 1, the dependency group 2 and the dependency group 3 is that the dependency group 2 is mapped first, the dependency group 3 is mapped again, and the dependency group 1 is mapped finally.

S250, acquiring a target dependency relation group of current processing according to the architecture mapping sequence, and acquiring target subtasks of the current processing according to the logic dependency relation among the subtasks in the target dependency relation group.

In this embodiment, after determining the architecture mapping order, each current processing target dependency relationship group may be sequentially obtained, and each current processing target dependency relationship group may be sequentially mapped according to the logical dependency relationship between the subtasks.

Alternatively, each sub-task in the target dependency relationship group may be sequentially acquired for architectural mapping in order from the dependent sub-task toward the dependent sub-task. In a specific example, for the dependency group 1 shown in fig. 3, the above-mentioned subtasks may be respectively mapped in the order of subtask 1, subtask 2, subtask 3, and subtask 4.

S260, determining all available hardware resources under the expected execution time point matched with the target subtask according to the current residual hardware resources in the target architecture and the expected release time point of each occupied hardware resource.

In this embodiment, taking a target subtask as an example, a specific implementation manner of performing architecture mapping on the target subtask is described. Specifically, first, the current remaining hardware resources in the target architecture may be determined according to all the hardware resources included in the target architecture and the occupied hardware resources that have been allocated to other subtasks before mapping to the target subtasks. Then, according to the logic dependency relationship of each subtask, the expected execution time point of each subtask and the expected release time point of the occupied hardware resource can be determined.

It will be appreciated that the goal of the architecture mapping for each sub-task in the load task is to allocate each sub-task to each subsystem in the target architecture for execution. Therefore, the execution of the first sub-task must correspond to an initialized starting time point, and after the starting time point is determined, according to the logic dependency relationship between the sub-tasks, the expected execution time point of each sub-task for the starting time point can be determined. Furthermore, after determining the function type of each subtask and the architecture description information of the target architecture, the execution end time of each subtask, that is, the expected release time point of the occupied hardware resources, may also be determined or estimated.

Based on the above information, all available hardware resources at the expected execution time point of the target subtask match may be predicted in advance before the target subtask is executed. At this time, the subsystem most adapted to the target subtask may be acquired from all the available hardware resources, and a mapping relationship may be established between the subsystem and the target subtask.

S270, acquiring a target subsystem matched with the target sub-task under all available hardware resources according to a preset mapping strategy, and establishing a task architecture mapping relation between the target sub-task and the target subsystem.

As mentioned above, the mapping policy may be preset according to practical situations, for example, the mapping of the target sub-task to one or more target sub-systems matching with the required resource types of the target sub-task may be performed. In the present embodiment, by the architecture definition information according to the target architecture, different subsystems in the target architecture can be represented by Arch [ id ]. In an alternative implementation of this embodiment, as shown in fig. 5, each heterogeneous sub-task included in the load task may be mapped to each heterogeneous subsystem in the target architecture.

Further, fig. 6 is a logic schematic diagram illustrating a task architecture mapping relationship between each sub-task in each dependency relationship group and each sub-system in the target architecture, which is applicable to the technical method of the embodiment of the present invention. As shown in fig. 6, after obtaining the architecture definition information of the target architecture and the ordered dependency groups, each sub-task may be mapped sequentially to each subsystem based on the logical dependency of each sub-task in each dependency group and a preset mapping policy, so as to finally form a sub-task list marked with resource mapping information, that is, a task architecture mapping relationship between the target sub-task and the target subsystem is established.

According to the technical scheme, at least one subtask relation pair is deduced according to the original load description information of each subtask; constructing and obtaining at least one dependency relation group according to each subtask relation pair; acquiring a target dependency relation group which is currently processed according to the architecture mapping sequence, and acquiring target subtasks which are currently processed according to the logic dependency relation among all subtasks in the target dependency relation group; determining all available hardware resources at the estimated execution time point matched with the target subtask according to the current residual hardware resources in the target architecture and the estimated release time points of the occupied hardware resources; according to a preset mapping strategy, a target subsystem matched with a target sub-task is obtained under all available hardware resources, and a technical means of establishing a task architecture mapping relation between the target sub-task and the target subsystem is provided, a novel comprehensive, high-availability and expandable architecture mapping mode of a load task is provided, efficient and available data preparation is provided for subsequent flexible and expandable benchmark test modeling, development cost and period of the benchmark test modeling can be reduced to a certain extent, and efficiency of architecture evaluation can be assisted to be improved, so that more, wider and deeper evaluation can be completed in an architecture exploration stage.

Example III

Fig. 7 is a flowchart of a method for mapping a load task according to a third embodiment of the present invention, which is refined based on the above embodiment, and in this embodiment, operations of deriving a data relay space node and a data relay time node are added, so that more abundant modeling reference information can be added to the benchmark test.

Accordingly, as shown in fig. 7, the method specifically may include:

s710, acquiring a load task to be loaded to a target architecture to execute a benchmark test, and splitting the load task into a plurality of subtasks.

S720, constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group.

And S730, respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test.

S740, deducing the data relay space node matched with each sub-task according to the logic dependency relationship among the sub-tasks, the task architecture mapping relationship among the sub-tasks and the sub-systems and the hierarchical architecture relationship among the sub-systems in the target architecture.

In this embodiment, the concept of a data relay node is introduced, where the data relay node includes a data relay space node and a data relay time node. The data relay space node can be understood as a subsystem where the data exchange operation is located when two sub-tasks with a dependency relationship perform the data exchange operation in the target architecture.

Specifically, as shown in FIG. 8, in the target architecture, subsystem A, subsystem B (B-1 and B-2), and subsystem C (C-1, C-2, C-3, and C-4) have a top-down hierarchical structure. In the architecture mapping process, sub-task 1 of the load tasks is mapped into sub-systems C-1 and C-2, and sub-task 2 is mapped into sub-systems C-3 and C-4. The logical dependency relationship between the subtask 1 and the subtask 2 is known, and the hierarchical architecture relationship and the connection relationship between the subtask 1 and the subtask 2 can be known, so that the data relay space node between the subtask 1 and the subtask 2 can only be the subsystem A. Similarly, as shown in FIG. 9, since there is a direct connection between the subsystem B-1 and the subsystem C-2, and a direct connection between the subsystem B-2 and the subsystem C-1, the data relay space node between the subsystem 1 and the subsystem 2 may be the subsystem B-1 or the subsystem B-2.

After the data relay space node is accurately determined, the method can be used for correcting and optimizing the mapping relation between the sub-task and the sub-system, and can also be used as new modeling reference information for realizing benchmark test aiming at the load task and the target architecture so as to help improve the efficiency of architecture assessment.

In an optional implementation manner of this embodiment, the deriving the data relay space node matched with each sub-task according to the logical dependency relationship between each sub-task, the task architecture mapping relationship between each sub-task and each sub-system, and the hierarchical architecture relationship between each sub-system in the target architecture may include:

acquiring a first subtask and a second subtask with a logic dependency relationship, wherein the first subtask is depended on by the second subtask; acquiring a target subsystem matched with a second sub-task in a task architecture mapping relation between each sub-task and each sub-system; according to the hierarchical architecture relation among all subsystems in the target architecture, sequentially acquiring an alternative subsystem with a data access relation with a second subsystem according to the sequence from a lower layer to a higher layer; if the connection relation between the first sub-task and the alternative sub-system is determined, the alternative sub-system is used as a data relay node between the first sub-task and the second sub-task; and if the first sub-task and the alternative sub-system do not have the connection relation, returning to execute the operation of sequentially acquiring one alternative sub-system which has the data access relation with the second sub-system until the data relay node between the first sub-task and the second sub-task is determined.

For example, assume that subtask S [1] relies on subtask S [2], i.e., S [1] - > S [2], with the subtask S [2] being mapped into computing unit SIP [2], i.e., S [2] - > SIP [2], by an architectural mapping. By performing a composition operation on both of the above, it can be calculated:

；

the expression obtained by the calculation represents that the subtask S1 also has a dependency relationship on the data accessed by the calculation unit SIP 2.

Traversing the target architecture layer by layer from the subsystem at the bottom layer to the subsystem at the highest layer, firstly obtaining a storage subsystem L1[2] with a data access relation with the SIP [2], namely, the SIP [2] - > L1[2]; by performing the composite calculation again, it can be known that:

；

the expression obtained by the calculation represents that the subtask S1 has a logic dependency relationship on the data reserved in the storage subsystem L1[2], and at this time, whether the S1 and the L1[2] have a connection relationship in the target architecture needs to be continuously judged:

if there is a connection relationship, determining that L1[2] is the data relay space node of S1 and S2; if no connection relation exists, the last stage storage subsystem of L1[2] is searched continuously until the data relay space nodes of S1 and S2 are determined.

S750, deducing the data relay time node matched with each subsystem according to the hierarchical structure relation among the subsystems in the target structure and the preset storage management rule.

In this embodiment, the data relay time node may be understood as a time point when two sub-tasks with logical dependency relationships implement data relay operations in a certain subsystem. After determining the data relay time node, the data relay time node can be used for correcting and optimizing the data relay space node and correcting and optimizing the mapping relation between the sub-tasks and the sub-systems. In addition, the model reference information can be used as new model reference information for realizing benchmark test for load tasks and target architecture, so as to help improve the efficiency of architecture assessment.

In an optional implementation manner of this embodiment, the deriving the data relay time node matched by each subsystem according to the hierarchical architecture relationship between each subsystem in the target architecture and the preset storage management rule may include:

according to a preset storage management rule and operation data description information of each sub-task, acquiring the residence condition of each subsystem in the target architecture on one or more operands at each time point; a data relay time node matched with each subsystem is deduced according to the residence condition of each subsystem in the target architecture to one or more operands at each time point.

It will be appreciated that each sub-task, when executed, can only multiplex data that is still resident in the data relay space node, whereas if at the point in time when the multiplexing of data occurs, the data is purged from the data relay space node (where no data resident is implemented), the sub-task cannot multiplex on the data relay space node, requiring the data to be carried from a further external storage space.

By way of example and not limitation, a schematic diagram of the residence of operands A, B and C by one subsystem at points in time n-1, n, and n+1 is shown in FIG. 10. Optionally, the preset storage management rule may include: the present embodiment is not limited to a scoreboard algorithm based on multiplexing revenue or a dynamic programming algorithm based on a loss model, and the like.

S760, taking the data relay space node and the data relay time node as modeling reference information in the benchmark test.

According to the technical scheme, the data relay space node matched with each sub-task is deduced according to the logical dependency relationship among the sub-tasks, the task architecture mapping relationship among the sub-tasks and the sub-systems and the hierarchical architecture relationship among the sub-systems in the target architecture; deducing a data relay time node matched with each subsystem according to a hierarchical architecture relation among the subsystems in the target architecture and a preset storage management rule; the data relay space node and the data relay time node are used as a technical means of modeling reference information in the standard test, and the operation of deriving the data relay space node and the data relay time node is added, so that richer modeling reference information can be added in the standard test, the efficiency of architecture evaluation can be further assisted and improved, and more, wider and deeper evaluation can be completed in the architecture exploration stage.

On the basis of the above embodiments, after deriving the data relay time node matched with each subsystem according to the hierarchical architecture relationship between each subsystem in the target architecture and the preset storage management rule, the method may further include:

and correcting the task architecture mapping relation between each sub-task and each subsystem in the target architecture by using the data relay space node and the data relay time node.

Through the arrangement, the task architecture mapping relation between each sub-task and each subsystem in the target architecture is more reasonable, and the availability is higher.

Specific application scene

Fig. 11 is a schematic diagram of a specific application scenario to which the technical solution of the embodiment of the present invention is applicable. In the application scenario, in order to map the load task to the target architecture, first, the original load description information of each subtask in the load task and the architecture definition information of the target architecture need to be acquired, and meanwhile, a manual intervention strategy may be optionally input, where the manual intervention strategy may include one or more of a mapping order determining strategy and a mapping strategy. Of course, if no manual intervention policies are entered, the policies may be determined using a default mapping order, as well as the mapping policies. Based on the above information, the architecture mapping process can be finally performed, the task architecture mapping relationship between each sub-task in the load task and each sub-system in the target architecture can be finally output, based on the task architecture mapping relationship, two types of data relay nodes, namely, data relay space nodes and data relay time nodes can be deduced, and by selecting the data relay nodes as feedback results, the task architecture mapping relationship between the new sub-task and the sub-system can be updated and corrected in the architecture mapping process. Finally, the task architecture mapping relation and the two types of data relay nodes can be used as modeling reference information for the benchmark test.

Example IV

Fig. 12 is a block diagram of a load task architecture mapping device according to a fourth embodiment of the present invention. As shown in fig. 12, the apparatus includes: a subtask splitting module 1210, an architecture mapping order determining module 1220, and a task architecture mapping relation establishing module 1230, wherein:

the subtask splitting module 1210 is configured to obtain a load task to be loaded to a target architecture for performing a benchmark test, and split the load task into a plurality of subtasks;

the architecture mapping order determining module 1220 is configured to construct at least one dependency group according to the logical dependencies among the plurality of subtasks, and determine an architecture mapping order of each dependency group;

the task architecture mapping relation establishing module 1230 is configured to respectively establish a task architecture mapping relation between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping order, and use the task architecture mapping relation as modeling reference information in the benchmark test.

Based on the above embodiments, the architecture mapping order determining module 1220 may specifically include:

the subtask relation pair deducing unit is used for deducing at least one subtask relation pair according to the original load description information of each subtask, wherein each subtask relation pair comprises two subtasks with logic dependency relations;

and the dependency relation group construction unit is used for constructing at least one dependency relation group according to each subtask relation pair.

On the basis of the above embodiments, the subtask relationship pair deriving unit may be specifically configured to:

analyzing the original load description information of each subtask to obtain operand description information and operand access types of each subtask, wherein the operand description information is the same, the operand access types are respectively two read and write subtasks, and a subtask relation pair is established;

and/or

On the basis of the above embodiments, the dependency group construction unit may specifically be configured to:

Constructing at least one dependency tree by taking the subtasks as nodes according to the same subtasks contained in each subtask relation pair, wherein any subtask on different dependency trees has no dependency;

the subtasks on each dependency tree are partitioned into the same dependency group.

Based on the above embodiments, the architecture mapping order determining module 1220 may specifically be configured to:

respectively calculating attribute values of all sub-tasks in each dependency relation group under at least one task attribute, and determining attribute total values respectively corresponding to each dependency relation group according to calculation results;

determining the architecture mapping sequence of each dependency relation group according to the total value of each attribute;

wherein the task attributes include at least one of a calculation amount, an operation access amount, and a priority index.

calculating the matching degree index between each sub-task in each dependency relation group and each subsystem in the target architecture respectively, and determining the index total value corresponding to each dependency relation group according to the calculation result;

determining the architecture mapping sequence of each dependency relation group according to the total value of each index;

The matching degree index comprises at least one of the matching degree of the number of the computing units, the matching degree of the computing capacity and the consistency of heterogeneous attributes.

Based on the above embodiments, the task architecture mapping relationship establishment module 1230 may be specifically configured to:

acquiring a target dependency relation group which is currently processed according to the architecture mapping sequence, and acquiring target subtasks which are currently processed according to the logic dependency relation among all subtasks in the target dependency relation group;

determining all available hardware resources at the estimated execution time point matched with the target subtask according to the current residual hardware resources in the target architecture and the estimated release time points of the occupied hardware resources;

and acquiring a target subsystem matched with the target sub-task under all available hardware resources according to a preset mapping strategy, and establishing a task architecture mapping relation between the target sub-task and the target subsystem.

On the basis of the above embodiments, the method may further include:

the data relay space node deducing unit is used for deducing the data relay space node matched with each sub-task according to the logic dependency relationship among the sub-tasks, the task architecture mapping relationship among the sub-tasks and the sub-systems and the hierarchical architecture relationship among the sub-systems in the target architecture after respectively establishing the task architecture mapping relationship among the sub-tasks in each dependency relationship group and the sub-systems in the target architecture according to the architecture mapping sequence;

The data relay time node deducing unit is used for deducing the data relay time node matched with each subsystem according to the hierarchical structure relation among the subsystems in the target structure and the preset storage management rule;

the new modeling reference information adding unit is used for taking the data relay space node and the data relay time node as one modeling reference information in the benchmark test.

On the basis of the above embodiments, the data relay spatial node deriving unit may be specifically configured to:

acquiring a first subtask and a second subtask with a logic dependency relationship, wherein the first subtask is depended on by the second subtask;

acquiring a target subsystem matched with a second sub-task in a task architecture mapping relation between each sub-task and each sub-system;

according to the hierarchical architecture relation among all subsystems in the target architecture, sequentially acquiring an alternative subsystem with a data access relation with a second subsystem according to the sequence from a lower layer to a higher layer;

if the connection relation between the first sub-task and the alternative sub-system is determined, the alternative sub-system is used as a data relay node between the first sub-task and the second sub-task;

And if the first sub-task and the alternative sub-system do not have the connection relation, returning to execute the operation of sequentially acquiring one alternative sub-system which has the data access relation with the second sub-system until the data relay node between the first sub-task and the second sub-task is determined.

On the basis of the above embodiments, the data relay time node deriving unit may be specifically configured to:

according to a preset storage management rule and operation data description information of each sub-task, acquiring the residence condition of each subsystem in the target architecture on one or more operands at each time point;

a data relay time node matched with each subsystem is deduced according to the residence condition of each subsystem in the target architecture to one or more operands at each time point.

On the basis of the above embodiments, the method may further include a relationship correction module, configured to:

after the data relay time nodes matched with all the subsystems are deduced according to the hierarchical architecture relation among all the subsystems in the target architecture and the preset storage management rules, the task architecture mapping relation among all the subsystems in the target architecture is corrected by using the data relay space nodes and the data relay time nodes.

The architecture mapping device for the load task provided by the embodiment of the invention can execute the architecture mapping method for the load task provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example five

Fig. 13 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention, as shown in fig. 13, the computer device includes a processor 1300, a memory 1310, an input device 1320, and an output device 1330; the number of processors 1300 in a computer device may be one or more, one processor 1300 being illustrated in fig. 13; the processor 1300, memory 1310, input devices 1320, and output devices 1330 in the computer device may be connected by a bus or other means, for example in fig. 13 by a bus connection.

The memory 1310 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the architecture mapping method of the load task in the embodiment of the present invention (e.g., the subtask splitting module 1210, the architecture mapping order determining module 1220, and the task architecture mapping relation establishing module 1230 as shown in fig. 12). The processor 1300 executes various functional applications of the computer device and data processing, i.e., an architecture mapping method for implementing the load tasks described above, by running software programs, instructions, and modules stored in the memory 1310.

Namely: acquiring a load task to be loaded to a target architecture for performing a benchmark test, and splitting the load task into a plurality of subtasks; constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group; and respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test.

Memory 1310 may primarily include a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for functionality; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 1310 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 1310 may further include memory located remotely from processor 1300, which may be connected to the device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 1320 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the computer device. The output 1330 may include a display device such as a display screen.

Example six

A sixth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform an architecture mapping method of load tasks, the method comprising:

acquiring a load task to be loaded to a target architecture for performing a benchmark test, and splitting the load task into a plurality of subtasks; constructing at least one dependency relation group according to the logic dependency relation among the plurality of subtasks, and determining the architecture mapping sequence of each dependency relation group; and respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test.

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, but may also perform the related operations in the architecture mapping method of the load task provided in any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

It should be noted that, in the above-mentioned embodiments of the search apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, as long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method for architecture mapping of load tasks, comprising:

respectively establishing task architecture mapping relations between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping sequence, and taking the task architecture mapping relations as modeling reference information in the benchmark test;

Deducing a data relay space node matched with each sub-task according to the logic dependency relationship among the sub-tasks, the task architecture mapping relationship among the sub-tasks and the sub-systems and the hierarchy architecture relationship among the sub-systems in the target architecture;

deducing a data relay time node matched with each subsystem according to a hierarchical architecture relation among the subsystems in the target architecture and a preset storage management rule;

and taking the data relay space node and the data relay time node as modeling reference information in the benchmark test.

2. The method of claim 1, wherein constructing at least one dependency group based on logical dependencies among a plurality of subtasks comprises:

deducing at least one subtask relation pair according to the original load description information of each subtask, wherein each subtask relation pair comprises two subtasks with logic dependency relations;

and constructing at least one dependency relation group according to each subtask relation pair.

3. The method of claim 2, wherein deriving at least one subtask relationship pair from the raw load description information for each subtask comprises:

and/or

4. The method of claim 2, wherein constructing at least one dependency group from each subtask relationship pair comprises:

5. The method of claim 1, wherein determining the architectural mapping order of each dependency group comprises:

6. The method of claim 1, wherein determining the architectural mapping order of each dependency group comprises:

7. The method of claim 1, wherein establishing task architecture mappings between each sub-task in each dependency group and each subsystem in the target architecture according to the architecture mapping order comprises:

8. The method of claim 1, wherein deriving the data relay space node that matches each sub-task based on logical dependencies between each sub-task, task architecture mappings between each sub-task and each sub-system, and hierarchical architecture relationships between each sub-system in the target architecture, comprises:

9. The method of claim 1, wherein deriving the data relay time node for each subsystem match based on the hierarchical relationships between the subsystems in the target architecture and the preset storage management rules, comprises:

10. The method of claim 1, further comprising, after deriving the data relay time node for each subsystem match based on the hierarchical relationship between the subsystems in the target architecture and the preset storage management rules:

11. An architecture mapping apparatus for load tasks, comprising:

the task architecture mapping relation establishing module is used for respectively establishing task architecture mapping relation between each sub-task in each dependency relation group and each subsystem in the target architecture according to the architecture mapping order and taking the task architecture mapping relation as modeling reference information in the benchmark test;

the data relay space node deducing unit is used for deducing the data relay space node matched with each sub-task according to the logic dependency relationship among the sub-tasks, the task architecture mapping relationship among the sub-tasks and the sub-systems and the hierarchical architecture relationship among the sub-systems in the target architecture;

12. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the architecture mapping method of the load task of any one of claims 1-10.

13. A computer readable storage medium storing computer instructions for causing a processor to perform the architecture mapping method of load tasks of any of claims 1-10.