CN115827256A

CN115827256A - Task transmission scheduling management system for multi-core storage and computation integrated accelerator network

Info

Publication number: CN115827256A
Application number: CN202310127045.4A
Authority: CN
Inventors: 李涛; 熊大鹏; 胡建伟
Original assignee: Shanghai Yizhu Intelligent Technology Co ltd
Current assignee: Shanghai Yizhu Intelligent Technology Co ltd
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-03-21
Anticipated expiration: 2043-02-17
Also published as: CN115827256B

Abstract

The invention belongs to the field of data processing, relates to the technology of a multi-core accelerator, and is used for solving the problem that the performance of an application program is limited by adopting a static hardware allocation method in the existing storage and computation integrated accelerator, in particular to a task transmission scheduling management system for a multi-core storage and computation integrated accelerator network, which comprises a scheduling management platform, wherein the scheduling management platform is in communication connection with a task management module and the accelerator network, and the task management module is used for performing management analysis on task transmission processing: compiling an application program into a data-driven task mode, providing a unique characteristic value for each task, and dynamically setting a task address space and external data; the invention makes each node in the accelerator network form a reconfigurable memory-computation integrated accelerator core through a real-time detection and dynamic task allocation algorithm to perform near data computation on data, and each node comprises a reallocation module supporting real-time detection and scheduling to control the accelerator and the data.

Description

Task transmission scheduling management system for multi-core storage and computation integrated accelerator network

Technical Field

The invention belongs to the field of data processing, relates to a multi-core accelerator technology, and particularly relates to a task transmission scheduling management system for a multi-core storage and computation integrated accelerator network.

Background

In order to meet the requirements of an application program on time delay and simultaneous processing of a large amount of data, the conventional multi-core computing integrated accelerator connects a plurality of cores to a shared memory to realize instruction and data exchange among the cores, and the whole interaction process is controlled by a main processor connected with the accelerator and is completed by executing data transmission and task execution instructions in program design;

traditional scientific simulation application programs generally divide data into data blocks with equal size and perform independent data operation and iteration, however, emerging high-performance computing application programs combine traditional scientific simulation with advanced data analysis and mechanical learning, data structures of the applications are often stored in a sparse data structure form and are more difficult to organize into a conventional partitionable data structure, and irregular fine-grained data access and a large amount of data interaction and deformation are caused; the existing storage and computation integrated accelerator mostly adopts a static hardware allocation method, so that hardware resources cannot be allocated effectively when a data structure is changed, and the performance of an application program is limited;

in view of the above technical problem, the present application proposes a solution.

Disclosure of Invention

The invention aims to provide a task transmission scheduling management system for a multi-core storage and computation integrated accelerator network, which is used for solving the problem that the performance of an application program is limited by adopting a static hardware allocation method in the existing storage and computation integrated accelerator.

The technical problems to be solved by the invention are as follows: how to provide a task transmission scheduling management system which can realize load balance among various accelerators so as to realize the highest hardware utilization rate.

The purpose of the invention can be realized by the following technical scheme:

the task transmission scheduling management system for the multi-core storage and computation integrated accelerator network comprises a scheduling management platform, wherein the scheduling management platform is in communication connection with a task management module and the accelerator network;

the task management module is used for managing and analyzing the task transmission processing: compiling an application program into a data-driven task mode, providing a unique characteristic value for each task, and dynamically setting a task address space and external data according to a data access mode of the task;

each node of the accelerator network comprises a storage and computation integrated accelerator core, a redistribution module and a plurality of monitoring modules, wherein the storage and computation integrated accelerator core is used for carrying out near data computation on data, and each node of the accelerator network is connected into a mesh topology structure to transmit the data and tasks; the method comprises the steps that a plurality of storage and computation integrated accelerator cores form an accelerator group, and monitoring modules correspond to the accelerator group one by one;

the monitoring module is used for monitoring and analyzing the hardware utilization rate of the accelerator group, obtaining the matching value and the priority value of the accelerator group, and sending the matching value and the priority value of the accelerator group to the re-distribution module through the scheduling management platform;

the reallocation module is used for performing task reallocation processing on the accelerator groups.

As a preferred embodiment of the present invention, the feature value of the task is composed of a feature parameter a and a feature parameter B, where the feature parameter a is a data access mode of the task, and includes a data accessor mode, an active domain object mode, an object relationship mapping mode, and a layer mode; the characteristic parameter B is a data memory value of the task.

As a preferred embodiment of the present invention, a specific process of the monitoring module for monitoring and analyzing the hardware utilization of the accelerator group includes: setting a monitoring period, marking an accelerator group as a monitoring object, and acquiring historical processing data of the monitoring object in the monitoring period, wherein the historical processing data comprises a characteristic value of a processing task, duration data SC and utilization data LS; summing the utilization coefficients LY of all the processing processes and averaging to obtain utilization data LS of the monitored object; and obtaining an application coefficient YY when the monitoring object carries out a processing task by carrying out numerical calculation on the time and length data SC and the utilization data LS.

As a preferred embodiment of the present invention, the process duration of the processing task performed by the monitoring object is duration data SC, and the obtaining process using the data LS includes: dividing the process duration of the processing task into a plurality of processing time intervals, acquiring processing data CL and memory data NC of the monitoring object in the processing time intervals, wherein the processing data CL is the maximum value of the processor utilization rate of the monitoring object in the processing time intervals, the memory data NC is the maximum value of the memory utilization rate of the monitoring object in the processing time intervals, and the utilization coefficient LY of the monitoring object in the processing time intervals is obtained by carrying out numerical calculation on the processing data CL and the memory data NC.

As a preferred embodiment of the present invention, the process of acquiring the matching value of the monitoring object includes: the processing task with the maximum application coefficient YY value in the monitoring period is marked as a matching task, the processing task with the same characteristic parameter A and the same data access mode of the matching task is marked as an analysis task, the maximum value and the minimum value of the characteristic parameter B of the analysis task form a memory range, the memory range is divided into a plurality of memory intervals, the memory interval matched with the characteristic parameter B of the matching task is marked as a matching interval of a monitoring object, and the characteristic parameter A and the matching interval of the matching task form a matching value of the monitoring object.

As a preferred embodiment of the present invention, the process of acquiring the priority value of the monitoring target includes: and summing the application coefficients YY of the analysis tasks, averaging to obtain application values corresponding to the characteristic parameters A of the matched object, obtaining application values corresponding to the residual characteristic parameters A of the processing tasks of the monitored object by the same method, and arranging the characteristic parameters A according to the sequence of the application values from large to small to obtain the priority value of the monitored object.

As a preferred embodiment of the present invention, the specific process of the reallocation module for performing task reallocation processing on the accelerator group includes: carrying a control module for each storage and computation integrated node to detect the utilization rate and perform task allocation, performing task data space allocation tasks in an allocation mode I, monitoring the hardware utilization rate in real time and marking the tasks with the highest utilization rate and the lowest utilization rate; monitoring whether the node to be determined exists or not, and if yes, performing task data space allocation tasks in an allocation mode II; if not, the first allocation mode is continuously adopted to allocate the task data space after the task is completed.

As a preferred embodiment of the present invention, the allocation procedure of the first allocation mode includes: acquiring a matching value of an accelerator group, judging whether a task matched with the matching value of the accelerator group exists in a task list, and if so, distributing the corresponding task and the accelerator group; if the node does not exist, marking the node corresponding to the accelerator group as an undetermined node; the judgment basis comprises: the characteristic parameter A of the task is the same as the characteristic parameter A of the matching value, and the characteristic parameter B is located in the memory interval;

the allocation process of the second allocation mode comprises the following steps: and obtaining a priority value of the accelerator group corresponding to the node to be determined, and screening a task from the task list and performing task allocation on the accelerator group according to the descending order of the priority value and the numerical value of the characteristic parameter B.

The working method of the task transmission scheduling management system for the multi-core storage and computation integrated accelerator network comprises the following steps:

the method comprises the following steps: compiling an application program into a data-driven task mode, providing a unique characteristic value for each task, and dynamically setting a task address space and external data according to the data access mode of the task;

step two: monitoring and analyzing the hardware utilization rate of the accelerator group: setting a monitoring period, marking the accelerator group as a monitoring object, acquiring historical processing data of the monitoring object in the monitoring period, and performing data analysis to obtain a matching value and a priority value of the monitoring object;

step three: performing task reallocation processing on the accelerator group: and carrying out task data space allocation on each storage and computation integrated node by using an allocation mode I and an allocation mode II.

The invention has the following beneficial effects:

1. each node in the accelerator network forms a reconfigurable storage and computation integrated accelerator core through a real-time detection and dynamic task allocation algorithm so as to perform near data computation on data, each node comprises a redistribution module supporting real-time detection and scheduling to control the accelerator and the data, each node is connected into a mesh topology structure so as to transmit the data and tasks, and the task allocation and recombination are supported through a task address space mode and a task addressing transmission method;

2. the hardware utilization rate of the accelerator groups can be detected and analyzed through the monitoring module, historical processing data of the accelerator groups are comprehensively analyzed within a monitoring period to obtain a priority value and a matching value, historical processing tasks of the accelerator groups are screened, the hardware utilization rate in the processing process is fed back, and therefore the hardware utilization rate of each accelerator group during task processing is guaranteed;

3. the re-allocation module can perform task re-allocation processing analysis on the accelerator group, the allocation mode I and the memory interval ensure the task adaptation degree of the accelerator group through the combined application of the allocation mode I and the allocation mode II, and the allocation mode II allocates tasks according to the data access mode on the basis of the allocation mode I.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a system according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a method according to a second embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to meet the requirements of an application program on time delay and simultaneous processing of a large amount of data, the conventional multi-core calculation integrated accelerator connects a plurality of cores to a shared memory to realize instruction and data exchange among the cores, and the whole interaction process is controlled by a main processor connected with the accelerator and is completed by executing data transmission and task execution instructions in program design; meanwhile, in order to ensure the reliability of tasks, the method needs to strictly control the sequence of data processing and shared memory access, under the two limiting conditions, the conventional multi-core integrated computing accelerator adopts a static task allocation method controlled by a compiler, so that various tasks cannot be processed simultaneously, and when the data volume of the tasks changes, the static allocation method cannot effectively utilize hardware resources on the multi-core integrated computing accelerator, so that the waste of the hardware resources is caused.

Example one

As shown in fig. 1, the task transmission scheduling management system for a multi-core storage-computation-integrated accelerator network includes a scheduling management platform, where the scheduling management platform is communicatively connected to a task management module and an accelerator network.

The task management module is used for managing and analyzing the task transmission processing: compiling an application program into a data-driven task mode, providing a unique characteristic value for each task, and dynamically setting a task address space and external data according to a data access mode of the task; the characteristic value of the task is composed of a characteristic parameter A and a characteristic parameter B, wherein the characteristic parameter A is a data access mode of the task and comprises a data accessor mode, an active domain object mode, an object relation mapping mode and a layer mode; the characteristic parameter B is a data memory value of the task.

Each node of the accelerator network comprises a storage and computation integrated accelerator core, a redistribution module and a plurality of monitoring modules, wherein the storage and computation integrated accelerator core is used for performing near data computation on data, and all nodes of the accelerator network are connected into a mesh topology structure to transmit data and tasks; the method comprises the steps that a plurality of storage and computation integrated accelerator cores form an accelerator group, and monitoring modules correspond to the accelerator group one by one; each node in the accelerator network forms a reconfigurable storage and computation integrated accelerator core through a real-time detection and dynamic task allocation algorithm to perform near data computation on data, each node is connected into a mesh topology structure to transmit the data and tasks, and the task allocation and recombination are supported through a task address space mode and task addressing transmission method.

The monitoring module is used for monitoring and analyzing the hardware utilization rate of the accelerator group: setting a monitoring period, marking an accelerator group as a monitoring object, and acquiring historical processing data of the monitoring object in the monitoring period, wherein the historical processing data comprises a characteristic value of a processing task, duration data SC and utilization data LS, the duration data SC is the process duration of the processing task of the monitoring object, and the acquisition process of the utilization data LS comprises the following steps: dividing the process duration of a processing task into a plurality of processing time periods, acquiring processing data CL and memory data NC of a monitoring object in the processing time periods, wherein the processing data CL is the maximum value of the processor utilization rate of the monitoring object in the processing time periods, the memory data NC is the maximum value of the memory utilization rate of the monitoring object in the processing time periods, and acquiring a utilization coefficient LY of the monitoring object in the processing time periods through a formula LY = alpha 1 × CL + alpha 2 × NC, wherein alpha 1 and alpha 2 are both proportional coefficients, and alpha 1 is more than alpha 2 and more than 1; summing the utilization coefficients LY of all the processing processes and averaging to obtain utilization data LS of the monitored object; obtaining an application coefficient YY when the monitoring object performs the processing task through a formula YY = (beta 1 × LS)/(beta 2 × SC), wherein the application coefficient is a numerical value reflecting the overall processing efficiency when the monitoring object performs the task processing, and the greater the numerical value of the application coefficient is, the higher the overall processing efficiency when the monitoring object processes the corresponding task is; wherein, the beta 1 and the beta 2 are proportional coefficients, and the beta 1 is more than the beta 2 and more than 1; marking the processing task with the maximum application coefficient YY value in the monitoring period as a matching task, marking the processing task with the same characteristic parameter A and the same data access mode as the matching task as an analysis task, forming a memory range by the maximum value and the minimum value of a characteristic parameter B of the analysis task, dividing the memory range into a plurality of memory intervals, marking the memory interval matched with the characteristic parameter B of the matching task as a matching interval of a monitoring object, forming a matching value of the monitoring object by the characteristic parameter A and the matching interval of the matching task, summing and averaging the application coefficients YY of the analysis task to obtain an application value corresponding to the characteristic parameter A of the matching object, obtaining application values corresponding to the residual characteristic parameters A of the processing task of the monitoring object by the same method, arranging the characteristic parameters A according to the sequence of the application values from large to small to obtain a priority value of the monitoring object, and sending the priority value and the matching value of the monitoring object to a reallocation module through a scheduling management platform; the hardware utilization rate of the accelerator groups is detected and analyzed, historical processing tasks of the accelerator groups are screened by comprehensively analyzing historical processing data of the accelerator groups within a monitoring period to obtain a priority value and a matching value, and the hardware utilization rate in the processing process is fed back, so that the hardware utilization rate of each accelerator group during task processing is ensured.

The reallocation module is used for performing task reallocation processing on the accelerator group: carrying a control module for each storage and computation integrated node to detect the utilization rate and perform task allocation, performing task data space allocation tasks in an allocation mode I, monitoring the hardware utilization rate in real time and marking the tasks with the highest utilization rate and the lowest utilization rate; monitoring whether the nodes to be determined exist or not, and if yes, adopting a second allocation mode to perform task data space allocation tasks; if not, continuing to adopt a first distribution mode to distribute the task data space after the task is finished; the allocation process of the allocation mode one comprises the following steps: acquiring a matching value of an accelerator group, judging whether a task matched with the matching value of the accelerator group exists in a task list, and if so, distributing the corresponding task and the accelerator group; if the node does not exist, marking the node corresponding to the accelerator group as an undetermined node; the judgment basis comprises: the characteristic parameter A of the task is the same as the characteristic parameter A of the matching value, and the characteristic parameter B is located in the memory interval; the allocation process of the second allocation mode comprises the following steps: acquiring a priority value of an accelerator group corresponding to a node to be determined, and screening a task from a task list and performing task allocation on the accelerator group according to the descending order of the priority value and the numerical value of the characteristic parameter B; and performing task reallocation processing analysis on the accelerator group, ensuring the task adaptation degree of the accelerator group through a characteristic parameter A and a memory interval by combining and applying a first allocation mode and a second allocation mode, and allocating the tasks according to a data access mode by the second allocation mode on the basis of the first allocation mode.

Example two

As shown in fig. 2, the task transmission scheduling management method for a multi-core computing unified accelerator network includes the following steps:

step two: monitoring and analyzing the hardware utilization rate of the accelerator group: setting a monitoring period, marking an accelerator group as a monitoring object, acquiring historical processing data of the monitoring object in the monitoring period, performing data analysis to obtain a matching value and a priority value of the monitoring object, and feeding back the hardware utilization rate in the processing process of the monitoring object through the matching value and the priority value;

step three: performing task reallocation processing on the accelerator group: and carrying out task data space allocation on each storage and computation integrated node by carrying a control module to detect the utilization rate and perform task allocation, and performing task data space allocation on the tasks by adopting an allocation mode I and an allocation mode II, wherein the allocation mode I ensures the task adaptation degree of an accelerator group through a characteristic parameter A and a memory interval, and the allocation mode II allocates the tasks according to a data access mode on the basis of the allocation mode I.

The task transmission scheduling management system for the multi-core storage and calculation integrated accelerator network compiles an application program into a data-driven task mode during working, provides a unique characteristic value for each task, and dynamically sets a task address space and external data according to the data access mode of the task; monitoring and analyzing the hardware utilization rate of the accelerator group: setting a monitoring period, marking the accelerator group as a monitoring object, acquiring historical processing data of the monitoring object in the monitoring period, and performing data analysis to obtain a matching value and a priority value of the monitoring object; performing task reallocation processing on the accelerator group: and carrying out task data space allocation on each storage and computation integrated node by using an allocation mode I and an allocation mode II.

The foregoing is merely exemplary and illustrative of the present invention and various modifications, additions and substitutions may be made by those skilled in the art to the specific embodiments described without departing from the scope of the invention as defined in the following claims.

The formulas are obtained by acquiring a large amount of data and performing software simulation, and the coefficients in the formulas are set by the technicians in the field according to actual conditions; such as: formula YY = (β 1 × ls)/(β 2 × sc); collecting multiple groups of sample data by the technicians in the field and setting corresponding application coefficients for each group of sample data; substituting the set application coefficient and the acquired sample data into formulas, forming a linear equation set of two variables by any two formulas, screening the calculated coefficients and taking the mean value to obtain values of alpha 1 and alpha 2 which are respectively 4.28 and 2.37;

the size of the coefficient is a specific numerical value obtained by quantizing each parameter, so that the subsequent comparison is convenient, and the size of the coefficient depends on the number of sample data and the corresponding application coefficient preliminarily set by a person skilled in the art for each group of sample data; as long as the proportional relationship between the parameters and the quantized values is not affected, for example, the application coefficient is proportional to the value of the utilization data.

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. The task transmission scheduling management system for the multi-core storage and computation integrated accelerator network is characterized by comprising a scheduling management platform, wherein the scheduling management platform is in communication connection with a task management module and the accelerator network;

each node of the accelerator network comprises a storage and computation integrated accelerator core, a redistribution module and a plurality of monitoring modules, wherein the storage and computation integrated accelerator core is used for performing near data computation on data, and all nodes of the accelerator network are connected into a mesh topology structure to transmit data and tasks; the method comprises the following steps that a plurality of storage and calculation integrated accelerator cores form an accelerator group, and monitoring modules correspond to the accelerator group one by one;

2. The task transmission scheduling management system for the multi-core storage-computation integrated accelerator network as claimed in claim 1, wherein the characteristic value of the task is composed of a characteristic parameter a and a characteristic parameter B, and the characteristic parameter a is a data access mode of the task, and includes a data accessor mode, an active domain object mode, an object relation mapping mode and a layer mode; the characteristic parameter B is a data memory value of the task.

3. The task transmission scheduling management system for the multi-core computation-integrated accelerator network as claimed in claim 2, wherein the specific process of the monitoring module for monitoring and analyzing the hardware utilization of the accelerator group includes: setting a monitoring period, marking an accelerator group as a monitoring object, and acquiring historical processing data of the monitoring object in the monitoring period, wherein the historical processing data comprises a characteristic value of a processing task, duration data SC and utilization data LS; summing the utilization coefficients LY of all the processing processes and averaging to obtain utilization data LS of the monitored object; and obtaining an application coefficient YY when the monitoring object carries out a processing task by carrying out numerical calculation on the time and length data SC and the utilization data LS.

4. The task transmission scheduling management system for the multi-core computing integrated accelerator network as claimed in claim 3, wherein the duration data SC is a process duration for the monitoring object to perform the processing task, and the obtaining process using the data LS includes: dividing the process duration of the processing task into a plurality of processing time intervals, acquiring processing data CL and memory data NC of the monitoring object in the processing time intervals, wherein the processing data CL is the maximum value of the processor utilization rate of the monitoring object in the processing time intervals, the memory data NC is the maximum value of the memory utilization rate of the monitoring object in the processing time intervals, and the utilization coefficient LY of the monitoring object in the processing time intervals is obtained by carrying out numerical calculation on the processing data CL and the memory data NC.

5. The task transmission scheduling management system for a multi-core computation-integrated accelerator network according to claim 4, wherein the acquisition process of the matching value of the monitoring object includes: the processing task with the maximum application coefficient YY value in the monitoring period is marked as a matching task, the processing task with the characteristic parameter A being the same as the data access mode of the matching task is marked as an analysis task, the maximum value and the minimum value of the characteristic parameter B of the analysis task form a memory range, the memory range is divided into a plurality of memory intervals, the memory interval matched with the characteristic parameter B of the matching task is marked as a matching interval of a monitoring object, and the characteristic parameter A of the matching task and the matching interval form a matching value of the monitoring object.

6. The task transmission scheduling management system for a multi-core computation-integrated accelerator network according to claim 5, wherein the acquiring of the priority value of the monitoring object includes: and summing the application coefficients YY of the analysis tasks, averaging to obtain application values corresponding to the characteristic parameters A of the matched object, obtaining application values corresponding to the residual characteristic parameters A of the processing tasks of the monitored object by the same method, and arranging the characteristic parameters A according to the sequence of the application values from large to small to obtain the priority value of the monitored object.

7. The task transmission scheduling management system for the multi-core computing unified accelerator network as claimed in claim 6, wherein the specific process of the reallocation module for performing task reallocation processing on the accelerator group comprises: carrying a control module for each storage and computation integrated node to detect the utilization rate and perform task allocation, performing task data space allocation tasks in an allocation mode I, monitoring the hardware utilization rate in real time and marking the tasks with the highest utilization rate and the lowest utilization rate; monitoring whether the node to be determined exists or not, and if yes, performing task data space allocation tasks in an allocation mode II; if not, the first allocation mode is continuously adopted to allocate the task data space after the task is completed.

8. The task transmission scheduling management system for a multi-core computing unified accelerator network according to claim 7, wherein the assignment procedure of the assignment mode one comprises: acquiring a matching value of an accelerator group, judging whether a task matched with the matching value of the accelerator group exists in a task list, and if so, distributing the corresponding task and the accelerator group; if the node does not exist, marking the node corresponding to the accelerator group as an undetermined node; the judgment basis includes: the characteristic parameter A of the task is the same as the characteristic parameter A of the matching value, and the characteristic parameter B is located in the memory interval;

the allocation process of the second allocation mode comprises the following steps: and obtaining a priority value of the accelerator group corresponding to the node to be determined, and screening the task from the task list and allocating the task to the accelerator group according to the descending order of the priority value and the numerical value of the characteristic parameter B.

9. The operating method of a task transmission scheduling management system for a multi-core computational unified accelerator network according to any of claims 1 to 8, characterized by comprising the following steps: