CN115421735A - Heterogeneous deployment method and device for deep learning task and electronic equipment - Google Patents

Heterogeneous deployment method and device for deep learning task and electronic equipment Download PDF

Info

Publication number
CN115421735A
CN115421735A CN202211081830.2A CN202211081830A CN115421735A CN 115421735 A CN115421735 A CN 115421735A CN 202211081830 A CN202211081830 A CN 202211081830A CN 115421735 A CN115421735 A CN 115421735A
Authority
CN
China
Prior art keywords
deployment
task
algorithm
deployed
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211081830.2A
Other languages
Chinese (zh)
Inventor
程勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lichi Semiconductor Co ltd
Original Assignee
Shanghai Lichi Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lichi Semiconductor Co ltd filed Critical Shanghai Lichi Semiconductor Co ltd
Priority to CN202211081830.2A priority Critical patent/CN115421735A/en
Publication of CN115421735A publication Critical patent/CN115421735A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a heterogeneous deployment method and device of a deep learning task and electronic equipment. Compiling the algorithm into a deployment file that can be deployed to the computing unit; determining a plurality of deployment strategies according to the equipment information and the task node information, simulating the deployment strategies based on the deployment file, determining a target strategy in the simulation strategies, and deploying the task to be deployed according to the target strategy. Therefore, the inconsistency of the deployment result caused by the experience of an engineer in the deployment process and the extra resource overhead caused by the deployment of the end-side equipment on the end-side equipment are avoided, the algorithm deployment mode determined by the actual execution result evaluation in the simulation environment better conforms to the actual equipment scene, and the deployment effect is better.

Description

Heterogeneous deployment method and device for deep learning task and electronic equipment
Technical Field
The invention relates to the technical field of deep learning, in particular to a heterogeneous deployment method and device of a deep learning task and electronic equipment.
Background
With the development of technologies such as internet and big data, the application of the deep learning algorithm is more and more extensive, the business involved in the actual application of the deep learning algorithm is more and more complex, and the data processing business with higher complexity can be realized by the cooperation of a plurality of deep learning algorithms. The mainstream AI computing hardware units include DSP, GPU, CPU, ASIC, and the like, and these units have advantages and disadvantages in terms of versatility, task concurrency, performance, and power. Therefore, how to efficiently deploy multiple algorithms to different hardware computing units of the device and achieve optimal performance becomes a key for realizing cooperative coordination of the multiple algorithms. The current deep learning algorithm deployment mode is divided into two modes of online compiling execution and offline compiling-after-deployment execution.
In order to pursue optimal performance, heterogeneous deployment currently mainly adopts a pre-compiled end-side deployment mode that is then deployed to an end-side device. For example: the deep learning algorithm may be fully deployed to a dedicated AI accelerator with higher performance, but this may cause the deep learning algorithm to be executed serially, and cannot guarantee efficient use of all hardware resources, thereby failing to meet the performance requirement. Hardware computing units deployed by different algorithms can be manually specified according to experience and theoretical computing power of different hardware, but due to the fact that various practical limits of the system are applied to the practical utilization rate of the hardware computing units, the practical computing power and the theoretical computing power are greatly different, and therefore the manual specifying mode cannot obtain optimal energy efficiency. And the heterogeneous deployment mode of the end-side deployment cannot change the execution unit of the algorithm after the deployment. When a plurality of algorithms are deployed simultaneously, the setting of the scheduling strategy is mainly completed through an empirical and theoretical calculation mode, and a lot of uncertainty exists.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide a heterogeneous deployment method and apparatus for a deep learning task, and an electronic device.
According to a first aspect of the present invention, a method for heterogeneous deployment of deep learning tasks is provided, the method comprising: acquiring task node information of a task to be deployed and equipment information of equipment deployed by the task to be deployed, wherein the task node information comprises a plurality of algorithms related to the task node of the task to be deployed, and the equipment information comprises a plurality of computing units of the equipment; compiling the algorithm into a deployment file deployable to the computing unit; determining a plurality of deployment strategies according to the equipment information and the task node information; simulating the deployment strategies based on the deployment file, and determining a target strategy in the simulation strategies; and deploying the task to be deployed according to the target strategy.
According to an embodiment of the present invention, the acquiring task node information of the task to be deployed includes: splitting the task to be deployed into a plurality of subtasks; and constructing a loop-free directed graph of the subtasks, wherein nodes of the loop-free directed graph are used for showing a plurality of algorithms related to the subtasks, and edges of the loop-free directed graph are used for showing the dependency relationship among the algorithms.
According to an embodiment of the present invention, compiling the algorithm into a deployment file that can be deployed to the computing unit includes: based on the development environment, each algorithm is compiled into a plurality of deployment files capable of running based on the plurality of computing units, respectively.
According to an embodiment of the present invention, the determining a plurality of deployment policies according to the device information and the task node information includes: determining the demand information of the algorithm related to the task node on the computing unit; and determining a computing unit deployed by an algorithm used by each task node according to the demand information and the task node information.
According to an embodiment of the present invention, the simulating the deployment policies based on the deployment file to determine a target policy of the simulation policies includes: loading the deployment strategy to end-side equipment for simulation in an interprocess communication mode; acquiring a simulation result of the deployment strategy on the end-side equipment; and determining the deployment strategy with the optimal simulation result performance in the plurality of deployment strategies as a target strategy.
According to an embodiment of the present invention, the deploying the task to be deployed according to the target policy includes: acquiring a deployment file for deploying the algorithm to the corresponding computing unit based on the target strategy; loading the deployment file to a corresponding computing unit of the device.
According to an embodiment of the present invention, the task node information further includes at least one of: the task list, the dependency relationships among the plurality of task nodes, and the priorities of the tasks in the task list.
According to an embodiment of the present invention, the device information further includes an acceleration unit configuration of the computing unit and/or a task that the computing unit can handle.
According to the second aspect of the present invention, there is also provided a heterogeneous deployment apparatus for deep learning task, the apparatus comprising: the task node information comprises a plurality of algorithms related to the task nodes of the tasks to be deployed, and the equipment information comprises a plurality of computing units of the equipment; a compiling module for compiling the algorithm into a deployment file deployable to the computing unit; the strategy searching module is used for determining a plurality of deployment strategies according to the equipment information and the task node information; the simulation module is used for simulating the deployment strategies based on the deployment file and determining a target strategy in the simulation strategies; and the deployment module is used for deploying the task to be deployed according to the target strategy.
According to a third aspect of the present invention, there is also provided an electronic device comprising at least one processor, and at least one memory connected to the processor, a bus; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the heterogeneous deployment method of the deep learning task described above.
According to the heterogeneous deployment method and device for the deep learning task and the electronic device, task node information of the task to be deployed and device information of the device deployed by the task to be deployed are obtained, wherein the task node information comprises a plurality of algorithms related to the task node of the task to be deployed, and the device information comprises a plurality of computing units of the device. And further, simulating the deployment strategies based on the deployment file, determining a target strategy in the simulation strategies, and deploying the task to be deployed according to the target strategy. Therefore, a plurality of deployment strategies for deploying the tasks to be deployed to a plurality of computing units of the equipment can be determined based on the equipment development environment, and then the plurality of deployment strategies are simulated through the development board, so that the inconsistency of deployment results caused by experience difference in the deployment process depending on the experience of engineers is avoided. Meanwhile, extra resource overhead caused by directly deploying the end-side equipment on the end-side equipment is effectively avoided, and furthermore, an algorithm deployment mode determined by actual execution result evaluation in a simulation environment better conforms to the actual specific application scene of the equipment, and the deployment effect is better.
It is to be understood that the teachings of the present invention need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of the present invention may achieve benefits not mentioned above.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a schematic flow chart illustrating an implementation of a heterogeneous deployment method of a deep learning task according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an implementation of an application example of the heterogeneous deployment method of the deep learning task according to an embodiment of the present invention;
fig. 3 shows a DAG diagram constructed according to a task to be deployed in an application example of the heterogeneous deployment method for the deep learning task according to the embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating an implementation of compiling a plurality of algorithms in an application example of the heterogeneous deployment method of the deep learning task according to the embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an implementation flow of determining multiple deployment policies in an application example of a heterogeneous deployment method for a deep learning task according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an implementation flow of simulating multiple deployment strategies in an application example of a heterogeneous deployment method for a deep learning task according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating a component structure of a heterogeneous deployment device of a deep learning task according to an embodiment of the present invention;
fig. 8 is a schematic diagram showing a structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given only to enable those skilled in the art to better understand and to implement the present invention, and do not limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
Fig. 1 shows a schematic flow chart of an implementation of a heterogeneous deployment method of a deep learning task according to an embodiment of the present invention.
Referring to fig. 1, the heterogeneous deployment method for deep learning tasks in the embodiment of the present invention at least includes the following operation flows: operation 101, acquiring task node information of a task to be deployed and device information of a device deployed by the task to be deployed, where the task node information includes a plurality of algorithms related to task nodes of the task to be deployed, and the device information includes a plurality of computing units of the device; an operation 102 of compiling the algorithm into a deployment file capable of being deployed to the computing unit; operation 103, determining a plurality of deployment policies according to the device information and the task node information; operation 104, simulating the multiple deployment policies based on the deployment file, and determining a target policy of the multiple simulation policies; and operation 105, deploying the task to be deployed according to the target strategy.
In operation 101, task node information of a task to be deployed and device information of a device on which the task to be deployed is deployed are obtained, where the task node information includes a plurality of algorithms related to task nodes of the task to be deployed, and the device information includes a plurality of computing units of the device.
In this embodiment of the present invention, the task node information of the task to be deployed is obtained mainly for determining a plurality of algorithms related to the task nodes of the task to be deployed. Specifically, the task to be deployed may be a vehicle automatic driving task, and the vehicle automatic driving task may include a plurality of subtasks, such as lane recognition during vehicle traveling, analysis of emergency video or images of the vehicle, and obstacle recognition during vehicle parking. Each subtask in turn includes a plurality of task nodes. The task to be deployed can be analysis of road conditions around the vehicle or obstacle distribution conditions and the like. For example: the method comprises the steps of confirming and processing the distance between vehicles around in the vehicle running process, wherein the steps of image acquisition, image recognition, target detection, lane line detection, determining whether a side vehicle or an obstacle exists in an image, confirming the distance between the vehicle and the side vehicle or the obstacle, radar point cloud classification and other task nodes are included, and each task node can adopt one or more algorithms to process information. For example: the Landmark algorithm can be adopted for lane line detection, the yolov3/5 algorithm can be adopted for target detection, the pointnet algorithm can be adopted for radar point cloud classification, and the depeplabv 3+ can be adopted for semantic segmentation.
In this embodiment of the present invention, the task node information may include, in addition to the plurality of algorithms involved by the task node of the task to be deployed, at least one of the following: the task list, the dependency relationships among the plurality of task nodes, and the priorities of the tasks in the task list.
Here, the task to be deployed may be split into multiple subtasks, and a DAG (Directed Acyclic Graph) for the subtasks is constructed. The nodes of the loop-free directed graph are used for showing a plurality of algorithms related to a plurality of subtasks, and the edges of the loop-free directed graph are used for showing the dependency relationship among the plurality of algorithms.
For example, the task to be deployed may be split into a task1 and a task2, where a task node of the task1 may be processed by three algorithms a, B, and C, the algorithm a, the algorithm B, and the algorithm C have a dependency relationship in sequence, that is, the algorithm B needs to be executed on the basis of the completion of the execution of the algorithm a, and the algorithm C needs to be executed on the basis of the completion of the execution of the algorithm B, where the algorithm B may be omitted for the processing of part of data, and the algorithm C is executed directly after the completion of the execution of the algorithm a. Therefore, the dependency relationship between the algorithm A and the algorithm B is that the algorithm B depends on the algorithm A, and the dependency relationship between the algorithm B and the algorithm C is that the algorithm C depends on the algorithm B. Task2 may be processed using algorithms D, E, F, and G. And executing an algorithm E and an algorithm F after the algorithm D is executed, and executing an algorithm G after the algorithm E and the algorithm F are both executed, so that the dependence relationship between the algorithm D and the algorithm E is that the algorithm E depends on the algorithm D, the dependence relationship between the algorithm D and the algorithm F is that the algorithm F depends on the algorithm D, the dependence relationship between the algorithm E and the algorithm F is that the algorithm E and the algorithm F are in a parallel relationship, the dependence relationship between the algorithm G and the algorithm E is that the algorithm G depends on the algorithm E, and the dependence relationship between the algorithm G and the algorithm F is that the algorithm G depends on the algorithm F.
In this embodiment of the present invention, the device information may include, in addition to a plurality of computing units of the device, an acceleration unit configuration condition of the computing unit or a task that the computing unit can handle.
For example, the device to be deployed with the task to be deployed may include computing units such as a CPU, a GPU, a DSP, an FPGA, and Asic. Here, the device information may also include whether the computing unit is configured with an acceleration unit, and a task that each computing unit can handle. For example: algorithms a, B, C, D, E, F and G may all be executed on a computing unit CPU, and the CPU is configured with an acceleration unit.
In operation 102, the algorithm is compiled into a deployment file that can be deployed to a computing unit.
In this embodiment of the present invention, compiling the algorithm into a deployment file that can be deployed to the computing unit includes: each algorithm is compiled based on a development environment into a plurality of deployment files capable of running based on a plurality of computing units, respectively.
The algorithm compiling operation is based on compiling the algorithm by a PC machine used in a development environment, and is not carried out on the end-side equipment, so that the limitation of the memory resource of the end-side equipment is effectively avoided.
The programming language, communication protocol, etc. used by the algorithm may vary from one computing unit to another. Therefore, here, each algorithm is compiled in advance as a plurality of deployment files that can be run based on a plurality of computing units, respectively. For example, the algorithms a, B, C, D, E, F, and G may be compiled into deployment files that can be executed based on computing units such as CPU, GPU, DSP, FPGA, and Asic.
In operation 103, a plurality of deployment policies is determined based on the device information and the task node information.
In this embodiment of the present invention, determining a plurality of deployment policies according to the device information and the task node information is to determine a plurality of implementation manners of the task node of the task to be deployed based on information such as data processing capability of the device and requirements of a plurality of task nodes of the task to be deployed. Here, the requirement information of the algorithm related to the task node for the computing unit may be determined first, and then the computing unit deployed by the algorithm used by each task node may be determined according to the requirement information and the task node information.
In this embodiment of the present invention, the requirement information of the algorithm related to the task node for the computing unit may include requirements of the algorithm for conditions such as the memory of the computing unit and whether the acceleration unit exists.
In this embodiment of the present invention, a deep learning algorithm may be adopted, and policy search may be performed on demand information of a computing unit and the like according to a dependency relationship of the algorithm and the algorithm, so as to obtain a plurality of algorithm deployment policies. For example: and deploying the algorithm A related to one subtask of the task to be deployed to a CPU (Central processing Unit), and deploying the algorithm A related to one subtask of the task to be deployed to a GPU (graphics processing Unit).
Therefore, the search of the deployment strategy is also completed based on a PC machine and the like in the development environment, and the extra overhead brought to the end-side equipment in the task deployment process is avoided. Meanwhile, through an automatic deployment strategy searching mode, inconsistency caused by human experience difference can be effectively avoided
In operation 104, the plurality of deployment policies is simulated based on the deployment file, and a target policy of the plurality of simulation policies is determined.
In this embodiment of the present invention, the simulation of the multiple deployment policies based on the deployment file can be implemented by the following operations, and the target policy in the multiple simulation policies is determined: loading the deployment strategy to the end-side equipment for simulation in an interprocess communication mode; acquiring a simulation result of a deployment strategy on the end-side equipment; and determining the deployment strategy with the optimal simulation result performance in the multiple deployment strategies as a target strategy.
In this embodiment of the present invention, each deployment policy can show to which computing unit the algorithms utilized by the plurality of task nodes of the task to be deployed are deployed. Each algorithm is compiled in operation 102 into a plurality of deployment files that can be deployed to a computing unit. Here, a deployment file for each algorithm to be deployed to the corresponding computing unit may be determined based on the deployment policy. And loading the determined deployment file to a development board of the computing unit for simulation. Therefore, the actual execution result is obtained based on the more real operation environment in the simulation environment, the deployment strategy with the optimal simulation result efficiency is evaluated based on the simulation execution result as the target strategy, the simulation environment better accords with the actual application scene, and the practical value of the simulation result is higher.
For example, n deployment policies are determined in operation 103, simulation is performed on the development board based on each deployment policy to obtain simulation results such as operation performance and bandwidth information, the simulation result of each deployment policy is obtained by the development board, and a deployment policy with the optimal performance is selected from the multiple simulation results as a target policy. The judgment of the efficiency of the simulation result can be set according to the actual application scene. For example: on the basis that the overall operation performance of the equipment meets the set performance condition, the deployment strategy corresponding to the operation result with the highest equipment bus bandwidth is taken as the target strategy.
In operation 105, the task to be deployed is deployed according to the target policy.
In this embodiment of the present invention, deploying the task to be deployed according to the target policy is a process that can be operated in batch. After determining the target policy, a deployment file for deploying the algorithm to the corresponding computing unit based on the target policy may be first obtained, and then the deployment file is loaded to the corresponding computing unit of the device.
Fig. 2 is a schematic flow chart illustrating an implementation example of an application of the heterogeneous deployment method for the deep learning task according to the embodiment of the present invention.
Referring to fig. 2, an application example of the heterogeneous deployment method for the deep learning task in the embodiment of the present invention at least includes the following operation flows:
operation 201, constructing an algorithm related to the service to be deployed as a DAG graph, and acquiring information such as concurrency and dependency relationships of a plurality of service nodes of the service to be deployed.
Specifically, reference may be made to a DAG diagram constructed according to a task to be deployed in an application example of the heterogeneous deployment method for deep learning a task according to the embodiment of the present invention shown in fig. 3.
The original data related to the service to be deployed is taken as input data, and further, the service to be deployed relates to a plurality of data processing tasks, which are only exemplarily shown as task1 (task 1) and task2 (task 2), and a plurality of tasks may be included in the actual application process.
The deep learning algorithm involved in each task can be constructed into a DAG graph according to the concurrency, dependency and other relations among the algorithms, each algorithm becomes a node of the DAG, and the dependency relation among the algorithms is set as edge.
As shown in fig. 3, the tasks to be deployed may include a task1 and a task2, where a task node of the task1 may adopt three algorithms a, B, and C for processing, the algorithm a, the algorithm B, and the algorithm C have a dependency relationship in sequence, that is, the algorithm B needs to be executed on the basis that the algorithm a is executed completely, and the algorithm C needs to be executed on the basis that the algorithm B is executed completely, where processing of part of data may omit the algorithm B, and the algorithm C is executed directly after the algorithm a is executed completely. Therefore, the dependency relationship between the algorithm A and the algorithm B is that the algorithm B depends on the algorithm A, and the dependency relationship between the algorithm B and the algorithm C is that the algorithm C depends on the algorithm B. Task2 may be processed using algorithms D, E, F, and G. And executing an algorithm E and an algorithm F after the algorithm D is executed, and executing an algorithm G after the algorithm E and the algorithm F are both executed, so that the dependence relationship between the algorithm D and the algorithm E is that the algorithm E depends on the algorithm D, the dependence relationship between the algorithm D and the algorithm F is that the algorithm F depends on the algorithm D, the dependence relationship between the algorithm E and the algorithm F is that the algorithm E and the algorithm F are in a parallel relationship, the dependence relationship between the algorithm G and the algorithm E is that the algorithm G depends on the algorithm E, and the dependence relationship between the algorithm G and the algorithm F is that the algorithm G depends on the algorithm F.
At operation 202, the algorithm is compiled by a neural network compiler into a deployment file that can be run on any computing unit.
All deep learning algorithms are compiled into deployment files which can be executed by different computing units in the equipment, so that the algorithms are compiled in a development environment, and memory resource limitation of the end-side equipment is effectively avoided.
Fig. 4 is a schematic flow chart illustrating an implementation process of compiling multiple algorithms in an application example of the heterogeneous deployment method of the deep learning task according to the embodiment of the present invention. Algorithm a may be compiled into a deployment file A1 that may be executed based on the CPU, a deployment file A2 that may be executed based on the GPU, a deployment file A3 that may be executed based on the DSP, and a deployment file A4 that may be executed based on the Asic. Correspondingly, the algorithm B can be compiled into a deployment file B1 executed based on a CPU, a deployment file B2 executed based on a GPU, a deployment file B3 executed based on a DSP, a deployment file B4 \8230; \ 8230performed based on Asic, and the algorithm G can be compiled into a deployment file G1 executed based on a CPU, a deployment file G2 executed based on a GPU, a deployment file G3 executed based on a DSP, and a deployment file G4 executed based on Asic.
In operation 203, device information that can show the device is obtained, and all possible deployment policies are searched according to the DAG graph and the device information constructed in operation 201.
Fig. 5 is a schematic implementation flow diagram for determining multiple deployment policies in an application example of the heterogeneous deployment method for the deep learning task according to the embodiment of the present invention. Referring to fig. 5, a deep learning algorithm may be adopted to perform automatic search of deployment policies according to DAG graphs and device information and the like formed by task nodes of multiple tasks, so as to obtain policy deployment files of multiple deployment policies.
At operation 204, a simulation is performed based on the end-side development board according to all possible deployment policies and corresponding deployment files.
The searched scheduling policy and simulation data may be loaded to the end-side development board for simulation by an IPC (Inter-Process Communication) manner.
And operation 205, acquiring a simulation result of the simulation executed by the end-side development board, and deploying the strategy based on the optimal algorithm of the simulation result.
Since the plurality of algorithms related to the service to be deployed are respectively deployed to the plurality of computing unit processes, calls among the algorithms are involved, and therefore, in some schemes, the deployment policy may also be understood as an algorithm scheduling policy.
Fig. 6 is a schematic diagram illustrating an implementation flow of simulating multiple deployment policies in an application example of the heterogeneous deployment method for the deep learning task according to the embodiment of the present invention.
And importing the generated policy configuration file, the corresponding deployment file generated in the operation 202 and simulation data of the actual scene into the end-side development board for simulation in an IPC mode. The simulation data of the actual scene includes whether or not the computing unit is configured with an acceleration unit, and the like.
The simulation results may include performance, bandwidth information, etc. And determining an optimal deployment strategy according to the simulation result.
The specific implementation process of operations 201 to 205 is similar to the specific implementation process of operations 101 to 105 in the embodiment shown in fig. 1, and is not described here again.
According to the heterogeneous deployment method and device for the deep learning task and the electronic device, task node information of the task to be deployed and device information of the device deployed by the task to be deployed are obtained, wherein the task node information comprises a plurality of algorithms related to the task node of the task to be deployed, and the device information comprises a plurality of computing units of the device. And further, simulating the deployment strategies based on the deployment file, determining a target strategy in the simulation strategies, and deploying the task to be deployed according to the target strategy. Therefore, a plurality of deployment strategies for deploying the tasks to be deployed to a plurality of computing units of the equipment can be determined based on the equipment end development environment, and then the plurality of deployment strategies are simulated through the development board, so that the inconsistency of deployment results caused by experience difference in the deployment process depending on the experience of engineers is avoided. Meanwhile, extra resource overhead caused by directly deploying the end-side equipment on the end-side equipment is effectively avoided, and furthermore, an algorithm deployment mode determined by actual execution result evaluation in a simulation environment better conforms to the actual specific application scene of the equipment, and the deployment effect is better.
Similarly, based on the above heterogeneous deployment method of the deep learning task, an embodiment of the present invention further provides a computer-readable storage medium, where a program is stored, and when the program is executed by a processor, the processor is caused to perform at least the following operation steps: operation 101, acquiring task node information of a task to be deployed and device information of a device to be deployed by the task to be deployed, where the task node information includes a plurality of algorithms related to task nodes of the task to be deployed, and the device information includes a plurality of computing units of the device; an operation 102 of compiling the algorithm into a deployment file capable of being deployed to the computing unit; operation 103, determining a plurality of deployment policies according to the device information and the task node information; operation 104, simulating the multiple deployment strategies based on the deployment file, and determining a target strategy in the multiple simulation strategies; and operation 105, deploying the task to be deployed according to the target strategy.
Further, based on the above heterogeneous deployment method of the deep learning task, an embodiment of the present invention further provides a heterogeneous deployment apparatus of the deep learning task, as shown in fig. 7, where the apparatus 70 includes: the acquiring module 701 is configured to acquire task node information of a task to be deployed and device information of a device to be deployed by the task to be deployed, where the task node information includes a plurality of algorithms related to task nodes of the task to be deployed, and the device information includes a plurality of computing units of the device; a compiling module 702 for compiling the algorithm into a deployment file that can be deployed to the computing unit; the policy search module 703 is configured to determine a plurality of deployment policies according to the device information and the task node information; a simulation module 704, configured to simulate the multiple deployment policies based on the deployment file, and determine a target policy of the multiple simulation policies; the deployment module 705 is configured to deploy the task to be deployed according to the target policy.
In this embodiment of the present invention, the obtaining module 701 includes: the task splitting sub-module is used for splitting the task to be deployed into a plurality of sub-tasks; and the task graph constructing submodule is used for constructing a loop-free directed graph of the subtasks, the nodes of the loop-free directed graph are used for showing a plurality of algorithms related to the subtasks, and the edges of the loop-free directed graph are used for showing the dependency relationship among the algorithms.
In this embodiment of the present invention, the compiling module 702 includes: and the compiling submodule is used for compiling each algorithm into a plurality of deployment files capable of running based on a plurality of computing units respectively based on the development environment.
In this embodiment of the present invention, the policy search module 703 includes: the requirement determining submodule is used for determining the requirement information of the algorithm related to the task node on the computing unit; and the unit determining submodule is used for determining a computing unit deployed by an algorithm used by each task node according to the demand information and the task node information.
In this embodiment of the present invention, the simulation module 704 includes: loading the deployment strategy to the end-side equipment for simulation in an interprocess communication mode; the result acquisition submodule is used for acquiring a simulation result of the deployment strategy on the end-side equipment; and the target determining submodule is used for determining the deployment strategy with the optimal simulation result performance in the plurality of deployment strategies as the target strategy.
In this embodiment of the present invention, the deployment module 705 comprises: the file acquisition submodule is used for acquiring a deployment file for deploying the algorithm to the corresponding computing unit based on the target strategy; and the loading submodule is used for loading the deployment file to a corresponding computing unit of the equipment.
In this embodiment of the present invention, the task node information further includes at least one of: the task list, the dependency relationships among the plurality of task nodes, and the priorities of the tasks in the task list.
In this embodiment of the present invention, the device information further includes an acceleration unit configuration of the computing unit and/or a task that the computing unit is capable of handling.
Fig. 8 is a schematic diagram showing a component structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, the device 80 includes at least one processor 801, at least one memory 802 connected to the processor 801, and a bus 803; the processor 801 and the memory 802 complete communication with each other through the bus 803; the processor 801 is configured to call program instructions in the memory 802 to perform the heterogeneous deployment method of the deep learning task described above.
Here, it should be noted that: the above description of the heterogeneous deployment device and the electronic device embodiment for the deep learning task is similar to the description of the method embodiment shown in fig. 1 to 6, and has similar beneficial effects to the method embodiment shown in fig. 1 to 6, and therefore, the description is omitted. For technical details that are not disclosed in the embodiments of the heterogeneous deployment device and the electronic device for deep learning task of the present invention, please refer to the description of the method embodiments shown in fig. 1 to fig. 6 of the present invention for understanding, and therefore, for brevity, no further description is provided.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for heterogeneous deployment of deep learning tasks, the method comprising:
acquiring task node information of a task to be deployed and equipment information of equipment deployed by the task to be deployed, wherein the task node information comprises a plurality of algorithms related to the task node of the task to be deployed, and the equipment information comprises a plurality of computing units of the equipment;
compiling the algorithm into a deployment file deployable to the computing unit;
determining a plurality of deployment strategies according to the equipment information and the task node information;
simulating the deployment strategies based on the deployment file, and determining a target strategy in the simulation strategies;
and deploying the task to be deployed according to the target strategy.
2. The method according to claim 1, wherein the acquiring task node information of the task to be deployed comprises:
splitting the task to be deployed into a plurality of subtasks;
and constructing a loop-free directed graph of the subtasks, wherein the nodes of the loop-free directed graph are used for showing a plurality of algorithms related to the subtasks, and the edges of the loop-free directed graph are used for showing the dependency relationship among the algorithms.
3. The method of claim 1, the compiling the algorithm into a deployment file deployable to the computing unit, comprising:
based on the development environment, each algorithm is compiled into a plurality of deployment files capable of running based on the plurality of computing units, respectively.
4. The method of claim 1, the determining a plurality of deployment policies based on the device information and the task node information, comprising:
determining the demand information of the algorithm related to the task node on the computing unit;
and determining a computing unit deployed by an algorithm used by each task node according to the demand information and the task node information.
5. The method of claim 1, the simulating the plurality of deployment policies based on the deployment file, determining a target policy of the plurality of simulation policies, comprising:
loading the deployment strategy to end-side equipment for simulation in an interprocess communication mode;
acquiring a simulation result of the deployment strategy on the end-side equipment;
and determining the deployment strategy with the optimal simulation result effectiveness in the plurality of deployment strategies as a target strategy.
6. The method of claim 1, wherein deploying the task to be deployed according to the target policy comprises:
acquiring a deployment file for deploying the algorithm to the corresponding computing unit based on the target strategy;
loading the deployment file to a corresponding computing unit of the device.
7. The method of any of claims 1-6, the task node information further comprising at least one of:
the task list, the dependency relationships among the plurality of task nodes, and the priorities of the tasks in the task list.
8. The method according to any one of claims 1-6, the device information further comprising acceleration unit configuration of the computing unit and/or tasks that the computing unit is capable of handling.
9. A heterogeneous deployment device of deep learning tasks, the device comprising:
the task node information comprises a plurality of algorithms related to the task nodes of the tasks to be deployed, and the equipment information comprises a plurality of computing units of the equipment;
a compiling module for compiling the algorithm into a deployment file deployable to the computing unit;
the strategy searching module is used for determining a plurality of deployment strategies according to the equipment information and the task node information;
the simulation module is used for simulating the deployment strategies based on the deployment file and determining a target strategy in the simulation strategies;
and the deployment module is used for deploying the task to be deployed according to the target strategy.
10. An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory are communicated with each other through the bus; the processor is to invoke program instructions in the memory to perform the heterogeneous deployment method of the deep learning task of any of claims 1-8.
CN202211081830.2A 2022-09-06 2022-09-06 Heterogeneous deployment method and device for deep learning task and electronic equipment Pending CN115421735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211081830.2A CN115421735A (en) 2022-09-06 2022-09-06 Heterogeneous deployment method and device for deep learning task and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211081830.2A CN115421735A (en) 2022-09-06 2022-09-06 Heterogeneous deployment method and device for deep learning task and electronic equipment

Publications (1)

Publication Number Publication Date
CN115421735A true CN115421735A (en) 2022-12-02

Family

ID=84201841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211081830.2A Pending CN115421735A (en) 2022-09-06 2022-09-06 Heterogeneous deployment method and device for deep learning task and electronic equipment

Country Status (1)

Country Link
CN (1) CN115421735A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116192725A (en) * 2023-04-23 2023-05-30 安徽中科晶格技术有限公司 Distributed SDN controller deployment method, system and equipment based on FPS algorithm
CN116956756A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Model deployment method, task processing method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116192725A (en) * 2023-04-23 2023-05-30 安徽中科晶格技术有限公司 Distributed SDN controller deployment method, system and equipment based on FPS algorithm
CN116956756A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Model deployment method, task processing method, device, equipment and storage medium
CN116956756B (en) * 2023-09-21 2024-02-09 浪潮电子信息产业股份有限公司 Model deployment method, task processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115421735A (en) Heterogeneous deployment method and device for deep learning task and electronic equipment
Bonakdarpour et al. A framework for automated distributed implementation of component-based models
CN110704178B (en) Machine learning model training method, platform, electronic device and readable storage medium
CN111768006A (en) Artificial intelligence model training method, device, equipment and storage medium
Tang et al. A container based edge offloading framework for autonomous driving
CN104636204A (en) Task scheduling method and device
Bonakdarpour et al. Automated conflict-free distributed implementation of component-based models
CN111352711B (en) Multi-computing engine scheduling method, device, equipment and storage medium
CN112035238A (en) Task scheduling processing method and device, cluster system and readable storage medium
CN115421897B (en) Core particle-oriented deep neural network pipeline parallel scheduling method and device
CN115220787A (en) Driving control instruction generation method, heterogeneous calculation method, related device and system
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
Yi et al. Fast training of deep learning models over multiple gpus
CN112099882B (en) Service processing method, device and equipment
CN115437781B (en) GPU resource management method and system
Wu et al. Latency modeling and minimization for large-scale scientific workflows in distributed network environments
CN115309501A (en) Cluster resource planning method, device, apparatus and medium
Zhang et al. Cost-efficient and latency-aware workflow scheduling policy for container-based systems
Kang et al. Learning scalable and transferable multi-robot/machine sequential assignment planning via graph embedding
CN112799787A (en) Improved parallel behavior execution conflict resolution method in simulation operation and storage medium thereof
Hernandez et al. Reliable DAG scheduling on grids with rewinding and migration
Cho et al. Hybrid Resource Scheduling Scheme for Video Surveillance in GPU-FPGA Accelerated Edge Computing System
CN113095645B (en) Heterogeneous unmanned aerial vehicle task allocation method aiming at emergency scene with uneven task distribution
CN117270893A (en) Application program deployment method, device, computer equipment and storage medium
CN115082286A (en) GPU (graphics processing Unit) operation method and device of edge node, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination