CN115796041A

CN115796041A - Neural network model deployment method, system, device and storage medium

Info

Publication number: CN115796041A
Application number: CN202211553282.9A
Authority: CN
Inventors: 江欣聪
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-03-14

Abstract

The application discloses a neural network model deployment method, a system, equipment and a storage medium, and belongs to the technical field of deep learning. The method comprises the following steps: generating a computational graph of the neural network model based on the loaded neural network model; segmenting the computational graph into a plurality of termination sets, each termination set comprising one or more nodes in the computational graph; aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model; and generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy. The method and the device aim to improve the execution efficiency of the heterogeneous platform when executing the neural network model, and solve the technical problem that the execution efficiency of the existing heterogeneous platform when executing the neural network model is low.

Description

Neural network model deployment method, system, device and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a neural network model deployment method, system, device, and storage medium.

Background

Deep learning is widely used in various fields such as image recognition, search technology, voice recognition, and the like. The bottom technology of deep learning is a model composed of a plurality of neural network layers, namely a neural network model.

The neural network layer usually mainly uses computation-intensive matrix operations, and in order to meet the operation time-consuming requirement of the neural network model, operators of the neural network model are usually allocated to various heterogeneous computing units of a heterogeneous platform for processing the neural network model. In an actual generation environment, developers manually allocate operators of the neural network models to different neural network models and different platforms, so that an allocation strategy is not an optimal strategy, and further execution efficiency of the heterogeneous platform is low when the neural network models are executed.

Disclosure of Invention

In view of this, the present application provides a neural network model deployment method, system, device and storage medium, which are intended to improve the execution efficiency of a heterogeneous platform when executing a neural network model, and solve the technical problem of low execution efficiency of the existing heterogeneous platform when executing the neural network model.

The application provides a neural network model deployment method, which comprises the following steps:

generating a computational graph of the neural network model based on the loaded neural network model;

segmenting the computational graph into a plurality of termination sets, each termination set comprising one or more nodes in the computational graph;

aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model;

and generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy.

In one possible embodiment of the present application, the computational graph comprises a plurality of nodes and directed edges, wherein each node represents a respective operator, each directed edge connects a respective first node to a respective second node, the input of the operator represented by the respective second node being the output of the operator represented by the respective first node; the step of segmenting the computational graph into a plurality of termination sets, each termination set including one or more nodes in the computational graph, comprises:

performing binary segmentation on the calculation graph to generate a termination set and residual subgraphs; wherein the termination set is disjoint from the remaining subgraph, and the inputs of the operators represented by the nodes in the termination set are the outputs of the operators represented by the nodes in the remaining subgraph;

judging whether the residual subgraphs meet preset segmentation stopping conditions or not;

if the residual subgraph does not meet the preset segmentation stopping condition, performing binary segmentation on the residual subgraph to generate a termination set and a new residual subgraph, and returning to execute the step of judging whether the residual subgraph meets the preset segmentation stopping condition and the subsequent steps;

and if the residual subgraphs meet the preset segmentation stopping condition, stopping performing binary segmentation on the residual subgraphs, and taking the termination set obtained by multiple times of binary segmentation and the residual subgraphs as a plurality of termination sets of the calculation graph.

In a possible implementation manner of the present application, the step of performing reverse ordering on the plurality of termination sets according to the segmentation generation time thereof, and for each termination set, generating an optimal segmentation policy corresponding to the termination set through a heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model includes:

for each termination set, combining the optimal segmentation strategies corresponding to all termination sets sequenced in front of the termination set to generate an initial segmentation strategy;

taking the initial segmentation strategy as a current segmentation strategy;

segmenting the termination set to generate a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy;

comparing the operation time consumption obtained by the new segmentation strategy and the current segmentation strategy based on a target function, and determining whether to update the current segmentation strategy into the new segmentation strategy; wherein the objective function is that the neural network model is based on a segmentation strategy and hardware resources of a heterogeneous platform for processing the neural network model, and the operation in the heterogeneous platform is time-consuming;

acquiring the current iteration times, and judging whether the current iteration times reach preset times or not;

if not, returning to execute the segmentation of the termination set to generate a segmentation strategy corresponding to the termination set, combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy, and generating a new segmentation strategy and subsequent steps until the current iteration frequency reaches a preset frequency;

and if so, determining the optimal segmentation strategy corresponding to the termination set based on the current segmentation strategy and the initial segmentation strategy.

In a possible implementation manner of the present application, the step of segmenting the termination set to generate a segmentation policy corresponding to the termination set, and merging the segmentation policy corresponding to the termination set with the initial segmentation policy to generate a new segmentation policy includes:

carrying out recursive segmentation on the termination set to generate a plurality of termination set segmentation subgraphs;

judging whether the number of the plurality of termination set segmentation subgraphs reaches the number of heterogeneous computing units in the heterogeneous platform;

if so, generating a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy;

if not, returning to the step of executing the recursive segmentation of the termination set to generate a plurality of segmentation subgraphs of the termination set and the subsequent steps.

In a possible implementation manner of the present application, the step of comparing the operation time consumed by the new slicing policy and the operation time consumed by the current slicing policy based on an objective function to determine whether to update the current slicing policy to the new slicing policy includes:

calculating the operation time of the neural network model in the heterogeneous platform based on the new segmentation strategy and the operation time of the neural network model in the heterogeneous platform based on the current segmentation strategy;

comparing the operation time consumption corresponding to the new segmentation strategy with the operation time consumption corresponding to the current segmentation strategy;

if the operation time consumption corresponding to the new segmentation strategy is less than the operation time consumption corresponding to the current segmentation strategy, updating the current segmentation strategy into the segmentation strategy;

and if the operation time consumption corresponding to the new cutting strategy is greater than or equal to the operation time consumption corresponding to the current cutting strategy, updating the current cutting strategy into the new cutting strategy by using a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), wherein t (p _ new) is the operation time consumption corresponding to the new cutting strategy, and t (p _ best) is the operation time consumption corresponding to the current cutting strategy.

In a possible embodiment of the present application, the step of calculating the operation time of the neural network model in the heterogeneous platform based on the new segmentation strategy and the operation time of the neural network model in the heterogeneous platform based on the current segmentation strategy includes:

carrying out model derivation on the cut subgraphs of the termination set in the cutting strategy to obtain a plurality of cut subgraph models of the termination set;

distributing the termination set segmentation sub-graph model to corresponding heterogeneous computing units based on hardware resources corresponding to the heterogeneous platform and model parameters of the termination set segmentation sub-graph model;

determining a scheduling sequence of a plurality of cut subgraph models of the termination set according to the dependency relationship and the parallel relationship among the cut subgraphs of the termination set in the cutting strategy;

controlling a heterogeneous computing unit distributed with the termination set segmentation sub-graph model to perform operation based on the scheduling sequence to obtain the running time consumption of the neural network model in the heterogeneous platform based on the segmentation strategy; wherein the slicing strategy comprises the new slicing strategy and the current slicing strategy.

In one possible implementation of the present application, the hardware resources include heterogeneous computing unit topologies.

In one possible embodiment of the present application, the step of generating a computation graph of the neural network model based on the loaded neural network model includes:

analyzing the loaded neural network model into a general format to obtain a neural network model in the general format;

and generating a calculation graph of the neural network model based on the neural network model in the general format.

The present application further provides a neural network model deployment system, the system comprising:

a computational graph obtaining module, configured to generate a computational graph of the neural network model based on the loaded neural network model;

a computational graph partitioning module to partition the computational graph into a plurality of termination sets, each termination set including one or more nodes in the computational graph;

the optimal segmentation strategy generation module is used for generating an optimal segmentation strategy corresponding to each termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model;

and the optimal allocation strategy determining module is used for generating an optimal allocation strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal allocation strategy.

In one possible embodiment of the present application, the computational graph comprises a plurality of nodes and directed edges, wherein each node represents a respective operator, each directed edge connects a respective first node to a respective second node, the input of the operator represented by the respective second node being the output of the operator represented by the respective first node; the computation graph segmentation module is specifically configured to:

performing binary segmentation on the calculation graph to generate a termination set and residual subgraphs; wherein the termination set is disjoint from the remaining subgraph, and the inputs of the operators represented by the nodes in the termination set are the outputs of the operators represented by the nodes in the remaining subgraph; judging whether the residual subgraphs meet a preset segmentation stopping condition; if the residual subgraph does not meet the preset segmentation stopping condition, performing binary segmentation on the residual subgraph to generate a termination set and a new residual subgraph, and returning to execute the step of judging whether the residual subgraph meets the preset segmentation stopping condition and the subsequent steps; if the residual subgraphs meet preset segmentation stopping conditions, stopping performing binary segmentation on the residual subgraphs, and taking a termination set obtained by multiple times of binary segmentation and the residual subgraphs as a plurality of termination sets of the calculation graph;

and/or the plurality of termination sets are sorted in a reverse order according to the segmentation generation time of the termination sets, and the optimal segmentation strategy generation module is specifically configured to:

for each termination set, combining the optimal segmentation strategies corresponding to all termination sets sequenced in front of the termination set to generate an initial segmentation strategy; taking the initial segmentation strategy as a current segmentation strategy; segmenting the termination set to generate a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy; comparing the operation time consumption obtained by the new segmentation strategy and the current segmentation strategy based on a target function, and determining whether to update the current segmentation strategy into the new segmentation strategy; wherein the objective function is that the neural network model is based on a segmentation strategy and hardware resources of a heterogeneous platform for processing the neural network model, and the operation in the heterogeneous platform is time-consuming; acquiring the current iteration times, and judging whether the current iteration times reach preset times or not; if not, returning to execute the segmentation of the termination set to generate a segmentation strategy corresponding to the termination set, combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy, and generating a new segmentation strategy and subsequent steps until the current iteration frequency reaches a preset frequency; if so, determining an optimal segmentation strategy corresponding to the termination set based on the current segmentation strategy and the initial segmentation strategy;

and/or the optimal segmentation strategy generation module is further specifically configured to:

carrying out recursive segmentation on the termination set to generate a plurality of termination set segmentation subgraphs; judging whether the number of the plurality of termination set segmentation subgraphs reaches the number of heterogeneous computing units in the heterogeneous platform; if so, generating a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy; if not, returning to the step of executing recursive segmentation on the termination set to generate a plurality of segmentation subgraphs of the termination set and the subsequent steps;

and/or the optimal segmentation strategy generation module is further specifically configured to: calculating the operation time consumption of the neural network model in the heterogeneous platform based on the new segmentation strategy and the operation time consumption of the neural network model in the heterogeneous platform based on the current segmentation strategy; comparing the operation time consumption corresponding to the new segmentation strategy with the operation time consumption corresponding to the current segmentation strategy; if the operation time consumption corresponding to the new segmentation strategy is less than the operation time consumption corresponding to the current segmentation strategy, updating the current segmentation strategy into the segmentation strategy; if the operation time consumption corresponding to the new slicing policy is greater than or equal to the operation time consumption corresponding to the current slicing policy, updating the current slicing policy to be the new slicing policy by using a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), wherein t (p _ new) is the operation time consumption corresponding to the new slicing policy, and t (p _ best) is the operation time consumption corresponding to the current slicing policy;

and/or the optimal segmentation strategy generation module is further specifically configured to: carrying out model derivation on the cut subgraphs of the termination set in the cutting strategy to obtain a plurality of cut subgraph models of the termination set; distributing the termination set segmentation sub-graph model to corresponding heterogeneous computing units based on hardware resources corresponding to the heterogeneous platform and model parameters of the termination set segmentation sub-graph model; determining a scheduling sequence of a plurality of termination set segmentation sub-graph models according to the dependency relationship and the parallel relationship among the termination set segmentation sub-graphs in the segmentation strategy; controlling a heterogeneous computing unit distributed with the termination set segmentation sub-graph model to perform operation based on the scheduling sequence to obtain the running time consumption of the neural network model in the heterogeneous platform based on the segmentation strategy; wherein the slicing strategy comprises the new slicing strategy and the current slicing strategy;

and/or the hardware resources comprise heterogeneous computing unit topologies;

and/or the computation graph acquisition module is specifically configured to: analyzing the loaded neural network model into a general format to obtain a neural network model in the general format; and generating a calculation graph of the neural network model based on the neural network model in the universal format.

The present application further provides a neural network model deployment device, the device comprising: a memory, a processor, and a neural network model deployment program stored on the memory and executable on the processor, the neural network model deployment program configured to implement the steps of the neural network model deployment method as described above.

The present application further provides a storage medium having a neural network model deployment program stored thereon, where the neural network model deployment program, when executed by a processor, implements the steps of the neural network model deployment method as described above.

Compared with the prior art that the execution efficiency of a heterogeneous platform for executing a neural network model is lower due to the fact that the optimal allocation strategy of the neural network model cannot be obtained, a calculation graph of the neural network model is generated based on the loaded neural network model; segmenting the computational graph into a plurality of termination sets, each termination set comprising one or more nodes in the computational graph; aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model; and generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy. Therefore, the method divides the calculation graph of the network model into a plurality of termination sets, generates an optimal segmentation strategy corresponding to each termination set through heuristic search on the basis of hardware resources of a heterogeneous platform for processing the neural network model, and finally generates an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, thereby improving the execution efficiency of the heterogeneous platform when executing the neural network model.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a neural network model deployment method of the present application;

FIG. 2 is a detailed flowchart of step S20 in FIG. 1;

FIG. 3 is a detailed flowchart of step S30 in FIG. 1;

fig. 4 is a schematic structural diagram of a neural network model deployment device of a hardware operating environment according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

An embodiment of the present application provides a neural network model deployment method, as shown in fig. 1, in an embodiment of the neural network model deployment method, the method includes the following steps:

s10, generating a calculation graph of the neural network model based on the loaded neural network model;

step S20, cutting the calculation graph into a plurality of termination sets, wherein each termination set comprises one or more nodes in the calculation graph;

s30, aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model;

and S40, generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy.

The present embodiment is intended to: the execution efficiency of the heterogeneous platform in the process of executing the neural network model is improved.

Specifically, the method divides a calculation graph of the network model into a plurality of termination sets, generates an optimal division strategy corresponding to each termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model, and finally generates an optimal distribution strategy of the neural network model according to the optimal division strategy corresponding to each termination set, so that the execution efficiency of the heterogeneous platform in executing the neural network model is improved.

Further, in the application, the computation graph is subjected to binary segmentation to generate a termination set and residual subgraphs, and when the residual subgraphs do not meet the preset segmentation stopping condition, the residual subgraphs are subjected to iterative binary segmentation to generate the termination set and new residual subgraphs, so that all the termination sets of the computation graph are obtained, an optimal distribution strategy is generated, a suboptimal distribution strategy of a neural network model is avoided, and the execution efficiency of the heterogeneous platform in executing the neural network model is improved.

Further, in the present application, if the operation time consumption corresponding to the new segmentation policy is less than the operation time consumption corresponding to the current segmentation policy, the current segmentation policy is updated to the segmentation policy; if the operation time consumption corresponding to the new slicing strategy is greater than or equal to the operation time consumption corresponding to the current slicing strategy, updating the current slicing strategy into the new slicing strategy by using a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), wherein t (p _ new) is the operation time consumption corresponding to the new slicing strategy,

and t (p _ best) is the operation time consumption corresponding to the current segmentation strategy. That is, in the present application, if the operation time consumption of the new segmentation strategy 5 in the heterogeneous platform is greater than or equal to the operation time consumption of the current segmentation strategy in the heterogeneous platform

In the process, the current segmentation strategy is updated to a new segmentation strategy with a certain probability, so that the situation that the optimal segmentation strategy is obtained and the global optimal solution cannot be obtained due to the fact that the optimal segmentation strategy is trapped in the local optimal solution is avoided, and the execution efficiency of the heterogeneous platform in executing the neural network model is improved.

Further, in the application, the termination set is recursively segmented to generate a plurality of termination set segmentation 0 sub-graphs until the number of the plurality of termination set segmentation sub-graphs reaches the number of heterogeneous computing units in the heterogeneous platform, and because the termination set segmentation sub-graphs generated after the termination set is recursively segmented are in a parallel relationship, the number of the plurality of termination set segmentation sub-graphs reaches the number of the heterogeneous computing units in the heterogeneous platform, the heterogeneous computing units in the heterogeneous platform can be maximally utilized at the same time, and the execution efficiency of the heterogeneous platform when executing the neural network model is further improved.

Further, in the present application, the hardware resources and the termination set corresponding to the heterogeneous platform are partitioned based on

After the termination set segmentation sub-graph models are distributed to corresponding heterogeneous computing units according to model parameters of the sub-graph models, the scheduling sequence of the multiple termination set segmentation sub-graph models is determined according to the dependency relationship and the parallel relationship among the termination set segmentation sub-graphs in the segmentation strategy, and the termination set segmentation sub-graph models in the parallel relationship can be combined

The method runs, the neural network model executed by the heterogeneous platform is accelerated, and the execution efficiency of the heterogeneous platform in executing the neural 0 network model is improved.

Further, in the present application, when the operation of the computation segmentation strategy in the heterogeneous platform is consumed, the hardware resources corresponding to the heterogeneous platform include a heterogeneous computation unit topology structure, and the termination set segmentation sub-graph model is segmented based on the hardware resources corresponding to the heterogeneous platform and the model parameters of the termination set segmentation sub-graph model

When the data are distributed to the corresponding heterogeneous computing units, the communication overhead when the neural network model is executed by the 5 heterogeneous platform is reduced by considering the connection relation between the heterogeneous computing units.

Further, in the application, the loaded neural network model is analyzed into a general format to obtain the neural network model in the general format, and the neural network model in the general format is supported to be derived from different deep learning frames, so that the neural network model deployment method is suitable for models trained by different deep learning frames and has universality.

The specific steps of 0 are as follows:

and S10, generating a calculation graph of the neural network model based on the loaded neural network model.

As an example, the loaded neural network model is a model generated based on deep learning framework training. Wherein, the deep learning frame can be tensorflow, pytorch, mxnet, and the like.

As an example, in the image domain, the neural network model is a convolutional neural network model; in the fields of speech recognition and natural speech processing, the neural network model is a recurrent neural network model, a self-attention mechanism model and a Transformer network model.

As an example, step S10 includes the steps of:

s11, analyzing the loaded neural network model into a universal format to obtain a neural network model in the universal format;

and S12, generating a calculation graph of the neural network model based on the neural network model with the general format.

As an example, a common format of the Neural Network model may be an ONNX (Open Neural Network Exchange) format.

As an example, the specific implementation process of step S12 is: reading the storage file corresponding to the neural network model passing through the format, traversing each operator in the storage file, extracting the characteristic information (input, output, shape, function and the like) of the traversed operator, constructing the graph representation of the operator by using the characteristic information, and constructing the calculation graph of the neural network model by using the graph representation of all the operators in the storage file.

Wherein, the computation graph of the neural network model can be represented as G = (V; E), V represents a set of nodes in the computation graph (i.e. a set of graph representations of operators in the neural network model), and E represents a set of directed edges in the computation graph (i.e. a dependency relationship between all operators in the neural network model). Operators in the neural network model include a computation class, a data processing class, and a control class. Each class is exemplified: computation classes such as tensor addition, subtraction, multiplication, division, matrix multiplication, convolution, activation, and the like; data processing classes such as transpose, embedding, tensor stitching, tensor division, and the like; control classes such as loops, jumps, etc. The input and the output of the operator are tensors, and the number of tensors input or output by the operator is more than one, and the specific number is not determined, such as concat, split and other operations. In addition to the input-output tensors, each operator has some additional parameters. For example, for convolution operation, there are stride, padding, and disparity parameters.

As an example, step S10 includes the steps of:

s101, simplifying the loaded neural network model;

step S102, analyzing the simplified neural network model into a general format to obtain a neural network model in the general format;

and S103, generating a calculation graph of the neural network model based on the neural network model with the general format.

As an example, the simplified processing of the loaded neural network model may be a batch normalization layer consolidation of the neural network model.

In the application, the loaded neural network model is simplified, and the execution efficiency of the neural network model can be improved on the premise of ensuring the correct equivalence of the neural network model.

And step S20, cutting the computational graph into a plurality of termination sets, wherein each termination set comprises one or more nodes in the computational graph.

As an example, a computational graph includes a plurality of nodes, each node representing a respective operator, and directed edges, each directed edge connecting a respective first node to a respective second node, the inputs of the operators represented by the respective second nodes being the outputs of the operators represented by the respective first nodes.

It should be noted that, the computation graph is cut into a plurality of termination sets, and data dependency can be ensured because the nature of the termination set determines that the termination set can ensure that when the node in the termination set is executed, the input of the operator represented by the node is generated by the remaining subgraphs.

As an example, as shown in fig. 2, the step of segmenting the computational graph into a plurality of termination sets, each termination set including one or more nodes in the computational graph, includes:

s21, performing binary segmentation on the calculation graph to generate a termination set and residual subgraphs; wherein the termination set is disjoint from the remaining subgraph, and the inputs of the operators represented by the nodes in the termination set are the outputs of the operators represented by the nodes in the remaining subgraph.

As an example, the step of performing binary segmentation on the computation graph to generate a termination set and remaining subgraphs includes:

and performing binary segmentation on the calculation graph based on a depth-first traversal algorithm to generate a termination set and residual subgraphs.

And S22, judging whether the residual subgraphs meet preset segmentation stopping conditions.

As an example, the preset cut-stopping condition is that the remaining subgraph is empty, or a single node exists in the remaining subgraph, or a directed edge does not exist between nodes in the remaining subgraph.

And S23, if the residual subgraphs do not meet the preset condition for stopping segmentation, performing binary segmentation on the residual subgraphs to generate a termination set and new residual subgraphs, and returning to the step of judging whether the residual subgraphs meet the preset condition for stopping segmentation and the subsequent steps.

As an example, the step of performing binary splitting on the remaining subgraph to generate a termination set and a new remaining subgraph comprises:

and performing binary segmentation on the remaining subgraphs based on a depth-first traversal algorithm to generate a termination set and new remaining subgraphs.

And step S24, if the residual subgraphs meet preset segmentation stopping conditions, stopping performing binary segmentation on the residual subgraphs, and taking a termination set obtained by the repeated binary segmentation and the residual subgraphs as a plurality of termination sets of the calculation graph.

As an example, for the computation graph V ₀ Performing binary segmentation to generate a termination set S ₁ And remaining subgraph V ₁ (ii) a If remaining subgraphs V ₁ If the preset condition for stopping segmentation is not met, the residual subgraph V is subjected to segmentation ₁ Performing binary segmentation to generate a termination set S ₂ And new remaining subgraph V ₂ (ii) a If remaining subgraphs V ₂ If the preset condition for stopping segmentation is not met, the remaining subgraphs V are processed ₂ Performing binary segmentation to generate a termination set S ₃ And new remaining subgraph V ₃ (ii) a …; if remaining subgraphs V _n-1 If the preset condition for stopping segmentation is not met, the residual subgraph V is subjected to segmentation _n-1 Performing binary segmentation to generate a termination set S _n And new remaining subgraph V _n (ii) a If remaining subgraphs V _n Meeting the preset condition of stopping segmentation, and carrying out pair-stopping residual subgraph V _n Performing binary segmentation, and dividing the two segments for multiple times to obtain a termination set S ₁ 、S ₂ 、S ₃ 、…、S _n And remaining subgraph V _n As multiple termination sets of the computation graph.

And S30, aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model.

As an example, the heterogeneous platform for processing the neural network model is a heterogeneous platform.

As an example, the plurality of termination sets in step S20 are sorted in reverse order by their cut generation time.

As an example, the specific implementation process for dividing the computation graph into a plurality of termination sets is as follows: for calculation chart V ₀ Performing binary segmentation to generate a termination set S ₁ And remaining subgraph V ₁ (ii) a If remaining subgraphs V ₁ If the preset condition for stopping segmentation is not met, the remaining subgraphs V are processed ₁ Performing binary segmentation to terminate generationCollection S ₂ And new remaining subgraph V ₂ (ii) a If remaining subgraphs V ₂ If the preset condition for stopping segmentation is not met, the remaining subgraphs V are processed ₂ Performing binary segmentation to generate a termination set S ₃ And new remaining subgraph V ₃ (ii) a …; if remaining subgraphs V _n-1 If the preset condition for stopping segmentation is not met, the remaining subgraphs V are processed _n-1 Performing binary segmentation to generate a termination set S _n And new remaining subgraph V _n (ii) a If remaining subgraphs V _n Meeting the preset condition of stopping segmentation, and carrying out pair-stopping residual subgraph V _n Performing binary segmentation, and dividing the two segments for multiple times to obtain a termination set S ₁ 、S ₂ 、S ₃ 、…、S _n And remaining subgraph V _n As multiple termination sets of the computation graph. On the basis, the multiple termination sets are sorted in a reverse order according to the segmentation generation time of the termination sets, and the sorting S can be obtained _n+1 、S _n 、…、S ₃ 、S ₂ 、S ₁ . Wherein the termination set S _n+1 I.e. remaining subgraph V _n 。

As shown in fig. 3, as an example, the step of generating, for each termination set, an optimal segmentation policy corresponding to the termination set by a heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model includes:

and S31, aiming at each termination set, merging the optimal segmentation strategies corresponding to 5 of all the termination sets sequenced in front of the termination set to generate an initial segmentation strategy.

As an example, for the termination set ordered first, since there is no termination set before the termination set, the optimal slicing policies corresponding to no termination set may be merged. That is, there is no initial slicing strategy for the termination set ordered first.

And S32, taking the initial segmentation strategy as a current segmentation strategy.

0 as an example, for the termination set ordered first, since there is no initial slicing strategy,

there is no current segmentation strategy.

And S33, segmenting the termination set to generate a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy.

As an example, for the termination set ordered first, since there is no current slicing strategy. 5 can therefore be understood as: and aiming at the termination set ordered at the first position, carrying out segmentation on the termination set to generate a segmentation strategy corresponding to the termination set, wherein the segmentation strategy corresponding to the termination set is a new segmentation strategy.

As an example, for the step of performing segmentation on the termination set, generating a segmentation policy corresponding to the termination set, and merging the segmentation policy corresponding to the termination set and the initial segmentation policy to generate a new segmentation policy, the step includes: and 0, carrying out recursive segmentation on the termination set to generate a plurality of termination set segmentation subgraphs.

As an example, the step of performing recursive partitioning on the termination set to generate multiple termination set partitioning subgraphs includes:

and carrying out recursive segmentation on the termination set based on a depth-first traversal algorithm to generate a plurality of termination set segmentation subgraphs.

Step S332 of judging whether the number of the multiple termination set segmentation subgraphs reaches the heterogeneous platform

The number of heterogeneous computing units in (1).

And S333, if so, generating a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy.

And step S334, if not, returning to the step of executing the recursive segmentation of the termination set to generate a plurality of segmentation subgraphs of the termination 0 termination set and the subsequent steps.

Step S34, comparing the operation time consumption obtained by the new segmentation strategy and the current segmentation strategy based on a target function, and determining whether to update the current segmentation strategy into the new segmentation strategy; wherein the objective function is based on a segmentation strategy for the neural network model and hardware resources of a heterogeneous platform for processing the neural network model, and the operation in the heterogeneous platform is time-consuming.

As an example, the step of comparing the running time consumed by the new slicing policy and the current slicing policy based on an objective function to determine whether to update the current slicing policy to the new slicing policy includes:

step S341, calculating the operation time of the neural network model in the heterogeneous platform based on the new segmentation strategy, and the operation time of the neural network model in the heterogeneous platform based on the current segmentation strategy.

As an example, the step of calculating the running time of the neural network model in the heterogeneous platform based on the new segmentation strategy includes:

and A1, carrying out model derivation on the cut subgraphs of the termination set in the new cutting strategy to obtain a plurality of cut subgraph models of the termination set.

As an example, the plurality of termination set split sub-graph models are termination set split sub-graph models in a common format.

Step A2, distributing the termination set segmentation sub-graph model to corresponding heterogeneous computing units based on hardware resources corresponding to the heterogeneous platform and model parameters of the termination set segmentation sub-graph model;

as an example, when the heterogeneous platform is a heterogeneous platform, the computing units are heterogeneous computing units.

As an example, the hardware resources include the number of heterogeneous computing units, the computing power of each heterogeneous computing unit, the storage power of each heterogeneous computing unit, and the type of algorithms to which each heterogeneous computing unit is adapted.

As an example, the model parameters include the computation amount of the terminal set segmentation sub-graph model, the computation type of the terminal set segmentation sub-graph model, and the memory occupation amount of the terminal set segmentation sub-graph model.

As an example, when the number of the heterogeneous computing units is the same as the number of the termination set split sub-graph models, and the termination set split sub-graph models are allocated to the corresponding heterogeneous computing units, the computing power of the heterogeneous computing units is greater than or equal to the computing amount of the termination set split sub-graph models, the applicable operator type of the heterogeneous computing units is the same as the operator type of the termination set split sub-graph models, and the storage power of the heterogeneous computing units is greater than or equal to the memory occupation amount of the termination set split sub-graph models.

As an example, when the number of heterogeneous computing units is less than the number of termination set cut sub-graph models, there is one heterogeneous computing unit assigned multiple termination set cut sub-graph models. In the above case, the computation capacity of the heterogeneous computation unit is greater than the computation amount of any one of the plurality of terminal set split sub-graph models to which the heterogeneous computation unit is allocated, or greater than the computation amount of the plurality of terminal set split sub-graph models to which the heterogeneous computation unit is allocated

Total calculation amount of the segmentation subgraph model; the operator type suitable for the heterogeneous computing unit is required to be matched with the operator types of the 5 termination set segmentation sub-graph models distributed by the heterogeneous computing unit; the storage capacity of the heterogeneous computing unit is larger than the total memory occupation amount of the distributed termination set segmentation sub-graph models.

As an example, when the number of heterogeneous computing units is less than the number of the termination set split sub-graph models, the termination set split sub-graph models with parallel relations are allocated to different heterogeneous computing units according to the parallel relations between the termination set split sub-graphs in the new splitting strategy.

0 as an example, the hardware resources also include a heterogeneous compute unit topology.

And A3, determining the scheduling sequence of the multiple termination set segmentation sub-graph models according to the dependency relationship and the parallel relationship among the termination set segmentation sub-graphs in the new segmentation strategy.

As an example, the sub-graphs are sliced according to the parallel relationship between the termination set and the sub-graphs in the new slicing strategy,

setting the running sequence of the termination set segmentation sub-graph model with the parallel relation as parallel running; and setting the running sequence of the running model with the dependency relationship to be running back and forth according to the dependency relationship between the cut subgraphs of the termination sets in the new 5-cut strategy, thereby determining the scheduling sequence of the multiple cut subgraph models of the termination sets corresponding to the new cut strategy.

As an example, when recursively slicing the termination set, the termination set sliced subgraphs of the parallel relationship can be identified and marked. There are several different forms of the parallel relationship: bit level, instruction level 0, data and task parallel.

And A4, controlling the heterogeneous computing unit distributed with the termination set segmentation sub-graph model to perform operation based on the scheduling sequence to obtain the running time of the neural network model in the heterogeneous platform based on the new segmentation strategy.

As one type, the multiple termination set segmentation sub-graph models are termination set segmentation sub-graph models 5 in a universal format, and the heterogeneous computing units which are distributed with the termination set segmentation sub-graph models are controlled to run based on the scheduling sequence

And converting the cut subgraph model of the termination set from the general format into a program which can be run by the heterogeneous computing unit. The conversion process is completed by the conversion tool of the heterogeneous platform, and the conversion tools of different heterogeneous platforms are different.

As an example, controlling the heterogeneous computing units to which the termination set segmentation sub-graph model is allocated to run on the basis of a scheduling sequence is completed in a runtime module, where the runtime module provides an abstraction for heterogeneous platform resources, that is, the runtime module provides registration interfaces of different heterogeneous computing units, the heterogeneous computing units can be registered into the runtime module by describing hardware parameters corresponding to the heterogeneous computing units, and the runtime module also provides resource initialization, call execution, and resource release interfaces that the different heterogeneous computing units need to provide. In addition, the runtime module also supports the input of the topology structure of the heterogeneous computing units, and this information is used to reduce the communication overhead in consideration of the connection relationship between the heterogeneous computing units when distributing the termination set segmentation subgraphs in the new segmentation strategy to the corresponding heterogeneous computing units.

As an example, the step of calculating the running time of the neural network model in the heterogeneous platform based on the current segmentation strategy includes:

b1, carrying out model derivation on the cut subgraphs of the termination set in the current cutting strategy to obtain a plurality of cut subgraph models of the termination set;

b2, distributing the termination set segmentation subgraph to corresponding heterogeneous computing units based on hardware resources corresponding to the heterogeneous platform and model parameters of the termination set segmentation subgraph model;

b3, determining a scheduling sequence of a plurality of termination set segmentation sub-graph models according to the dependency relationship and the parallel relationship among the termination set segmentation sub-graphs in the current segmentation strategy;

and step B4, controlling the heterogeneous computing unit distributed with the termination set segmentation sub-graph model to perform operation based on the scheduling sequence to obtain the running time of the neural network model in the heterogeneous platform based on the current segmentation strategy.

It should be noted that the specific process of calculating the operation time consumption of the neural network model in the heterogeneous platform based on the current segmentation strategy is the same as the specific process of calculating the operation time consumption of the neural network model in the heterogeneous platform based on the new segmentation strategy, and is not described herein again.

As an example, in an actual application process, since the initial segmentation strategy is merged by the optimal segmentation strategies corresponding to all the termination sets ordered before the termination set, and the running time corresponding to the optimal segmentation strategies corresponding to all the termination sets ordered before the termination set is consumed, the optimal segmentation strategy is known when being determined. Therefore, when the operation time consumption corresponding to the new segmentation strategy is calculated, only the operation time consumption corresponding to the segmentation strategy corresponding to the termination set needs to be calculated, and the operation time consumption corresponding to the segmentation strategy corresponding to the termination set is added to the operation time consumption corresponding to the optimal segmentation strategies respectively corresponding to all the termination sets sequenced before the termination set, so that the operation time consumption corresponding to the new segmentation strategy can be obtained. Similarly, when the current segmentation strategy is the initial segmentation strategy, the operation time consumption corresponding to the current segmentation strategy is the sum of the operation time consumptions corresponding to the optimal segmentation strategies corresponding to all the termination sets sequenced before the termination set.

Step S342, comparing the operation time consumption corresponding to the new slicing strategy with the operation time consumption corresponding to the current slicing strategy.

Step S343, if the operation time consumption corresponding to the new segmentation policy is less than the operation time consumption corresponding to the current segmentation policy, updating the current segmentation policy to the segmentation policy.

Step S344, if the operation time consumption corresponding to the new slicing policy is greater than or equal to the operation time consumption corresponding to the current slicing policy, updating the current slicing policy to the new slicing policy with a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), where t (p _ new) is the operation time consumption corresponding to the new slicing policy, and t (p _ best) is the operation time consumption corresponding to the current slicing policy.

And S35, acquiring the current iteration frequency, and judging whether the current iteration frequency reaches a preset frequency.

As an example, the preset number Kmax is specified in advance by the user. The present embodiment is not particularly limited.

And S36, if not, returning to execute the segmentation of the termination set to generate a segmentation strategy corresponding to the termination set, combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy, and generating a new segmentation strategy and subsequent steps until the current iteration frequency reaches a preset frequency.

And step S37, if yes, determining an optimal segmentation strategy corresponding to the termination set based on the current segmentation strategy and the initial segmentation strategy.

As an example, the initial segmentation strategy is obtained by merging the optimal segmentation strategies corresponding to all termination sets sorted before the termination set, in the termination set segmentation subgraphs of the current segmentation strategy, the termination set segmentation subgraphs corresponding to the initial segmentation strategy are deleted, and the remaining termination set segmentation subgraphs are the optimal segmentation strategies corresponding to the termination set.

And S30, generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy.

As an example, the multiple termination sets are sorted in a reverse order according to the segmentation generation time of the termination sets, and the optimal distribution strategy of the neural network model can be obtained after the optimal segmentation strategies corresponding to the termination sets are sorted in the reverse order.

Compared with the prior art that the execution efficiency of the heterogeneous platform when the neural network model is executed is low due to the fact that the optimal allocation strategy of the neural network model cannot be obtained, in the embodiment, a calculation graph of the neural network model is generated based on the loaded neural network model; segmenting the computational graph into a plurality of termination sets, each termination set comprising one or more nodes in the computational graph; aiming at each termination set, generating an optimal segmentation strategy corresponding to the termination set through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model; and generating an optimal distribution strategy of the neural network model according to the optimal segmentation strategy corresponding to each termination set, so that the heterogeneous platform deploys the neural network model based on the optimal distribution strategy. Therefore, in the embodiment, the computational graph of the network model is divided into a plurality of termination sets, an optimal segmentation strategy corresponding to each termination set is generated through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model, and finally, an optimal allocation strategy of the neural network model is generated according to the optimal segmentation strategy corresponding to each termination set, so that the execution efficiency of the heterogeneous platform in executing the neural network model is improved.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a neural network model deployment device of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 4, the neural network model deployment device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in FIG. 4 does not constitute a limitation of the neural network model deployment device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 4, a memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a neural network model deployment program.

In the neural network model deployment device shown in fig. 4, the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the neural network model deployment device according to the present application may be disposed in the neural network model deployment device, and the neural network model deployment device calls the neural network model deployment program stored in the memory 1005 through the processor 1001 to implement any of the steps of the neural network model deployment method described above.

The specific implementation of the neural network model deployment device in the present application is substantially the same as that of each embodiment of the neural network model deployment method described above, and details are not described here again.

carrying out recursive segmentation on the termination set to generate a plurality of termination set segmentation subgraphs; judging whether the number of the plurality of termination set segmentation subgraphs reaches the number of heterogeneous computing units in the heterogeneous platform; if so, generating a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy; if not, returning to the step of executing the recursive segmentation of the termination set to generate a plurality of segmentation subgraphs of the termination set and the subsequent steps;

and/or the optimal segmentation strategy generation module is further specifically configured to: calculating the operation time consumption of the neural network model in the heterogeneous platform based on the new segmentation strategy and the operation time consumption of the neural network model in the heterogeneous platform based on the current segmentation strategy; comparing the operation time consumption corresponding to the new segmentation strategy with the operation time consumption corresponding to the current segmentation strategy; if the operation time consumption corresponding to the new segmentation strategy is less than the operation time consumption corresponding to the current segmentation strategy, updating the current segmentation strategy into the segmentation strategy; if the operation time consumption corresponding to the new cutting strategy is greater than or equal to the operation time consumption corresponding to the current cutting strategy, updating the current cutting strategy into the new cutting strategy by using a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), wherein t (p _ new) is the operation time consumption corresponding to the new cutting strategy, and t (p _ best) is the operation time consumption corresponding to the current cutting strategy;

and/or the hardware resources comprise heterogeneous computing unit topologies;

and/or the computation graph acquisition module is specifically configured to: analyzing the loaded neural network model into a general format to obtain a neural network model in the general format; and generating a calculation graph of the neural network model based on the neural network model in the general format.

The specific implementation of the neural network model deployment apparatus of the present application is substantially the same as that of each embodiment of the neural network model deployment method described above, and details are not described here.

The present application provides a storage medium, and the storage medium stores one or more programs, which can be further executed by one or more processors for implementing the steps of the neural network model deployment method described in any one of the above.

The specific implementation of the storage medium of the present application is substantially the same as that of each embodiment of the neural network model deployment method, and is not described herein again.

The present application also provides a computer program product, comprising a computer program which, when executed by a processor, implements the steps of the neural network model deployment method described above.

The specific implementation of the computer program product of the present application is substantially the same as the embodiments of the neural network model deployment method described above, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal heterogeneous platform (which may be a mobile phone, a computer, a server, or a network heterogeneous platform) to execute the method described in the embodiments of the present application.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A neural network model deployment method is characterized by comprising the following steps:

2. The neural network model deployment method of claim 1, wherein the computational graph comprises a plurality of nodes and directed edges, wherein each node represents a respective operator, each directed edge connecting a respective first node to a respective second node, the inputs of the operators represented by the respective second nodes being the outputs of the operators represented by the respective first nodes; the step of segmenting the computational graph into a plurality of termination sets, each termination set comprising one or more nodes in the computational graph, comprises:

judging whether the residual subgraphs meet a preset segmentation stopping condition;

3. The neural network model deployment method of claim 1, wherein the plurality of termination sets are sorted in reverse order according to their slicing generation time, and the step of generating the optimal slicing strategy corresponding to the termination sets through heuristic search based on hardware resources of a heterogeneous platform for processing the neural network model for each termination set comprises:

taking the initial segmentation strategy as a current segmentation strategy;

4. The neural network model deployment method of claim 3, wherein the step of segmenting the termination set to generate the segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy comprises:

if not, returning to the step of executing recursive segmentation on the termination set to generate a plurality of segmentation subgraphs of the termination set and the subsequent steps.

5. The neural network model deployment method of claim 3, wherein the step of comparing the running time consumption obtained by the new slicing strategy and the current slicing strategy based on an objective function to determine whether to update the current slicing strategy to the new slicing strategy comprises:

6. The method for deploying a neural network model according to claim 5, wherein the step of calculating the running time of the neural network model in the heterogeneous platform based on the new segmentation strategy and the running time of the neural network model in the heterogeneous platform based on the current segmentation strategy comprises:

determining a scheduling sequence of a plurality of termination set segmentation sub-graph models according to the dependency relationship and the parallel relationship among the termination set segmentation sub-graphs in the segmentation strategy;

7. The neural network model deployment method of claim 6, wherein the hardware resources comprise a heterogeneous computational unit topology.

8. The neural network model deployment method of claim 1, wherein the step of generating a computational graph of the neural network model based on the loaded neural network model comprises:

9. A neural network model deployment system, the system comprising:

10. The neural network model deployment system of claim 9, wherein the computational graph comprises a plurality of nodes and directed edges, wherein each node represents a respective operator, each directed edge connecting a respective first node to a respective second node, the inputs of the operators represented by the respective second nodes being the outputs of the operators represented by the respective first nodes; the computation graph segmentation module is specifically configured to:

for each termination set, combining the optimal segmentation strategies corresponding to all termination sets sequenced in front of the termination set to generate an initial segmentation strategy; taking the initial segmentation strategy as a current segmentation strategy; segmenting the termination set to generate a segmentation strategy corresponding to the termination set, and combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy to generate a new segmentation strategy; comparing the operation time consumption obtained by the new segmentation strategy and the current segmentation strategy based on a target function, and determining whether to update the current segmentation strategy into the new segmentation strategy; wherein the objective function is that the neural network model is based on a segmentation strategy and hardware resources of a heterogeneous platform for processing the neural network model, and the operation in the heterogeneous platform is time-consuming; acquiring current iteration times, and judging whether the current iteration times reach preset times or not; if not, returning to execute the segmentation of the termination set to generate a segmentation strategy corresponding to the termination set, combining the segmentation strategy corresponding to the termination set with the initial segmentation strategy, and generating a new segmentation strategy and subsequent steps until the current iteration frequency reaches a preset frequency; if so, determining an optimal segmentation strategy corresponding to the termination set based on the current segmentation strategy and the initial segmentation strategy;

and/or the optimal segmentation strategy generation module is further specifically configured to: calculating the operation time of the neural network model in the heterogeneous platform based on the new segmentation strategy and the operation time of the neural network model in the heterogeneous platform based on the current segmentation strategy; comparing the operation time consumption corresponding to the new segmentation strategy with the operation time consumption corresponding to the current segmentation strategy; if the operation time consumption corresponding to the new segmentation strategy is less than the operation time consumption corresponding to the current segmentation strategy, updating the current segmentation strategy into the segmentation strategy; if the operation time consumption corresponding to the new slicing policy is greater than or equal to the operation time consumption corresponding to the current slicing policy, updating the current slicing policy to be the new slicing policy by using a probability exp (- (t (p _ new) -t (p _ best))/t (p _ best)), wherein t (p _ new) is the operation time consumption corresponding to the new slicing policy, and t (p _ best) is the operation time consumption corresponding to the current slicing policy;

and/or the hardware resources comprise heterogeneous computing unit topologies;

11. A neural network model deployment device, the device comprising: a memory, a processor, and a neural network model deployment program stored on the memory and executable on the processor, the neural network model deployment program configured to implement the steps of the neural network model deployment method of any one of claims 1-8.

12. A storage medium having stored thereon a neural network model deployment program that, when executed by a processor, implements the steps of the neural network model deployment method of any one of claims 1 to 8.