CN115809684A - Network model heterogeneous deployment method, device, equipment and storage medium - Google Patents

Network model heterogeneous deployment method, device, equipment and storage medium Download PDF

Info

Publication number
CN115809684A
CN115809684A CN202211707960.2A CN202211707960A CN115809684A CN 115809684 A CN115809684 A CN 115809684A CN 202211707960 A CN202211707960 A CN 202211707960A CN 115809684 A CN115809684 A CN 115809684A
Authority
CN
China
Prior art keywords
network
network layer
sub
inference engine
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211707960.2A
Other languages
Chinese (zh)
Inventor
林贤早
王康
陈波扬
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211707960.2A priority Critical patent/CN115809684A/en
Publication of CN115809684A publication Critical patent/CN115809684A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a heterogeneous network model deployment method, device, equipment and storage medium, which are used for deploying a network model to different hardware inference engines in a heterogeneous mode. The method comprises the following steps: determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationship of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each network layer, wherein each sub-network comprises at least one network layer, the hardware inference engine identifiers corresponding to at least one network layer in each sub-network are the same, and when the sub-networks comprise at least two network layers, at least two network layers have a direct or indirect input and output connection relationship; and sending the configuration information of each sub-network to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each sub-network, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engines.

Description

Network model heterogeneous deployment method, device, equipment and storage medium
Technical Field
The application relates to the field of deployment of deep learning network models, in particular to a network model heterogeneous deployment method, device, equipment and storage medium.
Background
With the rise of deep web learning technology, deep web models have been applied to various aspects of daily life, such as face detection and positioning, face comparison, voice recognition, fingerprint recognition, iris recognition, and the like. After each network layer of the deep learning network model is constructed, each network layer needs to be converted into a corresponding operator, and the hardware inference engine processes the operator corresponding to each network layer to complete the inference process of the deep learning network model.
However, the underlying implementation architectures of different hardware inference engines are different, the equipped operator libraries are different, and the like, so that each hardware inference engine has different performance when processing different operators and even operators of the same type, and the problem of how to accurately deploy the deep learning network model to different hardware inference engines to execute operations exists.
Disclosure of Invention
The application provides a heterogeneous network model deployment method, device, equipment and storage medium, wherein the heterogeneous network model can be deployed on different hardware reasoning engines by dividing the network model into a plurality of sub-networks.
In a first aspect, the present application provides a network model heterogeneous deployment method, including:
determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model;
dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation;
and sending the configuration information of each sub-network in the plurality of sub-networks of the network model to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engine.
Further, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers includes:
traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as input;
if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs;
and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
Further, before dividing each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks to which the first network layer belongs, the method further includes:
for each second network layer with the same hardware inference engine identification as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer;
and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.
Further, the determining the hardware inference engine identifier corresponding to each of the plurality of network layers of the network model includes:
acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time consumption information of the target operator on a plurality of hardware reasoning engines;
and determining a hardware inference engine identifier corresponding to each network layer in the plurality of network layers of the network model according to the information of the target operator of each network layer in the plurality of network layers for realizing the network model.
Further, the obtaining information of a target operator of each of a plurality of network layers implementing the network model includes:
acquiring information of a plurality of operators for realizing the network layer aiming at each network layer of the network model, wherein the information of each operator comprises time-consuming information of the operator on a plurality of hardware inference engines;
and determining a target operator for realizing the network layer according to the information of the operators for realizing the network layer.
Further, the heterogeneously deploying the plurality of subnetworks of the network model according to the hardware inference engine identification corresponding to each subnetwork in the plurality of subnetworks comprises:
determining an execution order of each of the plurality of subnetworks according to an execution order of a plurality of network layers of the network model;
and inserting a data conversion node for converting data formats between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification to finish the model heterogeneous deployment of the network model.
In a second aspect, the present application provides a network model heterogeneous deployment device, including:
the marking module is used for determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of the network model;
a processing module, configured to divide the multiple network layers into multiple subnetworks according to connection relationships of the multiple network layers in the network model and hardware inference engine identifiers corresponding to each of the multiple network layers, where each subnetwork includes at least one network layer, and the hardware inference engine identifiers corresponding to the at least one network layer in each subnetwork are the same, and when the subnetwork includes at least two network layers, the at least two network layers have a direct or indirect input-output connection relationship therebetween;
the processing module is further configured to send the configuration information of each of the plurality of subnetworks of the network model to the corresponding hardware inference engine according to the hardware inference engine identifier corresponding to each of the plurality of subnetworks, so that the plurality of subnetworks of the network model are deployed to the corresponding hardware inference engine in a heterogeneous manner.
Further, when the processing module divides the plurality of network layers into a plurality of subnetworks according to the connection relationship between the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, the processing module is specifically configured to: traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
Further, before the processing module divides each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks belonging to the first network layer, the processing module is further configured to: for each second network layer with the same hardware inference engine identifier as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer; and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.
Further, when the marking module determines the hardware inference engine identifier corresponding to each of the plurality of network layers of the network model, the marking module is specifically configured to: acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time-consuming information of the target operator on a plurality of hardware inference engines; and determining a hardware inference engine identifier corresponding to each network layer in the plurality of network layers of the network model according to the information of the target operator of each network layer in the plurality of network layers for realizing the network model.
Further, when the marking module obtains information of a target operator of each network layer of the plurality of network layers implementing the network model, the marking module is specifically configured to: acquiring information of a plurality of operators for realizing the network layer aiming at each network layer of the network model, wherein the information of each operator comprises time-consuming information of the operator on a plurality of hardware inference engines; and determining a target operator for realizing the network layer according to the information of the operators for realizing the network layer.
Further, when the processing module deploys the plurality of subnetworks of the network model in a heterogeneous manner according to the hardware inference engine identifier corresponding to each subnetwork in the plurality of subnetworks, the processing module is specifically configured to: determining an execution order of each of the plurality of sub-networks according to an execution order of a plurality of network layers of the network model; and inserting a data conversion node for data format conversion between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification, thereby completing model heterogeneous deployment of the network model.
In a third aspect, the present application provides an electronic device, which at least comprises a processor and a memory, and when the processor executes a computer program or instructions stored in the memory, the method of the first aspect is implemented.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program or instructions which, when executed by a processor, implement the method of the first aspect.
In the application, a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model is determined; according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, the plurality of network layers are divided into a plurality of sub-networks, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, when the sub-network comprises at least two network layers, direct or indirect input and output connection relation exists between the at least two network layers, and according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, the configuration information of each sub-network in the plurality of sub-networks of the network model is sent to the corresponding hardware inference engine, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engine, the plurality of network layers of the network model can be respectively deployed to different hardware inference engines according to the divided sub-networks, and the efficiency of the network model is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1A is a flowchart of a network model heterogeneous deployment method provided in the present application.
FIG. 1B is a diagram of one possible hardware inference engine architecture sub-network operator provided in the present application.
Fig. 2 is a schematic diagram of a possible network model divided into a plurality of sub-networks provided in the present application.
Fig. 3 is a schematic diagram of steps of one possible way of dividing sub-networks provided in the present application.
Fig. 4 is a schematic diagram of a possible node partitioning manner provided in the present application.
Fig. 5 is a flowchart of a network model heterogeneous deployment provided in the present application.
Fig. 6 is a schematic structural diagram of a network model heterogeneous deployment device provided in the present application.
Fig. 7 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or textual entities and are not necessarily intended to define a particular order or sequence unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
With the rise of deep web learning technology, deep web learning web models have been applied to various aspects of daily life, such as face detection and positioning, face comparison, voice recognition, fingerprint recognition, iris recognition, and the like. After each network layer of the deep learning network model is constructed, each network layer needs to be converted into a corresponding operator, and the hardware inference engine processes the operator corresponding to each network layer to complete the inference process of the deep learning network model.
However, the underlying implementation architectures of different hardware inference engines are different, the provided operator libraries are different, and the like, so that each hardware inference engine has different performance expressions when processing different operators and even operators of the same type, and a problem exists in how to put the operator corresponding to each network layer of the deep learning network model on different hardware inference engines to execute operations.
Based on this, the application provides a heterogeneous network model deployment method, device, equipment and storage medium, and the network layer included in the network model is divided into a plurality of sub-networks, and each sub-network is deployed on different hardware inference engines, so that the network layer of the network model is deployed on different hardware inference engines to operate, and the inference efficiency of the network model is improved.
Fig. 1A is a flowchart of a network model heterogeneous deployment method, which may be applied to an electronic device, and the method includes:
s101: and determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of the network model.
In this step, the Network model may be a deep learning Network model such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a generated countermeasure Network (GAN), or a deep learning Network model that is improved by adding, deleting, or modifying a Network layer. The target operator of each network layer may be an operator selected from an operator library of multiple hardware inference engines, which is the shortest in time consumption for implementing the network, or an operator in an operator library of a preset certain hardware inference engine. In addition, the hardware Inference engine includes, for example, an Open Visual Inference Neural Network Optimization (OpenVINO) hardware Inference engine developed by intel (intel) and usable in a Central Processing Unit (CPU), a TensorRT hardware Inference engine developed by NVIDIA (NVIDIA) and usable in a Graphics Processing Unit (GPU), and the like. Wherein, one Operator (OP) may be a computing unit in a deep learning algorithm, and in the network model, the Operator of each network layer may correspond to the computing logic of the network layer in the network model. For example, the convolution algorithm in the convolution network layer and the weight summation process in the fully-connected layer can be considered as an operator. For the same network layer, there may be multiple types of operators to implement the computation logic of the network layer, for example, the activation operators may include Sigmoid, tanh, reLU, leak ReLU Softmax, and the like.
Alternatively, the electronic device may obtain operator information of each network layer of the network model by running each network layer of the network model under different hardware inference engines and different inference acceleration methods. The operator information of one network layer may include information about time consumption, memory configuration, and the like of multiple operators in the network layer under different hardware inference engines (that is, hardware inference engine information corresponding to the operators). The electronic equipment can determine an operator for realizing optimal performance of each network layer and a hardware inference engine identifier corresponding to each network layer according to the operator information of each network layer. For example, the network layer a is an active layer, the corresponding activation operators for realizing the network layer a include Sigmoid, tanh, reLU, leak ReLU, softmax, and the like, and the time consumption, memory arrangement, and other information of the Sigmoid, tanh, reLU, leak ReLU, softmax, and other activation operators on OpenVINO or TensorRT or other hardware inference engines are obtained by a simulation reasoning test mode, so that the target activation operator with the optimal performance for the network layer a and the hardware inference engine identifier corresponding to the network layer a can be determined.
It should be noted that different hardware inference engines have their own associated operator libraries, and thus the operator libraries of any two hardware inference engines may contain different operators. In addition, even if the two hardware inference engines support the same operator, the time consumption, memory arrangement and other information of the final operator may be different due to different hardware corresponding to the two hardware inference engines. For example, a CPU that tends to process small amounts of data quickly may process activation operators faster, etc., a GPU that tends to process large amounts of data may process convolution operators faster, etc.
S102: dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation.
To better illustrate how the network layers of the network model are divided, fig. 2 is taken as an example, and fig. 2 is a schematic diagram of a possible network model divided into a plurality of sub-networks provided in the present application. As shown in fig. 2, the upper network model has 9 network layers, a white network layer corresponds to the same hardware inference engine identifier a, a gray network layer corresponds to the same hardware inference engine identifier B, network layer 1, network layer 4, network layer 6, network layer 8, and network layer 9 can run on OpenVINO (or hardware inference engines such as TensorRT that support other network layer corresponding operators), network layer 2, network layer 3, network layer 5, and network layer 7 can run on TensorRT (or hardware inference engines such as OpenVINO that support other network layer corresponding operators), because network layer 2 and network layer 3 corresponding to hardware inference engine identifier B exist between network layer 1 and network layer 4, and only one path exists from network layer 1 to network layer 2 to network layer 3 to network layer 4, network layer 1 and network layer 4 cannot be divided into the same sub-network, and similarly, network layer 6, network layer 8, and network layer 9. Any two network layers of the network layer 2, the network layer 3, the network layer 5 and the network layer 7 corresponding to the same hardware inference engine identifier B have direct or indirect input and output relations, and a path between any two network layers does not have a network layer corresponding to the hardware inference engine identifier a (or other hardware inference engine identifiers except the hardware inference engine identifier B). Therefore, the upper network model in fig. 2 is divided into 6 sub-networks, which are respectively sub-network a: network layer 1, sub-network b: network layer 2, network layer 3, network layer 5, and network layer 7, subnetwork c: network layer 4, subnetwork d: network layer 6, sub-network e: network layer 8, sub-network f: network layer 9, wherein the input, output of each sub-network may be determined according to the output, output of at least one network layer comprised by the sub-network. Since the sub-networks a, c, d, e and f include only one network layer, the input and output thereof correspond to the input and output of the network layer included in the sub-networks, respectively, and the sub-network b includes one or more network layers, the input and output of the network layer 2 and the input of the network layer 3 can be used as the input and output of the sub-network b, that is, the input and output of the network layer 3 can be used as the two inputs of the sub-network b, the output of the network layer 3, the output of the network layer 5 and the output of the network layer 7 can be used as the three outputs of the sub-networks, and the output of the network layer 2 can be omitted, and the calculation result of the network layer 2 is directly introduced into the network layer 3, the calculation result of the network layer 3 is introduced into the network layer 5, and the calculation result of the network layer 5 is introduced into the network layer 7, so as to increase the efficiency of the sub-networks after the network model is heterogeneous.
It should be understood that any two network layers under the same sub-network may have direct or indirect input-output relationship therebetween: under a sub-network, any one network layer can be directly connected with another sub-network under the sub-network, and can also be connected with another sub-network through at least one network layer under the sub-network. For example, the lower network model in fig. 2 can also be divided into 6 sub-networks: sub-network A: network layer 1, sub-network B: network layer 2 and network layer 3, subnetwork C: network layer 4, subnetwork D: network layer 5 and network layer 6, sub-network E: network layer 7 and network layer 8, sub-network F: a network layer 9. In contrast to the above network model, the presence of a network layer 5, which does not belong to sub-network B or E, between network layer 2 and 3 of sub-network B and network layer 7 and 8 of sub-network E results in sub-network B and sub-network E not being able to be divided under the same sub-network.
S103: sending configuration information of each of the plurality of sub-networks of the network model to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each of the plurality of sub-networks, so that the plurality of sub-networks of the network model are deployed to the corresponding hardware inference engine in a heterogeneous manner.
The sub-network comprises at least one network layer, and the configuration information of the sub-network is the number of the network layers contained in the sub-network and the configuration information of input, output, quantization parameters and the like of each network layer in the sub-network.
In this step, the electronic device may send the configuration information and the quantization weight of each sub-network in the plurality of sub-networks of the network model to the hardware inference engine corresponding to each sub-network, and each hardware inference engine may invoke a corresponding operator from its own operator library according to the configuration information and the quantization weight of the sub-network corresponding to each sub-network, so as to complete the operation process of each sub-network.
After the sub-network including at least two network layers is deployed on the hardware inference engine, the input and output processes between the network layers in the sub-network can be omitted according to the number of the network layers included in the sub-network and the input and output of each network layer, and the calculation result of the previous network layer is directly imported into the corresponding network layer. In other words, the sub-network can be regarded as a sub-network operator containing the computing logic of a plurality of operators, and similar to the operators for realizing other single network layers, the sub-network operator can directly obtain the output of the sub-network according to the input of the sub-network, and the intermediate processes are not output one by one. Fig. 1B is a schematic diagram of a possible hardware inference engine for constructing sub-network operators according to this embodiment. As shown in fig. 1B, the hardware inference engine initializes and constructs a sub-network operator according to the number of network layers included in the sub-network layer and the input and output of each network layer in the sub-network, and then applies for a memory space for storing configuration information and quantization weights of each network layer in the sub-network operator according to the number of network layers included in the sub-network layer, and copies the configuration information and quantization weights of each network layer in the sub-network to the memory space according to the execution sequence of each network layer in the sub-network as the configuration information and quantization weights of the sub-network operator.
Alternatively, the order of execution of each of the plurality of sub-networks can be determined in accordance with the order of execution of the plurality of network layers of the network model. And because the input and output memory types of different hardware inference engines are different, a data conversion node for data format conversion between the hardware inference engines can be inserted between the two adjacent sub-networks according to the execution sequence of each sub-network and the corresponding hardware inference engine identification, so that after the sub-network operator corresponding to the previous sub-network is executed, the data which is subjected to data conversion and has correct memory type can be input into the next sub-network, and model heterogeneous deployment of the network model is completed.
In the application, a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model is determined; according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, the plurality of network layers are divided into a plurality of sub-networks, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to the at least one network layer in each sub-network is the same, when the sub-network comprises at least two network layers, direct or indirect input and output connection relation is formed between the at least two network layers, according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, the configuration information of each sub-network in the plurality of sub-networks of the network model is sent to the corresponding hardware inference engine, heterogeneous deployment of the plurality of sub-networks of the network model is completed, the plurality of network layers of the network model can be respectively deployed to different hardware inference engines according to the divided sub-networks, and the inference efficiency of the network model is improved.
In one possible implementation, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers comprises:
traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input;
if a second network layer with the same hardware inference engine identifier as the first network layer exists, dividing each second network layer with the same hardware inference engine identifier as the first network layer into sub-networks to which the first network layer belongs;
and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
Further, before dividing each second network layer having the same hardware inference engine identifier as the first network layer under the sub-network to which the first network layer belongs, the method may further include:
for each second network layer with the same hardware inference engine identification as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer;
and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.
Optionally, each network layer of the network model may be regarded as a node, one obtained network after division is regarded as a sub-graph, each node of the network model is traversed through the execution sequence of each network layer of the network model, and the nodes corresponding to the same hardware inference engine identifier are divided into one sub-graph, so as to obtain a plurality of sub-graphs (i.e., sub-networks) of the network model. To better illustrate how the network model is divided into a plurality of sub-networks, reference is now made to fig. 3 as an example, and fig. 3 is a schematic step diagram of a possible sub-network dividing manner provided in the present application. As shown in fig. 3, first, the currently traversed node is denoted as src, all successor node next of the node src is found (i.e., the node with the output of the node src as input, that is, the node src is used as a first network layer, and all successor node next is used as all second network layers corresponding to the first network layer), whether each successor node next and the node src correspond to the same hardware inference engine identifier is determined one by one, and if the currently determined successor node next and the node src correspond to different hardware inference engine identifiers, the next successor node next is determined.
Then, if the current successor node next corresponds to the same hardware inference engine identifier as the node src, finding out all predecessor nodes a of the successor node next except for the node src (where it can be considered that at least one output of all predecessor nodes corresponds to one input of the successor node next), and judging one by one whether a path exists from the node src to each predecessor node a (i.e. whether the predecessor node is a descendant node of the node src), if at least one predecessor node B exists in all predecessor nodes a and is a descendant node of the node src (i.e. a path exists from the node src to the predecessor node B), judging whether all nodes (including the predecessor node B) on the path from the node src to the predecessor node B correspond to the same hardware inference engine identifier as the node src. If the same hardware reasoning identification is corresponding, all nodes from the node src to the path of the precursor node B and the successor node next are divided into a sub-graph, and if the same hardware reasoning identification is not corresponding, the successor node next and the node src cannot be divided into a sub-graph. And taking the nodes C to D as the execution sequence of the network model, wherein if the nodes C are directly connected with the nodes D or indirectly connected through a plurality of nodes, the nodes D are descendants of the nodes C.
To better illustrate the way of obtaining the subgraph by the partitioning in the above manner, fig. 4 is taken as an example, and fig. 4 is a schematic diagram of a possible node partitioning manner. As shown in fig. 4, it is assumed that a circle is represented as a node corresponding to the identifier a (i.e., a hardware inference engine identifier), and a rectangle is represented as a node corresponding to the identifier b (i.e., another hardware inference engine identifier different from the identifier a). The node currently traversed is src, followed by next1, next2, and next3 nodes. The precursor node of next1 is prev1, and the precursor node of next2 is prev2. For src and next1, as prev1 is also a precursor node of src at the same time, there is only one directly connected path from src to next1 in the graph, and src and next1 correspond to the same identifier a, then src and next1 can be divided into the same sub-graph. For src and next, it can be known that src and next2 correspond to the same identifier a, but because a predecessor node prev2 of next2 corresponds to identifier b (i.e. nodes on a path from src to next2 are not both identifiers a), src and next2 cannot be partitioned under the same sub-graph. For a path which corresponds to the same identifier a and has only one unique direct connection between src and next3, src and next3 can be divided into the same sub-graph. Finally, src is partitioned under the same subgraph as next1 and next3.
Fig. 5 is a flowchart of another network model heterogeneous deployment provided by the present application. As shown in fig. 5, the process of network model heterogeneous includes:
s501: each network layer of the network model is labeled.
Operator information of each network layer of the network model can be obtained by operating each network layer of the network model under different hardware reasoning engines and different reasoning acceleration methods. And determining an operator realizing the optimal performance of the network layer and marking a corresponding hardware inference engine identifier for the network layer according to the time consumption information, memory arrangement and other information of a plurality of operators in the operator information on different hardware inference engines.
S502: and dividing to obtain a plurality of sub-networks according to the connection relation of each network layer and the mark of the hardware reasoning engine.
The subgraph partitioning of the currently traversed node can be obtained by determining whether all successors (second network layers) to the currently traversed node (first network layer) can be partitioned into the same subgraph (sub-network), using, for example, the steps shown in fig. 3. According to the execution sequence of the network model, all the nodes are traversed, and therefore a plurality of nodes (a plurality of network layers) contained in the network model can be divided into a plurality of sub-graphs (sub-networks).
S503: each subnetwork is deployed to a corresponding hardware inference engine.
And respectively sending each divided sub-network and the network configuration information corresponding to each sub-network to the corresponding hardware inference engine, and calling the corresponding operator in the operator library of the hardware inference engine to finish the operation of each sub-network according to the network configuration information and the quantitative weight of each sub-network by the hardware inference engine corresponding to each sub-network. The network configuration information corresponding to each self-network comprises the number of network layers contained in the sub-network, parameters, operation instructions, quantization parameters, input, output and quantization weight addresses of all the network layers contained in the sub-network, and information of quantization parameter arrangement, alignment and the like of the sub-network.
S504: and (4) inserting conversion operators among different hardware reasoning engines to finish heterogeneous deployment of the network model.
Because the input and output memory types of different sub-networks may be different, the input and output data of different sub-networks are connected in series by judging the memory types and inserting data conversion nodes (i.e. conversion operators) to finally generate a complete heterogeneous deployment model.
Based on the foregoing network model heterogeneous deployment method, the present application provides a network model heterogeneous deployment device, and fig. 6 is a schematic structural diagram of a network model heterogeneous deployment device provided in an embodiment of the present application, where the apparatus includes:
a marking module 601, configured to determine a hardware inference engine identifier corresponding to each of multiple network layers of a network model;
a processing module 602, configured to divide the multiple network layers into multiple sub-networks according to the connection relationships of the multiple network layers in the network model and the hardware inference engine identifiers corresponding to each network layer in the multiple network layers, where each sub-network includes at least one network layer, and the hardware inference engine identifiers corresponding to at least one network layer included in each sub-network are the same, and when the sub-network includes at least two network layers, there is a direct or indirect input/output connection relationship between the at least two network layers;
the processing module 602 is further configured to send, according to the hardware inference engine identifier corresponding to each of the multiple subnetworks, the configuration information of each of the multiple subnetworks of the network model to the corresponding hardware inference engine, so that heterogeneous deployment of the multiple subnetworks of the network model to the corresponding hardware inference engine is achieved.
Further, when the processing module 602 divides the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, the processing module is specifically configured to: traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
Further, before the processing module 602 divides each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks belonging to the first network layer, the processing module is further configured to: for each second network layer with the same hardware inference engine identification as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer; and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.
Further, when the marking module 601 determines the hardware inference engine identifier corresponding to each of the multiple network layers of the network model, it is specifically configured to: acquiring operator information of each network layer of a network model, wherein the operator information of the network layer comprises hardware inference engine information corresponding to each operator in a plurality of operators for realizing the network layer, and the hardware inference engine information corresponding to the operator comprises time consumption information of the operator on a hardware inference engine; and determining the hardware inference engine identifier corresponding to each network layer according to the operator information of each network layer.
Further, when the marking module 601 obtains operator information of each network layer of the network model, it is specifically configured to: and obtaining operator information of each network layer of the network model by operating a plurality of operators corresponding to each network layer of the network model on different hardware reasoning engines.
Further, when the processing module 602 deploys the plurality of sub-networks of the network model in a heterogeneous manner according to the hardware inference engine identifier corresponding to each sub-network of the plurality of sub-networks, specifically: determining an execution order of each of the plurality of sub-networks according to an execution order of a plurality of network layers of the network model; and inserting data conversion nodes for data format conversion among the hardware reasoning engines among different hardware reasoning engines according to the execution sequence of each sub-network in the sub-networks and the corresponding hardware reasoning engine identifications to finish model heterogeneous deployment of the network model.
Fig. 7 is a schematic structural diagram of an electronic device. As shown in fig. 7, the electronic apparatus includes: a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 are communicated with each other through the communication bus 704.
The memory 703 has a computer program stored therein, which when executed by the processor 701 causes the processor 701 to implement the steps of any one of the above-described network model heterogeneous deployment methods.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 702 is used for communication between the electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
On the basis of the foregoing embodiments, an embodiment of the present application further provides a computer-readable storage medium, where a computer program executable by an electronic device is stored in the computer-readable storage medium, and when the program runs on the electronic device, the electronic device is enabled to perform the steps of any one of the foregoing network model heterogeneous deployment methods when executed.
The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs), etc.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A network model heterogeneous deployment method is applied to electronic equipment and comprises the following steps:
determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model;
dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation;
and sending the configuration information of each sub-network in the plurality of sub-networks of the network model to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engine.
2. The method of claim 1, wherein dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers comprises:
traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input;
if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs;
and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
3. The method according to claim 2, wherein before dividing each second network layer having the same hardware inference engine identification as the first network layer under the sub-network to which the first network layer belongs, the method further comprises:
for each second network layer with the same hardware inference engine identifier as the first network layer, if a third network layer with the second network layer as a descendant node exists in the network model, judging whether the third network layer is the descendant node of the first network layer;
and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.
4. The method of claim 1, wherein determining the hardware inference engine identity corresponding to each of the plurality of network layers of the network model comprises:
acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time-consuming information of the target operator on a plurality of hardware inference engines;
and determining a hardware inference engine identifier corresponding to each network layer in the plurality of network layers of the network model according to the information of the target operator of each network layer in the plurality of network layers for realizing the network model.
5. The method of claim 4, wherein obtaining information of a target operator of each of a plurality of network layers implementing the network model comprises:
acquiring information of a plurality of operators for realizing the network layer aiming at each network layer of the network model, wherein the information of each operator comprises time-consuming information of the operator on a plurality of hardware inference engines;
and determining a target operator for realizing the network layer according to the information of the operators for realizing the network layer.
6. The method of claim 1, wherein said deploying said plurality of subnetworks of said network model heterogeneously according to hardware inference engine identifications corresponding to each of said plurality of subnetworks comprises:
determining an execution order of each of the plurality of subnetworks according to an execution order of a plurality of network layers of the network model;
and inserting a data conversion node for converting data formats between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification to finish model heterogeneous deployment of the network model.
7. A network model heterogeneous deployment device, the device comprising:
the marking module is used for determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of the network model;
the processing module is used for dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation;
the processing module is further configured to send the configuration information of each of the plurality of subnetworks of the network model to the corresponding hardware inference engine according to the hardware inference engine identifier corresponding to each of the plurality of subnetworks, so that the plurality of subnetworks of the network model are deployed to the corresponding hardware inference engine in a heterogeneous manner.
8. The device according to claim 7, wherein the processing module, when dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, is specifically configured to:
traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.
9. An electronic device, characterized in that the electronic device comprises at least a processor and a memory, the processor, when executing computer programs or instructions stored in the memory, implementing the method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that it stores a computer program or instructions which, when executed by a processor, implement the method according to any one of claims 1 to 6.
CN202211707960.2A 2022-12-29 2022-12-29 Network model heterogeneous deployment method, device, equipment and storage medium Pending CN115809684A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211707960.2A CN115809684A (en) 2022-12-29 2022-12-29 Network model heterogeneous deployment method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211707960.2A CN115809684A (en) 2022-12-29 2022-12-29 Network model heterogeneous deployment method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115809684A true CN115809684A (en) 2023-03-17

Family

ID=85486987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211707960.2A Pending CN115809684A (en) 2022-12-29 2022-12-29 Network model heterogeneous deployment method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115809684A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629308A (en) * 2023-07-24 2023-08-22 科大讯飞股份有限公司 Neural network model reasoning method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629308A (en) * 2023-07-24 2023-08-22 科大讯飞股份有限公司 Neural network model reasoning method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20200249998A1 (en) Scheduling computation graph heterogeneous computer system
US11126493B2 (en) Methods and systems for autonomous cloud application operations
GB2530516A (en) Intelligent Software Test Augmenting
CN114399019A (en) Neural network compiling method, system, computer device and storage medium
KR20210108319A (en) Method and system for automatic classification based on machine learning
CN115809684A (en) Network model heterogeneous deployment method, device, equipment and storage medium
CN116210010A (en) Method and system for evaluating consistency of engineering system
US20240160977A1 (en) Quantum circuit compilation method, device, compilation framework and quantum operating system
Kusmenko et al. On the engineering of AI-powered systems
JP6888737B2 (en) Learning devices, learning methods, and programs
US20200104123A1 (en) Intelligent agent framework
CN112099882B (en) Service processing method, device and equipment
WO2020169182A1 (en) Method and apparatus for allocating tasks
CN113283575A (en) Processor for reconstructing artificial neural network, operation method thereof and electrical equipment
CN115461718A (en) Memory allocation in neural networks
CN112990461A (en) Method and device for constructing neural network model, computer equipment and storage medium
CN116933841A (en) Operator fusion method and device, electronic equipment and computer readable medium
AU2017101008A4 (en) Systems and Methods for Reducing CPU Time to Compute State Space of Resource Allocation System
KR102376527B1 (en) Method and computer program of processing program for single accelerator using dnn framework on plural accelerators
Saribatur et al. Reactive policies with planning for action languages
CN113705813B (en) Mutation rule supplementing method and device based on genetic algorithm
CN113811897B (en) Inference method and apparatus of neural network model, computer device, and storage medium
CN113821251A (en) Code optimization method, device, equipment and storage medium based on artificial intelligence
JP7424373B2 (en) Analytical equipment, analytical methods and analytical programs
CN110097183B (en) Information processing method and information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination