CN115809684A

CN115809684A - Network model heterogeneous deployment method, device, equipment and storage medium

Info

Publication number: CN115809684A
Application number: CN202211707960.2A
Authority: CN
Inventors: 林贤早; 王康; 陈波扬; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-03-17

Abstract

The application provides a heterogeneous network model deployment method, device, equipment and storage medium, which are used for deploying a network model to different hardware inference engines in a heterogeneous mode. The method comprises the following steps: determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationship of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each network layer, wherein each sub-network comprises at least one network layer, the hardware inference engine identifiers corresponding to at least one network layer in each sub-network are the same, and when the sub-networks comprise at least two network layers, at least two network layers have a direct or indirect input and output connection relationship; and sending the configuration information of each sub-network to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each sub-network, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engines.

Description

Network model heterogeneous deployment method, device, equipment and storage medium

Technical Field

The application relates to the field of deployment of deep learning network models, in particular to a network model heterogeneous deployment method, device, equipment and storage medium.

Background

With the rise of deep web learning technology, deep web models have been applied to various aspects of daily life, such as face detection and positioning, face comparison, voice recognition, fingerprint recognition, iris recognition, and the like. After each network layer of the deep learning network model is constructed, each network layer needs to be converted into a corresponding operator, and the hardware inference engine processes the operator corresponding to each network layer to complete the inference process of the deep learning network model.

However, the underlying implementation architectures of different hardware inference engines are different, the equipped operator libraries are different, and the like, so that each hardware inference engine has different performance when processing different operators and even operators of the same type, and the problem of how to accurately deploy the deep learning network model to different hardware inference engines to execute operations exists.

Disclosure of Invention

The application provides a heterogeneous network model deployment method, device, equipment and storage medium, wherein the heterogeneous network model can be deployed on different hardware reasoning engines by dividing the network model into a plurality of sub-networks.

In a first aspect, the present application provides a network model heterogeneous deployment method, including:

determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model;

dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation;

and sending the configuration information of each sub-network in the plurality of sub-networks of the network model to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engine.

Further, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers includes:

traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as input;

if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs;

and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.

Further, before dividing each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks to which the first network layer belongs, the method further includes:

for each second network layer with the same hardware inference engine identification as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer;

and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.

Further, the determining the hardware inference engine identifier corresponding to each of the plurality of network layers of the network model includes:

acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time consumption information of the target operator on a plurality of hardware reasoning engines;

and determining a hardware inference engine identifier corresponding to each network layer in the plurality of network layers of the network model according to the information of the target operator of each network layer in the plurality of network layers for realizing the network model.

Further, the obtaining information of a target operator of each of a plurality of network layers implementing the network model includes:

acquiring information of a plurality of operators for realizing the network layer aiming at each network layer of the network model, wherein the information of each operator comprises time-consuming information of the operator on a plurality of hardware inference engines;

and determining a target operator for realizing the network layer according to the information of the operators for realizing the network layer.

Further, the heterogeneously deploying the plurality of subnetworks of the network model according to the hardware inference engine identification corresponding to each subnetwork in the plurality of subnetworks comprises:

determining an execution order of each of the plurality of subnetworks according to an execution order of a plurality of network layers of the network model;

and inserting a data conversion node for converting data formats between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification to finish the model heterogeneous deployment of the network model.

In a second aspect, the present application provides a network model heterogeneous deployment device, including:

the marking module is used for determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of the network model;

a processing module, configured to divide the multiple network layers into multiple subnetworks according to connection relationships of the multiple network layers in the network model and hardware inference engine identifiers corresponding to each of the multiple network layers, where each subnetwork includes at least one network layer, and the hardware inference engine identifiers corresponding to the at least one network layer in each subnetwork are the same, and when the subnetwork includes at least two network layers, the at least two network layers have a direct or indirect input-output connection relationship therebetween;

the processing module is further configured to send the configuration information of each of the plurality of subnetworks of the network model to the corresponding hardware inference engine according to the hardware inference engine identifier corresponding to each of the plurality of subnetworks, so that the plurality of subnetworks of the network model are deployed to the corresponding hardware inference engine in a heterogeneous manner.

Further, when the processing module divides the plurality of network layers into a plurality of subnetworks according to the connection relationship between the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, the processing module is specifically configured to: traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.

Further, before the processing module divides each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks belonging to the first network layer, the processing module is further configured to: for each second network layer with the same hardware inference engine identifier as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer; and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.

Further, when the marking module determines the hardware inference engine identifier corresponding to each of the plurality of network layers of the network model, the marking module is specifically configured to: acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time-consuming information of the target operator on a plurality of hardware inference engines; and determining a hardware inference engine identifier corresponding to each network layer in the plurality of network layers of the network model according to the information of the target operator of each network layer in the plurality of network layers for realizing the network model.

Further, when the marking module obtains information of a target operator of each network layer of the plurality of network layers implementing the network model, the marking module is specifically configured to: acquiring information of a plurality of operators for realizing the network layer aiming at each network layer of the network model, wherein the information of each operator comprises time-consuming information of the operator on a plurality of hardware inference engines; and determining a target operator for realizing the network layer according to the information of the operators for realizing the network layer.

Further, when the processing module deploys the plurality of subnetworks of the network model in a heterogeneous manner according to the hardware inference engine identifier corresponding to each subnetwork in the plurality of subnetworks, the processing module is specifically configured to: determining an execution order of each of the plurality of sub-networks according to an execution order of a plurality of network layers of the network model; and inserting a data conversion node for data format conversion between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification, thereby completing model heterogeneous deployment of the network model.

In a third aspect, the present application provides an electronic device, which at least comprises a processor and a memory, and when the processor executes a computer program or instructions stored in the memory, the method of the first aspect is implemented.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program or instructions which, when executed by a processor, implement the method of the first aspect.

In the application, a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model is determined; according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, the plurality of network layers are divided into a plurality of sub-networks, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, when the sub-network comprises at least two network layers, direct or indirect input and output connection relation exists between the at least two network layers, and according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, the configuration information of each sub-network in the plurality of sub-networks of the network model is sent to the corresponding hardware inference engine, so that the plurality of sub-networks of the network model are heterogeneously deployed to the corresponding hardware inference engine, the plurality of network layers of the network model can be respectively deployed to different hardware inference engines according to the divided sub-networks, and the efficiency of the network model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1A is a flowchart of a network model heterogeneous deployment method provided in the present application.

FIG. 1B is a diagram of one possible hardware inference engine architecture sub-network operator provided in the present application.

Fig. 2 is a schematic diagram of a possible network model divided into a plurality of sub-networks provided in the present application.

Fig. 3 is a schematic diagram of steps of one possible way of dividing sub-networks provided in the present application.

Fig. 4 is a schematic diagram of a possible node partitioning manner provided in the present application.

Fig. 5 is a flowchart of a network model heterogeneous deployment provided in the present application.

Fig. 6 is a schematic structural diagram of a network model heterogeneous deployment device provided in the present application.

Fig. 7 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or textual entities and are not necessarily intended to define a particular order or sequence unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

With the rise of deep web learning technology, deep web learning web models have been applied to various aspects of daily life, such as face detection and positioning, face comparison, voice recognition, fingerprint recognition, iris recognition, and the like. After each network layer of the deep learning network model is constructed, each network layer needs to be converted into a corresponding operator, and the hardware inference engine processes the operator corresponding to each network layer to complete the inference process of the deep learning network model.

However, the underlying implementation architectures of different hardware inference engines are different, the provided operator libraries are different, and the like, so that each hardware inference engine has different performance expressions when processing different operators and even operators of the same type, and a problem exists in how to put the operator corresponding to each network layer of the deep learning network model on different hardware inference engines to execute operations.

Based on this, the application provides a heterogeneous network model deployment method, device, equipment and storage medium, and the network layer included in the network model is divided into a plurality of sub-networks, and each sub-network is deployed on different hardware inference engines, so that the network layer of the network model is deployed on different hardware inference engines to operate, and the inference efficiency of the network model is improved.

Fig. 1A is a flowchart of a network model heterogeneous deployment method, which may be applied to an electronic device, and the method includes:

s101: and determining a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of the network model.

In this step, the Network model may be a deep learning Network model such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a generated countermeasure Network (GAN), or a deep learning Network model that is improved by adding, deleting, or modifying a Network layer. The target operator of each network layer may be an operator selected from an operator library of multiple hardware inference engines, which is the shortest in time consumption for implementing the network, or an operator in an operator library of a preset certain hardware inference engine. In addition, the hardware Inference engine includes, for example, an Open Visual Inference Neural Network Optimization (OpenVINO) hardware Inference engine developed by intel (intel) and usable in a Central Processing Unit (CPU), a TensorRT hardware Inference engine developed by NVIDIA (NVIDIA) and usable in a Graphics Processing Unit (GPU), and the like. Wherein, one Operator (OP) may be a computing unit in a deep learning algorithm, and in the network model, the Operator of each network layer may correspond to the computing logic of the network layer in the network model. For example, the convolution algorithm in the convolution network layer and the weight summation process in the fully-connected layer can be considered as an operator. For the same network layer, there may be multiple types of operators to implement the computation logic of the network layer, for example, the activation operators may include Sigmoid, tanh, reLU, leak ReLU Softmax, and the like.

Alternatively, the electronic device may obtain operator information of each network layer of the network model by running each network layer of the network model under different hardware inference engines and different inference acceleration methods. The operator information of one network layer may include information about time consumption, memory configuration, and the like of multiple operators in the network layer under different hardware inference engines (that is, hardware inference engine information corresponding to the operators). The electronic equipment can determine an operator for realizing optimal performance of each network layer and a hardware inference engine identifier corresponding to each network layer according to the operator information of each network layer. For example, the network layer a is an active layer, the corresponding activation operators for realizing the network layer a include Sigmoid, tanh, reLU, leak ReLU, softmax, and the like, and the time consumption, memory arrangement, and other information of the Sigmoid, tanh, reLU, leak ReLU, softmax, and other activation operators on OpenVINO or TensorRT or other hardware inference engines are obtained by a simulation reasoning test mode, so that the target activation operator with the optimal performance for the network layer a and the hardware inference engine identifier corresponding to the network layer a can be determined.

It should be noted that different hardware inference engines have their own associated operator libraries, and thus the operator libraries of any two hardware inference engines may contain different operators. In addition, even if the two hardware inference engines support the same operator, the time consumption, memory arrangement and other information of the final operator may be different due to different hardware corresponding to the two hardware inference engines. For example, a CPU that tends to process small amounts of data quickly may process activation operators faster, etc., a GPU that tends to process large amounts of data may process convolution operators faster, etc.

S102: dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation.

To better illustrate how the network layers of the network model are divided, fig. 2 is taken as an example, and fig. 2 is a schematic diagram of a possible network model divided into a plurality of sub-networks provided in the present application. As shown in fig. 2, the upper network model has 9 network layers, a white network layer corresponds to the same hardware inference engine identifier a, a gray network layer corresponds to the same hardware inference engine identifier B, network layer 1, network layer 4, network layer 6, network layer 8, and network layer 9 can run on OpenVINO (or hardware inference engines such as TensorRT that support other network layer corresponding operators), network layer 2, network layer 3, network layer 5, and network layer 7 can run on TensorRT (or hardware inference engines such as OpenVINO that support other network layer corresponding operators), because network layer 2 and network layer 3 corresponding to hardware inference engine identifier B exist between network layer 1 and network layer 4, and only one path exists from network layer 1 to network layer 2 to network layer 3 to network layer 4, network layer 1 and network layer 4 cannot be divided into the same sub-network, and similarly, network layer 6, network layer 8, and network layer 9. Any two network layers of the network layer 2, the network layer 3, the network layer 5 and the network layer 7 corresponding to the same hardware inference engine identifier B have direct or indirect input and output relations, and a path between any two network layers does not have a network layer corresponding to the hardware inference engine identifier a (or other hardware inference engine identifiers except the hardware inference engine identifier B). Therefore, the upper network model in fig. 2 is divided into 6 sub-networks, which are respectively sub-network a: network layer 1, sub-network b: network layer 2, network layer 3, network layer 5, and network layer 7, subnetwork c: network layer 4, subnetwork d: network layer 6, sub-network e: network layer 8, sub-network f: network layer 9, wherein the input, output of each sub-network may be determined according to the output, output of at least one network layer comprised by the sub-network. Since the sub-networks a, c, d, e and f include only one network layer, the input and output thereof correspond to the input and output of the network layer included in the sub-networks, respectively, and the sub-network b includes one or more network layers, the input and output of the network layer 2 and the input of the network layer 3 can be used as the input and output of the sub-network b, that is, the input and output of the network layer 3 can be used as the two inputs of the sub-network b, the output of the network layer 3, the output of the network layer 5 and the output of the network layer 7 can be used as the three outputs of the sub-networks, and the output of the network layer 2 can be omitted, and the calculation result of the network layer 2 is directly introduced into the network layer 3, the calculation result of the network layer 3 is introduced into the network layer 5, and the calculation result of the network layer 5 is introduced into the network layer 7, so as to increase the efficiency of the sub-networks after the network model is heterogeneous.

It should be understood that any two network layers under the same sub-network may have direct or indirect input-output relationship therebetween: under a sub-network, any one network layer can be directly connected with another sub-network under the sub-network, and can also be connected with another sub-network through at least one network layer under the sub-network. For example, the lower network model in fig. 2 can also be divided into 6 sub-networks: sub-network A: network layer 1, sub-network B: network layer 2 and network layer 3, subnetwork C: network layer 4, subnetwork D: network layer 5 and network layer 6, sub-network E: network layer 7 and network layer 8, sub-network F: a network layer 9. In contrast to the above network model, the presence of a network layer 5, which does not belong to sub-network B or E, between

network layer

2 and 3 of sub-network B and network layer 7 and 8 of sub-network E results in sub-network B and sub-network E not being able to be divided under the same sub-network.

S103: sending configuration information of each of the plurality of sub-networks of the network model to the corresponding hardware inference engine according to the hardware inference engine identification corresponding to each of the plurality of sub-networks, so that the plurality of sub-networks of the network model are deployed to the corresponding hardware inference engine in a heterogeneous manner.

The sub-network comprises at least one network layer, and the configuration information of the sub-network is the number of the network layers contained in the sub-network and the configuration information of input, output, quantization parameters and the like of each network layer in the sub-network.

In this step, the electronic device may send the configuration information and the quantization weight of each sub-network in the plurality of sub-networks of the network model to the hardware inference engine corresponding to each sub-network, and each hardware inference engine may invoke a corresponding operator from its own operator library according to the configuration information and the quantization weight of the sub-network corresponding to each sub-network, so as to complete the operation process of each sub-network.

After the sub-network including at least two network layers is deployed on the hardware inference engine, the input and output processes between the network layers in the sub-network can be omitted according to the number of the network layers included in the sub-network and the input and output of each network layer, and the calculation result of the previous network layer is directly imported into the corresponding network layer. In other words, the sub-network can be regarded as a sub-network operator containing the computing logic of a plurality of operators, and similar to the operators for realizing other single network layers, the sub-network operator can directly obtain the output of the sub-network according to the input of the sub-network, and the intermediate processes are not output one by one. Fig. 1B is a schematic diagram of a possible hardware inference engine for constructing sub-network operators according to this embodiment. As shown in fig. 1B, the hardware inference engine initializes and constructs a sub-network operator according to the number of network layers included in the sub-network layer and the input and output of each network layer in the sub-network, and then applies for a memory space for storing configuration information and quantization weights of each network layer in the sub-network operator according to the number of network layers included in the sub-network layer, and copies the configuration information and quantization weights of each network layer in the sub-network to the memory space according to the execution sequence of each network layer in the sub-network as the configuration information and quantization weights of the sub-network operator.

Alternatively, the order of execution of each of the plurality of sub-networks can be determined in accordance with the order of execution of the plurality of network layers of the network model. And because the input and output memory types of different hardware inference engines are different, a data conversion node for data format conversion between the hardware inference engines can be inserted between the two adjacent sub-networks according to the execution sequence of each sub-network and the corresponding hardware inference engine identification, so that after the sub-network operator corresponding to the previous sub-network is executed, the data which is subjected to data conversion and has correct memory type can be input into the next sub-network, and model heterogeneous deployment of the network model is completed.

In the application, a hardware inference engine identifier corresponding to each network layer in a plurality of network layers of a network model is determined; according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, the plurality of network layers are divided into a plurality of sub-networks, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to the at least one network layer in each sub-network is the same, when the sub-network comprises at least two network layers, direct or indirect input and output connection relation is formed between the at least two network layers, according to the hardware inference engine identification corresponding to each sub-network in the plurality of sub-networks, the configuration information of each sub-network in the plurality of sub-networks of the network model is sent to the corresponding hardware inference engine, heterogeneous deployment of the plurality of sub-networks of the network model is completed, the plurality of network layers of the network model can be respectively deployed to different hardware inference engines according to the divided sub-networks, and the inference efficiency of the network model is improved.

In one possible implementation, dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers comprises:

traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input;

if a second network layer with the same hardware inference engine identifier as the first network layer exists, dividing each second network layer with the same hardware inference engine identifier as the first network layer into sub-networks to which the first network layer belongs;

Further, before dividing each second network layer having the same hardware inference engine identifier as the first network layer under the sub-network to which the first network layer belongs, the method may further include:

Optionally, each network layer of the network model may be regarded as a node, one obtained network after division is regarded as a sub-graph, each node of the network model is traversed through the execution sequence of each network layer of the network model, and the nodes corresponding to the same hardware inference engine identifier are divided into one sub-graph, so as to obtain a plurality of sub-graphs (i.e., sub-networks) of the network model. To better illustrate how the network model is divided into a plurality of sub-networks, reference is now made to fig. 3 as an example, and fig. 3 is a schematic step diagram of a possible sub-network dividing manner provided in the present application. As shown in fig. 3, first, the currently traversed node is denoted as src, all successor node next of the node src is found (i.e., the node with the output of the node src as input, that is, the node src is used as a first network layer, and all successor node next is used as all second network layers corresponding to the first network layer), whether each successor node next and the node src correspond to the same hardware inference engine identifier is determined one by one, and if the currently determined successor node next and the node src correspond to different hardware inference engine identifiers, the next successor node next is determined.

Then, if the current successor node next corresponds to the same hardware inference engine identifier as the node src, finding out all predecessor nodes a of the successor node next except for the node src (where it can be considered that at least one output of all predecessor nodes corresponds to one input of the successor node next), and judging one by one whether a path exists from the node src to each predecessor node a (i.e. whether the predecessor node is a descendant node of the node src), if at least one predecessor node B exists in all predecessor nodes a and is a descendant node of the node src (i.e. a path exists from the node src to the predecessor node B), judging whether all nodes (including the predecessor node B) on the path from the node src to the predecessor node B correspond to the same hardware inference engine identifier as the node src. If the same hardware reasoning identification is corresponding, all nodes from the node src to the path of the precursor node B and the successor node next are divided into a sub-graph, and if the same hardware reasoning identification is not corresponding, the successor node next and the node src cannot be divided into a sub-graph. And taking the nodes C to D as the execution sequence of the network model, wherein if the nodes C are directly connected with the nodes D or indirectly connected through a plurality of nodes, the nodes D are descendants of the nodes C.

To better illustrate the way of obtaining the subgraph by the partitioning in the above manner, fig. 4 is taken as an example, and fig. 4 is a schematic diagram of a possible node partitioning manner. As shown in fig. 4, it is assumed that a circle is represented as a node corresponding to the identifier a (i.e., a hardware inference engine identifier), and a rectangle is represented as a node corresponding to the identifier b (i.e., another hardware inference engine identifier different from the identifier a). The node currently traversed is src, followed by next1, next2, and next3 nodes. The precursor node of next1 is prev1, and the precursor node of next2 is prev2. For src and next1, as prev1 is also a precursor node of src at the same time, there is only one directly connected path from src to next1 in the graph, and src and next1 correspond to the same identifier a, then src and next1 can be divided into the same sub-graph. For src and next, it can be known that src and next2 correspond to the same identifier a, but because a predecessor node prev2 of next2 corresponds to identifier b (i.e. nodes on a path from src to next2 are not both identifiers a), src and next2 cannot be partitioned under the same sub-graph. For a path which corresponds to the same identifier a and has only one unique direct connection between src and next3, src and next3 can be divided into the same sub-graph. Finally, src is partitioned under the same subgraph as next1 and next3.

Fig. 5 is a flowchart of another network model heterogeneous deployment provided by the present application. As shown in fig. 5, the process of network model heterogeneous includes:

s501: each network layer of the network model is labeled.

Operator information of each network layer of the network model can be obtained by operating each network layer of the network model under different hardware reasoning engines and different reasoning acceleration methods. And determining an operator realizing the optimal performance of the network layer and marking a corresponding hardware inference engine identifier for the network layer according to the time consumption information, memory arrangement and other information of a plurality of operators in the operator information on different hardware inference engines.

S502: and dividing to obtain a plurality of sub-networks according to the connection relation of each network layer and the mark of the hardware reasoning engine.

The subgraph partitioning of the currently traversed node can be obtained by determining whether all successors (second network layers) to the currently traversed node (first network layer) can be partitioned into the same subgraph (sub-network), using, for example, the steps shown in fig. 3. According to the execution sequence of the network model, all the nodes are traversed, and therefore a plurality of nodes (a plurality of network layers) contained in the network model can be divided into a plurality of sub-graphs (sub-networks).

S503: each subnetwork is deployed to a corresponding hardware inference engine.

And respectively sending each divided sub-network and the network configuration information corresponding to each sub-network to the corresponding hardware inference engine, and calling the corresponding operator in the operator library of the hardware inference engine to finish the operation of each sub-network according to the network configuration information and the quantitative weight of each sub-network by the hardware inference engine corresponding to each sub-network. The network configuration information corresponding to each self-network comprises the number of network layers contained in the sub-network, parameters, operation instructions, quantization parameters, input, output and quantization weight addresses of all the network layers contained in the sub-network, and information of quantization parameter arrangement, alignment and the like of the sub-network.

S504: and (4) inserting conversion operators among different hardware reasoning engines to finish heterogeneous deployment of the network model.

Because the input and output memory types of different sub-networks may be different, the input and output data of different sub-networks are connected in series by judging the memory types and inserting data conversion nodes (i.e. conversion operators) to finally generate a complete heterogeneous deployment model.

Based on the foregoing network model heterogeneous deployment method, the present application provides a network model heterogeneous deployment device, and fig. 6 is a schematic structural diagram of a network model heterogeneous deployment device provided in an embodiment of the present application, where the apparatus includes:

a marking module 601, configured to determine a hardware inference engine identifier corresponding to each of multiple network layers of a network model;

a processing module 602, configured to divide the multiple network layers into multiple sub-networks according to the connection relationships of the multiple network layers in the network model and the hardware inference engine identifiers corresponding to each network layer in the multiple network layers, where each sub-network includes at least one network layer, and the hardware inference engine identifiers corresponding to at least one network layer included in each sub-network are the same, and when the sub-network includes at least two network layers, there is a direct or indirect input/output connection relationship between the at least two network layers;

the processing module 602 is further configured to send, according to the hardware inference engine identifier corresponding to each of the multiple subnetworks, the configuration information of each of the multiple subnetworks of the network model to the corresponding hardware inference engine, so that heterogeneous deployment of the multiple subnetworks of the network model to the corresponding hardware inference engine is achieved.

Further, when the processing module 602 divides the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, the processing module is specifically configured to: traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.

Further, before the processing module 602 divides each second network layer having the same hardware inference engine identifier as the first network layer into sub-networks belonging to the first network layer, the processing module is further configured to: for each second network layer with the same hardware inference engine identification as the first network layer, if a third network layer with the second network layer as a descendant node exists, judging whether the third network layer is the descendant node of the first network layer; and under the condition that the third network layer is a descendant node of the first network layer and is the same as the hardware inference engine identifier of the first network layer, dividing the third network into sub-networks to which the first network layer belongs.

Further, when the marking module 601 determines the hardware inference engine identifier corresponding to each of the multiple network layers of the network model, it is specifically configured to: acquiring operator information of each network layer of a network model, wherein the operator information of the network layer comprises hardware inference engine information corresponding to each operator in a plurality of operators for realizing the network layer, and the hardware inference engine information corresponding to the operator comprises time consumption information of the operator on a hardware inference engine; and determining the hardware inference engine identifier corresponding to each network layer according to the operator information of each network layer.

Further, when the marking module 601 obtains operator information of each network layer of the network model, it is specifically configured to: and obtaining operator information of each network layer of the network model by operating a plurality of operators corresponding to each network layer of the network model on different hardware reasoning engines.

Further, when the processing module 602 deploys the plurality of sub-networks of the network model in a heterogeneous manner according to the hardware inference engine identifier corresponding to each sub-network of the plurality of sub-networks, specifically: determining an execution order of each of the plurality of sub-networks according to an execution order of a plurality of network layers of the network model; and inserting data conversion nodes for data format conversion among the hardware reasoning engines among different hardware reasoning engines according to the execution sequence of each sub-network in the sub-networks and the corresponding hardware reasoning engine identifications to finish model heterogeneous deployment of the network model.

Fig. 7 is a schematic structural diagram of an electronic device. As shown in fig. 7, the electronic apparatus includes: a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 are communicated with each other through the communication bus 704.

The memory 703 has a computer program stored therein, which when executed by the processor 701 causes the processor 701 to implement the steps of any one of the above-described network model heterogeneous deployment methods.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface 702 is used for communication between the electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

On the basis of the foregoing embodiments, an embodiment of the present application further provides a computer-readable storage medium, where a computer program executable by an electronic device is stored in the computer-readable storage medium, and when the program runs on the electronic device, the electronic device is enabled to perform the steps of any one of the foregoing network model heterogeneous deployment methods when executed.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs), etc.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A network model heterogeneous deployment method is applied to electronic equipment and comprises the following steps:

2. The method of claim 1, wherein dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers comprises:

3. The method according to claim 2, wherein before dividing each second network layer having the same hardware inference engine identification as the first network layer under the sub-network to which the first network layer belongs, the method further comprises:

for each second network layer with the same hardware inference engine identifier as the first network layer, if a third network layer with the second network layer as a descendant node exists in the network model, judging whether the third network layer is the descendant node of the first network layer;

4. The method of claim 1, wherein determining the hardware inference engine identity corresponding to each of the plurality of network layers of the network model comprises:

acquiring information of a target operator of each network layer in a plurality of network layers for realizing the network model, wherein the information of the target operator comprises time-consuming information of the target operator on a plurality of hardware inference engines;

5. The method of claim 4, wherein obtaining information of a target operator of each of a plurality of network layers implementing the network model comprises:

6. The method of claim 1, wherein said deploying said plurality of subnetworks of said network model heterogeneously according to hardware inference engine identifications corresponding to each of said plurality of subnetworks comprises:

and inserting a data conversion node for converting data formats between the hardware inference engines between the two adjacent executed sub-networks according to the execution sequence of each sub-network in the plurality of sub-networks and the corresponding hardware inference engine identification to finish model heterogeneous deployment of the network model.

7. A network model heterogeneous deployment device, the device comprising:

the processing module is used for dividing the plurality of network layers into a plurality of sub-networks according to the connection relation of the plurality of network layers in the network model and the hardware inference engine identification corresponding to each network layer in the plurality of network layers, wherein each sub-network comprises at least one network layer, the hardware inference engine identification corresponding to at least one network layer in each sub-network is the same, and when the sub-network comprises at least two network layers, the at least two network layers have direct or indirect input and output connection relation;

8. The device according to claim 7, wherein the processing module, when dividing the plurality of network layers into a plurality of sub-networks according to the connection relationships of the plurality of network layers in the network model and the hardware inference engine identifier corresponding to each of the plurality of network layers, is specifically configured to:

traversing the plurality of network layers according to the execution sequence of the plurality of network layers of the network model, taking the currently traversed network layer as a first network layer, and determining at least one second network layer taking the output of the first network layer as the input; if a second network layer with the same hardware inference engine identification as the first network layer exists, dividing each second network layer with the same hardware inference engine identification as the first network layer under a sub-network to which the first network layer belongs; and if a second network layer with the same hardware inference engine identification as the first network layer does not exist, dividing the first network layer into a sub-network.

9. An electronic device, characterized in that the electronic device comprises at least a processor and a memory, the processor, when executing computer programs or instructions stored in the memory, implementing the method according to any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that it stores a computer program or instructions which, when executed by a processor, implement the method according to any one of claims 1 to 6.