CN114896950A

CN114896950A - Model conversion method, model conversion device, and storage medium

Info

Publication number: CN114896950A
Application number: CN202210815276.XA
Authority: CN
Inventors: 韩建强; 刘德龙; 陈波扬
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-08-12
Anticipated expiration: 2042-07-11
Also published as: CN114896950B

Abstract

The application discloses a model conversion method, a model conversion device and a storage medium, wherein the model conversion method comprises the following steps: determining an initial data arrangement format of each computing node in a target platform based on the shape of a data block at the input end and the output end of each computing node in a network model, and acquiring first consumed time spent in the determination process; sequentially judging whether the initial data arrangement formats of two adjacent connected computing nodes are consistent, if not, adding a conversion operator between the two adjacent computing nodes, converting the initial data arrangement formats of the two adjacent computing nodes by using the conversion operator, and acquiring the conversion time consumption of the conversion operator; and finally, determining the connection structure of all the computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption. Through the mode, the model conversion method provided by the application can obtain the model with the optimal performance for the target platform by confirming the connection structures of all the computing nodes in the network model.

Description

Model conversion method, model conversion device, and storage medium

Technical Field

The present application relates to the technical field of computer science, and in particular, to a model transformation method, a model transformation device, and a storage medium.

Background

With the continuous development of artificial intelligence technology, the application of deep learning neural networks is more and more extensive. The neural network model application is generally divided into a training stage and a deployment stage, wherein the training stage uses a training frame (such as Caffe and TensorFlow) to learn sample data and update weight data, the deployment stage generally converts an original model into a model which can be identified by a target platform, after the target platform reads information of the model, a data structure which needs to be operated for inference is constructed, and then network inference is carried out and an inference result is obtained for further processing.

In the deployment stage, due to the limitation of the hardware acceleration unit and the CPU core of the architecture, the data arrangement types supported by the model are different, and in the case of supporting multiple arrangement formats, the performance of each data arrangement format is also different.

Disclosure of Invention

The application provides a model conversion method, a model conversion device and a storage medium, which confirm a model with optimal network performance for a target platform by confirming the connection structures of all computing nodes in a network model, so that the overall network performance is optimal when the network model is deployed on the target platform.

In order to solve the technical problem, the present application adopts a technical solution that: the method comprises the steps of obtaining the shape of a data block of an input end and an output end of each computing node in a network model, determining an initial data arrangement format of each computing node in a target platform based on the shape of the data block of the input end and the output end of each computing node, and determining first time consumption of each computing node for obtaining the initial data arrangement format; sequentially judging whether the initial data arrangement formats of two adjacent connected computing nodes are consistent, if not, adding a conversion operator between the two adjacent computing nodes, converting the initial data arrangement formats of the two adjacent computing nodes by using the conversion operator, and acquiring the conversion time consumption of the conversion operator; and finally, determining the connection structures of all the computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption.

Determining an initial data arrangement format of each computing node in a target platform based on the shape of the data block at the input end and the output end of each computing node, wherein the determining comprises: acquiring all data arrangement formats to be tested on a target platform, and testing input data blocks at the input end and output data blocks at the output end of each computing node based on all the data arrangement formats to be tested to obtain the initial data arrangement format of each computing node supported by the target platform.

The method includes the following steps that based on all data arrangement formats to be tested, input data blocks of input ends and output data blocks of output ends of all computing nodes are tested, and initial data arrangement formats of all the computing nodes are obtained, and the method includes the following steps: performing operation by using each calculation operator; in the operation process, data arrangement is carried out on an input data block at the input end of each computing node and an output data block at the output end of each computing node according to each data arrangement format to be tested so as to obtain real-time performance parameters corresponding to each data arrangement format to be tested; and determining the initial data arrangement format of the input data block at the input end and the output data block at the output end of each computing node based on the real-time performance parameters corresponding to the data arrangement formats to be tested.

Wherein the real-time performance parameters comprise at least one of response time and running speed; determining the initial data arrangement format of the input data block at the input end and the output data block at the output end of each computing node based on the real-time performance parameters corresponding to each to-be-tested data arrangement format, which comprises the following steps: and determining the data arrangement format to be tested with the shortest corresponding response time or/and the fastest running speed as the initial data arrangement format.

The output end of the former computing node of the two adjacent computing nodes is connected with the input end of the conversion operator, and the input end of the latter computing node of the two adjacent computing nodes is connected with the output end of the conversion operator.

The method for converting the initial data arrangement formats of two adjacent computing nodes by using the conversion operator and acquiring the conversion time consumption of the conversion operator comprises the following steps: acquiring an initial data arrangement format corresponding to the output end of a previous calculation node of two adjacent connected calculation nodes as a first data arrangement format; acquiring an initial data arrangement format corresponding to the input end of a subsequent computing node of two adjacent connected computing nodes as a second data arrangement format; and converting the data block output by the output end of the previous calculation node from the first data arrangement format to a second data arrangement format by using a conversion operator, inputting the data block to the next calculation node, and acquiring the conversion time consumption of the conversion operator.

Determining the connection structures of all the computing nodes in the conversion model based on the correlation between the first time consumption and the conversion time consumption, wherein the determining comprises the following steps: summing the first consumed time and the conversion consumed time to obtain second consumed time; selecting any one of two adjacent connected computing nodes as a target computing node, and combining the target computing node with a conversion operator to obtain a combined node; determining third time consumption of the initial data arrangement format obtained by the combined node in the target platform; and determining the connection structure of all the computing nodes in the network model based on the second time consumption and the third time consumption.

Determining the connection structures of all the computing nodes in the network model based on the second consumed time and the third consumed time, wherein the determining comprises the following steps: and comparing the second consumed time with the third consumed time, if the third consumed time is less than the second consumed time, replacing the target computing node by using the combined node, and connecting the target computing node with another target computing node to determine the connection structures of all computing nodes in the network model.

Selecting any one of two adjacent connected computing nodes as a target computing node, and combining the target computing node and a conversion operator to obtain a combined node, wherein the method comprises the following steps: taking the input end of the target calculation node as the input end of the combination node, and taking the output end of the conversion operator as the output end of the combination node; or

Taking the input end of the conversion operator as the input end of the combination node, and taking the output end of the target calculation node as the output end of the combination node; or

Any target calculation node of two adjacent connected calculation nodes is combined with two conversion operators, the input end of one conversion operator in the two conversion operators is used as the input end of the combination node, and the output end of the other conversion operator is used as the output end of the combination node.

In order to solve the above technical problem, another technical solution adopted by the present application is: the model conversion device is applied to a target platform and comprises a memory and a processor, wherein the memory is used for storing program data, and the processor is used for executing the program data to realize the conversion method of the network model.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium having stored therein program data for executing the above-mentioned method of converting a network model when executed by a processor.

The beneficial effect of this application is: different from the prior art, the method for converting the network model provided by the application can determine the initial data arrangement format of each computing node in the target platform and determine the first time consumed for obtaining the initial data arrangement format by each computing node based on the shape of the data block of the input end and the output end of each computing node acquired from the network model; sequentially judging whether the initial data arrangement formats of two adjacent connected computing nodes are consistent, if not, adding a conversion operator between the two adjacent computing nodes, converting the initial data arrangement formats of the two adjacent computing nodes by using the conversion operator, and acquiring the conversion time consumption of the conversion operator; and finally, determining the connection structure of all the computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption. By the method, the connection structures of all the computing nodes in the network model are determined, so that the network model with the optimal performance data arrangement format can be realized on the target platform, and the performance of the whole network model can be improved when the target platform applies the network model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram of a first embodiment of a model transformation method provided herein;

FIG. 2 is a schematic flow chart diagram of a second embodiment of a model transformation method provided by the present application;

FIG. 3 is a schematic flow chart diagram illustrating an embodiment of step 23 provided herein;

FIG. 4 is a schematic flow chart diagram of a third embodiment of a model transformation method provided by the present application;

fig. 5 is a schematic structural diagram of a first embodiment of a combining node provided in the present application;

fig. 6 is a schematic structural diagram of a second embodiment of a combining node provided in the present application;

fig. 7 is a schematic structural diagram of a third embodiment of a combining node provided in the present application;

FIG. 8 is a schematic diagram of an embodiment of a model transformation apparatus provided herein;

FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a model transformation method provided in the present application, the method including:

step 11: and acquiring the shape of the data block at the input end and the output end of each computing node in the network model.

Optionally, the network model includes an image detection model, a text recognition model, a video detection model, and the like.

Specifically, the shape of the data block corresponding to the output end and the output end of the computation node is related to the network model.

The data block may be data output by an output terminal of a compute node after the compute node performs an operation. And the data block may be data input as an input to the compute node.

In an embodiment, the network model is an image detection model, the shape of the data block is determined by at least one 4-dimensional tensor, where the 4-dimensional tensor is N (size, number of samples), C (channel ), H (height), and W (width), and it is noted that one tensor includes several dimensions, each dimension has scale information (size), and the scale information in all dimensions forms the shape of each computation node. For example, if a 4-dimensional tensor data of a certain computation node is N =8, C =64, H =112, and W =112, respectively, then the dimension of the computation node is 8 × 64 × 112, that is, the shape of the computation node is 8 × 64 × 112.

In another embodiment, the model is a text recognition model, and the shape of the data block is determined by at least 1 4-dimensional tensor, which is the text content.

In another embodiment, the model is a video detection model, and the shape of the data block is determined by at least one 5-dimensional tensor, where the 5-dimensional tensor is sample (samples), frame (frames), height (height), width (width), and color depth (color _ depth).

Specifically, the computing node has corresponding computing functions, such as convolution operation, pooling, deconvolution, and the like. Of course, the computational functionality of a compute node is specifically determined by its specific distribution in the model.

Step 12: determining an initial data arrangement format of each computing node in the target platform based on the shape of the data block at the input end and the output end of each computing node, and determining the first time consumed for each computing node to obtain the initial data arrangement format.

The data blocks corresponding to the input end and the output end of each computing node have data arrangement formats, the data blocks of the input end and the output end of each computing node correspond to at least one data arrangement format, and the type of the data arrangement format is related to the shape of the data block.

For a compute node, its corresponding data block shape is fixed, but there may be different storage in the target platform. I.e. using different data arrangement formats for storage.

In an embodiment, the network model is an image detection model, a data arrangement format of a data block of each compute node in the image detection model is determined by N (basis size, number of samples), C (channel ), H (height), and W (width), and the data arrangement format at least includes NCHW, NHWC, and CHWN.

For example, if the data arrangement format of the data block of the computing node is NCHW, the following is satisfied: firstly, dividing according to the batch, continuously storing data in each batch, and sequentially storing according to the sequence of the batch; the interior of each batch is divided according to channels, and data in each channel are stored continuously and sequentially according to the channel sequence; dividing the interior of each channel according to rows, continuously storing data in each row, and sequentially storing the data according to the row sequence; the data within each row is stored sequentially in column order.

For example, if the data arrangement format of the data block of the computing node is NHWC, then: firstly, dividing according to the batch, continuously storing data in each batch, and sequentially storing according to the sequence of the batch; the interior of each batch is divided according to rows, and data in each row are continuously stored and are sequentially stored according to the row sequence; the data in each row are divided according to columns, and the data in each column are stored continuously and sequentially according to the column sequence; the data in each column is stored in sequence according to the channel order.

It is noted that each tensor can also be grouped, the data type of a 4-dimensional tensor can be changed from 4-dimensional to 5-dimensional, and so on. For example, C (channels) are grouped, assuming that each group contains 4 channels, if C is not a multiple of 4, C is aligned upward to obtain a multiple of 4, and then 4-dimensional data is regarded as5 dimensional data, the 5 dimensional data being N, C ₁ 、C ₂ H, W. Wherein, C ₂ =4、C ₁ And = div _ up (C, 4), which means that the division result is rounded up. For an arbitrary channel index c, then there is c ₂ = c mod 4、c ₁ =(c -c ₂ )/4. Arranging these 5 dimensions arbitrarily will result in a data arrangement format that groups C, such as NC ₁ HWC ₂ . In other embodiments, other data types of tensors may be grouped, without limitation.

Therefore, the initial data arrangement format of each computation node can be determined in the target platform based on the shape of the data block at the input end and the output end of each computation node, and the first time consumed for each computation node to obtain the initial data arrangement format can be determined.

In some embodiments, the network model may be run on a target platform, and then each computing node performs a related operation to determine a data arrangement format of data blocks corresponding to an input end and an output end in an operation process.

For example, a data arrangement format may be specified for each compute node, and then a correlation operation may be performed to determine whether the specified data arrangement format may be determined to be the initial data arrangement format according to the operation efficiency. If not, appointing the data arrangement format for the computing node again, and determining whether the appointed data arrangement format can be determined as the initial data arrangement format according to the operation efficiency again.

By analogy, each compute node determines the best data arrangement format and determines the best data arrangement format as the initial data arrangement format. Further, the time consumed by each computing node to obtain the initial data arrangement format may be taken as the first elapsed time.

Step 13: and sequentially judging whether the initial data arrangement formats of the two adjacent connected computing nodes are consistent.

By means of the step 12, although the initial data arrangement format is determined by the data block corresponding to a single computing node, the initial data arrangement formats of the computing nodes connected adjacently are inconsistent. Therefore, it is necessary to determine whether the initial data arrangement formats of two adjacent connected computing nodes are consistent, and if not, step 14 is executed.

In some embodiments, compute node A and compute node B are connected adjacently with the output of compute node A connected to the input of compute node B. But the initial data arrangement format of the output data block at the output end of the computation node a is not consistent with the initial arrangement format corresponding to the input end of the computation node B. For example, the initial data arrangement format corresponding to the output end of the computing node a is NCHW, and the initial arrangement format corresponding to the input end of the computing node B is NHWC.

Step 14: and adding a conversion operator between two adjacent computing nodes.

In some embodiments, the conversion operator may be a function that implements different data arrangement format conversions.

In some embodiments, the conversion operator may be a conversion node, that is, a conversion node is added between two adjacent computation nodes, and an input end of the conversion node is connected to an output end of a previous computation node, and an output end of the conversion node is connected to an input end of a next computation node.

Step 15: and converting the initial data arrangement formats of the two adjacent computing nodes by using a conversion operator, and acquiring the conversion time consumption of the conversion operator.

In some embodiments, the conversion operator is a conversion node, the data arrangement format corresponding to the input end of the conversion node is the same as the data arrangement format corresponding to the output end of the previous computation node, and the data arrangement format corresponding to the output end of the conversion node is the same as the data arrangement format corresponding to the input end of the next computation node. After the output end of the previous computing node outputs the data block, the data block passes through the conversion node and is converted into the data arrangement format corresponding to the input end of the next computing node from the data arrangement format corresponding to the output end of the previous computing node. The time consumed by the conversion process may be taken as the conversion elapsed time.

In some embodiments, the conversion operator may be a function that implements different data arrangement format conversions. And converting the data arrangement format of the data block from the data arrangement format corresponding to the output end of the previous calculation node to the data arrangement format corresponding to the input end of the next calculation node by using a conversion operator. The time consumed by the conversion process may be taken as the conversion time.

In one embodiment, the conversion operator may be a function that implements the conversion of different data arrangement formats. And converting the data arrangement format corresponding to the input end of the next calculation node into the data arrangement format corresponding to the output end of the previous calculation node by using a conversion operator. The time consumed by the conversion process may be taken as the conversion elapsed time.

Step 16: and determining the connection structure of all the computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption.

In some embodiments, if the conversion time consumption is less than the first time consumption, indicating that the optimal initial data arrangement format is determined when the initial data arrangement format of each computing node is determined in the target platform.

In some embodiments, if the conversion time consumption is greater than the first time consumption, it is stated that the optimal initial data arrangement format has not been actually determined when the initial data arrangement format of each computing node is determined in the target platform. Thus, the above embodiments may be repeated to determine the optimal initial data arrangement format.

Based on this, the connection structure of all the compute nodes in the network model can be determined. For example, when the conversion operator is a conversion node, after the conversion node is added, the adjacent computing nodes are changed, and therefore, the connection structure at this time needs to be used as the final connection structure of the adjacent computing nodes in the network model.

Different from the prior art, the model conversion method provided by the application judges whether a conversion operator needs to be inserted between two adjacent computing nodes by determining the initial data arrangement format of the data block at the input end and the output end of each computing node and determining the time consumption of the process, if so, the conversion operator is inserted to convert the initial data arrangement formats of the two adjacent computing nodes and obtain the time consumption for conversion, and finally, the connection structures of all computing nodes of the whole network are determined according to the correlation of the two time consumptions, so that a network model with the optimal data arrangement format can be realized on a target platform, and the performance of the whole network model can be improved when the network model is applied to the target platform.

Referring to fig. 2, fig. 2 is a schematic flow chart of a second embodiment of a model transformation method provided in the present application, the method including:

step 21: and acquiring the shape of the data block at the input end and the output end of each computing node in the network model.

Each block is a tensor (tensor) of several dimensions, one for each vector and two for each matrix. Each dimension has its scale information (size), and the scale information in all dimensions constitutes the shape information of the data block. For example, the 2D convolution inputs and outputs are all four-dimensional tensors, and the four dimensions are N (batch size), C (channel), H (height), and W (width), respectively, and if N =8, C =64, H =112, and W =112, the dimension of the data block is 8 × 64 × 112.

Step 22: and acquiring all data to be tested arrangement formats on the target platform.

Alternatively, the target platform may be a mobile phone, a tablet computer, a smart wearable device (e.g., smart glasses, smart watch, bluetooth headset), and the like.

For a data block with a definite shape, there can be different storage modes in the memory, which is called the layout format.

In one embodiment, the network model is an image detection model, the shape of the data block of the computation node is determined by at least one 4-dimensional tensor, where the 4-dimensional tensor is N, C, H, W, and the data layout format to be tested at least includes NHWC, CHWN, and NCHW.

Step 23: and testing the input data block at the input end and the output data block at the output end of each computing node based on all the data arrangement formats to be tested to obtain the initial data arrangement format of each computing node, and determining the first time consumption for obtaining the initial data arrangement format by each computing node.

In some embodiments, a traversal mode may be adopted to make the data arrangement formats of the input data block at the input end and the output data block at the output end of each compute node correspond to the data arrangement formats to be tested one by one, and the operation is performed. If the target platform supports three data arrangement formats to be tested, all three data arrangement formats to be tested need to be tested by the input data block of the input end and the output data block of the output end of each computing node.

In the process, the corresponding optimal data arrangement format to be tested of each computing node is determined, and the optimal data arrangement format to be tested is used as an initial data arrangement format.

It can be understood that different computing nodes may adopt different data arrangement formats to be tested in the same test due to the existence of different computing nodes. Therefore, the times needing to be tested can be determined according to the number of the computing nodes and the type of the data arrangement format to be tested. And carrying out time statistics on each test, and taking the counted time as the first consumed time. That is, there are a plurality of first consumed times, the smallest first consumed time among the plurality of first consumed times may be determined. And taking the data arrangement format to be tested of the computing node corresponding to the minimum consumed time as an initial data arrangement format.

During the test process, forward reasoning operation may be performed between the computing nodes, and the operation process of course also includes reading and storing the corresponding data blocks.

In some embodiments, referring to fig. 3, step 23 may be the following flow:

step 31: and performing operation by using each calculation operator.

Wherein, each calculation operator can be used for carrying out forward reasoning operation.

Step 32: and in the operation process, performing data arrangement on the input data block at the input end and the output data block at the output end of each computing node according to the data arrangement formats to be tested to obtain the real-time performance parameters corresponding to the data arrangement formats to be tested.

If the computing node a performs data arrangement on the input data block at the input end according to the data arrangement format a to be tested in the operation process, and the input data block is input into the computing node a as input data to participate in the operation, and then the computing node a outputs a data block at the output end, and the data block can perform data arrangement according to the data arrangement format b to be tested. Therefore, real-time performance parameters of the data blocks with different to-be-tested data arrangement formats in the operation process can be determined.

For example, the real-time performance parameter includes at least one of response time and operating speed. The response time may be a response time of the target platform processor to implement the forward inference operation after the data arrangement is performed by using the data arrangement format to be tested. The running speed may refer to the time from the input of the data block to the output of the computing node after the computation of the computing node. The running speed can also refer to the running speed of hardware, which is output after the data block is input to the computing node and is computed by the computing node. Such as processor frequency.

Step 33: and determining the initial data arrangement format of the input data block at the input end and the output data block at the output end of each computing node based on the real-time performance parameters corresponding to the data arrangement formats to be tested.

In some embodiments, the data arrangement format to be tested having the shortest corresponding response time may be determined as the initial data arrangement format.

In some embodiments, the data arrangement format to be tested, which corresponds to the fastest running speed, may be determined as the initial data arrangement format.

In some embodiments, the data arrangement format to be tested with the shortest response time and the fastest running speed may be determined as the initial data arrangement format.

Step 24: and sequentially judging whether the initial data arrangement formats of the two adjacent connected computing nodes are consistent.

If not, go to step 25. If the two adjacent connected computing nodes are consistent, the two adjacent connected computing nodes do not need to be processed.

Step 25: and adding a conversion operator between two adjacent computing nodes.

The conversion operator is a function for realizing conversion among different data arrangement formats.

Step 26: and converting the initial data arrangement formats of the two adjacent computing nodes by using a conversion operator, and acquiring the conversion time consumption of the conversion operator.

In some embodiments, the conversion process is: acquiring a data arrangement format of a data block at the output end of a previous computing node in two adjacent computing nodes as a first data arrangement format; acquiring a data arrangement format of a data block at the input end of the subsequent computing node as a second data arrangement format; and converting the data arrangement format of the data block at the output end of the previous computing node from the first data arrangement format to a second data arrangement format by using a conversion operator, and inputting the data block into the next computing node in the second data arrangement format.

Step 27: and determining the connection structure of all the computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption.

Steps 24 to 27 may have the same or similar technical solutions as those in the above embodiments, and are not described herein again.

Different from the prior art, when the initial data arrangement format of each computing node is determined, the determined initial data arrangement format is the optimal data arrangement format of each computing node, and then whether a conversion operator needs to be inserted between two adjacent computing nodes is judged to achieve the consistency of the data arrangement formats of the input end and the output end between the two adjacent computing nodes, so that the optimal network model on the target platform is determined.

Referring to fig. 4, fig. 4 is a schematic flow chart of a third embodiment of a model transformation method provided in the present application, the method including:

step 41: and acquiring the shape of the data block at the input end and the output end of each computing node in the network model.

Step 42: determining an initial data arrangement format of each computing node in the target platform based on the shape of the data block at the input end and the output end of each computing node, and determining the first time consumed for each computing node to obtain the initial data arrangement format.

Step 43: and sequentially judging whether the initial data arrangement formats of the two adjacent connected computing nodes are consistent.

Step 44: and adding a conversion operator between two adjacent computing nodes.

Step 45: and converting the initial data arrangement formats of the two adjacent computing nodes by using a conversion operator, and acquiring the conversion time consumption of the conversion operator.

Steps 41 to 45 may have the same or similar technical solutions as those in the above embodiments, and are not described herein again.

Step 46: and summing the first time consumption and the conversion time consumption to obtain a second time consumption.

Step 47: and selecting any one of the two adjacent connected computing nodes as a target computing node, and combining the target computing node and the conversion operator to obtain a combined node.

In some embodiments, the conversion operator is provided at the output of the target node, i.e. the input of the conversion operator is connected to the output of the target node. Thus, the input of the target computation node is taken as the input of the combination node, and the output of the conversion operator is taken as the output of the combination node. As shown in fig. 5, the combination node 50 includes a computation node 501 and a conversion operator 502, wherein the computation node 501 and the conversion operator 502 are connected, and an output terminal of the computation node 501 is connected to an input terminal of the conversion operator 502.

In some embodiments, the conversion operator is provided at the input of the target node, i.e. the output of the conversion operator is connected to the input of the target node. Thus, the input end of the conversion operator is used as the input end of the combination node, and the output end of the target calculation node is used as the output end of the combination node. As shown in fig. 6, the combination node 60 includes a calculation node 601 and a conversion operator 602, wherein the calculation node 601 is connected to the conversion operator 602, and an input terminal of the calculation node 601 is connected to an output terminal of the conversion operator 602.

In some embodiments, any one of the target computing nodes of the two adjacently connected computing nodes is combined with the two conversion operators, the input end of one of the two conversion operators is used as the input end of the combination node, and the output end of the other conversion operator is used as the output end of the combination node. As shown in fig. 7, the combination node 70 includes a computation node 701, a conversion operator 702 and a conversion operator 703, wherein the computation node 701 is connected to two conversion operators (702, 703), and there is no connection relationship between the conversion operator 702 and the conversion operator 703, that is, an input end of the computation node 701 is connected to an output end of the conversion operator 702, and an output end of the computation node 701 is connected to an input end of the conversion operator 703.

And 48: and determining the third time for the combined node to obtain the initial data arrangement format in the target platform.

In some embodiments, the data arrangement format currently corresponding to the combination node may be directly used as the initial data arrangement format, and then the combination node is used to perform the operation in the target platform to obtain the third time consumption of the initial data arrangement format.

In some embodiments, compute nodes having the same data arrangement format as the combined node are found in the network model. And if the initial data arrangement format of the computing node is determined, corresponding to the first consumed time, and taking the first consumed time as a third consumed time for obtaining the initial data arrangement format in the target platform by the combined node.

Step 49: and determining the connection structure of all the computing nodes in the network model based on the second time consumption and the third time consumption.

In some embodiments, the second elapsed time and the third elapsed time may be compared. And if the third consumed time is less than the second consumed time, replacing the target computing node by using the combined node, and connecting the target computing node with another target computing node to determine the connection structures of all computing nodes in the network model.

And when the third time consumption is less than the second time consumption, the time consumption of the combined node is lower, and the target computing node can be replaced by the combined node directly. That is, the conversion operator is directly absorbed at the target computing node to form a combined node, which is equivalent to replacing the target computing node with the combined node.

It will be appreciated that there may be multiple composite nodes in the process described above, and each composite node may determine whether to replace the target computing node in the manner described above.

If the third time consumption is greater than the second time consumption, which indicates that the time consumption of the combination node is higher, the mode of the combination node is not needed, and the mode of the conversion operator can be continuously adopted to convert the data arrangement format, so that the time consumption is lower.

Specifically, step 49 is followed by (not shown):

s1: and calculating the time difference between the second consumed time and the third consumed time to serve as the combined benefit.

It will be appreciated that only the compute nodes where the data arrangement format conversion occurs have a consolidated revenue.

S2: and determining the maximum value of each merging profit based on the merging profit of each computing node subjected to data arrangement format conversion, merging the conversion operators into the corresponding computing nodes, and continuing the steps for other computing nodes based on the merged computing nodes to obtain the maximum value of the sum of the merging profits of each computing node.

The combined revenue sum of all the combined computing nodes can be used to prove that the performance of the new model with the completed model transformation on the target platform is optimal.

S3: and deploying the network model of the optimal data arrangement format corresponding to the maximum value of the combined profit sum to a target platform.

It can be understood that the network model of the optimal data arrangement format can be deployed on a target platform to perform a deployment phase in the neural network deep learning.

The deployment stage is to convert the network model into a model format which can be identified by the target platform, construct a data structure which needs to be operated for reasoning after the target platform acquires the information of the network model, and perform network reasoning to obtain a reasoning result.

S4: and inserting an inference operator into the target platform.

Specifically, the inference operator is a function capable of completing a specific inference operation function, and generally has a plurality of computing nodes as inputs, and after the computation is completed, the plurality of computing nodes are obtained as outputs, and a plurality of parameters which may affect the computation process exist in the computation process.

Specifically, some of the input computing nodes are constants that are obtained during the training phase in performing deep neural network learning, and are generally referred to as weight data (weightblob). In the neural network, each inference operator corresponds to each neural network layer, and one inference operator completes inference calculation of the corresponding layer.

Alternatively, the inference operator may be a convolution operator, a pooling operator, or other function with a specific inference function, which is not limited herein.

S5: and performing reverse reasoning on the target platform by using a reasoning operator to obtain a model with an optimal data arrangement format.

It can be understood that, when the target platform determines the model of the optimal data arrangement format, the model of the optimal data arrangement format may be inferred by using an inference operator in the deployment stage.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a model transformation device provided in the present application, where the model transformation device 80 includes a memory 801 and a processor 802, the memory 801 is used for storing program data, and the processor 802 is used for executing the program data to implement the method for transforming a network model according to any of the above embodiments, and details are not repeated here.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application, where the computer-readable storage medium 90 stores program data 901, and when the program data 901 is executed by a processor, the program data is used to implement a method for converting a model network model according to any one of the above embodiments, and details are not repeated here.

The processor referred to in this application may be referred to as a Central Processing Unit (CPU), may be an integrated circuit chip, or may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component.

The storage medium used in the present application includes various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or an optical disk.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for converting a network model, wherein the network model is deployed on a target platform, the method comprising:

acquiring the shape of a data block at the input end and the output end of each computing node in the network model;

determining an initial data arrangement format of each computing node in the target platform based on the shape of the data block at the input end and the output end of each computing node, and determining a first consumed time for each computing node to obtain the initial data arrangement format;

sequentially judging whether the initial data arrangement formats of two adjacent connected computing nodes are consistent;

if the two adjacent calculation nodes are not consistent, adding a conversion operator between the two adjacent calculation nodes;

converting the initial data arrangement formats of the two adjacent computing nodes by using the conversion operator, and acquiring the conversion time consumption of the conversion operator;

determining the connection structure of all computing nodes in the network model based on the correlation between the first time consumption and the conversion time consumption.

2. The method of claim 1,

determining, in the target platform, an initial data arrangement format for each of the compute nodes based on a shape of a data block at an input and an output of each of the compute nodes, including:

acquiring all data to be tested arrangement formats on the target platform;

and testing the input data block of the input end and the output data block of the output end of each computing node based on all the to-be-tested data arrangement formats to obtain the initial data arrangement format of each computing node.

3. The method of claim 2,

the step of testing the input data block of the input end and the output data block of the output end of each computing node based on all the to-be-tested data arrangement formats to obtain the initial data arrangement format of each computing node comprises:

performing operation by using each calculation operator;

in the operation process, carrying out data arrangement on the input data block at the input end and the output data block at the output end of each computing node according to the data arrangement formats to be tested so as to obtain real-time performance parameters corresponding to the data arrangement formats to be tested;

and determining the initial data arrangement format of the input data block at the input end and the output data block at the output end of each computing node based on the real-time performance parameters corresponding to the to-be-tested data arrangement formats.

4. The method of claim 3,

the real-time performance parameters comprise at least one of response time and running speed;

the determining of the initial data arrangement format of the input data block at the input end and the output data block at the output end of each computing node based on the real-time performance parameters corresponding to each to-be-tested data arrangement format includes:

and determining the data arrangement format to be tested with the shortest corresponding response time or/and the fastest running speed as the initial data arrangement format.

5. The method according to claim 1, wherein the output terminal of the previous one of the two adjacent computing nodes is connected to the input terminal of the conversion operator, and the input terminal of the next one of the two adjacent computing nodes is connected to the output terminal of the conversion operator;

the converting the initial data arrangement formats of the two adjacent computing nodes by using the conversion operator, and acquiring the conversion time consumption of the conversion operator, includes:

acquiring an initial data arrangement format corresponding to the output end of a previous computing node of two adjacent connected computing nodes as a first data arrangement format; and

acquiring an initial data arrangement format corresponding to the input end of a subsequent computing node of two adjacent connected computing nodes as a second data arrangement format;

and converting the data block output by the output end of the previous computing node from a first data arrangement format to a second data arrangement format by using the conversion operator, inputting the data block to the next computing node, and acquiring the conversion time consumption of the conversion operator.

6. The method of claim 1,

determining a connection structure of all computing nodes in the network model based on the correlation between the first elapsed time and the conversion elapsed time, including:

summing the first consumed time and the conversion consumed time to obtain second consumed time;

selecting any one of two adjacent connected computing nodes as a target computing node, and combining the target computing node and the conversion operator to obtain a combined node;

determining third time consumption of the initial data arrangement format obtained by the combined node in the target platform;

determining connection structures of all computing nodes in the network model based on the second elapsed time and the third elapsed time.

7. The method of claim 6,

determining connection structures of all computing nodes in the network model based on the second elapsed time and the third elapsed time includes:

comparing the second elapsed time with the third elapsed time;

and if the third consumed time is less than the second consumed time, replacing the target computing node by using the combined node, and connecting the target computing node with another target computing node to determine the connection structures of all computing nodes in the network model.

8. The method of claim 6,

selecting any one of two adjacent connected computing nodes as a target computing node, and combining the target computing node and the conversion operator to obtain a combined node, including:

taking the input end of the target computing node as the input end of the combined node, and taking the output end of the conversion operator as the output end of the combined node; or

Combining any target calculation node of two adjacent connected calculation nodes with the two conversion operators, taking the input end of one of the two conversion operators as the input end of the combination node, and taking the output end of the other conversion operator as the output end of the combination node.

9. A model transformation device, characterized in that the model transformation device comprises a memory for storing program data and a processor for executing the program data to implement the transformation method of the network model according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein program data for, when executed by a processor, performing the method of transforming a network model according to any one of claims 1-8.