CN112561044A

CN112561044A - Neural network model acceleration method and device, server and storage medium

Info

Publication number: CN112561044A
Application number: CN201910914935.3A
Authority: CN
Inventors: 蒋焘
Original assignee: Xian Wingtech Electronic Technology Co Ltd
Current assignee: Xian Wingtech Electronic Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2021-03-26
Anticipated expiration: 2039-09-26
Also published as: CN112561044B

Abstract

The invention is suitable for the technical field of deep learning and provides a neural network model acceleration method and device, a server and a storage medium, wherein the method comprises the following steps: determining nodes of a third-party engine which are incompatible with the neural network model; partitioning the neural network model based on the incompatible nodes to obtain at least two sub-models; converting sub-models that do not include the incompatible node into a format supported by the third party engine; running the format-converted submodel through the third-party engine. In the invention, the third-party engine is used for acceleration, and when incompatibility occurs, the third-party engine accelerates the compatible part in a segmented execution mode, so that the overall running speed of the neural network model can be improved.

Description

Neural network model acceleration method and device, server and storage medium

Technical Field

The invention relates to the technical field of deep learning, in particular to a neural network model acceleration method and device, a server and a storage medium.

Background

A neural network model is a computational model that mimics the structure and function of a biological neural network (the central nervous system of an animal, particularly the brain) and is used to estimate or approximate a function. With the continuous and deep research of neural network algorithms, the accuracy rate of the neural network algorithms exceeds that of the traditional machine learning algorithm in many application occasions. Neural network algorithms gradually replace traditional algorithms and are beginning to be deployed on terminal devices.

Although the accuracy of the neural network algorithm is high, the memory bandwidth and the overall power consumption are consumed greatly due to the huge calculation amount. The terminal equipment is often embedded equipment, the embedded equipment has high requirements on efficiency, and the traditional CPU is difficult to independently undertake the task; in order to obtain higher execution efficiency, a third-party acceleration engine is needed to perform format conversion on the cured neural network model so as to execute the neural network model on a GPU and the like, but because the third-party engine and the model are incompatible, acceleration cannot be truly realized, which affects the execution efficiency of the neural network algorithm.

Therefore, a new technical solution is needed to solve the above technical problems.

Disclosure of Invention

In view of this, embodiments of the present invention provide a neural network model acceleration method and apparatus, a server, and a storage medium, which solve the problem in the prior art that the operation efficiency of a neural network model is not high.

A first aspect of an embodiment of the present invention provides a neural network model acceleration method, where the method includes:

determining nodes of a third-party engine which are incompatible with the neural network model;

partitioning the neural network model based on the incompatible nodes to obtain at least two sub-models;

converting sub-models that do not include the incompatible node into a format supported by the third party engine;

running the format-converted submodel through the third-party engine.

A second aspect of an embodiment of the present invention provides a neural network model acceleration apparatus, including:

the determining module is used for determining nodes of the third-party engine which are incompatible with the neural network model;

a partitioning module, configured to partition the neural network model based on the incompatible node to obtain at least two models;

a conversion module for converting a model that does not include the incompatible node to a format supported by the third party engine;

and the control module is used for controlling the third-party engine to operate the model subjected to the format conversion.

A third aspect of embodiments of the present invention further provides a server, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to the first aspect is implemented.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method mentioned in the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: when the third-party engine is used for acceleration, and incompatibility occurs, the third-party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be increased, and the operation efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a method for accelerating a neural network model based on a third-party engine according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an apparatus for accelerating a neural network model based on a third-party engine according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical solution of the present invention, the following is illustrated by specific examples.

Example one

Fig. 1 is a schematic flowchart of a method for accelerating a neural network model according to an embodiment of the present invention, where the method may include the following steps:

and step S1, determining nodes of the third-party engine which are not compatible with the neural network model.

Specifically, in an embedded system, in order to improve execution efficiency, a third-party engine is generally required to increase execution speed, and since a neural network model is already solidified, if the neural network model is executed by the third-party engine, the neural network model needs to be converted first, and in the conversion process, an incompatible problem between the third-party engine and the solidified neural network model may occur, so that an incompatible node existing between the third-party engine and the neural network model needs to be determined first, that is, a node where an error occurs when the neural network model is converted by the third-party engine is an incompatible node.

In a preferred aspect of this embodiment, the third-party Engine may be a Neural network acceleration Engine, such as a Snapdragon Neural Processing Engine (SNPE).

In a further preferable scheme of this embodiment, the step S1 specifically includes:

first, the neural network model is transformed by the third party engine.

Specifically, the format of the execution file of the neural network model is first converted into a format compatible with the third-party engine, and then the neural network model subjected to format conversion is executed by the third-party engine, for example, a conversion function of the third-party engine is called to perform format conversion on the neural network model.

Then, in the conversion process, if the third-party engine is abnormally converted, it is indicated that the acceleration engine and the neural network model begin to be incompatible, and a currently converted node is obtained, where the node is an incompatible node. It should be noted that, in the conversion process, there may be a plurality of incompatible nodes, if an incompatible node has appeared at present, the conversion of the neural network model may be started from the incompatible node to obtain another incompatible node, if in the conversion process, a conversion anomaly appears, a corresponding incompatible node is obtained, and the foregoing steps are repeated until the conversion is completed, so as to obtain a plurality of incompatible nodes.

And step S2, segmenting the neural network model based on incompatible nodes to obtain at least two sub models.

Specifically, the original neural network model (i.e. the neural network model without format conversion) is divided into at least two sub-models by taking incompatible nodes as boundaries, wherein some sub-models include the incompatible nodes, some sub-models do not include the incompatible nodes, and the sub-models without the incompatible nodes are operated by a third-party engine.

Step S3, converting the sub-model not including the incompatible node into a format supported by the third party engine.

In particular, since the submodel that does not include incompatible nodes is run by the third-party engine, it is necessary to convert that submodel into a format supported by the third-party engine, for example, into a format that can be run on a GPU or DSP, the format being supported by the third-party engine, such as a DLC format supported by a highpass SNPE engine.

In step S4, the format-converted submodel is run by the third-party engine.

Specifically, after the neural network model is segmented, the sub-model not including the incompatible node is run by the third-party engine, and the sub-model including the incompatible node is still run in the original manner, for example, the sub-model is run on the CPU.

In this embodiment, when the incompatibility occurs, the third-party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be increased.

In a preferable embodiment of this embodiment, step S1 may further include:

and judging whether the third-party engine is compatible with the neural network model, if not, turning to the step S1, and if so, directly running the neural network model on the GPU or the DSP by using the third-party engine.

In a preferable scheme of this embodiment, the step S1 specifically includes:

the transformation of the neural network model is started by a third party engine.

And when the third-party engine is abnormally converted, acquiring a currently converted node, wherein the currently converted node is an incompatible node.

And starting to convert the neural network model by a third-party engine with a next node of the incompatible node, and if the incompatible node exists, starting to continue to perform the conversion step by the next node of the current incompatible node until the conversion is finished, and finally obtaining at least two incompatible nodes.

Specifically, if an incompatible node is to be obtained, the neural network model is converted by a third-party engine, in the conversion process, if conversion is abnormal, the current node is indicated to be incompatible, the current converted node is taken as the incompatible node, the neural network is converted by the third-party engine starting from the next node of the incompatible node, if an incompatible node exists, the conversion step is continued starting from the next node of the current incompatible node until the conversion is finished, and at least two incompatible nodes are finally obtained.

In a preferred aspect of the present embodiment, there may be a case where: when the conversion of a third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, starting to convert the neural network model through the third-party engine by using a next node of the incompatible node, and obtaining an incompatible node when the conversion is finished if the conversion is not abnormal, namely, the third-party engine only has one conversion abnormality when converting the neural network model, and only obtains one incompatible node at the moment; and if the incompatible node is close to the end point of the neural network model (or is positioned at the end point), the end point of the neural network model and the incompatible node belong to the same sub-model. It should be noted that the starting point refers to the first node of the neural network model, and the ending point refers to the last node of the neural network model.

In a further preferred embodiment of this embodiment, when any of the incompatible nodes does not belong to the start point or the end point of the neural network model, that is, the incompatible node does not appear at the start point or the end point of the neural network model, the step S2 specifically includes:

dividing the neural network model into K +1 sub-models based on the K incompatible nodes, wherein if the data length between the Mth incompatible node and the M-1 th incompatible node is larger than or equal to the data length between the Mth incompatible node and the M +1 th incompatible node, the Mth incompatible node and the M +1 th incompatible node belong to the same model, and the Mth incompatible node and the M-1 th incompatible node belong to different models; if the data length between the Mth incompatible node and the M-1 th incompatible node is smaller than the data length between the Mth incompatible node and the M +1 th incompatible node, the Mth incompatible node and the M-1 th incompatible node belong to the same model, and the Mth incompatible node and the M +1 th incompatible node belong to different models, wherein K is a natural number larger than or equal to 2, M is a natural number larger than or equal to 1, and M is smaller than or equal to K.

For example: the neural network model comprises 8 nodes (respectively a 1 st node (i.e. a starting point), a 2 nd node, …, and an 8 th node (i.e. an end point)), if the number of the incompatible nodes is 3, the model can be divided into 4 models respectively comprising an A sub-model, a B sub-model, a C sub-model, and a D sub-model, and the 2 nd node, the 5 th node, and the 7 th node are respectively a first incompatible node, a second incompatible node, and a third incompatible node, and at this time, the data length between the first incompatible node and the second incompatible node, the data length between the second incompatible node and the third incompatible node, that is, the data length between the 2 nd node and the 5 th node, and the data length between the 5 th node and the 7 th node are respectively obtained, and when the data length between the 2 nd node and the 5 th node is greater than the data length between the 5 th node and the 7 th node, putting a data part of the neural network model from the 1 st node to the 2 nd node and including the 2 nd node into an A sub-model; putting data which is behind the 2 nd node and is in front of the 4 th node and does not comprise the 2 nd node and the 4 th node into a B sub-model; putting data of the 4 th node and the 7 th node including the 4 th node and the 7 th node into a C sub-model; putting data from the 7 th node to the 8 th node into a D sub-model; at the moment, the sub-model B and the sub-model D do not comprise incompatible nodes, and the sub-model B and the sub-model D are operated by a third-party engine; the A-submodel and the C-submodel include incompatible nodes and are executed on a CPU.

In another preferable scheme of this embodiment, the step S1 specifically includes:

When the third-party engine is abnormally converted, acquiring a currently converted node, wherein the currently converted node is an incompatible node, and acquiring the ratio of the data length from the starting point of the neural network model to the incompatible node to the data length corresponding to the neural network model, namely dividing the data length from the starting point to the incompatible node by the data length corresponding to the neural network model to obtain the ratio.

If the ratio is greater than or equal to the preset value, the step S2 is turned to, the neural network model is directly segmented, and then the neural network model is segmented based on the incompatible node to obtain two sub-models, wherein the starting point of the neural network model and the incompatible node are in different sub-models, and since the ratio of the data length from the incompatible node to the starting point to the data length of the neural network model is greater than the preset value, it indicates that the neural network model can be segmented based on the incompatible node, the part from the starting point to a node before the incompatible node is used as one sub-model, and the part from the incompatible node to the end point of the neural network model is used as the other sub-model. The preset value can be set according to actual conditions, for example, 1 to 9, and preferably, the ratio is 9, that is, the effect is better when the data length between the starting point and the incompatible node accounts for 90% of the total data length of the neural network model.

If the ratio is smaller than a preset value, taking the incompatible node as a first incompatible node, and converting the neural network model from a node next to the first incompatible node through a third-party engine;

when the conversion of a third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as a second incompatible node, acquiring a ratio between a data length from the first incompatible node to the second incompatible node and a data length corresponding to the neural network model, starting to convert the neural network model through the third-party engine by using a next node of the second incompatible node if the acquired ratio is smaller than a preset value, and continuously acquiring a next incompatible node until the ratio between the data length from an Nth incompatible node to an (N-1) th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and stopping acquiring the next incompatible node, wherein N is larger than or equal to 2.

For example, the neural network model has 15 nodes, when a conversion abnormality occurs at a 3 rd node, the 3 rd node is a first incompatible node, and the data length from the 1 st node to the 3 rd node and the data length corresponding to the neural network model are smaller than a preset value, the neural network model is converted at the 4 th node by a third-party engine, if a conversion abnormality occurs at a 6 th node, the 6 th node is a second incompatible node, and the data length from the 3 rd node to the 6 th node and the data length corresponding to the neural network model are smaller than the preset value, the neural network model is converted by the third-party engine continuously starting with the 7 th node until the data length between two adjacent nodes and the data length corresponding to the neural network model are greater than or equal to the preset value, the next incompatible node is stopped being obtained, if the subsequent conversion is abnormal at the 9 th node and the 12 th node, and the data length between the 9 th incompatible node and the 12 th incompatible node and the data length corresponding to the neural network model are larger than or equal to a preset value, stopping the continuous conversion and not obtaining the operation of the next incompatible node;

if the ratio of the data length between the nth incompatible node and the nth-1 incompatible node to the data length of the neural network model is greater than or equal to the preset value, the step S2 specifically includes: and segmenting the neural network model based on the (N-1) th incompatible node and the (N) th incompatible node to obtain three submodels, wherein the starting point of the neural network model to the (N-1) th incompatible node belong to one submodel, the next node of the (N-1) th incompatible node to the last node of the (N) th incompatible node belong to the other submodel, and the (N) th incompatible node to the end point of the neural network model belong to the other submodel of the three submodels. For example: and (3) dividing the neural network model into three sections (each section represents one sub-model), wherein the first section is from the starting point to the (N-1) th incompatible node, the second section is from the part after the (N-1) th incompatible node to the part before the (N) th incompatible node as the second section, and the (N) th incompatible node to the end point as the third section. And the second segment is a segment that the third party engine can be compatible with.

In a variation of this embodiment, if there is no difference until the conversion is completed that the ratio between the data length between the nth incompatible node and the nth-1 incompatible node and the data length of the neural network model is greater than or equal to the preset value, the data length between each two adjacent incompatible nodes is obtained, and a maximum value of the at least two obtained data lengths is obtained.

Specifically, if the ratio between the data length from the current incompatible node to the previous incompatible node and the data length of the neural network model is smaller than the preset value when one incompatible node is obtained, the attempt to obtain the next incompatible node is required to be continued, the ratio between the data length from the current incompatible node to the previous incompatible node and the data length of the neural network model is obtained, and the obtained ratio is compared with the preset value until all the nodes are tried. In the whole trial process, the situation that the ratio of the data length between the nth incompatible node and the N-1 th incompatible node to the data length of the neural network model is larger than or equal to the preset value may not occur, the data length between each two adjacent incompatible nodes is obtained at this time, the maximum value of the at least two obtained data lengths is obtained, and since the part between the incompatible nodes is compatible with the third-party engine, the data length between each two adjacent incompatible nodes needs to be counted, the at least two data lengths are obtained, and the maximum value of the data lengths is obtained.

In a further preferable embodiment of the modification, the step S2 specifically includes:

and segmenting the neural network model based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding submodels, wherein one submodel does not comprise two target incompatible nodes, and the other two submodels respectively comprise one target incompatible node of the two target incompatible nodes. .

For example: the incompatible nodes corresponding to the longest data length are b1 and b2, the neural network model is divided into three segments (i.e. three submodels), the part from the starting point of the neural network model to b1 is used as a first segment, the part from b1 to b2 is used as a second segment, and the part from b2 to the end point of the neural network model is used as a third segment. Since the longest part is executed by the third-party engine, the processing efficiency can be effectively improved.

In a preferred embodiment of the present invention, because there are a plurality of incompatible nodes, the neural network model may be divided into a plurality of sub models based on the plurality of incompatible nodes, and some of the plurality of sub models are compatible with the third-party engine, and the converted and compatible sub models may be executed by the third-party engine, while other incompatible models are still executed in the original manner.

After the submodel is divided into at least two submodels, format conversion is performed on the submodel compatible with the third-party engine through the third-party engine, so that the submodel with the converted format is obtained.

In a preferred embodiment of this embodiment, for example, a high-performance neural network acceleration engine SNPE, a face recognition model facenet.pb is taken as an example, in order to improve the execution efficiency of the facenet.pb model, the facenet.pb model is executed by the SNPE, when an execution error is found, a node where the error occurs is obtained, the node is taken as an incompatible node, the neural network model is divided into two models, i.e., a P1 model (a first model) and a P2 model (a second model), the node a is located in a P2 model, then the P1 is converted into a DLC format, the converted P1 is run on the GPU or the DSP by the SNPE, and the P2 is run on the CPU in the original manner, so that the overall execution efficiency can be improved.

Specifically, the neural network model SNPE is converted into a DLC format supported by the SNPE engine by using a high-throughput neural network acceleration engine, and the process of determining incompatible nodes may be as follows: the incompatible problem is found in the process of converting a neural network model (a face recognition model pb) into a DLC format through a third-party engine, if an ERROR prompt such as 'ERROR-Conversion failed: Element Wise decoder multicast method' is popped in the Conversion process, the current node Conversion is abnormal, and an incompatible node appears, all node pairs of the neural network model can be printed at the moment, and the ERROR is judged according to experience. Or invoke a convert command to try each node one by one to see which node will have a conversion failure. Replace the value of the-out _ node parameter of the following command one by one, where embeddings are the last node (i.e., end point) of the neural network model, for example: sn-extensoflow-to-dlc-graph fabric. pb-input _ dim input 1,160, 3-out _ node embeddings-dlc fabric. dlc-allow _ unconfined _ nodes.

Further, the conversion process of converting facenet.pb to facenet.dll is as follows: inputting a conversion command: "snp-extensflow-to-dlc-graph fabric. pb-input _ dim input 1,160, 3, -out _ node extensions- -dlc fabric. dlc- -all _ unconjugated _ nodes",

the acquired execution time is 2019-04-2209: 56:57.578906:

in the process of executing the above conversion command, the error record is specifically as follows: Itensorflow/core/display/CPU _ feature _ guard.cc:141] outer CPU support instructions that is not said Tensorflow organization not said completed to use AVX2 AVX512F FMA 2019-04-2209: 56:57.602994: Itensorflow/core/display/profile _ issues/CPU _ issues.cc: 94] CPU Frequency:3500000000Hz

2019-04-22 09:56:57.603530:I tensorflow/compiler/xla/service/service.cc:161]XLA service 0x54467a0 executing computations on platform Host.Devices:

2019-04-22 09:56:57.603532:I tensorflow/compiler/xla/service/service.cc:168]StreamExecutor device(0):<undefined>,<undefined>

At 2019-04-2210: 00:39,144-106, the ERROR is formally reported, and then an ERROR prompt of "ERROR-converted failed: Element Wise resolver implementation branched method" is popped up, which indicates that the execution cannot be carried out at present.

In the execution process, if the prompt is popped up, the conversion exception is indicated, after analysis (all node pairs of the neural network model can be printed out, and according to the judgment of experience, or each node is tried one by the conversion command, the value of the-out _ node parameter of the conversion command is replaced one by one, the embeddings are the last node of the neural network model), if after the analysis, the SNPE does not support the Inclusion Resnet V1/logs/Reshape node in the facenet. pb, at this time, the facenet. pb is divided into two pb files by taking the Inclusion Resnet V1/logs/Reshape as the boundary, the facenet _ part1.pb and the facenet _ part2.pb, and a splitting instruction "/home/tensegroflow-master/base-bin/tension/translation/transform _ graphics _ map/graphics _ map is input

--in_graph＝/home/models/facenet_part1.pb

--out_graph＝/home/models/facenet_part2.pb

--inputs＝InceptionResnetV1/Logits/Reshape

--outputs＝embeddings--transforms＝'add_default_attributes

remove _ nodes (op ═ Identity, op ═ checknumbers) fold _ batch _ nodes fold _ old _ batch _ nodes strip _ unused _ nodes source _ by _ execution _ order' ″, and then execute the instruction.

After the division, the facenet _ part1.pb is converted into facenet _ part1.dlc by the command SNPE-tensoflow-to-dlc, so that the facenet _ part1.dlc can be rapidly operated on the GPU or DSP by the SNPE, and the remaining facenet _ part2.pb can be operated on the CPU by a conventional method.

In this embodiment, when the third-party engine is used for acceleration, and when incompatibility occurs, the third-party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be increased, and the operation efficiency can be improved.

If a plurality of incompatible nodes exist, selecting a data length which is not less than a preset value or selecting a part of neural network model between two incompatible nodes corresponding to the maximum data length and excluding the two incompatible nodes to operate through a third-party engine, and further improving the operation efficiency.

Example two

Based on the first embodiment, as shown in fig. 2, a schematic structural diagram of a neural network model acceleration apparatus according to a second embodiment of the present invention is provided, where the apparatus is configured to perform the method steps of the first embodiment, and for convenience of description, only the parts related to the first embodiment of the present invention are shown. The device at least comprises:

and the determining module 1 is used for determining the nodes of the third-party engine which are incompatible with the neural network model.

And the segmentation module 2 is used for segmenting the neural network model based on the incompatible nodes to obtain at least two sub-models.

And the conversion module 3 is used for converting the sub-model which does not comprise the incompatible node into a format supported by the third-party engine.

And the control module 4 is used for controlling the third-party engine to operate the sub-model for format conversion.

In a preferred embodiment, the determining module 1 is specifically configured to:

controlling a third-party engine to start converting the neural network model;

when the conversion of the third-party engine is abnormal, acquiring a currently converted node, and taking the currently converted node as an incompatible node; starting to convert the neural network model by the third-party engine with the next node of the incompatible nodes, and if the incompatible nodes exist, starting to continue the conversion step with the next node of the current incompatible nodes until the conversion is finished to obtain at least two incompatible nodes; or

When the third-party engine is abnormal in conversion, obtaining a currently converted node, taking the currently converted node as an incompatible node, starting to convert the neural network model through the third-party engine by using a node next to the incompatible node, if no conversion abnormality occurs, obtaining an incompatible node when the conversion is finished, and after determining the incompatible node, feeding a result back to the segmentation module 2.

In a preferred embodiment, if there is one incompatible node, the partitioning module 2 is specifically configured to:

dividing the neural network model based on the incompatible node to obtain two submodels, wherein if the incompatible node is close to the starting point of the neural network model, the starting point of the neural network model and the incompatible node belong to the same submodel; and if the incompatible node is close to the end point of the neural network model, the end point of the neural network model and the incompatible node belong to the same sub-model.

If there are at least two incompatible nodes and any incompatible node does not belong to the start point or the end point of the neural network model, the segmentation module 2 is specifically configured to:

dividing the neural network model into K +1 submodels based on K incompatible nodes, wherein if the data length between the Mth incompatible node and the M-1 st incompatible node is larger than or equal to the data length between the Mth incompatible node and the M +1 st incompatible node, the Mth incompatible node and the M +1 st incompatible node belong to the same submodel, and the Mth incompatible node and the M-1 st incompatible node belong to different submodels; if the data length between the Mth incompatible node and the M-1 th incompatible node is smaller than the data length between the Mth incompatible node and the M +1 th incompatible node, the Mth incompatible node and the M-1 th incompatible node belong to the same submodel, and the Mth incompatible node and the M +1 th incompatible node belong to different submodels, wherein K is a natural number greater than or equal to 2, M is a natural number greater than or equal to 1, and M is less than or equal to K.

In a preferred embodiment, the determining module 1 is specifically configured to: controlling a third-party engine to start converting the neural network model;

when the conversion of the third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, and acquiring the ratio of the data length from the starting point of the neural network model to the incompatible node to the data length corresponding to the neural network model; if the ratio is greater than or equal to the preset value, the value is fed back to the segmentation module 2.

The segmentation module 2 is specifically configured to: and partitioning the neural network model based on the incompatible node to obtain two submodels, wherein the starting point of the neural network model and the incompatible node are in different submodels.

In another preferred embodiment, if the ratio is smaller than the preset value, the determining module 1 is further configured to: taking the incompatible node as a first incompatible node, and controlling a third-party engine to convert the neural network model from a node next to the first incompatible node;

when the conversion of a third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as a second incompatible node, acquiring a ratio between the data length from the first incompatible node to the second incompatible node and the data length corresponding to the neural network model, if the ratio is smaller than the preset value, starting to control the third-party engine to convert the neural network model by using a next node of the second incompatible node, and continuing to acquire a next incompatible node until the ratio between the data length from an Nth incompatible node to an (N-1) th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and stopping acquiring the next incompatible node, wherein N is larger than or equal to 2;

at this time, the dividing module 2 is specifically configured to: and segmenting the neural network model based on the N-1 th incompatible node and the Nth incompatible node to obtain three submodels, wherein the starting point of the neural network model to the N-1 th incompatible node belong to one submodel, the next node of the N-1 th incompatible node to the last node of the Nth incompatible node belong to another submodel, and the end point of the Nth incompatible node to the neural network model belongs to another submodel of the three models.

In a preferred embodiment, if there is no ratio between the data length from the nth incompatible node to the N-1 st incompatible node and the data length of the neural network model, which is greater than or equal to the preset value, until the conversion is finished, the determining module 1 is further configured to: acquiring the data length between each two adjacent incompatible nodes, acquiring the maximum value of at least two acquired data lengths, and feeding back the maximum value to the segmentation module 2;

at this time, the dividing module 2 is specifically configured to: and segmenting the neural network model based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding submodels, wherein one submodel does not comprise the two target incompatible nodes, and the other two submodels respectively comprise one target incompatible node of the two target incompatible nodes.

In a preferred embodiment, the conversion module 3 is specifically configured to: and calling a conversion function of the third-party engine to start converting the neural network model.

In a preferred embodiment, the control module 4 is further configured to: control executes on the CPU a submodel including incompatible nodes.

The neural network model acceleration device provided by the embodiment of the invention can execute the neural network model acceleration method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a server according to a third embodiment of the present invention. As shown in fig. 3, the server 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, implements the steps of any of the embodiments of the method described above.

Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 32 in the server 3. For example, the computer program 32 may perform the following operations:

partitioning the neural network model based on incompatible nodes to obtain at least two sub-models;

converting the sub-model not including the incompatible node into a format supported by a third-party engine;

running the format-converted submodel through a third-party engine.

In a preferred embodiment, the computer program 32 may further perform the following operations:

initiating, by a third party engine, transformation of the neural network model;

When the conversion of the third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, starting to convert the neural network model through the third-party engine by using a next node of the incompatible node, and if the conversion is not abnormal, obtaining an incompatible node when the conversion is finished.

dividing the neural network model based on the incompatible node to obtain two sub-models, wherein if the incompatible node is close to the starting point of the neural network model, the starting point of the neural network model and the incompatible node belong to the same sub-model; and if the incompatible node is close to the end point of the neural network model, the end point of the neural network model and the incompatible node belong to the same sub-model.

In a preferred embodiment, when there are at least two incompatible nodes and any incompatible node does not belong to the start point or the end point of the neural network model, the computer program 32 may further perform the following operations:

controlling a third-party engine to start converting the neural network model;

when the conversion of the third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, and acquiring the ratio of the data length from the starting point of the neural network model to the incompatible node to the data length corresponding to the neural network model;

if the ratio is larger than or equal to a preset value, the neural network model is segmented based on the incompatible node to obtain two sub-models, wherein the starting point of the neural network model and the incompatible node are in different sub-models.

if the ratio is smaller than the preset value, taking an incompatible node as a first incompatible node, and controlling the third-party engine to convert the neural network model from a node next to the first incompatible node;

when the third-party engine is abnormally converted, acquiring a currently converted node, taking the currently converted node as a second incompatible node, acquiring a ratio of a data length between a first incompatible node and the second incompatible node to a data length corresponding to the neural network model, if the ratio is smaller than the preset value, starting to control the third-party engine to convert the neural network model by using a next node of the second incompatible node, and continuing to acquire the next incompatible node until the ratio of the data length between an Nth incompatible node and an (N-1) th incompatible node to the data length of the neural network model is larger than or equal to the preset value, and stopping acquiring the next incompatible node, wherein N is larger than or equal to 2;

and then, segmenting the neural network model based on the (N-1) th incompatible node and the (N) th incompatible node to obtain three submodels, wherein the starting point of the neural network model to the (N-1) th incompatible node belong to one submodel, the next node of the (N-1) th incompatible node to the last node of the (N) th incompatible node belong to the other submodel, and the end point of the (N) th incompatible node to the neural network model belong to the other submodel of the three submodels.

if the ratio of the data length from the Nth incompatible node to the (N-1) th incompatible node to the data length of the neural network model is not larger than or equal to a preset value until the conversion is finished, acquiring the data length between every two adjacent incompatible nodes, and acquiring the maximum value of at least two acquired data lengths;

then, the neural network model is segmented based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding submodels, wherein one submodel does not comprise the two target incompatible nodes, and the other two submodels respectively comprise one target incompatible node of the two target incompatible nodes.

and calling a conversion function of the third-party engine to start converting the neural network model.

a submodel including incompatible nodes is executed on the CPU.

The server 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The server may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of a server 3 and is not intended to be limiting of server 3, and may include more or fewer components than those shown, or some components in combination, or different components, e.g., the server may also include input output devices, network access devices, buses, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 31 may be an internal storage unit of the server 3, such as a hard disk or a memory of the server 3. The memory 31 may also be an external storage device of the server 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the server 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the server 3. The memory 31 is used for storing the computer program and other programs and data required by the server. The memory 31 may also be used to temporarily store data that has been output or is to be output.

Example four

The embodiment of the present invention further provides a computer-readable storage medium, where at least one executable instruction is stored in the computer-readable storage medium, and the computer-executable instruction may execute the neural network model acceleration method in any of the above method embodiments.

The executable instructions may be operable to cause a processor to:

running the format-converted submodel through a third-party engine.

In a preferred embodiment, the executable instructions are further specifically configured to cause a processor to:

In a preferred embodiment, when there are at least two incompatible nodes and any incompatible node does not belong to the start point or the end point of the neural network model, the executable instructions may be further specifically configured to cause the processor to:

In a preferred embodiment, the executable instructions are further specifically configured to cause a processor to perform the following operations:

controlling a third-party engine to start converting the neural network model;

In a preferred embodiment, the executable instructions may be specifically configured to cause a processor to:

a submodel including incompatible nodes is executed on the CPU.

In the invention, when the third-party engine is used for acceleration and incompatibility occurs, the third-party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be increased, and the operation efficiency is improved.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art would appreciate that the modules, elements, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that has been appropriately increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A neural network model acceleration method, the method comprising:

running the format-converted submodel through the third-party engine.

2. The method of claim 1, wherein determining nodes for which a third party engine is incompatible with a neural network model comprises:

initiating, by the third party engine, transformation of the neural network model;

when the third-party engine is abnormally converted, acquiring a currently converted node, and taking the currently converted node as an incompatible node; starting to convert the neural network model by the third-party engine with the next node of the incompatible nodes, and if the incompatible nodes exist, starting to continue the conversion step with the next node of the current incompatible nodes until the conversion is finished to obtain at least two incompatible nodes; or

And when the third-party engine is abnormal in conversion, acquiring a currently converted node, taking the currently converted node as an incompatible node, starting to convert the neural network model through the third-party engine by using a node next to the incompatible node, and if no conversion abnormality occurs, acquiring an incompatible node when the conversion is finished.

3. The method according to claim 2, wherein when there is one incompatible node, the partitioning the neural network model based on the incompatible node is specifically:

4. The method according to claim 2, wherein the number of the incompatible nodes is at least two, and when any of the incompatible nodes does not belong to a start point or an end point of the neural network model, the neural network model is segmented based on the incompatible nodes to obtain at least two sub-models, specifically:

dividing the neural network model into K +1 submodels based on the K incompatible nodes, wherein if the data length between the Mth incompatible node and the M-1 st incompatible node is larger than or equal to the data length between the Mth incompatible node and the M +1 st incompatible node, the Mth incompatible node and the M +1 st incompatible node belong to the same submodel, and the Mth incompatible node and the M-1 st incompatible node belong to different submodels; if the data length between the Mth incompatible node and the M-1 th incompatible node is smaller than the data length between the Mth incompatible node and the M +1 th incompatible node, the Mth incompatible node and the M-1 th incompatible node belong to the same submodel, and the Mth incompatible node and the M +1 th incompatible node belong to different submodels, wherein K is a natural number greater than or equal to 2, M is a natural number greater than or equal to 1, and M is less than or equal to K.

5. The method of claim 1, wherein determining nodes for which a third party engine is incompatible with a neural network model comprises:

when the third-party engine is abnormally converted, acquiring a currently converted node, taking the currently converted node as an incompatible node, and acquiring the ratio of the data length from the starting point of the neural network model to the incompatible node to the data length corresponding to the neural network model;

if the ratio is larger than or equal to a preset value, switching to the step of segmenting the neural network model based on the incompatible node;

the partitioning the neural network model based on the incompatible nodes is specifically as follows:

and partitioning the neural network model based on the incompatible node to obtain two submodels, wherein the starting point of the neural network model and the incompatible node are in different submodels.

6. The method of claim 5, further comprising:

if the ratio is smaller than the preset value, the incompatible node is used as a first incompatible node, and the neural network model is converted from a node next to the first incompatible node through the third-party engine;

when the third-party engine is abnormally converted, acquiring a currently converted node, taking the currently converted node as a second incompatible node, acquiring a ratio between a data length from the first incompatible node to the second incompatible node and a data length corresponding to the neural network model, starting to convert the neural network model through the third-party engine by using a next node of the second incompatible node if the ratio is smaller than the preset value, and continuing to acquire the next incompatible node until the ratio between the data length from the Nth incompatible node to the (N-1) th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and stopping acquiring the next incompatible node, wherein N is larger than or equal to 2;

the method comprises the following steps of segmenting the neural network model based on the incompatible node to obtain at least two submodels:

and segmenting the neural network model based on the N-1 th incompatible node and the Nth incompatible node to obtain three submodels, wherein the starting point of the neural network model to the N-1 th incompatible node belong to one submodel, the next node of the N-1 th incompatible node to the last node of the Nth incompatible node belong to another submodel, and the end point of the Nth incompatible node to the neural network model belongs to another submodel of the three models.

7. The method of claim 6, further comprising:

if the ratio of the data length from the Nth incompatible node to the (N-1) th incompatible node to the data length of the neural network model is not larger than or equal to the preset value until the conversion is finished, acquiring the data length between every two adjacent incompatible nodes, and acquiring the maximum value of at least two acquired data lengths;

the partitioning of the neural network model based on the incompatible nodes to obtain at least two models specifically:

and segmenting the neural network model based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding submodels, wherein one submodel does not comprise the two target incompatible nodes, and the other two submodels respectively comprise one target incompatible node of the two target incompatible nodes.

8. The method according to claim 2 or 5, wherein the starting of the transformation of the neural network model by the third party engine is in particular:

9. The method of any one of claims 1 to 7, wherein the segmenting the neural network model based on the incompatible nodes further comprises, after obtaining at least two sub-models:

a submodel including incompatible nodes is executed on the CPU.

10. A neural network model acceleration apparatus, comprising:

11. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 9 are implemented when the computer program is executed by the processor.

12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.