CN112561044B

CN112561044B - Neural network model acceleration method and device, server and storage medium

Info

Publication number: CN112561044B
Application number: CN201910914935.3A
Authority: CN
Inventors: 蒋焘
Original assignee: Xian Wingtech Electronic Technology Co Ltd
Current assignee: Xian Wingtech Electronic Technology Co Ltd
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2023-07-14
Anticipated expiration: 2039-09-26
Also published as: CN112561044A

Abstract

The invention is applicable to the technical field of deep learning, and provides a neural network model acceleration method and device, a server and a storage medium, wherein the method comprises the following steps: determining a node of the third party engine incompatible with the neural network model; dividing the neural network model based on the incompatible nodes to obtain at least two sub-models; converting a sub-model that does not include the incompatible node to a format supported by the third party engine; and running the sub-model subjected to format conversion through the third party engine. In the invention, the third party engine is utilized to accelerate, and when incompatibility occurs, the third party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be improved.

Description

Neural network model acceleration method and device, server and storage medium

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a neural network model acceleration method and apparatus, a server, and a storage medium.

Background

A neural network model is a computational model that mimics the structure and function of a biological neural network (the central nervous system of an animal, particularly the brain) for estimating or approximating a function. As the neural network algorithm is continuously studied, the accuracy of the neural network algorithm exceeds that of the traditional machine learning algorithm in many application occasions. The neural network algorithm gradually replaces the conventional algorithm and starts to be deployed on the terminal device.

Although the neural network algorithm has high accuracy, the calculation amount is huge, so that the memory bandwidth and the whole power consumption are consumed greatly. The terminal equipment is often embedded equipment, the embedded equipment has high requirement on efficiency, and the traditional CPU is difficult to independently bear the task; in order to obtain higher execution efficiency, the acceleration engine of the third party is needed to perform format conversion on the solidified neural network model so as to execute on the GPU and the like, but the acceleration in the real sense cannot be realized due to the incompatibility problem of the third party engine and the model, which affects the execution efficiency of the neural network algorithm.

Therefore, a new technical solution is needed to solve the above technical problems.

Disclosure of Invention

In view of this, the embodiment of the invention provides a neural network model acceleration method and device, a server and a storage medium, which solve the problem of low operation efficiency of the neural network model in the prior art.

A first aspect of an embodiment of the present invention provides a neural network model acceleration method, where the method includes:

determining a node of the third party engine incompatible with the neural network model;

Dividing the neural network model based on the incompatible nodes to obtain at least two sub-models;

converting a sub-model that does not include the incompatible node to a format supported by the third party engine;

and running the sub-model subjected to format conversion through the third party engine.

A second aspect of an embodiment of the present invention provides a neural network model acceleration apparatus, the apparatus including:

the determining module is used for determining nodes of which the third-party engine is incompatible with the neural network model;

the segmentation module is used for segmenting the neural network model based on the incompatible nodes to obtain at least two models;

a conversion module for converting a model that does not include the incompatible node into a format supported by the third party engine;

and the control module is used for controlling the third party engine to run the format-converted model.

A third aspect of an embodiment of the present invention also provides a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method mentioned in the first aspect when executing the computer program.

A fourth aspect of an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: when the third party engine is utilized to accelerate, when incompatibility occurs, the third party engine accelerates the compatible part in a segmented execution mode, so that the overall operation speed of the neural network model can be improved, and the operation efficiency can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for accelerating a neural network model based on a third party engine according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a neural network model acceleration device based on a third party engine according to a second embodiment of the present invention;

Fig. 3 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical solution of the present invention, the following description is given by way of specific examples.

Example 1

Fig. 1 is a flowchart of a method for accelerating a neural network model according to an embodiment of the present invention, where the method may include the following steps:

and step S1, determining a node of which the third party engine is incompatible with the neural network model.

Specifically, in an embedded system, in order to improve the execution efficiency, it is generally required to increase the execution speed by means of a third party engine, and because the neural network model is already cured, if the neural network model is executed by the third party engine, the neural network model needs to be converted first, and in the conversion process, an incompatibility problem between the third party engine and the cured neural network model can occur, so that it is firstly required to determine an incompatible node existing between the third party engine and the neural network model, that is, a node where an error occurs when the neural network model is converted by the third party engine is an incompatible node.

In a preferred aspect of this embodiment, the third party engine may be a neural network acceleration engine, such as a high-pass neural network acceleration engine (Snapdragon Neural Processing Engine, SNPE).

In a further preferred aspect of this embodiment, the step S1 specifically includes:

first, the neural network model is converted by the third party engine.

Specifically, the format of the execution file of the neural network model is firstly converted into a format compatible with the third-party engine, and then the neural network model with the format converted is executed through the third-party engine, for example, the conversion function of the third-party engine is called to convert the format of the neural network model.

Then, in the conversion process, if the conversion of the third-party engine is abnormal, the acceleration engine is incompatible with the neural network model, and the current converted node is obtained, namely the incompatible node. It should be noted that, in the conversion process, there may be a plurality of incompatible nodes, if an incompatible node has occurred, in order to obtain other incompatible nodes, the neural network model may be converted from behind the incompatible node, if in the conversion process, when a conversion abnormality occurs, a corresponding incompatible node is obtained, and the foregoing steps are repeated until the conversion is completed, so as to obtain a plurality of incompatible nodes.

And step S2, dividing the neural network model based on incompatible nodes to obtain at least two sub-models.

Specifically, the original neural network model (i.e., the neural network model that has not undergone format conversion) is partitioned into at least two sub-models with incompatible nodes as boundaries, wherein some sub-models include incompatible nodes, some sub-models do not include incompatible nodes, and the sub-models that do not include incompatible nodes are run by a third party engine.

And step S3, converting the submodel which does not comprise the incompatible nodes into a format supported by the third party engine.

In particular, since the sub-model that does not include incompatible nodes is run by the third party engine, it is necessary to convert the sub-model into a format supported by the third party engine, for example, into a format that can run on a GPU or DSP, which is a format supported by the third party engine, such as the DLC format supported by the high-pass SNPE engine.

And S4, running the submodel of the format conversion through a third party engine.

Specifically, after the neural network model is segmented, the sub-model excluding the incompatible node is run by the third party engine, and the sub-model including the incompatible node is still run in the original manner, for example, the sub-model is run on the CPU.

In this embodiment, when incompatibility occurs, the third party engine accelerates the compatible part in a segmented execution manner, so that the overall running speed of the neural network model can be improved.

In a preferred aspect of this embodiment, step S1 may further include:

and judging whether the third party engine is compatible with the neural network model, if not, turning to step S1, and if so, directly using the third party engine to run the neural network model on the GPU or the DSP.

In a preferred aspect of this embodiment, the step S1 specifically includes:

the neural network model is converted by the third party engine.

When the third-party engine is abnormal in conversion, the current converted node is obtained, and the current converted node is an incompatible node.

And starting to convert the neural network model through a third-party engine by using the next node of the incompatible node, and if the incompatible node exists, starting to continue the conversion step by using the next node of the current incompatible node until the conversion is finished, and finally obtaining at least two incompatible nodes.

Specifically, if an incompatible node is to be obtained, the neural network model is firstly converted by a third party engine, in the conversion process, if conversion abnormality occurs, the current node is not compatible, the current converted node is used as the incompatible node, the next node of the incompatible node starts to convert the neural network by the third party engine, if the incompatible node still exists, the conversion step is continuously performed by the next node of the current incompatible node until the conversion is finished, and at least two incompatible nodes are finally obtained.

In a preferred version of this embodiment, there may also be cases where: when the conversion of the third party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, starting to convert the neural network model through the third party engine by taking the next node of the incompatible node, and obtaining an incompatible node when the conversion is finished if the conversion is not abnormal, namely, only once converting the neural network model by the third party engine, wherein only one incompatible node is obtained at the moment, and if the incompatible node is close to the starting point (or is positioned at the starting point) of the neural network model, the starting point of the neural network model and the incompatible node belong to the same submodel; if the incompatible node is near the endpoint of the neural network model (or is at the endpoint), the endpoint of the neural network model and the incompatible node belong to the same sub-model. It should be noted that, the starting point refers to the first node of the neural network model, and the end point refers to the last node of the neural network model.

In a further preferred aspect of this embodiment, when any of the incompatible nodes does not belong to the start point or the end point of the neural network model, that is, the incompatible node does not appear at the start point or the end point of the neural network model, the step S2 specifically includes:

Dividing the neural network model into K+1 sub-models based on K incompatible nodes, wherein if the data length between the Mth incompatible node and the Mth-1 incompatible node is greater than or equal to the data length between the Mth incompatible node and the Mth+1st incompatible node, the Mth incompatible node and the Mth+1st incompatible node belong to the same model, and the Mth incompatible node and the Mth-1 st incompatible node belong to different models; if the data length between the Mth incompatible node and the Mth-1 incompatible node is smaller than the data length between the Mth incompatible node and the Mth+1st incompatible node, the Mth incompatible node and the Mth-1 incompatible node belong to the same model, the Mth incompatible node and the Mth+1st incompatible node belong to different models, wherein K is a natural number larger than or equal to 2, M is a natural number larger than or equal to 1, and M is smaller than or equal to K.

For example: the neural network model comprises 8 nodes (namely a 1 st node (namely a starting point), a 2 nd node, a … th node and an 8 th node (namely an end point)) and can be divided into 4 models at the moment, namely an A sub-model, a B sub-model, a C sub-model and a D sub-model, wherein the 2 nd node, the 5 th node and the 7 th node are respectively a first incompatible node, a second incompatible node and a third incompatible node, at the moment, the data length between the first incompatible node and the second incompatible node and the data length between the second incompatible node and the third incompatible node are respectively obtained, namely the data length between the 2 nd node and the 5 th node and the data length between the 5 th node are respectively obtained, and when the data length between the 2 nd node and the 5 th node is larger than the data length between the 5 th node and the 7 th node, the 1 st node to the 2 nd node of the neural network model and the data part comprising the 2 nd node are put into the A sub-model; putting the data which is after the node 2 and before the node 4 and does not comprise the node 2 and the node 4 into a B sub-model; putting the data of the 4 th node and the 7 th node comprising the 4 th node and the 7 th node into a C sub-model; putting the data from the 7 th node to the 8 th node into a D sub-model; at this time, the B sub-model and the D sub-model do not include incompatible nodes, and the B sub-model and the D sub-model are operated by a third party engine; the A sub-model and the C sub-model include incompatible nodes, executing on the CPU.

In another preferred aspect of the present embodiment, the step S1 specifically includes:

the neural network model is converted by the third party engine.

When the third-party engine is abnormal in conversion, the current converted node is obtained, the current converted node is an incompatible node, the ratio of the data length between the starting point of the neural network model and the incompatible node to the data length corresponding to the neural network model is obtained, namely the data length between the starting point and the incompatible node is divided by the data length corresponding to the neural network model, and the ratio is obtained.

If the ratio is greater than or equal to the preset value, turning to step S2, directly performing segmentation of the neural network model, and then performing segmentation on the neural network model based on the incompatible node to obtain two sub-models, wherein the starting point of the neural network model and the incompatible node are in different sub-models, and since the ratio of the data length from the incompatible node to the starting point to the data length of the neural network model is greater than the preset value, it is indicated that the neural network model can be segmented based on the incompatible node, the part from the starting point to the node before the incompatible node is used as one sub-model, and the part from the incompatible node to the end point of the neural network model is used as another sub-model. The magnitude of the preset value may be set according to practical situations, for example, 1-9, preferably, the ratio is 9, that is, the effect is better when the data length between the starting point and the incompatible node accounts for 90% of the total data length of the neural network model.

If the ratio is smaller than a preset value, the incompatible node is used as a first incompatible node, and the neural network model is converted by a third party engine from the next node of the first incompatible node;

when the third party engine is abnormal in conversion, a currently converted node is obtained, the currently converted node is used as a second incompatible node, the ratio between the data length between the first incompatible node and the second incompatible node and the data length corresponding to the neural network model is obtained, if the obtained ratio is smaller than a preset value, the next node of the second incompatible node starts to convert the neural network model through the third party engine, the next incompatible node is continuously obtained until the ratio between the data length between the Nth incompatible node and the N-1 th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and the next incompatible node is stopped being obtained, wherein N is larger than or equal to 2.

For example, the neural network model has 15 nodes, when a transition abnormality occurs in the 3 rd node, the 3 rd node is a first incompatible node, and when the data length from the 1 st node to the 3 rd node is smaller than a preset value, the 4 th node starts to transition the neural network model through the third party engine, if a transition abnormality occurs in the 6 th node, the 6 th node is a second incompatible node, and when the data length from the 3 rd node to the 6 th node is smaller than the preset value, the 7 th node continues to start to transition the neural network model through the third party engine until the data length between two adjacent nodes and the data length corresponding to the neural network model are larger than or equal to the preset value, the next incompatible node is stopped to be acquired, for example, when the data length between the 9 th node and the 12 th node are both abnormal, and when the data length between the 9 th incompatible node and the 12 th incompatible node are larger than or equal to the preset value, the next incompatible node is stopped to be acquired, and the next incompatible node is stopped to be operated;

If there is an nth incompatible node, the ratio between the data length between the nth incompatible node and the 1 st incompatible node and the data length of the neural network model is greater than or equal to the preset value, the step S2 specifically includes: dividing the neural network model based on the N-1 incompatible node and the N incompatible node to obtain three sub-models, wherein the starting point of the neural network model to the N-1 incompatible node belong to one sub-model, the next node of the N-1 incompatible node to the last node of the N incompatible node belong to another sub-model, and the end point of the N incompatible node to the neural network model belongs to another sub-model in the three models. For example: the neural network model is partitioned into three segments (each segment representing a sub-model), wherein the first segment is from the start point to the (N-1) th incompatible node, the second segment is from the (N-1) th incompatible node to the (N) th incompatible node before the (N) th incompatible node as the second segment, and the (N) th incompatible node to the end point as the third segment. And the second segment is a segment which can be compatible with the third party engine.

In a modification of this embodiment, if the ratio between the data length from the nth incompatible node to the nth-1 incompatible node and the data length of the neural network model is greater than or equal to the preset value until the conversion is completed, the data length between each two adjacent incompatible nodes is obtained, and the maximum value of at least two obtained data lengths is obtained.

Specifically, if the ratio between the data length from the current incompatible node to the last incompatible node and the data length of the neural network model is smaller than the preset value when one incompatible node is acquired, it is necessary to continue to attempt to acquire the next incompatible node, and acquire the ratio between the data length from the current incompatible node to the last incompatible node and the data length of the neural network model, and compare the ratio with the preset value until all the nodes are tried. The ratio between the data length from the nth incompatible node to the (N-1) th incompatible node and the data length of the neural network model is greater than or equal to the preset value, and the data length between each two adjacent incompatible nodes is acquired at this time to obtain the maximum value of at least two acquired data lengths.

In a further preferred embodiment of the modification, the step S2 is specifically:

the neural network model is segmented based on two target incompatible nodes corresponding to the maximum value of the data length, so that three corresponding sub-models are obtained, one sub-model does not comprise the two target incompatible nodes, and the other two sub-models respectively comprise one of the two target incompatible nodes. .

For example: the incompatible nodes corresponding to the longest data length are b1 and b2, the neural network model is divided into three segments (namely three sub-models), the part from the starting point of the neural network model to b1 is used as a first segment, the part from the rear of b1 to the front of b2 is used as a second segment, and the end point of the neural network model from b2 to the third segment. Since the longest portion is executed by the third party engine, the processing efficiency can be effectively improved.

In a preferred embodiment of the present invention, since there are a plurality of incompatible nodes, the neural network model may be divided into a plurality of sub-models based on the plurality of incompatible nodes, and some of the plurality of sub-models are compatible with the third party engine, the sub-models after conversion and compatible therewith may be executed by the third party engine, and other incompatible models may still be executed using the original manner.

It should be noted that, after the sub-model is divided into at least two sub-models, the sub-model compatible with the third party engine is subjected to format conversion by the third party engine to obtain the sub-model with the converted format.

In a preferred scheme of this embodiment, taking a high-pass neural network acceleration engine SNPE and taking a face recognition model facenet. Pb as an example, in order to improve the execution efficiency of the facenet. Pb model, firstly, the facenet. Pb model is executed through the SNPE, when an execution error is found, a node where the error is located is obtained, the node is taken as an incompatible node, the neural network model is divided into two models P1 (a first model) and P2 (a second model) by taking the node a as a boundary line, the node a is located in the P2 model, then the P1 is converted into a DLC format, and the P2 is executed on the CPU by the original mode after the conversion format is executed on the GPU or the DSP through the SNPE, so that the overall execution efficiency can be improved.

Specifically, the high-pass neural network acceleration engine is utilized to convert the neural network model SNPE into a DLC format supported by the SNPE engine, and the process of determining the incompatible nodes can be as follows: the incompatibility problem is found in the process of converting the neural network model (face recognition model pb) into the DLC format by a third party engine, if an ERROR prompt such as ERROR-Conversion failed: element Wise resolver must implement broadcast method is popped up in the conversion process, the current node conversion is abnormal, incompatible nodes are shown, all node pairs of the neural network model can be printed at the moment, and the ERROR is judged according to experience. Or invoke a transition command to try each node one by one, seeing which node will have a transition failure. The values of the-out_node parameters of the following commands are replaced one by one, where empeddings is the last node (i.e., endpoint) of the neural network model, such as: snpe-tensorflow-to-dlc-graph facenet. Pb-input_dim input 1,160,160,3-out_node emmbeddings-dlc facenet. Dlc-alloy_unconditioned_nodes.

Further, the conversion process of converting facenet. Pb into facenet. Dlc is as follows: inputting a conversion command: "snpe-tensorsurface-to-dlc-graph facenet. Pb-input_dim input 1,160,160,3, -out_node emmbeddings-dlc facenet. Dlc-alloy_unconditioned_nodes",

the acquired execution time is 2019-04-22 09:56:57.578906:

in the process of executing the above-mentioned conversion command, the recording of the error is specifically as follows: itinorflow/core/platform/cpu_feature_guard. Cc:141]Your CPU supports instructions that this TensorFlow binary was not compiled to use:AVX2 AVX512F FMA 2019-04-22 09:56:57.602994:Itensorflow/core/platform/profile_units/cpu_units. Cc:94]CPU Frequency:3500000000Hz

2019-04-22 09:56:57.603530:I tensorflow/compiler/xla/service/service.cc:161]XLA service 0x54467a0 executing computations on platform Host.Devices:

2019-04-22 09:56:57.603532:I tensorflow/compiler/xla/service/service.cc:168]StreamExecutor device(0):<undefined>,<undefined>

The ERROR is formally reported at 2019-04-22 10:00:39, 144-106-this point in time, whereupon an ERROR prompt pops up "ERROR-Conversion failed: element Wise resolver must implement broadcast method" indicating that it is currently unable to be performed.

During execution, if the prompt is popped up, the transition abnormality is described, analysis is performed (all node pairs of the neural network model can be printed out, each node can be judged empirically, or each node can be tried by the above-mentioned transition command one by one, the value of the-out_node parameter of the transition command is replaced one by one, the empeddings are the last node of the neural network model), if the analysis is performed, the SNPE does not support the InceptionResnetV1/Logits/Reshape node in the facenet. Pb, at this time, the facenet. Pb takes the node InceptionResnetV1/Logits/Reshape as a boundary and is divided into two pb files, facenet_part1.Pb and facenet_part2.Pb, and a division instruction "/home/size-master-bin/tension/tools/ph_trans/trans/ph_form_grams is input

--in_graph＝/home/models/facenet_part1.pb

--out_graph＝/home/models/facenet_part2.pb

--inputs＝InceptionResnetV1/Logits/Reshape

--outputs＝embeddings--transforms＝'add_default_attributes

remove_nodes (op=identity, op=checknumerics) fold_batch_norm fold_old_batch_norm strip_unbused_nodes sort_by_execution_order' "and then execute the instruction.

After segmentation, the facenet_part1.Pb is converted into facenet_part1.Dlc by the command SNPE-tensorsurface-to-dlc, so that the facenet_part1.Dlc can be run on the GPU or DSP at high speed through the SNPE, and the remaining facenet_part2.Pb can be run on the CPU by a conventional method.

In this embodiment, when the third party engine is used to accelerate, the third party engine accelerates the compatible part in a segmented execution manner when incompatibility occurs, so that the overall operation speed of the neural network model can be improved, and the operation efficiency can be improved.

If a plurality of incompatible nodes exist, selecting a data length not smaller than a preset value or selecting a part of the neural network model between two incompatible nodes corresponding to the maximum data length and excluding the part of the neural network model between the two incompatible nodes to operate through a third party engine, so that the operation efficiency can be further improved.

Example two

Based on the first embodiment, as shown in fig. 2, a schematic structural diagram of a neural network model acceleration device according to a second embodiment of the present invention is provided, and the device is configured to perform the method steps of the first embodiment, and for convenience of explanation, only the portions related to the embodiments of the present invention are shown. The device at least comprises:

And the determining module 1 is used for determining the nodes of which the third party engine is incompatible with the neural network model.

And the segmentation module 2 is used for segmenting the neural network model based on incompatible nodes to obtain at least two sub-models.

A conversion module 3, configured to convert the submodel that does not include the incompatible node into a format supported by the third party engine.

And the control module 4 is used for controlling the submodel of the third party engine operation format conversion.

In a preferred embodiment, the determining module 1 is specifically configured to:

controlling a third party engine to start converting the neural network model;

when the conversion of the third-party engine is abnormal, acquiring a currently converted node, and taking the currently converted node as an incompatible node; starting to convert the neural network model through the third party engine by the next node of the incompatible node, and if the incompatible node exists, starting to continue the conversion step by the next node of the current incompatible node until the conversion is finished, so as to obtain at least two incompatible nodes; or alternatively

When the conversion of the third party engine is abnormal, a currently converted node is obtained, the currently converted node is taken as an incompatible node, the neural network model is converted by the third party engine from the next node of the incompatible node, if the conversion is abnormal, an incompatible node is obtained when the conversion is finished, and after the incompatible node is determined, the result is fed back to the segmentation module 2.

In a preferred embodiment, if the incompatible node is one, the splitting module 2 is specifically configured to:

dividing the neural network model based on the incompatible node to obtain two sub-models, wherein if the incompatible node is close to the starting point of the neural network model, the starting point of the neural network model and the incompatible node belong to the same sub-model; and if the incompatible node is close to the end point of the neural network model, the end point of the neural network model and the incompatible node belong to the same sub-model.

If there are at least two incompatible nodes, and any incompatible node does not belong to the start point or the end point of the neural network model, the segmentation module 2 is specifically configured to:

dividing the neural network model into K+1 sub-models based on K incompatible nodes, wherein if the data length between the Mth incompatible node and the Mth-1 incompatible node is greater than or equal to the data length between the Mth incompatible node and the Mth+1th incompatible node, the Mth incompatible node and the Mth+1th incompatible node belong to the same sub-model, and the Mth incompatible node and the Mth-1 th incompatible node belong to different sub-models; if the data length between the Mth incompatible node and the Mth-1 incompatible node is smaller than the data length between the Mth incompatible node and the Mth+1st incompatible node, the Mth incompatible node and the Mth-1 incompatible node belong to the same submodel, the Mth incompatible node and the Mth+1st incompatible node belong to different submodels, wherein K is a natural number greater than or equal to 2, M is a natural number greater than or equal to 1, and M is smaller than or equal to K.

In a preferred embodiment, the determining module 1 is specifically configured to: controlling a third party engine to start converting the neural network model;

when the third-party engine is abnormal in conversion, a currently converted node is obtained, the currently converted node is taken as an incompatible node, and the ratio of the data length between the starting point of the neural network model and the incompatible node to the data length corresponding to the neural network model is obtained; if the ratio is greater than or equal to the preset value, the ratio is fed back to the segmentation module 2.

The segmentation module 2 is specifically configured to: and dividing the neural network model based on the incompatible nodes to obtain two sub-models, wherein the starting point of the neural network model and the incompatible nodes are in different sub-models.

In another preferred embodiment, if the ratio is smaller than the preset value, the determining module 1 is further configured to: taking the incompatible node as a first incompatible node, and starting from the next node of the first incompatible node, controlling a third party engine to convert the neural network model;

when the third party engine is abnormal in conversion, a currently-converted node is obtained, the currently-converted node is used as a second incompatible node, the ratio between the data length between the first incompatible node and the second incompatible node and the data length corresponding to the neural network model is obtained, if the ratio is smaller than the preset value, the third party engine is controlled to convert the neural network model by the next node of the second incompatible node, the next incompatible node is continuously obtained until the ratio between the data length between the Nth incompatible node and the N-1 th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and the next incompatible node is stopped being obtained, wherein N is larger than or equal to 2;

At this time, the dividing module 2 is specifically configured to: dividing the neural network model based on the N-1 incompatible node and the N incompatible node to obtain three sub-models, wherein the starting point of the neural network model to the N-1 incompatible node belong to one sub-model, the next node of the N-1 incompatible node to the last node of the N incompatible node belong to another sub-model, and the ending point of the N incompatible node to the neural network model belongs to another sub-model in the three models.

In a preferred embodiment, if until the conversion is finished, there is no ratio between the data length from the nth incompatible node to the nth-1 incompatible node and the data length of the neural network model is greater than or equal to the preset value, the determining module 1 is further configured to: acquiring the data length between each two adjacent incompatible nodes, obtaining the maximum value of at least two acquired data lengths, and feeding back to the segmentation module 2;

at this time, the dividing module 2 is specifically configured to: and dividing the neural network model based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding sub-models, wherein one sub-model does not comprise the two target incompatible nodes, and the other two sub-models respectively comprise one of the two target incompatible nodes.

In a preferred embodiment, the conversion module 3 is specifically configured to: and calling a conversion function of the third party engine to start converting the neural network model.

In a preferred embodiment, the control module 4 is further configured to: control executes a sub-model including incompatible nodes on the CPU.

The neural network model acceleration device provided by the embodiment of the invention can execute the neural network model acceleration method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example III

Fig. 3 is a schematic structural diagram of a server according to a third embodiment of the present invention. As shown in fig. 3, the server 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, implements the steps of any of the embodiments of the method described above.

Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 32 in the server 3. For example, the computer program 32 may perform the following operations:

dividing the neural network model based on incompatible nodes to obtain at least two sub-models;

converting the submodel not including the incompatible nodes into a format supported by a third party engine;

and running the sub-model subjected to format conversion through a third party engine.

In a preferred embodiment, the computer program 32 is further operable to:

starting to transform the neural network model by a third party engine;

When the conversion of the third party engine is abnormal, the node which is converted currently is obtained, the node which is converted currently is taken as an incompatible node, the neural network model is converted by the third party engine from the next node of the incompatible node, and if the conversion abnormality does not occur, the incompatible node is obtained when the conversion is finished.

In a preferred embodiment, the computer program 32 is further operable to:

controlling a third party engine to start converting the neural network model;

when the conversion of the third-party engine is abnormal, acquiring a currently converted node, taking the currently converted node as an incompatible node, and acquiring the ratio of the data length between the starting point of the neural network model and the incompatible node to the data length corresponding to the neural network model;

and if the ratio is greater than or equal to a preset value, dividing the neural network model based on the incompatible node to obtain two sub-models, wherein the starting point of the neural network model and the incompatible node are in different sub-models.

In a preferred embodiment, the computer program 32 is further operable to:

if the ratio is smaller than the preset value, using an incompatible node as a first incompatible node, and controlling the third party engine to convert the neural network model from the next node of the first incompatible node;

when the third party engine is abnormal in conversion, a currently-converted node is obtained, the currently-converted node is used as a second incompatible node, the ratio between the data length between the first incompatible node and the second incompatible node and the data length corresponding to the neural network model is obtained, if the ratio is smaller than the preset value, the third party engine is controlled to convert the neural network model by starting with the next node of the second incompatible node, the next incompatible node is continuously obtained until the ratio between the data length between the Nth incompatible node and the N-1 th incompatible node and the data length of the neural network model is larger than or equal to the preset value, and the next incompatible node is stopped being obtained, wherein N is larger than or equal to 2;

And then dividing the neural network model based on the N-1 incompatible node and the N incompatible node to obtain three sub-models, wherein the starting point of the neural network model to the N-1 incompatible node belong to one sub-model, the next node of the N-1 incompatible node to the last node of the N incompatible node belong to another sub-model, and the end point of the N incompatible node to the neural network model belongs to another sub-model in the three models.

In a preferred embodiment, the computer program 32 is further operable to:

if the ratio of the data length from the Nth incompatible node to the N-1 st incompatible node to the data length of the neural network model is not greater than or equal to a preset value until the conversion is finished, acquiring the data length between each two adjacent incompatible nodes, and obtaining the maximum value of at least two acquired data lengths;

then, the neural network model is segmented based on two target incompatible nodes corresponding to the maximum value of the data length, so that three corresponding sub-models are obtained, wherein one sub-model does not comprise the two target incompatible nodes, and the other two sub-models respectively comprise one of the two target incompatible nodes.

In a preferred embodiment, the computer program 32 is further operable to:

and calling a conversion function of the third-party engine to start converting the neural network model.

In a preferred embodiment, the computer program 32 is further operable to:

a sub-model including incompatible nodes is executed on the CPU.

The server 3 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The server may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the server 3 and does not constitute a limitation of the server 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the server may further include input-output devices, network access devices, buses, etc.

The processor 30 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the server 3, such as a hard disk or a memory of the server 3. The memory 31 may be an external storage device of the server 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the server 3. The memory 31 is used for storing the computer program as well as other programs and data required by the server. The memory 31 may also be used for temporarily storing data that has been output or is to be output.

Example IV

The embodiment of the invention also provides a computer readable storage medium, which stores at least one executable instruction, and the computer executable instruction can execute the neural network model acceleration method in any of the above method embodiments.

The executable instructions may be operable to cause a processor to:

In a preferred embodiment, the executable instructions may also be specifically configured to cause a processor to:

starting to transform the neural network model by a third party engine;

In a preferred embodiment, the number of incompatible nodes is at least two, and when any incompatible node does not belong to the start point or the end point of the neural network model, the executable instructions are further specifically configured to cause the processor to:

controlling a third party engine to start converting the neural network model;

In a preferred embodiment, the executable instructions may be specifically configured to cause a processor to:

a sub-model including incompatible nodes is executed on the CPU.

In the invention, when the third party engine is utilized to accelerate, the third party engine accelerates the compatible part in a segmented execution mode when incompatibility occurs, so that the overall operation speed of the neural network model can be improved, and the operation efficiency can be improved.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the modules, units, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions and by jurisdictions in which such content is subject to jurisdiction and patent practice, such as in certain jurisdictions in which such content is subject to jurisdiction and patent practice, such computer readable medium does not include electrical carrier signals or telecommunications signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A neural network model acceleration method, the method comprising:

2. The method of claim 1, wherein determining a node for which the third party engine is incompatible with the neural network model comprises:

starting to transform the neural network model by the third party engine;

When the third party engine is abnormal in conversion, acquiring a currently converted node, and taking the currently converted node as an incompatible node; starting to convert the neural network model through the third party engine by the next node of the incompatible node, and if the incompatible node exists, starting to continue the conversion step by the next node of the current incompatible node until the conversion is finished, so as to obtain at least two incompatible nodes; or alternatively

When the third party engine is abnormal in conversion, the current converted node is obtained, the current converted node is used as an incompatible node, the next node of the incompatible node starts to convert the neural network model through the third party engine, and if the conversion abnormality does not occur, the incompatible node is obtained when the conversion is finished.

3. The method according to claim 2, wherein when the incompatible node is one, the partitioning the neural network model based on the incompatible node is specifically:

4. The method according to claim 2, wherein the number of incompatible nodes is at least two, and when any of the incompatible nodes does not belong to a start point or an end point of the neural network model, the neural network model is segmented based on the incompatible nodes to obtain at least two sub-models, specifically:

dividing the neural network model into K+1 submodels based on K incompatible nodes, wherein if the data length between the Mth incompatible node and the Mth-1 incompatible node is greater than or equal to the data length between the Mth incompatible node and the Mth+1 incompatible node, the Mth incompatible node and the Mth+1 incompatible node belong to the same submodel, and the Mth incompatible node and the Mth-1 incompatible node belong to different submodels; if the data length between the Mth incompatible node and the Mth-1 incompatible node is smaller than the data length between the Mth incompatible node and the Mth+1st incompatible node, the Mth incompatible node and the Mth-1 incompatible node belong to the same submodel, the Mth incompatible node and the Mth+1st incompatible node belong to different submodels, wherein K is a natural number larger than 2, M is a natural number larger than 1, and M is smaller than K.

5. The method of claim 1, wherein determining a node for which the third party engine is incompatible with the neural network model comprises:

starting to transform the neural network model by the third party engine;

when the third party engine is abnormal in conversion, a currently converted node is obtained, the currently converted node is used as an incompatible node, and the ratio of the data length between the starting point of the neural network model and the incompatible node to the data length corresponding to the neural network model is obtained;

if the ratio is greater than or equal to a preset value, the method goes to the step of dividing the neural network model based on the incompatible nodes;

the dividing the neural network model based on the incompatible nodes specifically comprises the following steps:

and dividing the neural network model based on the incompatible nodes to obtain two sub-models, wherein the starting point of the neural network model and the incompatible nodes are in different sub-models.

6. The method of claim 5, wherein the method further comprises:

if the ratio is smaller than the preset value, the incompatible node is used as a first incompatible node, and the neural network model is converted by the third party engine from the next node of the first incompatible node;

When the third party engine is abnormal in conversion, acquiring a currently-converted node, taking the currently-converted node as a second incompatible node, acquiring a ratio between the data length between the first incompatible node and the second incompatible node and the data length corresponding to the neural network model, and if the ratio is smaller than the preset value, starting to convert the neural network model through the third party engine by the next node of the second incompatible node, and continuing to acquire the next incompatible node until the ratio between the data length between the Nth incompatible node and the N-1 th incompatible node and the data length of the neural network model is larger than or equal to the preset value, stopping acquiring the next incompatible node, wherein N is larger than or equal to 2;

the neural network model is segmented based on the incompatible nodes, and the obtaining of at least two sub-models is specifically as follows:

dividing the neural network model based on the N-1 incompatible node and the N incompatible node to obtain three sub-models, wherein the starting point of the neural network model to the N-1 incompatible node belong to one sub-model, the next node of the N-1 incompatible node to the last node of the N incompatible node belong to another sub-model, and the ending point of the N incompatible node to the neural network model belongs to another sub-model in the three models.

7. The method of claim 6, wherein the method further comprises:

if the ratio of the data length from the Nth incompatible node to the N-1 st incompatible node to the data length of the neural network model is greater than or equal to the preset value until the conversion is finished, acquiring the data length between every two adjacent incompatible nodes, and obtaining the maximum value in the acquired data lengths;

the neural network model is segmented based on the incompatible nodes, and the obtaining of at least two models is specifically as follows:

and dividing the neural network model based on two target incompatible nodes corresponding to the maximum value of the data length to obtain three corresponding sub-models, wherein one sub-model does not comprise the two target incompatible nodes, and the other two sub-models respectively comprise one of the two target incompatible nodes.

8. The method according to claim 2 or 5, wherein the starting of the conversion of the neural network model by a third party engine is in particular:

and calling a conversion function of the third party engine to start converting the neural network model.

9. The method according to any one of claims 1 to 7, wherein the segmenting the neural network model based on the incompatible nodes further comprises, after obtaining at least two sub-models:

a sub-model including incompatible nodes is executed on the CPU.

10. A neural network model acceleration apparatus, characterized by comprising:

11. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 9.