WO2022217419A1 - Neural network model inference method and apparatus, computer device, and storage medium - Google Patents

Neural network model inference method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022217419A1
WO2022217419A1 PCT/CN2021/086552 CN2021086552W WO2022217419A1 WO 2022217419 A1 WO2022217419 A1 WO 2022217419A1 CN 2021086552 W CN2021086552 W CN 2021086552W WO 2022217419 A1 WO2022217419 A1 WO 2022217419A1
Authority
WO
WIPO (PCT)
Prior art keywords
data structure
connection layer
optimized
layer
neural network
Prior art date
Application number
PCT/CN2021/086552
Other languages
French (fr)
Chinese (zh)
Inventor
庄奇
Original Assignee
深圳元戎启行科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳元戎启行科技有限公司 filed Critical 深圳元戎启行科技有限公司
Priority to PCT/CN2021/086552 priority Critical patent/WO2022217419A1/en
Priority to CN202180050194.4A priority patent/CN115867923A/en
Publication of WO2022217419A1 publication Critical patent/WO2022217419A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to a neural network model inference method, apparatus, computer equipment and storage medium.
  • the reasoning of the neural network model refers to deploying a pre-trained neural network model into actual business scenarios, such as image classification, object detection, online translation, etc., that is, input data to the neural network model, and obtain the output data through the neural network model. process.
  • the inference of the neural network model takes more time.
  • the traditional method is to fuse the connection layer, that is, the input tensor corresponding to the connection layer is directly written into the output tensor of the connection layer, and the input tensor and the connection layer are deleted to reduce the memory space. Occupy, reduce memory copy time.
  • a neural network model inference method According to various embodiments disclosed in the present application, a neural network model inference method, apparatus, computer device and storage medium are provided.
  • a neural network model inference method comprising:
  • the neural network model inference task includes a model identifier
  • obtaining a neural network model corresponding to the model identifier analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
  • connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer
  • Inference is performed according to the optimized neural network model to obtain a model inference result.
  • a neural network model inference device comprising:
  • a task acquisition module for acquiring a neural network model inference task, where the neural network model inference task includes a model identifier
  • a model parsing module configured to obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
  • a structure generation module configured to obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the computation graph according to the data structure template;
  • a data determination module configured to determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
  • a model optimization module for performing optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model
  • a model inference module configured to perform inference according to the optimized neural network model to obtain a model inference result.
  • a computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored therein, the computer-readable instructions, when executed by the processor, cause the one or more processors to execute The following steps:
  • the neural network model inference task includes a model identifier
  • obtaining a neural network model corresponding to the model identifier analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
  • connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer
  • Inference is performed according to the optimized neural network model to obtain a model inference result.
  • One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the neural network model inference task includes a model identifier
  • obtaining a neural network model corresponding to the model identifier analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
  • connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer
  • Inference is performed according to the optimized neural network model to obtain a model inference result.
  • FIG. 1 is an application environment diagram of a neural network model inference method in one or more embodiments.
  • FIG. 2 is a schematic flowchart of a neural network model inference method in one or more embodiments.
  • FIG. 3 is a schematic diagram of a computational graph in one or more embodiments.
  • FIG. 4 is a schematic diagram of a computation graph obtained by performing connection layer fusion on the computation graph shown in FIG. 3 in one or more embodiments.
  • FIG. 5 is a schematic diagram of a computational graph including complex join operations in one or more embodiments.
  • FIG. 6 is a schematic diagram of a data structure template in one or more embodiments.
  • FIG. 7 is a schematic flowchart of a step of generating a target sub-data structure corresponding to a computation graph according to a data structure template in one or more embodiments.
  • FIG. 8 is a schematic flowchart of an optimized data structure step of generating an output tensor corresponding to a connection layer according to a data structure template and a calculation graph in one or more embodiments.
  • FIG. 9 is a block diagram of a neural network model inference apparatus in one or more embodiments.
  • FIG. 10 is a block diagram of a computer device in one or more embodiments.
  • the neural network model inference method provided in this application can be applied to computer equipment, and the computer equipment can be a terminal or a server. It can be understood that the neural network model inference method provided by the present application can be applied to a terminal, a server, or a system including a terminal and a server, and is realized through interaction between the terminal and the server.
  • the neural network model inference method provided in this application can be applied to the application environment shown in FIG. 1 .
  • the terminal 102 communicates with the server 104 through the network.
  • the terminal 102 can acquire the model inference task, and the model inference task carries the model identifier.
  • the terminal 102 obtains the neural network model corresponding to the model identifier, parses the neural network model, and obtains a calculation graph corresponding to the neural network model, and the calculation graph includes a connection layer, thereby obtaining a pre-built data structure template, and generating a calculation graph according to the data structure template.
  • the corresponding target sub-data structure determine the connection layer data to be optimized in the calculation graph, and perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model, Then, inference is carried out according to the optimized neural network model, and the model inference result is obtained.
  • the terminal 102 may specifically include, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers.
  • the server 104 can be implemented by an independent server or a server cluster composed of multiple servers.
  • the neural network model inference method provided by the present application implements inference on the neural network model, can be applied to various application environments, and the neural network model can include various types.
  • the neural network model may include a convolutional neural network model, a recurrent neural network model, a recurrent neural network model, and the like.
  • Neural network models can be used to process many different kinds of data.
  • the neural network model may specifically include an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, a scene classification model, and the like.
  • the reasoning method of the neural network model provided by the present application can be specifically applied to the field of automatic driving, and the neural network model can specifically include at least one of an image recognition model or a trajectory prediction model.
  • the reasoning method of the neural network model provided by the present application can be applied in the text field, and the neural network model can specifically include at least one of an image recognition model, a behavior prediction model, or a risk assessment model.
  • a neural network model inference method is provided, and the method is applied to a computer device as an example for illustration.
  • the computer device may be the terminal or server in FIG. 1, including the following step:
  • Step 202 acquiring a model reasoning task, where the model reasoning task carries a model identifier.
  • Model inference refers to the operation of the data input into the neural network model according to the sequence of the model network structure of the neural network model and the corresponding arithmetic operations of the multiple network layers included in the network structure, so as to obtain the output of the neural network model. inference results.
  • the model inference task is used to instruct the computer device to infer the corresponding neural network model.
  • the computer equipment can be a terminal or a server.
  • the model identifier refers to a unique identifier for marking the neural network model, which is used to distinguish the neural network model.
  • the computer device can acquire the model inference task, and analyze the model inference task, thereby obtaining the model identifier carried in the model inference task.
  • the computer device can determine the neural network model specified by the user according to the received user operation instruction, and generate a model inference task carrying the model identifier.
  • the computer equipment can also determine the neural network model that needs to be called according to the actual operation requirements, and generate model inference tasks.
  • the computer device can generate a model inference task, so as to infer the image recognition model after inputting the image according to the model inference task, and obtain the image recognition The image recognition result output by the model.
  • the computer device may store an inference engine in advance, and the computer device may perform a model inference task through the inference engine, and perform inference on the neural network model corresponding to the model identifier.
  • An inference engine refers to a functional module in a computer device that is used to complete inference.
  • Step 204 Obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, and the computation graph includes a connection layer.
  • a neural network model is pre-stored in the computer device, and the neural network model is obtained by training a large amount of sample data, so that a corresponding neural network model can be obtained according to the model identifier.
  • the neural network model corresponding to the model identification may include at least one of various types of neural network models. For example, depending on the network structure of the neural network model, it may specifically include at least one of a convolutional neural network model (Convolutional Neural Networks, CNN for short), a recurrent neural network model (Recurrent Neural Network, RNN for short), and a recurrent neural network model. A sort of. According to different functions of the neural network model, the neural network model may specifically include at least one of an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, and a scene classification model.
  • the computer device analyzes the acquired neural network model, and obtains a calculation graph corresponding to the neural network model.
  • the computation graph can be an abstract graph of the neural network model in the model inference process, and can include multiple operation layers, tensors corresponding to each operation layer, and directed edges between the operation layers and corresponding tensors.
  • the operation layer can be used to represent the network layer in the network structure of the neural network model, and the operation layer can be used to determine the arithmetic operation performed by the corresponding network layer, such as convolution operation, full connection operation, connection operation, etc.
  • Tensor tensor is a data structure that can be understood as a vector or array matrix.
  • the shape of a tensor can be represented by dimensions, a one-dimensional tensor can be called a vector, and a tensor with more than two dimensions can be called an array matrix.
  • Tensors include input tensors and output tensors. The input tensors can be used to represent the input data corresponding to the operation layer, and the output tensors can be used to represent the output data of the operation layer.
  • Each operation layer may include multiple pointers to different input tensors and multiple pointers to different output tensors, and each tensor may include one pointer to the production layer and multiple pointers to different demand layers.
  • the production layer refers to which operation layers are used to obtain the tensor
  • the demand layer refers to which operation layers the tensor is used as the input tensor.
  • the directed edge between the operation layer and the corresponding tensor in the computation graph can be generated through the pointer in the operation layer and the pointer in the tensor.
  • the computer device can refer to the operation layer corresponding to the connection operation as the connection layer.
  • the connection layer is used to splicing the obtained multiple input tensors.
  • the data stored in the output tensors of the connection layer and the data stored in the multiple input tensors are completely different. identical.
  • a schematic diagram of the computation graph may be shown in FIG. 3 , and the computation graph includes an operation layer layer1, an operation layer layer2, an operation layer layer3, and input tensors and output tensors corresponding to each operation layer.
  • the output tensor tensor1 of the operation layer layer1 and the output tensor tensor2 of the operation layer layer2 are used as the input of layer3, and layer3 connects tensor1 and tensor2 to obtain the output tensor tensor3.
  • Shape in tensor1, tensor2 and tensor3 represents the shape of the tensor, which can be represented by dimensions.
  • Shape in tensor1 1 ⁇ 3 ⁇ 40 ⁇ 40, from left to right, represents the 0th dimension, the 1st dimension, the 2nd dimension, and the 3rd dimension.
  • Step 206 Obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the calculation graph according to the data structure template.
  • the pre-built data structure template refers to the data structure template required to optimize the computational graph.
  • the data structure template is a new data structure template built on the basis of the data structure of the traditional computational graph.
  • the target sub-data structure refers to the data structure of the optimized tensors in the computation graph.
  • connection layer in the operation layer.
  • the connection layer can be used to splicing the obtained multiple input tensors, and the data saved in the output tensors of the connection layer and multiple inputs
  • the data held in the tensors are exactly the same.
  • the connection operation of the connection layer will cause duplicate data to be stored in the memory, and also lead to unnecessary memory copy time.
  • the connection operation of the connection layer will occupy more memory space, and the memory copy time will become longer and longer, resulting in a long inference time and a slow inference speed of the neural network model, which is not conducive to real-time Data processing with high requirements.
  • connection layer is fused and duplicated data is deleted to reduce memory space occupation and memory copy time.
  • the traditional method is only suitable for simple connection operations, for example, the input tensors of the connection layer are all connected to the connection layer.
  • a tensor is used as the input of multiple operation layers, and there are operation layers that do not participate in the connection operation in the multiple operation layers. Incorrect inference results are obtained.
  • connection layer fusion still has the problem that the position of the optimized tensor in the output tensor of the last connection layer cannot be determined, and model inference cannot be performed.
  • the output tensor of one operation layer is connected with the output tensors of multiple operation layers at the same time, that is, the output tensor of this operation layer needs to be input into multiple connection layers, and the connection layer fusion is performed in the traditional way.
  • the output tensors of the operation layer will not be stored in multiple connection layers at the same time, which will lead to errors in the model inference results.
  • the computer device builds a new data structure template in advance, stores it, and obtains the pre-built data structure template when model inference is required, so as to optimize the data structure of the calculation graph according to the data structure template, which can be used for all situations.
  • the lower connection layer is accurately fused to ensure correct inference results, and at the same time, the memory space occupied by the inference process is reduced, the memory copy time is reduced, and the inference speed is accelerated.
  • a schematic diagram of a computation graph obtained by performing connection layer fusion on the computation graph shown in FIG. 3 in a traditional manner may be as shown in FIG. 4 .
  • the connection operation can be called a memory copy operation.
  • a schematic diagram of a computational graph including complex connection operations may be shown in FIG. 5 .
  • Layer1-layer7 are all operation layers
  • layer3, layer5 and layer7 are connection layers.
  • the cosumer layer pointed to by the dotted arrows of tensor1, tensor3 and tensor5 refers to the demand layer other than the connection layer corresponding to each tensor.
  • the pre-built data structure template may be the output structure template of the output tensor corresponding to the connection layer in the calculation graph.
  • the schematic diagram of the data structure template may be as shown in FIG. 6, including multiple template items, sub_producer_layers, sub_consumer_layers, sub_shapes, sub_producer_index_table, sub_consumer_index_table, and sub_consumer_stride_table.
  • sub_producer_layers is used to store the production layer of the output tensors corresponding to the connection layer, including the production layer of the input tensors that need to be stored, and sub_consumer_layers is used to store the demand layer of the output tensors corresponding to the connection layer, including the needs of the input tensors to be stored.
  • sub_shapes is used to store the shape of the output tensor corresponding to the connection layer, including the shape of the input tensor to be stored
  • sub_producer_index_table is used to store the index corresponding to each production layer in sub_producer_layers
  • sub_consumer_index_table is used to store the corresponding data of each demand layer in sub_consumer_layers
  • the index of the input tensor, sub_consumer_stride_table is used to store the size of the input tensor corresponding to each demand layer in sub_consumer_layers.
  • each operation layer includes multiple pointers to different input tensors and multiple pointers to different output tensors
  • the connection operation in the network structure of the neural network model is complex , if a tensor is used as the input of multiple operation layers, other operation layers except the connection layer do not participate in the connection operation.
  • the connection layer is optimized, other operation layers except the connection layer lose the original input tensors , the output tensor of the connection layer will be used as the output tensor of other operation layers except the connection layer, resulting in errors in the output results of other operation layers.
  • connection layer fusion is performed in the traditional way, not only the problem of wrong output results, but also the inability to determine the tensor1, tensor2, tensor4 and tensor6 after fusion.
  • the position of the tensor7 in the tensor7 caused the deleted connection layers layer3, layer5 and layer7, and the cosumer layer to be unable to determine the corresponding input tensor, resulting in errors in the inference results.
  • the computer device may optimize the data structure of the computation graph according to the above-mentioned pre-built data structure template. Specifically, the computer device may obtain the original data structure of the output tensor corresponding to each connection layer in the calculation graph according to the data structure template.
  • the original data structure can include the output tensor itself, as well as a pointer to the production layer and multiple pointers to different demand layers.
  • the computer equipment can optimize the original data structure according to the data structure template, and complete the data structure template according to the calculation graph and the output tensors corresponding to each connection layer in the calculation graph, so as to obtain the output tensors corresponding to each connection layer in the calculation graph.
  • the optimized data structure is the optimized data structure, and the target sub-data structure corresponding to the calculation graph is determined according to the generated optimized data structure.
  • the target sub-data structure can include the production layer, the demand layer, the shape of the output tensor corresponding to the connection layer, the index corresponding to each production layer, the input tensor index corresponding to each demand layer, and the size of the input tensor corresponding to each demand layer. .
  • the production layer and the demand layer of the output tensor corresponding to the connection layer are used to determine the original input tensor of the demand layer that does not participate in the connection operation, so as to ensure the correct output result.
  • the shape of the output tensor is used to ensure the connection made by the connection layer.
  • the index of the input tensor corresponding to each demand layer and the size of the input tensor corresponding to each demand layer are used to indicate the position of the input tensor in the output tensor. In the case of a continuous connection layer, each demand layer can be searched according to the position. the corresponding input tensor.
  • Step 208 Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer.
  • Step 210 Perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model.
  • the target sub-data structure refers to the optimized data structure that can correctly fuse the connection layers.
  • the computer device may determine the connection layer data to be optimized in the computation graph according to the target sub-data structure and the connection layer in the computation graph.
  • the connection layer data to be optimized may include connection layers in the computation graph and input tensors corresponding to the connection layers.
  • the computer device may perform optimization processing on the data of the connection layer to be optimized according to the target substructure data, and the optimization processing method may be to fuse the connection layer, that is, delete the data of the connection layer to be optimized.
  • performing optimization processing on the data of the connection layer to be optimized according to the target substructure data to obtain an optimized neural network model including: deleting the data of the connection layer to be optimized to obtain a calculation graph after deletion; according to the target data structure
  • the deleted computational graphs are connected to obtain the optimized neural network model.
  • the computer device can sequentially connect the deleted calculation graphs according to the operation layer sequence in the target data structure, so as to obtain the optimized calculation graph, and then obtain the optimized calculation graph according to the optimized calculation graph.
  • Neural network model By deleting the connection layer data to be optimized and connecting the deleted computation graphs, the memory space occupied by the inference process can be reduced, the memory copy time can be reduced, and the inference speed can be accelerated.
  • Step 212 inference is performed according to the optimized neural network model to obtain a model inference result.
  • the optimized neural network model reduces the connection operations in the inference process, reduces the memory occupation of repeated data in the inference process, and reduces the The memory copy time is reduced, thereby improving the model inference speed.
  • the computer equipment can perform inference according to the optimized neural network model, and perform operations in sequence according to the arithmetic operations of the operation layers corresponding to the optimized neural network model to obtain the inferred data results.
  • the computer device may sequentially perform operations on the input images according to the neural network model and in the sequence of arithmetic operations corresponding to the optimized neural network model, to obtain the recognized image results.
  • the neural network model inference task is acquired, the neural network model corresponding to the model identifier in the neural network model inference task is acquired, the neural network model is parsed, and a computation graph corresponding to the neural network model is obtained, and the computation graph includes connections layer, so as to obtain a pre-built data structure template, generate the target sub-data structure corresponding to the calculation graph according to the data structure template, determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer, and then according to the target sub-structure data
  • the data of the connection layer to be optimized is optimized to obtain an optimized neural network model, and inference is performed according to the optimized neural network model to obtain a model inference result.
  • the data structure of the computational graph can be optimized, so that the optimized target sub-data structure can correctly integrate the connection layers in all cases , so as to obtain the correct inference result, and at the same time reduce the memory space occupied by the inference process, reduce the memory copy time, and speed up the inference speed.
  • the step of generating the target sub-data structure corresponding to the computation graph according to the data structure template includes:
  • Step 702 traverse the operation layers in the calculation graph, and identify the connection layers in the operation layer.
  • Step 704 Generate an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph.
  • Step 706 Determine the target sub-data structure corresponding to the computation graph according to the optimized data structure of the output tensor corresponding to the connection layer.
  • the computation graph includes operation layers and tensors, the operation layers include connection layers, and the tensors can be input tensors or output tensors of the operation layers.
  • the computer device can traverse all the operation layers in the calculation graph, identify the connection layer in the operation layer, and when the connection layer is identified, obtain the corresponding output of the connection layer in the calculation graph according to multiple template items in the data structure template
  • the template item data of the tensor is added, and the obtained template item data is added to the corresponding template item, so as to generate the optimized data structure of the output tensor corresponding to the connection layer, until the output tensor corresponding to the last connection layer is generated.
  • optimized data structure refers to the data structure obtained by optimizing the original data structure of the output tensor.
  • the computer device may determine the target sub-data structure corresponding to the computation graph according to the generated optimized data structure of the output tensor corresponding to the connection layer.
  • the optimized data structure of the output tensor corresponding to the connection layer directly determines the target sub-data structure corresponding to the computation graph.
  • the computer device can determine the target sub-data structure corresponding to the computation graph from the optimized data structure of the output tensor corresponding to the last connection layer.
  • the computer device can determine the optimized data structures of the output tensors corresponding to the multiple connection layers.
  • the target sub-data structure includes the optimized data structure of the output tensors corresponding to the multiple connection layers.
  • the optimized data structure of the output tensor determines the target sub-data structure corresponding to the calculation graph, which can quickly obtain the data structure required for the correct fusion of the connection layer in all complex situations.
  • the above method further includes: topologically sorting the operation layers in the calculation graph to obtain a topological sequence; identifying whether each operation layer is a connection layer in turn according to the topological sequence; if it is not a connection layer, skipping Operation layer; when it is a connection layer, the optimized data structure of the output tensor corresponding to the connection layer is generated according to the data structure template and the calculation graph.
  • a computation graph can include multiple operation layers, and a computer device can topologically sort the multiple operation layers to obtain a topological sequence.
  • Topological sorting refers to arranging a sequence that satisfies the topological order according to the dependencies between the operation layers in the directed computational graph, and the sequence obtained by topological sorting is a one-dimensional linear sequence.
  • the computer device can first search for an operation layer with an in-degree of 0, that is, no input edge, in the calculation graph, store the operation layer in the stack, and delete the operation layer and the operation layer related to the operation layer from the calculation graph.
  • the directed edges adjust the in-degree of the operation layer of the deleted directed edge, such as subtracting 1 from the in-degree, and then repeat the steps of finding the operation layer whose in-degree is 0, and deleting and adjusting the operation layer until the calculation is performed.
  • All the operation layers in the figure have been saved to the stack, and all the operation layers in the stack can be output in turn according to the storage sequence between the operation layers, so as to obtain the topology sequence.
  • the arrangement order among the operation layers in the topology sequence may be determined according to the storage sequence of the operation layers.
  • the computer device can access each operation layer according to the arrangement order of each operation layer in the topology sequence, and identify whether the arithmetic operation corresponding to each operation layer is a connection operation.
  • Arithmetic operations refer to the data processing operations performed by each operation layer.
  • the operation layer is not a connection layer, and the computer device can directly skip the operation layer without performing the optimization step of the data structure.
  • the operation layer is a connection layer, and the optimization step of the data structure needs to be performed.
  • the computer device generates an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph.
  • the computer device can ensure that when inferring a certain operation layer, the arithmetic operation corresponding to the previous operation layer has been inferred, thereby improving the recognition accuracy of the connection layer. It is beneficial to quickly obtain the optimized data structure of the output tensor corresponding to the connection layer.
  • the step of generating the optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph includes:
  • Step 802 Obtain the current connection layer, and identify whether there is an optimized input tensor in the input tensor corresponding to the current connection layer.
  • Step 804 when it exists, obtain the optimized data structure of the optimized output tensor, and generate the optimized data structure of the output tensor corresponding to the current connection layer according to the optimized data structure of the optimized output tensor, the calculation graph and the data structure template , update the next connection layer to the current connection layer, return the step of identifying whether there is an optimized input tensor in the input tensor corresponding to the current connection layer, until the traversal is completed, and generate the optimization of the output tensor corresponding to the connection layer in the operation layer data structure.
  • Step 806 when there is no optimized input tensor in the input tensor corresponding to the current connection layer, extract the template data corresponding to the output tensor of the current connection layer in the calculation graph according to the data structure template, and add the extracted template data to In the data structure template, the optimized data structure of the output tensor corresponding to the current connection layer is obtained.
  • the current connection layer refers to the currently accessed connection layer.
  • the optimized input tensor means that the production layer of the input tensor is a connection layer, and the data structure of the input tensor is an optimized data structure generated according to the data structure template and the computation graph.
  • the computer device obtains the current connection layer, and identifies whether there is an optimized input tensor in the input tensor corresponding to the current connection layer. When it is recognized that there is an optimized input tensor in the input tensor corresponding to the current connection layer, it indicates that there is a continuous connection layer in the calculation graph, and the output tensor corresponding to the current connection layer can be generated according to the optimized data structure of the optimized input tensor optimized data structure. Specifically, since the optimized data structure of the input tensor is the optimized data structure, the computer device can obtain the optimized data structure of the optimized input tensor, and associate the optimized data structure of the optimized input tensor with the one corresponding to the current connection layer.
  • Unoptimized input tensors are combined.
  • the computer device obtains the template data corresponding to the output tensor of the current connection layer in the optimized data structure and calculation graph of the optimized input tensor according to the data structure template, and adds the obtained template data to the data structure template, thereby obtaining the current
  • the optimized data structure corresponding to the connection layer may include the production layer, the demand layer, the shape of the input tensor stored in the output tensor corresponding to the current connection layer, the index corresponding to each production layer, the index corresponding to each demand layer, and the corresponding demand layer.
  • the size of the corresponding input tensor may include the production layer, the demand layer, the shape of the input tensor stored in the output tensor corresponding to the current connection layer, the index corresponding to each production layer, the index corresponding to each demand layer, and the corresponding demand layer. The size of the corresponding input tensor.
  • the computer device can continue to identify whether the next operation layer is a connection layer, and when it is a connection layer, the next operation layer is used as the next connection layer, and the next connection layer is updated to the current connection layer, and the identification corresponding to the current connection layer is returned.
  • the generation efficiency of the optimized data structure can be improved to speed up the fusion speed of the connection layer, thereby improving the model inference speed.
  • the computer device when there are continuous connection layers in the calculation graph, that is, optimized input tensors exist in the input tensors corresponding to the second connection layer and subsequent connection layers, the computer device can optimize the data structure according to the above.
  • the output method generates the optimized data structure of the output tensor corresponding to each connection layer, and uses the optimized data structure of the output tensor corresponding to the last connection layer as the target sub-data structure corresponding to the calculation graph.
  • the computer device can directly extract template data corresponding to the output tensor of the current connection layer in the computation graph according to the data structure template.
  • the data structure template includes multiple template items, and the computer device can sequentially extract template item data corresponding to the output tensor of the current connection layer according to the multiple template items in the data structure template, and add the extracted template item data to the corresponding In the corresponding template item, the optimized data structure of the output tensor corresponding to the current connection layer is obtained.
  • An optimized data structure can be obtained that correctly fuses the connection layers.
  • template data corresponding to the output tensor of the current connection layer is extracted in the calculation graph according to the data structure template, and the extracted template data is correspondingly added to the data structure template to obtain the output tensor corresponding to the current connection layer
  • the optimized data structure includes: sequentially extracting the production layer, demand layer and dimension data corresponding to each input tensor in the calculation graph according to the data structure template; adding the extracted production layer, demand layer and dimension data to the data structure template; Establish the production layer index corresponding to the production layer in the production layer index table of the data structure template; establish the demand layer index corresponding to the demand layer in the demand layer data table of the data structure template, and count the input tensors corresponding to each demand layer. Size; connect the demand layers other than the connection layer in the demand layer corresponding to the input tensor to the demand layer data table to obtain the optimized data structure of the output tensor corresponding to the current connection layer.
  • the template data corresponding to the output tensor of the current connection layer includes the production layer, demand layer, shape of the output tensor of the current connection layer, the index corresponding to each production layer, the input tensor index corresponding to each demand layer, and the corresponding index of each demand layer.
  • the size of the input tensor During the fusion process of the connection layer, the input tensor of the connection layer needs to be deleted, and the input tensor of the connection layer needs to be saved, so it can only be saved in the output tensor of the connection layer.
  • the computer device extracts the production layer corresponding to each input tensor of the connection layer in the calculation graph as the production layer of the output tensor of the current connection layer.
  • the demand layer corresponding to the input tensor of each connection layer is extracted from the calculation graph as the demand layer of the output tensor of the current connection layer. Extract the shape corresponding to the input tensor of each connection layer in the computation graph as the shape of the output tensor of the current connection layer.
  • the computer device may add the extracted template data to the corresponding template item of the data structure template after extracting the template data.
  • the computer equipment can also establish a production layer index corresponding to each production layer in the production layer index table of the data structure template. The production layer index is used to distinguish multiple production layers, indicating the position of the production layer in the calculation graph, which can ensure that after optimization The accuracy of the position of each production layer in the neural network model.
  • the computer device establishes a requirement layer index corresponding to the requirement layer in the requirement layer data table of the data structure template, and counts the size of the input tensor corresponding to each requirement layer.
  • the demand layer index corresponding to the demand layer is used to determine where the demand layer needs to determine its own input tensor in the output tensor, indicating the position of each input tensor in the output tensor. After the connection layer is fused, it is convenient to find the input tensor corresponding to each demand layer according to the position.
  • the size of the input tensor corresponding to each requirement layer can be used to represent the number of basic tensors required by each requirement layer.
  • the computer equipment attaches the demand layers other than the connection layer in the demand layer corresponding to each input tensor of the connection layer in the demand layer data table, and the attached demand layer is the operation layer that does not participate in the connection operation, avoiding the optimization of the connection layer. After that, the problem of directly using the output tensor as the input tensor of the operation layer that does not participate in the connection operation improves the accuracy of the fusion of the connection layer.
  • Layer1-layer7 are operation layers
  • layer3, layer5, and layer7 are connection layers.
  • the consumer layer pointed to by the dotted arrows of tensor1, tensor3, and tensor5 refers to the demand layer other than the connection layer corresponding to each tensor.
  • the computer device can traverse each operation layer in the order of layer1-layer7. When layer1 and layer2 are identified, if it is identified that they are not connected layers, layer1 and layer2 are skipped.
  • layer3 is the connection layer, and the two input tensors tensor1 and tensor2 of layer3 are both unoptimized input tensors, then the schematic diagram of the steps to generate the optimized data structure of tensor3 can be shown in the following A1-F1:
  • step B1 tensor1's consumer layers represent the demand layer corresponding to tensor1, and tensor2's consumer layers represent the demand layer corresponding to tensor2.
  • the table in step E1 refers to the combined table of sub_consumer_index_table and sub_consumer_stride_table, which can be called the demand layer table.
  • tensor1's consumers represent the demand layer corresponding to tensor1.
  • the index of tensor1's consumers is 0, indicating that the demand layer corresponding to tensor1 can find the input tensor tensor1 at the 0th position of the output tensor.
  • the stride of tensor1's consumers is 0, indicating that the size of tensor1 is 1 unit.
  • the index of tensor2's consumers is 1, it means that the demand layer corresponding to tensor2 can find the input tensor tensor2 in the first position of the output tensor, and the stride of tensor2's consumers is 1, which means that the size of tensor2 is 1 unit.
  • the computer device continues to recognize layer4, and if it recognizes that it is not a connection layer, skips layer4.
  • layer5 is the connection layer
  • tensor3 in the input tensor of layer5 is the optimized input tensor
  • A2-F2 the schematic diagram of the steps to generate the optimized data structure of tensor5
  • C2.sub_consumer_layers [tensor1’s consumer layers, tensor2’s consumer layers, tensor3’s consumer layers, tensor4’s consumer layers]
  • D2.sub_shapes [tensor1's shape, tensor2's shape, tensor4's shape]
  • the computer device continues to recognize layer6, and if it recognizes that it is not a connection layer, skips layer6.
  • layer7 is the connection layer
  • tensor5 in the input tensor of layer7 is the optimized input tensor
  • the schematic diagram of the steps to generate the optimized data structure of tensor7 can be shown in the following A3-F3:
  • C3.sub_consumer_layers [tensor1’s consumer layers, tensor2’s consumer layers, tensor3’s consumer layers, tensor4’s consumer layers, tensor5’s consumer layers, tensor6’s consumer layers]
  • D3.sub_shapes [tensor1’s shape, tensor2’s shape, tensor4’s shape, tensor6’s shape]
  • the computer device uses the obtained optimized data structure of tensor7 as the target sub-data structure corresponding to the calculation graph.
  • determining the connection layer data to be optimized in the computation graph according to the target sub-data structure and the connection layer includes: determining the connection layer to be optimized in the connection layer according to the target sub-data structure; obtaining the connection layer to be optimized in the computation graph The input tensor corresponding to each connection layer in the connection layer; the connection layer to be optimized and the obtained input tensor are used as the connection layer data to be optimized.
  • the computer device determines the connection layer to be optimized in the connection layer according to the target sub-data structure, and the connection layer to be optimized is the connection layer to be deleted. Since the connection layer connects the input tensors of the connection layer, the data stored in the output tensor of the connection layer is exactly the same as the data stored in the input tensor, and the connection layer and the corresponding input tensors can be deleted. Therefore, the computer device obtains the input tensor corresponding to each connection layer in the connection layer to be optimized in the calculation graph, and uses the connection layer to be optimized and the obtained input tensor as the connection layer data to be optimized.
  • the computer equipment can determine the connection layers to be optimized as layer3, layer5 and layer7 according to the optimized data structure of tensor7, and obtain the input tensors corresponding to layer3, layer5 and layer7 in the calculation diagram respectively, and obtain tensor1 to tensor 6. Therefore, take layer3, layer5, layer7 and tensor 1 to tensor 6 as the connection layer data to be optimized, and delete the connection layer data to be optimized.
  • the obtained target sub-data structure is a data structure that can correctly fuse the connection layer
  • the data of the connection layer to be optimized is determined in the connection layer according to the target sub-data structure, which can improve the accuracy of the data of the connection layer to be optimized. , so that the connection layer is correctly fused according to the data of the connection layer to be optimized, and the model inference speed can also be improved after the connection layer is fused.
  • a neural network model inference apparatus including: a task acquisition module 902, a model analysis module 904, a structure generation module 906, a data determination module 908, a model optimization module 910 and Model inference module 912, where:
  • the task acquisition module 902 is configured to acquire a neural network model inference task, where the neural network model inference task includes a model identifier.
  • the model parsing module 904 is configured to obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, and the computation graph includes a connection layer.
  • the structure generation module 906 is configured to obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the computation graph according to the data structure template.
  • the data determination module 908 is configured to determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer.
  • the model optimization module 910 is configured to perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model.
  • the model inference module 912 is configured to perform inference according to the optimized neural network model to obtain a model inference result.
  • the computation graph includes an operation layer and a tensor
  • the operation layer includes a connection layer
  • the tensor is an input tensor or an output tensor of the operation layer.
  • the above-mentioned device further includes: an identification module for topologically sorting the operation layers in the calculation graph to obtain a topological sequence; sequentially identifying whether each operation layer is a connection layer according to the topological sequence; when it is not a connection layer When it is, the operation layer is skipped; when it is the connection layer, the optimized data structure of the output tensor corresponding to the connection layer is generated according to the data structure template and the calculation graph.
  • the structure generation module 906 is further configured to obtain the current connection layer, and identify whether there is an optimized input tensor in the input tensor corresponding to the current connection layer; if there is, obtain the optimized data of the optimized output tensor Structure, generate the optimized data structure of the output tensor corresponding to the current connection layer according to the optimized data structure, calculation graph and data structure template of the optimized output tensor, update the next connection layer to the current connection layer, and return to identify the current connection layer The steps of whether there are optimized input tensors in the corresponding input tensors until the optimized data structure of the output tensors corresponding to all connection layers in the operation layer is generated.
  • the structure generation module 906 is further configured to extract, according to the data structure template, the output tensor corresponding to the current connection layer in the calculation graph when there is no optimized input tensor in the input tensor corresponding to the current connection layer.
  • Template data the extracted template data is correspondingly added to the data structure template, and the optimized data structure of the output tensor corresponding to the current connection layer is obtained.
  • the structure generation module 906 is further configured to sequentially extract the production layer, the demand layer and the shape corresponding to each input tensor in the calculation graph according to the data structure template; add the extracted production layer, the demand layer and the shape to the data structure template; establish the production layer index corresponding to the production layer in the production layer index table of the data structure template; establish the demand layer index corresponding to the demand layer in the demand layer data table of the data structure template, and count each demand layer The size of the corresponding input tensor; connect the demand layers other than the connection layer in the demand layer corresponding to the input tensor to the demand layer data table to obtain the optimized data structure of the output tensor corresponding to the current connection layer.
  • the data determination module 908 is further configured to determine the connection layer to be optimized in the connection layer according to the target sub-data structure; obtain the input tensor corresponding to each connection layer in the connection layer to be optimized in the calculation graph ; Use the connection layer to be optimized and the obtained input tensor as the connection layer data to be optimized.
  • the model optimization module 910 is further configured to delete the connection layer data to be optimized to obtain a calculation graph after deletion; and connect the deleted calculation graphs according to the target data structure to obtain an optimized neural network model.
  • Each module in the above-mentioned neural network model inference apparatus may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device in one of the embodiments, the computer device may be a server, and the internal structure diagram thereof may be as shown in FIG. 10 .
  • the computer device includes a processor, memory, a communication interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used for storing data of a neural network model inference method.
  • the communication interface of the computer device is used to connect and communicate with an external terminal.
  • the computer readable instructions when executed by a processor, implement a neural network model inference method.
  • FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device comprising a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the one or more processors, makes the one or more processors execute the above methods to implement steps in the example.
  • One or more computer storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps in each of the foregoing method embodiments.
  • the computer storage medium is a readable storage medium, and the readable storage medium may be non-volatile or volatile.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

A neural network model inference method, comprising: obtaining a neural network model inference task, the neural network model inference task comprising a model identifier (202); obtaining a neural network model corresponding to the model identifier, and analyzing the neural network model to obtain a computation graph corresponding to the neural network model, the computation graph comprising a connection layer (204); obtaining a pre-constructed data structure template, and generating, according to the data structure template, a target sub data structure corresponding to the computation graph (206); according to the target sub data structure and the connection layer, determining, in the computation graph, connection layer data to be optimized (208); optimizing, according to target sub structural data, the connection layer data to be optimized, so as to obtain an optimized neural network model (210); and performing inference according to the optimized neural network model to obtain a model inference result (212).

Description

神经网络模型推理方法、装置、计算机设备和存储介质Neural network model inference method, device, computer equipment and storage medium 技术领域technical field
本申请涉及一种神经网络模型推理方法、装置、计算机设备和存储介质。The present application relates to a neural network model inference method, apparatus, computer equipment and storage medium.
背景技术Background technique
神经网络模型的推理是指将一个预先训练的神经网络模型部署到实际业务场景中,如图像分类、物体检测、在线翻译等,即输入数据至神经网络模型,并通过神经网络模型得到输出数据的过程。随着神经网络模型的网络结构变得更复杂,神经网络模型的推理需要耗费更多的时间。为了提高神经网络模型的推理速度,传统方式是通过对连接层进行融合,即将连接层对应的输入张量直接写入至连接层的输出张量中,删除输入张量以及该连接层,以减少内存空间占用,减少内存拷贝时间。The reasoning of the neural network model refers to deploying a pre-trained neural network model into actual business scenarios, such as image classification, object detection, online translation, etc., that is, input data to the neural network model, and obtain the output data through the neural network model. process. As the network structure of the neural network model becomes more complex, the inference of the neural network model takes more time. In order to improve the inference speed of the neural network model, the traditional method is to fuse the connection layer, that is, the input tensor corresponding to the connection layer is directly written into the output tensor of the connection layer, and the input tensor and the connection layer are deleted to reduce the memory space. Occupy, reduce memory copy time.
然而,发明人意识到,传统方式只适用于简单的连接层结构,随着神经网络模型的网络结构变得更复杂,连接层的连接操作也会变得复杂,如果采用传统方式进行连接层融合,会导致致使推理结果存在错误。因此,如何在提高推理速度的同时,得到正确的推理结果成为目前需要解决的技术问题。However, the inventor realized that the traditional method is only suitable for a simple connection layer structure. As the network structure of the neural network model becomes more complex, the connection operation of the connection layer will also become complicated. If the connection layer fusion is performed by the traditional method , which will lead to errors in the inference results. Therefore, how to obtain correct inference results while improving the inference speed has become a technical problem that needs to be solved at present.
发明内容SUMMARY OF THE INVENTION
根据本申请公开的各种实施例,提供一种神经网络模型推理方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a neural network model inference method, apparatus, computer device and storage medium are provided.
一种神经网络模型推理方法,包括:A neural network model inference method, comprising:
获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
一种神经网络模型推理装置,包括:A neural network model inference device, comprising:
任务获取模块,用于获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;a task acquisition module for acquiring a neural network model inference task, where the neural network model inference task includes a model identifier;
模型解析模块,用于获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;a model parsing module, configured to obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
结构生成模块,用于获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;a structure generation module, configured to obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the computation graph according to the data structure template;
数据确定模块,用于根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;a data determination module, configured to determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
模型优化模块,用于根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及A model optimization module for performing optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
模型推理模块,用于根据所述优化后的神经网络模型进行推理,得到模型推理结果。A model inference module, configured to perform inference according to the optimized neural network model to obtain a model inference result.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored therein, the computer-readable instructions, when executed by the processor, cause the one or more processors to execute The following steps:
获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
一个或多个存储有计算机可读指令的计算机存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features and advantages of the present application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
图1为一个或多个实施例中神经网络模型推理方法的应用环境图。FIG. 1 is an application environment diagram of a neural network model inference method in one or more embodiments.
图2为一个或多个实施例中神经网络模型推理方法的流程示意图。FIG. 2 is a schematic flowchart of a neural network model inference method in one or more embodiments.
图3为一个或多个实施例中计算图的示意图。3 is a schematic diagram of a computational graph in one or more embodiments.
图4为一个或多个实施例中对图3所示的计算图进行连接层融合后得到的计算图示意图。FIG. 4 is a schematic diagram of a computation graph obtained by performing connection layer fusion on the computation graph shown in FIG. 3 in one or more embodiments.
图5为一个或多个实施例中包含有复杂连接操作的计算图示意图。FIG. 5 is a schematic diagram of a computational graph including complex join operations in one or more embodiments.
图6为一个或多个实施例中数据结构模板的示意图。6 is a schematic diagram of a data structure template in one or more embodiments.
图7为一个或多个实施例中根据数据结构模板生成计算图对应的目标子数据结构步骤的流程示意图。FIG. 7 is a schematic flowchart of a step of generating a target sub-data structure corresponding to a computation graph according to a data structure template in one or more embodiments.
图8为一个或多个实施例中根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构步骤的流程示意图。FIG. 8 is a schematic flowchart of an optimized data structure step of generating an output tensor corresponding to a connection layer according to a data structure template and a calculation graph in one or more embodiments.
图9为一个或多个实施例中神经网络模型推理装置的框图。FIG. 9 is a block diagram of a neural network model inference apparatus in one or more embodiments.
图10为一个或多个实施例中计算机设备的框图。10 is a block diagram of a computer device in one or more embodiments.
具体实施方式Detailed ways
本申请提供的神经网络模型推理方法,可以应用于计算机设备中,计算机设备可以为终端或服务器。可以理解的是,本申请提供的神经网络模型推理方法可以应用于终端,也可以应用于服务器,还可以应用于包括终端和服务器的系统,并通过终端和服务器的交互实现。The neural network model inference method provided in this application can be applied to computer equipment, and the computer equipment can be a terminal or a server. It can be understood that the neural network model inference method provided by the present application can be applied to a terminal, a server, or a system including a terminal and a server, and is realized through interaction between the terminal and the server.
本申请提供的神经网络模型推理方法,可以应用于如图1所示的应用环境中。其中,终端102与服务器104通过网络进行通信。终端102可以获取模型推理任务,模型推理任务携带模型标识。终端102获取模型标识对应的神经网络模型,对神经网络模型进行解析,得到神经网络模型对应的计算图,计算图中包括连接层,从而获取预先构建的数据结构模板,根据数据结构模板生成计算图对应的目标子数据结构,根据目标子数据结构以及所述连接层在计算图中确定待优化连接层数据,根据目标子结构数据对待优化连接层数据进行优化处理,得到优化后的神经网络模型,进而根据优化后的神经网络模型进行推理,得到模型推理结果。终端102具体可以包括但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑。服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The neural network model inference method provided in this application can be applied to the application environment shown in FIG. 1 . The terminal 102 communicates with the server 104 through the network. The terminal 102 can acquire the model inference task, and the model inference task carries the model identifier. The terminal 102 obtains the neural network model corresponding to the model identifier, parses the neural network model, and obtains a calculation graph corresponding to the neural network model, and the calculation graph includes a connection layer, thereby obtaining a pre-built data structure template, and generating a calculation graph according to the data structure template. The corresponding target sub-data structure, according to the target sub-data structure and the connection layer, determine the connection layer data to be optimized in the calculation graph, and perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model, Then, inference is carried out according to the optimized neural network model, and the model inference result is obtained. The terminal 102 may specifically include, but is not limited to, various personal computers, notebook computers, smart phones, and tablet computers. The server 104 can be implemented by an independent server or a server cluster composed of multiple servers.
可以理解的,本申请提供的神经网络模型推理方法实现对神经网络模型进行推理,可以应用于多种应用环境,神经网络模型可以包括多种类型。例如,神经网络模型可以包括卷积神经网络模型、循环神经网络模型以及递归神经网络模型等。神经网络模型可以用于处理多种不同的数据。例如,神经网络模型具体可以包括图像识别模型、特征提取模型、语音识别模型、文本识别模型以及场景分类模型等。It can be understood that the neural network model inference method provided by the present application implements inference on the neural network model, can be applied to various application environments, and the neural network model can include various types. For example, the neural network model may include a convolutional neural network model, a recurrent neural network model, a recurrent neural network model, and the like. Neural network models can be used to process many different kinds of data. For example, the neural network model may specifically include an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, a scene classification model, and the like.
在其中一个实施例中,本申请提供的神经网络模型的推理方法具体可以应用于自动驾驶领域中,神经网络模型具体可以包括图像识别模型或轨迹预测模型等中的至少一种。In one of the embodiments, the reasoning method of the neural network model provided by the present application can be specifically applied to the field of automatic driving, and the neural network model can specifically include at least one of an image recognition model or a trajectory prediction model.
在其中一个实施例中,本申请提供的神经网络模型的推理方法具体可以应用于文本领域中,神经网络模型具体可以包括图像识别模型、行为预测模型或者风险评估模型等中的至少一种。In one embodiment, the reasoning method of the neural network model provided by the present application can be applied in the text field, and the neural network model can specifically include at least one of an image recognition model, a behavior prediction model, or a risk assessment model.
在其中一个实施例中,如图2所示,提供了一种神经网络模型推理方法,以该方法应用于计算机设备为例进行说明,该计算机设备可以是图1中的终端或服务器,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a neural network model inference method is provided, and the method is applied to a computer device as an example for illustration. The computer device may be the terminal or server in FIG. 1, including the following step:
步骤202,获取模型推理任务,模型推理任务携带模型标识。 Step 202, acquiring a model reasoning task, where the model reasoning task carries a model identifier.
模型推理是指根据神经网络模型的模型网络结构顺序,依次按照网络结构中包括的多个网络层各自对应的算术操作,对输入至神经网络模型中的数据进行运算,以此得到神经网络模型输出的推理结果。模型推理任务用于指示计算机设备对相应的神经网络模型进行推理。计算机设备可以为终端或服务器。模型标识是指对神经网络模型进行标记的唯一标识,用于对神经网络模型进行区分。Model inference refers to the operation of the data input into the neural network model according to the sequence of the model network structure of the neural network model and the corresponding arithmetic operations of the multiple network layers included in the network structure, so as to obtain the output of the neural network model. inference results. The model inference task is used to instruct the computer device to infer the corresponding neural network model. The computer equipment can be a terminal or a server. The model identifier refers to a unique identifier for marking the neural network model, which is used to distinguish the neural network model.
当需要进行模型推理时,计算机设备可以获取模型推理任务,对模型推理任务进行解析,从而得到模型推理任务中携带的模型标识。具体的,计算机设备可以在用户需要进行模型推理时,根据接收到的用户操作指令确定用户指定的神经网络模型,生成携带模型标识的模型推理任务。计算机设备也可以根据实际运行需求确定需要调用的神经网络模型,生成模型推理任务。例如,在图像识别的过程中,当需要调用图像识别模型对图像进行识别处理时,计算机设备可以生成模型推理任务,从而根据模型推理任务,在输入图像后对图像识别模型进行推理,得到图像识别模型输出的图像识别结果。When model inference needs to be performed, the computer device can acquire the model inference task, and analyze the model inference task, thereby obtaining the model identifier carried in the model inference task. Specifically, when the user needs to perform model inference, the computer device can determine the neural network model specified by the user according to the received user operation instruction, and generate a model inference task carrying the model identifier. The computer equipment can also determine the neural network model that needs to be called according to the actual operation requirements, and generate model inference tasks. For example, in the process of image recognition, when the image recognition model needs to be called to recognize and process the image, the computer device can generate a model inference task, so as to infer the image recognition model after inputting the image according to the model inference task, and obtain the image recognition The image recognition result output by the model.
在其中一个实施例中,计算机设备可以预先存储有推理引擎,计算机设备可以通过推理引擎执行模型推理任务,对模型标识对应的神经网络模型进行推理。推理引擎是指计算机设备中用于完成推理的功能模块。In one of the embodiments, the computer device may store an inference engine in advance, and the computer device may perform a model inference task through the inference engine, and perform inference on the neural network model corresponding to the model identifier. An inference engine refers to a functional module in a computer device that is used to complete inference.
步骤204,获取模型标识对应的神经网络模型,对神经网络模型进行解析,得到神经网络模型对应的计算图,计算图中包括连接层。Step 204: Obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, and the computation graph includes a connection layer.
计算机设备中预先存储有神经网络模型,神经网络模型是通过大量样本数据训练得到的,从而可以根据模型标识获取到相应的神经网络模型。模型标识对应的神经网络模型可以包括多种类型的神经网络模型中的至少一种。例如,根据神经网络模型的网络结构的不同,具体可以包括卷积神经网络模型(Convolutional Neural Networks,简称CNN)、循环神经网络模型(Recurrent Neural Network,简称RNN)以及递归神经网络模型等中的至少一种。根据神经网络模型的功能不同,神经网络模型具体可以包括图像识别模型、特征提取模型、语音识别模型、文本识别模型以及场景分类模型等中的至少一种。A neural network model is pre-stored in the computer device, and the neural network model is obtained by training a large amount of sample data, so that a corresponding neural network model can be obtained according to the model identifier. The neural network model corresponding to the model identification may include at least one of various types of neural network models. For example, depending on the network structure of the neural network model, it may specifically include at least one of a convolutional neural network model (Convolutional Neural Networks, CNN for short), a recurrent neural network model (Recurrent Neural Network, RNN for short), and a recurrent neural network model. A sort of. According to different functions of the neural network model, the neural network model may specifically include at least one of an image recognition model, a feature extraction model, a speech recognition model, a text recognition model, and a scene classification model.
计算机设备对获取到的神经网络模型进行解析,得到神经网络模型对应的计算图。计算图可以是神经网络模型在模型推理过程中的抽象图,可以包括多个操作层以及每个操作层对应的张量和操作层以及对应的张量之间的有向边。操作层可以用于表示神经网络模型的网络结构中的网络层,操作层可以用于确定相应的网络层所作的算术操作,例如,卷积操作、全连接操作、连接操作等。张量tensor是一种数据结构,可以理解为向量或数组矩阵。张量的形状可以用维度来表示,一维的张量可以称为向量,二维以上的张量可以称为数组矩阵。张量包括输入张量和输出张量,输入张量可以用于表示操作层对应的输入数据,输出张量可以用于表示操作层的输出数据。每个操作层中可以包括多个指向不同的输入张量和多个指向不同输出张量的指针,每个张量中可以包括一个指向生产层的指针和多个指向不同的需求层的指针。生产层是指通过哪些操作层计算得到该张量,需求层是指该张量作为哪些操作层的输入张量。通过操作层中的指针以及张量中的指针可以生成计算图中操作层与对应的张量之间的有向边。The computer device analyzes the acquired neural network model, and obtains a calculation graph corresponding to the neural network model. The computation graph can be an abstract graph of the neural network model in the model inference process, and can include multiple operation layers, tensors corresponding to each operation layer, and directed edges between the operation layers and corresponding tensors. The operation layer can be used to represent the network layer in the network structure of the neural network model, and the operation layer can be used to determine the arithmetic operation performed by the corresponding network layer, such as convolution operation, full connection operation, connection operation, etc. Tensor tensor is a data structure that can be understood as a vector or array matrix. The shape of a tensor can be represented by dimensions, a one-dimensional tensor can be called a vector, and a tensor with more than two dimensions can be called an array matrix. Tensors include input tensors and output tensors. The input tensors can be used to represent the input data corresponding to the operation layer, and the output tensors can be used to represent the output data of the operation layer. Each operation layer may include multiple pointers to different input tensors and multiple pointers to different output tensors, and each tensor may include one pointer to the production layer and multiple pointers to different demand layers. The production layer refers to which operation layers are used to obtain the tensor, and the demand layer refers to which operation layers the tensor is used as the input tensor. The directed edge between the operation layer and the corresponding tensor in the computation graph can be generated through the pointer in the operation layer and the pointer in the tensor.
计算机设备可以将连接操作对应的操作层称为连接层,连接层用于对获取到的多个输入张量进行拼接处理,连接层的输出张量中保存的数据和多个输入张量中保存的数据是完全相同的。The computer device can refer to the operation layer corresponding to the connection operation as the connection layer. The connection layer is used to splicing the obtained multiple input tensors. The data stored in the output tensors of the connection layer and the data stored in the multiple input tensors are completely different. identical.
在其中一个实施例中,计算图的示意图可以如图3所示,计算图包括操作层layer1、操作层layer2、操作层layer3以及每个操作层对应的输入张量和输出张量。将操作层layer1的输出张量tensor1和操作层layer2的输出张量tensor2作为layer3的输入,layer3对tensor1和tensor2作连接处理,得到输出张量tensor3。tensor1、tensor2和tensor3中的Shape表示张量的形状,可以用维度来进行表示。例如,tensor1中的Shape:1×3×40×40,从左到右依次表示第0个维度、第1个维度、第2个维度、第3个维度。layer3中的contact axis=1,表示将tensor1和tensor2在第1个维度进行连接。In one of the embodiments, a schematic diagram of the computation graph may be shown in FIG. 3 , and the computation graph includes an operation layer layer1, an operation layer layer2, an operation layer layer3, and input tensors and output tensors corresponding to each operation layer. The output tensor tensor1 of the operation layer layer1 and the output tensor tensor2 of the operation layer layer2 are used as the input of layer3, and layer3 connects tensor1 and tensor2 to obtain the output tensor tensor3. Shape in tensor1, tensor2 and tensor3 represents the shape of the tensor, which can be represented by dimensions. For example, Shape in tensor1: 1×3×40×40, from left to right, represents the 0th dimension, the 1st dimension, the 2nd dimension, and the 3rd dimension. The contact axis=1 in layer3 means that tensor1 and tensor2 are connected in the first dimension.
步骤206,获取预先构建的数据结构模板,根据数据结构模板生成计算图对应的目标子数据结构。Step 206: Obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the calculation graph according to the data structure template.
预先构建的数据结构模板是指对计算图进行优化所需的数据结构模板。数据结构模板是在传统计算图的数据结构的基础上构建的一个新的数据结构模板。目标子数据结构是指计算图中优化后的张量的数据结构。The pre-built data structure template refers to the data structure template required to optimize the computational graph. The data structure template is a new data structure template built on the basis of the data structure of the traditional computational graph. The target sub-data structure refers to the data structure of the optimized tensors in the computation graph.
在神经网络模型中,包括大量操作层,操作层中可能会存在连接层,连接层可以用于对获取到的多个输入张量进行拼接处理,连接层的输出张量中保存的数据和多个输入张量中保存的数据是完全相同的。而在神经网络模型的推理过程中,该连接层的连接操作会造成内存中存储有重复数据,同时也会导致不必要的内存拷贝时间。随着数据量的增大,连接层的连接操作会占用较多的内存空间,内存拷贝时间也会越来越长,致使神经网络模型的推理时间较长,推理速度较慢,不利于对实时性要求较高的数据处理过程。例如,在自动驾驶领域中,需要快速根据神经网络模型推理得到数据处理结果。传统方式为了加快推理速度,通过对连接层进行融合,删除重复数据,以减少内存空间占用,减少内存拷贝时间。然而传统方式只适用于简单的连接操作,如,连接层的输入张量全部输入连接层进行连接。当神经网络模型的网络结构中的连接操作较复杂时,如某一张量作为多个操作层的输入,多个操作层中存在不参与连接操作的操作层,采用传统方式进行连接层融合会得到错误的推理结果,再如,存在连续连接时,采用传统方式进行连接层融合还存在无法确定最后一个连接层的输出张量中被优化掉的张量的位置,无法进行模型推理的问题。又如,当一个操作层的输出张量同时与多个操作层的输出张量作连接操作时,即该操作层的输出张量需要输入至多个连接层中,采用传统方式进行连接层融合,多个连接层中不会同时存储有该操作层的输出张量,会导致模型推理结果存在错误。因此,计算机设备通过预先构建一个新的数据结构模板,进行存储,当需要进行模型推理时,获取预先构建的数据结构模板,从而根据数据结构模板对计算图的数据结构进行优化,能够对所有情况下的连接层进行准确地融合,确保得到正确的推理结果,同时减少了推理过程对内存空间的占用,减少了内存拷贝时间,加快了推理速度。In the neural network model, there are a large number of operation layers. There may be a connection layer in the operation layer. The connection layer can be used to splicing the obtained multiple input tensors, and the data saved in the output tensors of the connection layer and multiple inputs The data held in the tensors are exactly the same. In the inference process of the neural network model, the connection operation of the connection layer will cause duplicate data to be stored in the memory, and also lead to unnecessary memory copy time. As the amount of data increases, the connection operation of the connection layer will occupy more memory space, and the memory copy time will become longer and longer, resulting in a long inference time and a slow inference speed of the neural network model, which is not conducive to real-time Data processing with high requirements. For example, in the field of autonomous driving, it is necessary to quickly obtain data processing results based on neural network model inference. In order to speed up the inference speed in the traditional method, the connection layer is fused and duplicated data is deleted to reduce memory space occupation and memory copy time. However, the traditional method is only suitable for simple connection operations, for example, the input tensors of the connection layer are all connected to the connection layer. When the connection operation in the network structure of the neural network model is complex, for example, a tensor is used as the input of multiple operation layers, and there are operation layers that do not participate in the connection operation in the multiple operation layers. Incorrect inference results are obtained. For another example, when there are continuous connections, the traditional method of connection layer fusion still has the problem that the position of the optimized tensor in the output tensor of the last connection layer cannot be determined, and model inference cannot be performed. For another example, when the output tensor of one operation layer is connected with the output tensors of multiple operation layers at the same time, that is, the output tensor of this operation layer needs to be input into multiple connection layers, and the connection layer fusion is performed in the traditional way. The output tensors of the operation layer will not be stored in multiple connection layers at the same time, which will lead to errors in the model inference results. Therefore, the computer device builds a new data structure template in advance, stores it, and obtains the pre-built data structure template when model inference is required, so as to optimize the data structure of the calculation graph according to the data structure template, which can be used for all situations. The lower connection layer is accurately fused to ensure correct inference results, and at the same time, the memory space occupied by the inference process is reduced, the memory copy time is reduced, and the inference speed is accelerated.
在其中一个实施例中,采用传统方式对图3所示的计算图进行连接层融合后得到的计算图示意图可以如图4所示。由于作为连接层的tensor3中保存的内容和tensor1,tensor2中的 内容完全一致,连接操作可以称为内存拷贝的操作。通过将layer1和layer2的输出结果直接写入tensor3中,删除tensor1、tensor2以及layer3,能够少开辟两块内存空间tensor 1和tensor2,以及少做一次layer3中内存拷贝的操作。In one of the embodiments, a schematic diagram of a computation graph obtained by performing connection layer fusion on the computation graph shown in FIG. 3 in a traditional manner may be as shown in FIG. 4 . Since the content stored in tensor3 as the connection layer is exactly the same as that in tensor1 and tensor2, the connection operation can be called a memory copy operation. By directly writing the output results of layer1 and layer2 into tensor3, and deleting tensor1, tensor2 and layer3, two less memory spaces tensor1 and tensor2 can be opened up, and one less memory copy operation in layer3.
在其中一个实施例中,包含有复杂连接操作的计算图示意图可以如图5所示。layer1-layer7均为操作层,layer3、layer5和layer7为连接层,tensor1、tensor3和tensor5的虚线箭头指向的cosumer层是指每个tensor对应的处连接层以外的需求层。In one of the embodiments, a schematic diagram of a computational graph including complex connection operations may be shown in FIG. 5 . Layer1-layer7 are all operation layers, layer3, layer5 and layer7 are connection layers. The cosumer layer pointed to by the dotted arrows of tensor1, tensor3 and tensor5 refers to the demand layer other than the connection layer corresponding to each tensor.
在其中一个实施例中,预先构建的数据结构模板可以是计算图中连接层对应的输出张量的输出结构模板,数据结构模板的示意图可以如图6所示,包括多个模板项,sub_producer_layers、sub_consumer_layers、sub_shapes、sub_producer_index_table、sub_consumer_index_table和sub_consumer_stride_table。sub_producer_layers用于存储连接层对应的输出张量的生产层,包括需要存储的输入张量的生产层、sub_consumer_layers用于存储连接层对应的输出张量的需求层,包括需要存储的输入张量的需求层,sub_shapes用于存储连接层对应的输出张量的形状,包括需要存储的输入张量的形状,sub_producer_index_table用于存储sub_producer_layers中各生产层对应的索引,sub_consumer_index_table用于存储sub_consumer_layers中各需求层对应的输入张量的索引,sub_consumer_stride_table用于存储sub_consumer_layers中各需求层对应的输入张量的大小。In one embodiment, the pre-built data structure template may be the output structure template of the output tensor corresponding to the connection layer in the calculation graph. The schematic diagram of the data structure template may be as shown in FIG. 6, including multiple template items, sub_producer_layers, sub_consumer_layers, sub_shapes, sub_producer_index_table, sub_consumer_index_table, and sub_consumer_stride_table. sub_producer_layers is used to store the production layer of the output tensors corresponding to the connection layer, including the production layer of the input tensors that need to be stored, and sub_consumer_layers is used to store the demand layer of the output tensors corresponding to the connection layer, including the needs of the input tensors to be stored. Layer, sub_shapes is used to store the shape of the output tensor corresponding to the connection layer, including the shape of the input tensor to be stored, sub_producer_index_table is used to store the index corresponding to each production layer in sub_producer_layers, and sub_consumer_index_table is used to store the corresponding data of each demand layer in sub_consumer_layers The index of the input tensor, sub_consumer_stride_table is used to store the size of the input tensor corresponding to each demand layer in sub_consumer_layers.
由于获取到的计算图的数据结构中,每个操作层中包括多个指向不同的输入张量和多个指向不同输出张量的指针,当神经网络模型的网络结构中的连接操作较复杂时,如某一张量作为多个操作层的输入,除连接层以外的其他操作层不参与连接操作,在进行连接层优化后,除该连接层以外的其他操作层失去了原本的输入张量,会将该连接层的输出张量作为除该连接层以外的其他操作层的输出张量,从而导致其他操作层的输出结果存在错误。进一步的,当计算图为图5所示时,存在多个连续连接层,通过传统方式进行连接层融合,不仅存在输出结果错误的问题,还存在无法确定tensor1、tensor2、tensor4和tensor6在融合后的tensor7中的位置的问题,从而导致删除后的连接层layer3、layer5和layer7,以及cosumer层无法确定对应的输入张量,致使推理结果存在错误。Since in the data structure of the obtained calculation graph, each operation layer includes multiple pointers to different input tensors and multiple pointers to different output tensors, when the connection operation in the network structure of the neural network model is complex , if a tensor is used as the input of multiple operation layers, other operation layers except the connection layer do not participate in the connection operation. After the connection layer is optimized, other operation layers except the connection layer lose the original input tensors , the output tensor of the connection layer will be used as the output tensor of other operation layers except the connection layer, resulting in errors in the output results of other operation layers. Further, when the calculation graph is shown in Figure 5, there are multiple continuous connection layers, and the connection layer fusion is performed in the traditional way, not only the problem of wrong output results, but also the inability to determine the tensor1, tensor2, tensor4 and tensor6 after fusion. The position of the tensor7 in the tensor7 caused the deleted connection layers layer3, layer5 and layer7, and the cosumer layer to be unable to determine the corresponding input tensor, resulting in errors in the inference results.
计算机设备可以根据上述预先构建的数据结构模板对计算图的数据结构进行优化。具体的,计算机设备可以根据数据结构模板获取计算图中每个连接层对应的输出张量的原始数据结构。原始数据结构中可以包括输出张量本身、以及一个指向生产层的指针和多个指向不同的需求层的指针。计算机设备可以根据数据结构模板对原始数据结构进行优化,根据计算图以及计算图中每个连接层对应的输出张量将数据结构模板添加完整,从而得到计算图中各连接层对应的输出张量的优化数据结构,优化数据结构为优化后的数据结构,根据生成的优化数据结构确定计算图对应的目标子数据结构。目标子数据结构中可以包括连接层对应的输出张量的生产层、需求层、形状、各生产层对应的索引、各需求层对应的输入张量索引和各需求层对应的输入张量的大小。连接层可以是一个,也可以是多个。连接层对应的输出张量的生产层和需求层用于确定不参与连接操作的需求层原本的输入张量,确保得到正确的输出结果,输出张量的形状用于确保连接层所做的连接操作的正确性。各需求层对应的输入张量的 索引和各需求层对应的输入张量的大小用于表示输入张量在输出张量中的位置,能够在出现连续连接层的情况下,根据该位置查找各需求层对应的输入张量。The computer device may optimize the data structure of the computation graph according to the above-mentioned pre-built data structure template. Specifically, the computer device may obtain the original data structure of the output tensor corresponding to each connection layer in the calculation graph according to the data structure template. The original data structure can include the output tensor itself, as well as a pointer to the production layer and multiple pointers to different demand layers. The computer equipment can optimize the original data structure according to the data structure template, and complete the data structure template according to the calculation graph and the output tensors corresponding to each connection layer in the calculation graph, so as to obtain the output tensors corresponding to each connection layer in the calculation graph. The optimized data structure is the optimized data structure, and the target sub-data structure corresponding to the calculation graph is determined according to the generated optimized data structure. The target sub-data structure can include the production layer, the demand layer, the shape of the output tensor corresponding to the connection layer, the index corresponding to each production layer, the input tensor index corresponding to each demand layer, and the size of the input tensor corresponding to each demand layer. . There can be one or more connection layers. The production layer and the demand layer of the output tensor corresponding to the connection layer are used to determine the original input tensor of the demand layer that does not participate in the connection operation, so as to ensure the correct output result. The shape of the output tensor is used to ensure the connection made by the connection layer. correctness of operation. The index of the input tensor corresponding to each demand layer and the size of the input tensor corresponding to each demand layer are used to indicate the position of the input tensor in the output tensor. In the case of a continuous connection layer, each demand layer can be searched according to the position. the corresponding input tensor.
步骤208,根据目标子数据结构以及所述连接层在计算图中确定待优化连接层数据。Step 208: Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer.
步骤210,根据目标子结构数据对待优化连接层数据进行优化处理,得到优化后的神经网络模型。Step 210: Perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model.
目标子数据结构是指能够正确融合连接层的优化数据结构。The target sub-data structure refers to the optimized data structure that can correctly fuse the connection layers.
计算机设备可以根据目标子数据结构以及计算图中的连接层在计算图中确定待优化连接层数据。待优化连接层数据可以包括计算图中的连接层以及连接层对应的输入张量。计算机设备可以根据目标子结构数据对待优化连接层数据进行优化处理,优化处理的方式可以是将连接层进行融合,即将待优化连接层数据进行删除。The computer device may determine the connection layer data to be optimized in the computation graph according to the target sub-data structure and the connection layer in the computation graph. The connection layer data to be optimized may include connection layers in the computation graph and input tensors corresponding to the connection layers. The computer device may perform optimization processing on the data of the connection layer to be optimized according to the target substructure data, and the optimization processing method may be to fuse the connection layer, that is, delete the data of the connection layer to be optimized.
在其中一个实施例中,根据目标子结构数据对待优化连接层数据进行优化处理,得到优化后的神经网络模型;包括:删除待优化连接层数据,得到删除处理后的计算图;根据目标数据结构将删除后的计算图进行连接,得到优化后的神经网络模型。计算机设备在删除待优化连接层数据后,可以根据目标数据结构中的操作层顺序将删除后的计算图进行顺序连接,从而得到优化后的计算图,进而根据优化后的计算图得到优化后的神经网络模型。通过删除待优化连接层数据,将删除后的计算图进行连接,能够减少了推理过程对内存空间的占用,减少了内存拷贝时间,加快了推理速度。In one embodiment, performing optimization processing on the data of the connection layer to be optimized according to the target substructure data to obtain an optimized neural network model; including: deleting the data of the connection layer to be optimized to obtain a calculation graph after deletion; according to the target data structure The deleted computational graphs are connected to obtain the optimized neural network model. After deleting the data of the connection layer to be optimized, the computer device can sequentially connect the deleted calculation graphs according to the operation layer sequence in the target data structure, so as to obtain the optimized calculation graph, and then obtain the optimized calculation graph according to the optimized calculation graph. Neural network model. By deleting the connection layer data to be optimized and connecting the deleted computation graphs, the memory space occupied by the inference process can be reduced, the memory copy time can be reduced, and the inference speed can be accelerated.
步骤212,根据优化后的神经网络模型进行推理,得到模型推理结果。 Step 212, inference is performed according to the optimized neural network model to obtain a model inference result.
计算机设备在得到优化后的神经网络模型后,优化后的神经网络模型与传统的神经网络模型相比,减少了推理过程中的连接操作,减少了推理过程中重复数据对内存的占用,同时减少了内存拷贝时间,从而提高了模型推理速度。计算机设备可以根据优化后的神经网络模型进行推理,按照优化后的神经网络模型所对应的操作层的算术操作依次进行运算,得到推理出的数据结果。例如,计算机设备可以根据神经网络模型,按照优化后的神经网络模型所对应的算术操作顺序依次对输入的图像进行运算,得到识别出的图像结果。After the computer equipment obtains the optimized neural network model, compared with the traditional neural network model, the optimized neural network model reduces the connection operations in the inference process, reduces the memory occupation of repeated data in the inference process, and reduces the The memory copy time is reduced, thereby improving the model inference speed. The computer equipment can perform inference according to the optimized neural network model, and perform operations in sequence according to the arithmetic operations of the operation layers corresponding to the optimized neural network model to obtain the inferred data results. For example, the computer device may sequentially perform operations on the input images according to the neural network model and in the sequence of arithmetic operations corresponding to the optimized neural network model, to obtain the recognized image results.
在本实施例中,获取神经网络模型推理任务,获取神经网络模型推理任务中的模型标识对应的神经网络模型,对神经网络模型进行解析,得到神经网络模型对应的计算图,计算图中包括连接层,从而获取预先构建的数据结构模板,根据数据结构模板生成计算图对应的目标子数据结构,根据目标子数据结构以及连接层在计算图中确定待优化连接层数据,进而根据目标子结构数据对待优化连接层数据进行优化处理,得到优化后的神经网络模型,根据优化后的神经网络模型进行推理,得到模型推理结果。通过预先构建一个数据结构模板,并根据该模板生成计算图对应的目标子数据结构,能够实现对计算图的数据结构进行优化,使得优化后的目标子数据结构能够正确融合所有情况下的连接层,从而得到正确的推理结果,同时减少了推理过程对内存空间的占用,减少了内存拷贝时间,加快了推理速度。In this embodiment, the neural network model inference task is acquired, the neural network model corresponding to the model identifier in the neural network model inference task is acquired, the neural network model is parsed, and a computation graph corresponding to the neural network model is obtained, and the computation graph includes connections layer, so as to obtain a pre-built data structure template, generate the target sub-data structure corresponding to the calculation graph according to the data structure template, determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer, and then according to the target sub-structure data The data of the connection layer to be optimized is optimized to obtain an optimized neural network model, and inference is performed according to the optimized neural network model to obtain a model inference result. By constructing a data structure template in advance and generating the target sub-data structure corresponding to the computational graph according to the template, the data structure of the computational graph can be optimized, so that the optimized target sub-data structure can correctly integrate the connection layers in all cases , so as to obtain the correct inference result, and at the same time reduce the memory space occupied by the inference process, reduce the memory copy time, and speed up the inference speed.
在其中一个实施例中,如图7所示,根据数据结构模板生成计算图对应的目标子数据结构的步骤包括:In one of the embodiments, as shown in FIG. 7 , the step of generating the target sub-data structure corresponding to the computation graph according to the data structure template includes:
步骤702,对计算图中的操作层进行遍历,识别操作层中的连接层。 Step 702 , traverse the operation layers in the calculation graph, and identify the connection layers in the operation layer.
步骤704,根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构。Step 704: Generate an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph.
步骤706,根据连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构。Step 706: Determine the target sub-data structure corresponding to the computation graph according to the optimized data structure of the output tensor corresponding to the connection layer.
计算图包括操作层和张量,操作层包括连接层,张量可以是操作层的输入张量或输出张量。计算机设备可以对计算图中的所有操作层进行遍历,识别操作层中的连接层,当识别到连接层时,根据数据结构模板中的多个模板项在计算图中获取该连接层对应的输出张量的模板项数据,并将获取的每个模板项数据添加至对应的模板项中,从而生成该连接层对应的输出张量的优化数据结构,直至生成最后一个连接层对应的输出张量的优化数据结构。优化数据结构是指对输出张量的原始数据结构进行优化后得到的数据结构。The computation graph includes operation layers and tensors, the operation layers include connection layers, and the tensors can be input tensors or output tensors of the operation layers. The computer device can traverse all the operation layers in the calculation graph, identify the connection layer in the operation layer, and when the connection layer is identified, obtain the corresponding output of the connection layer in the calculation graph according to multiple template items in the data structure template The template item data of the tensor is added, and the obtained template item data is added to the corresponding template item, so as to generate the optimized data structure of the output tensor corresponding to the connection layer, until the output tensor corresponding to the last connection layer is generated. optimized data structure. The optimized data structure refers to the data structure obtained by optimizing the original data structure of the output tensor.
计算机设备可以根据生成的连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构。当计算图中只存在一个连接层时,直接将该连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构。当计算图中存在多个连续连接层时,如图5所示的计算图,计算机设备可以将最后一个连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构。当计算图中包括多个并列的连接层,即当一个操作层的输出张量同时需要输入至多个连接层中时,计算机设备可以将多个连接层对应的输出张量的优化数据结构均确定为计算图对应的目标子数据结构,此时,目标子数据结构中包括多个连接层对应的输出张量的优化数据结构。The computer device may determine the target sub-data structure corresponding to the computation graph according to the generated optimized data structure of the output tensor corresponding to the connection layer. When there is only one connection layer in the computation graph, the optimized data structure of the output tensor corresponding to the connection layer directly determines the target sub-data structure corresponding to the computation graph. When there are multiple consecutive connection layers in the computation graph, as shown in the computation graph in Figure 5, the computer device can determine the target sub-data structure corresponding to the computation graph from the optimized data structure of the output tensor corresponding to the last connection layer. When the computation graph includes multiple parallel connection layers, that is, when the output tensors of one operation layer need to be input into multiple connection layers at the same time, the computer device can determine the optimized data structures of the output tensors corresponding to the multiple connection layers. In order to calculate the target sub-data structure corresponding to the graph, at this time, the target sub-data structure includes the optimized data structure of the output tensors corresponding to the multiple connection layers.
在本实施例中,对计算图中的操作层进行遍历,识别操作层中的连接层,只需要根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构,根据连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构,能够快速得到所有复杂情况下,正确融合连接层所需的数据结构。In this embodiment, to traverse the operation layers in the calculation graph to identify the connection layers in the operation layer, it is only necessary to generate an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph. The optimized data structure of the output tensor determines the target sub-data structure corresponding to the calculation graph, which can quickly obtain the data structure required for the correct fusion of the connection layer in all complex situations.
在其中一个实施例中,上述方法还包括:将计算图中的操作层进行拓扑排序,得到拓扑序列;根据拓扑序列依次识别每个操作层是否为连接层;当不是连接层时,则跳过操作层;当是连接层时,则根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构。In one embodiment, the above method further includes: topologically sorting the operation layers in the calculation graph to obtain a topological sequence; identifying whether each operation layer is a connection layer in turn according to the topological sequence; if it is not a connection layer, skipping Operation layer; when it is a connection layer, the optimized data structure of the output tensor corresponding to the connection layer is generated according to the data structure template and the calculation graph.
计算图中可以包括多个操作层,计算机设备可以将多个操作层进行拓扑排序,得到拓扑序列。拓扑排序是指根据有向计算图中的操作层之间的依赖关系排成满足拓扑次序的序列,拓扑排序所得到的序列为一维的线性序列。具体的,计算机设备可以先在计算图中查找入度为0,即没有输入边的操作层,将该操作层存储至堆栈中,并从计算图中删除与该操作层及与该操作层相关联的有向边,调整被删除有向边的操作层的入度,如将入度减1,然后再重复查找入度为0的操作层以及对操作层进行删除、调整的步骤,直至计算图中所有的操作层均已保存至堆栈中,可以根据操作层之间的存储先后顺序依次输出堆栈中的所有操作层,从而得到拓扑序列。拓扑序列中各操作层之间的排列顺序可以是根据上述操作层之间的存储先后顺序确定的。计算机设备可以根据拓扑序列中各操作层的排列顺序对各操作层进行访问,识别每个操作层对应的算术操作是否为连接操作。算术操作是指每个操作层所作的数据处理操作。当不为连接操作时,则该操作层不是连接层,计算机设备可以直接跳过该操作层,不进行数据结构的优化步骤。当为连接操作时,则该操作层为连接层,需要进行数据结构的优 化步骤,计算机设备根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构。A computation graph can include multiple operation layers, and a computer device can topologically sort the multiple operation layers to obtain a topological sequence. Topological sorting refers to arranging a sequence that satisfies the topological order according to the dependencies between the operation layers in the directed computational graph, and the sequence obtained by topological sorting is a one-dimensional linear sequence. Specifically, the computer device can first search for an operation layer with an in-degree of 0, that is, no input edge, in the calculation graph, store the operation layer in the stack, and delete the operation layer and the operation layer related to the operation layer from the calculation graph. Connect the directed edges, adjust the in-degree of the operation layer of the deleted directed edge, such as subtracting 1 from the in-degree, and then repeat the steps of finding the operation layer whose in-degree is 0, and deleting and adjusting the operation layer until the calculation is performed. All the operation layers in the figure have been saved to the stack, and all the operation layers in the stack can be output in turn according to the storage sequence between the operation layers, so as to obtain the topology sequence. The arrangement order among the operation layers in the topology sequence may be determined according to the storage sequence of the operation layers. The computer device can access each operation layer according to the arrangement order of each operation layer in the topology sequence, and identify whether the arithmetic operation corresponding to each operation layer is a connection operation. Arithmetic operations refer to the data processing operations performed by each operation layer. When it is not a connection operation, the operation layer is not a connection layer, and the computer device can directly skip the operation layer without performing the optimization step of the data structure. When it is a connection operation, the operation layer is a connection layer, and the optimization step of the data structure needs to be performed. The computer device generates an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph.
在本实施例中,计算机设备通过将计算图中的操作层进行拓扑排序,能够确保对某个操作层进行推理时,上一个操作层对应的算术操作已推理完成,从而提高连接层的识别准确性,有利于快速得到连接层对应的输出张量的优化数据结构。In this embodiment, by topologically sorting the operation layers in the calculation graph, the computer device can ensure that when inferring a certain operation layer, the arithmetic operation corresponding to the previous operation layer has been inferred, thereby improving the recognition accuracy of the connection layer. It is beneficial to quickly obtain the optimized data structure of the output tensor corresponding to the connection layer.
在其中一个实施例中,如图8所示,根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构的步骤包括:In one embodiment, as shown in FIG. 8 , the step of generating the optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph includes:
步骤802,获取当前连接层,识别当前连接层对应的输入张量中是否存在已优化的输入张量。Step 802: Obtain the current connection layer, and identify whether there is an optimized input tensor in the input tensor corresponding to the current connection layer.
步骤804,当存在时,获取已优化的输出张量的优化数据结构,根据已优化的输出张量的优化数据结构、计算图以及数据结构模板生成当前连接层对应的输出张量的优化数据结构,将下一个连接层更新为当前连接层,返回识别当前连接层对应的输入张量中是否存在已优化的输入张量的步骤,直至遍历完成,生成操作层中的连接层对应的输出张量的优化数据结构。 Step 804, when it exists, obtain the optimized data structure of the optimized output tensor, and generate the optimized data structure of the output tensor corresponding to the current connection layer according to the optimized data structure of the optimized output tensor, the calculation graph and the data structure template , update the next connection layer to the current connection layer, return the step of identifying whether there is an optimized input tensor in the input tensor corresponding to the current connection layer, until the traversal is completed, and generate the optimization of the output tensor corresponding to the connection layer in the operation layer data structure.
步骤806,当当前连接层对应的输入张量中不存在已优化的输入张量时,根据数据结构模板在计算图中提取当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至数据结构模板中,得到当前连接层对应的输出张量的优化数据结构。 Step 806, when there is no optimized input tensor in the input tensor corresponding to the current connection layer, extract the template data corresponding to the output tensor of the current connection layer in the calculation graph according to the data structure template, and add the extracted template data to In the data structure template, the optimized data structure of the output tensor corresponding to the current connection layer is obtained.
当前连接层是指当前访问的连接层。已优化的输入张量是指该输入张量的生产层是连接层,且该输入张量的数据结构是根据数据结构模板以及所述计算图生成的优化数据结构。The current connection layer refers to the currently accessed connection layer. The optimized input tensor means that the production layer of the input tensor is a connection layer, and the data structure of the input tensor is an optimized data structure generated according to the data structure template and the computation graph.
计算机设备获取当前连接层,识别当前连接层对应的输入张量中是否存在已优化的输入张量。当识别到当前连接层对应的输入张量中存在已优化的输入张量时,表明计算图中存在连续连接层,可以根据已优化的输入张量的优化数据结构来生成当前连接层对应的输出张量的优化数据结构。具体的,由于已优化的输入张量的数据结构是优化数据结构,计算机设备可以获取已优化的输入张量的优化数据结构,将已优化的输入张量的优化数据结构与当前连接层对应的未优化的输入张量进行结合。计算机设备根据数据结构模板在已优化的输入张量的优化数据结构和计算图中获取当前连接层的输出张量对应的模板数据,将获取到的模板数据添加至数据结构模板中,从而得到当前连接层对应的优化数据结构。当前连接层对应的优化数据结构中可以包括当前连接层对应的输出张量中所存储的输入张量的生产层、需求层、形状、各生产层对应的索引、各需求层对应的索引、各需求层对应的输入张量的大小。计算机设备可以继续识别下一个操作层是否为连接层,当为连接层时,将下一个操作层作为下一个连接层,并将下一个连接层更新为当前连接层,返回识别当前连接层对应的输入张量中是否存在已优化的输入张量的步骤,直至生成操作层中所有连接层对应的输出张量的优化数据结构。通过将已优化的输入张量的优化数据结构与当前连接层对应的未优化的输入张量进行结合,能够提高优化数据结构的生成效率,以加快连接层的融合速度,从而提高模型推理速度。The computer device obtains the current connection layer, and identifies whether there is an optimized input tensor in the input tensor corresponding to the current connection layer. When it is recognized that there is an optimized input tensor in the input tensor corresponding to the current connection layer, it indicates that there is a continuous connection layer in the calculation graph, and the output tensor corresponding to the current connection layer can be generated according to the optimized data structure of the optimized input tensor optimized data structure. Specifically, since the optimized data structure of the input tensor is the optimized data structure, the computer device can obtain the optimized data structure of the optimized input tensor, and associate the optimized data structure of the optimized input tensor with the one corresponding to the current connection layer. Unoptimized input tensors are combined. The computer device obtains the template data corresponding to the output tensor of the current connection layer in the optimized data structure and calculation graph of the optimized input tensor according to the data structure template, and adds the obtained template data to the data structure template, thereby obtaining the current The optimized data structure corresponding to the connection layer. The optimized data structure corresponding to the current connection layer may include the production layer, the demand layer, the shape of the input tensor stored in the output tensor corresponding to the current connection layer, the index corresponding to each production layer, the index corresponding to each demand layer, and the corresponding demand layer. The size of the corresponding input tensor. The computer device can continue to identify whether the next operation layer is a connection layer, and when it is a connection layer, the next operation layer is used as the next connection layer, and the next connection layer is updated to the current connection layer, and the identification corresponding to the current connection layer is returned. The step of whether there are optimized input tensors in the input tensors, until the optimized data structure of the output tensors corresponding to all the connection layers in the operation layer is generated. By combining the optimized data structure of the optimized input tensor with the unoptimized input tensor corresponding to the current connection layer, the generation efficiency of the optimized data structure can be improved to speed up the fusion speed of the connection layer, thereby improving the model inference speed.
在其中一个实施例中,当计算图中存在连续连接层时,即第二个连接层以及之后的连接层对应的输入张量中均存在已优化的输入张量,计算机设备可以按照上述优化数据结构的输 出方式生成每个连接层对应的输出张量的优化数据结构,将最后一个连接层对应的输出张量的优化数据结构作为计算图对应的目标子数据结构。In one of the embodiments, when there are continuous connection layers in the calculation graph, that is, optimized input tensors exist in the input tensors corresponding to the second connection layer and subsequent connection layers, the computer device can optimize the data structure according to the above. The output method generates the optimized data structure of the output tensor corresponding to each connection layer, and uses the optimized data structure of the output tensor corresponding to the last connection layer as the target sub-data structure corresponding to the calculation graph.
当识别到当前连接层对应的输入张量中不存在已优化的输入张量时,表明当前连接层对应的输入张量为基本张量。计算机设备可以直接根据数据结构模板在计算图中提取当前连接层的输出张量对应的模板数据。具体的,数据结构模板中包括多个模板项,计算机设备可以根据数据结构模板中的多个模板项依次提取当前连接层的输出张量对应的模板项数据,将提取的模板项数据对应添加至对应的模板项中,得到当前连接层对应的输出张量的优化数据结构。能够得到正确融合连接层的优化数据结构。When it is recognized that there is no optimized input tensor in the input tensor corresponding to the current connection layer, it indicates that the input tensor corresponding to the current connection layer is a basic tensor. The computer device can directly extract template data corresponding to the output tensor of the current connection layer in the computation graph according to the data structure template. Specifically, the data structure template includes multiple template items, and the computer device can sequentially extract template item data corresponding to the output tensor of the current connection layer according to the multiple template items in the data structure template, and add the extracted template item data to the corresponding In the corresponding template item, the optimized data structure of the output tensor corresponding to the current connection layer is obtained. An optimized data structure can be obtained that correctly fuses the connection layers.
在其中一个实施例中,根据数据结构模板在计算图中提取当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至数据结构模板中,得到当前连接层对应的输出张量的优化数据结构包括:根据数据结构模板在计算图中依次提取每个输入张量对应的生产层、需求层以及维度数据;将提取的生产层、需求层以及维度数据添加至数据结构模板中;在数据结构模板的生产层索引表中建立生产层对应的生产层索引;在数据结构模板的需求层数据表中建立需求层对应的需求层索引,以及统计每个需求层对应的输入张量的大小;将输入张量对应的需求层中连接层以外的需求层挂接至需求层数据表中,得到当前连接层对应的输出张量的优化数据结构。In one embodiment, template data corresponding to the output tensor of the current connection layer is extracted in the calculation graph according to the data structure template, and the extracted template data is correspondingly added to the data structure template to obtain the output tensor corresponding to the current connection layer The optimized data structure includes: sequentially extracting the production layer, demand layer and dimension data corresponding to each input tensor in the calculation graph according to the data structure template; adding the extracted production layer, demand layer and dimension data to the data structure template; Establish the production layer index corresponding to the production layer in the production layer index table of the data structure template; establish the demand layer index corresponding to the demand layer in the demand layer data table of the data structure template, and count the input tensors corresponding to each demand layer. Size; connect the demand layers other than the connection layer in the demand layer corresponding to the input tensor to the demand layer data table to obtain the optimized data structure of the output tensor corresponding to the current connection layer.
当前连接层的输出张量对应的模板数据包括当前连接层的输出张量的生产层、需求层、形状、各生产层对应的索引、各需求层对应的输入张量索引和各需求层对应的输入张量的大小。在连接层的融合过程中需要将连接层的输入张量进行删除,需要将连接层的输入张量进行保存,因此只能保存在连接层的输出张量中。具体的,计算机设备在计算图中提取连接层的每个输入张量对应的生产层,作为当前连接层的输出张量的生产层。在计算图中提取每个连接层的输入张量对应的需求层,作为当前连接层的输出张量的需求层。在计算图中提取每个连接层的输入张量对应的形状,作为当前连接层的输出张量的形状。计算机设备可以在提取模板数据之后,将提取的模板数据添加至数据结构模板的相应模板项中。计算机设备还可以在数据结构模板的生产层索引表中建立每个生产层对应的生产层索引,生产层索引用于区分多个生产层,表示生产层在计算图中的位置,能够确保优化后的神经网络模型中各生产层位置的准确性。计算机设备在数据结构模板的需求层数据表中建立需求层对应的需求层索引,以及统计每个需求层对应的输入张量的大小。需求层对应的需求层索引用于确定需求层需要在输出张量的哪个位置确定自身的输入张量,表示每个输入张量在输出张量中的位置。能够在连接层融合后,便于根据该位置查找各需求层对应的输入张量。每个需求层对应的输入张量的大小可以用于表示每个需求层需要的基本张量的数量。计算机设备将连接层的每个输入张量对应的需求层中连接层以外的需求层挂接在需求层数据表中,挂接的需求层为不参与连接操作的操作层,避免了连接层优化后,直接将输出张量作为不参与连接操作的操作层的输入张量的问题,提高了连接层融合的正确性。The template data corresponding to the output tensor of the current connection layer includes the production layer, demand layer, shape of the output tensor of the current connection layer, the index corresponding to each production layer, the input tensor index corresponding to each demand layer, and the corresponding index of each demand layer. The size of the input tensor. During the fusion process of the connection layer, the input tensor of the connection layer needs to be deleted, and the input tensor of the connection layer needs to be saved, so it can only be saved in the output tensor of the connection layer. Specifically, the computer device extracts the production layer corresponding to each input tensor of the connection layer in the calculation graph as the production layer of the output tensor of the current connection layer. The demand layer corresponding to the input tensor of each connection layer is extracted from the calculation graph as the demand layer of the output tensor of the current connection layer. Extract the shape corresponding to the input tensor of each connection layer in the computation graph as the shape of the output tensor of the current connection layer. The computer device may add the extracted template data to the corresponding template item of the data structure template after extracting the template data. The computer equipment can also establish a production layer index corresponding to each production layer in the production layer index table of the data structure template. The production layer index is used to distinguish multiple production layers, indicating the position of the production layer in the calculation graph, which can ensure that after optimization The accuracy of the position of each production layer in the neural network model. The computer device establishes a requirement layer index corresponding to the requirement layer in the requirement layer data table of the data structure template, and counts the size of the input tensor corresponding to each requirement layer. The demand layer index corresponding to the demand layer is used to determine where the demand layer needs to determine its own input tensor in the output tensor, indicating the position of each input tensor in the output tensor. After the connection layer is fused, it is convenient to find the input tensor corresponding to each demand layer according to the position. The size of the input tensor corresponding to each requirement layer can be used to represent the number of basic tensors required by each requirement layer. The computer equipment attaches the demand layers other than the connection layer in the demand layer corresponding to each input tensor of the connection layer in the demand layer data table, and the attached demand layer is the operation layer that does not participate in the connection operation, avoiding the optimization of the connection layer. After that, the problem of directly using the output tensor as the input tensor of the operation layer that does not participate in the connection operation improves the accuracy of the fusion of the connection layer.
以图5中的计算图为例对生成计算图对应的目标子数据结构的过程进行说明。layer1-layer7均为操作层,layer3、layer5和layer7为连接层,tensor1、tensor3和tensor5的虚 线箭头指向的cosumer层是指每个tensor对应的处连接层以外的需求层。计算机设备可以按照layer1-layer7的顺序对各操作层进行遍历,当识别layer1和layer2时,识别到不是连接层,则跳过layer1和layer2。继续识别layer3,layer3是连接层,并且layer3的两个输入张量tensor1和tensor2均为未优化的输入张量,则生成tensor3的优化数据结构的步骤示意图可以如下A1-F1所示:Taking the calculation graph in FIG. 5 as an example, the process of generating the target sub-data structure corresponding to the calculation graph will be described. Layer1-layer7 are operation layers, layer3, layer5, and layer7 are connection layers. The consumer layer pointed to by the dotted arrows of tensor1, tensor3, and tensor5 refers to the demand layer other than the connection layer corresponding to each tensor. The computer device can traverse each operation layer in the order of layer1-layer7. When layer1 and layer2 are identified, if it is identified that they are not connected layers, layer1 and layer2 are skipped. Continue to identify layer3, layer3 is the connection layer, and the two input tensors tensor1 and tensor2 of layer3 are both unoptimized input tensors, then the schematic diagram of the steps to generate the optimized data structure of tensor3 can be shown in the following A1-F1:
A1.sub_producer_layers=[layer1,layer2]A1.sub_producer_layers=[layer1, layer2]
B1.sub_consumer_layers=[tensor1’s consumer layers,tensor 2’s consumer layers]B1.sub_consumer_layers=[tensor1’s consumer layers, tensor 2’s consumer layers]
C1.sub_shapes=[tensor1’s shape,tensor2’s shape]C1.sub_shapes=[tensor1’s shape, tensor2’s shape]
D1.sub_producer_index_table:D1.sub_producer_index_table:
sub producer namesub producer name layer1layer1 layer2layer2
indexindex 00 11
E1.sub_consumer_index_table和sub_consumer_stride_table:E1.sub_consumer_index_table and sub_consumer_stride_table:
sub consumer namesub consumer name tensor1's consumerstensor1's consumers tensor2's consumerstensor2's consumers
indexindex 00 11
stridestride 11 11
F1.将tensor1和tensor2对应的需求层中除了连接层layer3以外的其他consumer挂接到表e之中。F1. Hook other consumers in the demand layers corresponding to tensor1 and tensor2 except the connection layer layer3 to table e.
步骤B1中,tensor1’s consumer layers表示tensor1对应的需求层,tensor2’s consumer layers表示tensor2对应的需求层,步骤E1中的表格是指sub_consumer_index_table和sub_consumer_stride_table合并后的表格,可以称为需求层表格,需求层表格中tensor1's consumers表示tensor1对应的需求层,tensor1's consumers的index为0表示tensor1对应的需求层可以在输出张量的第0个位置查找输入张量tensor1,tensor1's consumers的stride为0表示tensor1的大小为1个单位。同理,tensor2's consumers的index为1表示tensor2对应的需求层可以在输出张量的第1个位置查找输入张量tensor2,tensor2's consumers的stride为1表示tensor2的大小为1个单位。In step B1, tensor1's consumer layers represent the demand layer corresponding to tensor1, and tensor2's consumer layers represent the demand layer corresponding to tensor2. The table in step E1 refers to the combined table of sub_consumer_index_table and sub_consumer_stride_table, which can be called the demand layer table. tensor1's consumers represent the demand layer corresponding to tensor1. The index of tensor1's consumers is 0, indicating that the demand layer corresponding to tensor1 can find the input tensor tensor1 at the 0th position of the output tensor. The stride of tensor1's consumers is 0, indicating that the size of tensor1 is 1 unit. Similarly, if the index of tensor2's consumers is 1, it means that the demand layer corresponding to tensor2 can find the input tensor tensor2 in the first position of the output tensor, and the stride of tensor2's consumers is 1, which means that the size of tensor2 is 1 unit.
计算机设备继续识别layer4,识别到不是连接层,则跳过layer4。继续识别layer5,layer5是连接层,并且layer5的输入张量中tensor3为已优化的输入张量,则生成tensor5的优化数据结构的步骤示意图可以如下A2-F2所示:The computer device continues to recognize layer4, and if it recognizes that it is not a connection layer, skips layer4. Continue to identify layer5, layer5 is the connection layer, and tensor3 in the input tensor of layer5 is the optimized input tensor, then the schematic diagram of the steps to generate the optimized data structure of tensor5 can be shown in the following A2-F2:
A2.获取tensor3的优化数据结构,将tensor3的优化数据结构与tensor4进行结合A2. Obtain the optimized data structure of tensor3 and combine the optimized data structure of tensor3 with tensor4
B2.sub_producer_layers=[layer1,layer2,layer4]B2.sub_producer_layers=[layer1, layer2, layer4]
C2.sub_consumer_layers=[tensor1’s consumer layers,tensor2’s consumer layers,tensor3’s consumer layers,tensor4’s consumer layers]C2.sub_consumer_layers=[tensor1’s consumer layers, tensor2’s consumer layers, tensor3’s consumer layers, tensor4’s consumer layers]
D2.sub_shapes=[tensor1’s shape,tensor2’s shape,tensor4’s shape]D2.sub_shapes=[tensor1's shape, tensor2's shape, tensor4's shape]
E2.sub_producer_index_table:E2.sub_producer_index_table:
Figure PCTCN2021086552-appb-000001
Figure PCTCN2021086552-appb-000001
F2.sub_consumer_index_table和sub_consumer_stride_table:F2.sub_consumer_index_table and sub_consumer_stride_table:
Figure PCTCN2021086552-appb-000002
Figure PCTCN2021086552-appb-000002
G2.将tensor1至tensor4对应的需求层中除了连接层layer3以外的其他consumer挂接到表e之中。G2. Hook other consumers in the demand layers corresponding to tensor1 to tensor4 except the connection layer layer3 into table e.
计算机设备继续识别layer6,识别到不是连接层,则跳过layer6。继续识别layer7,layer7是连接层,并且layer7的输入张量中tensor5为已优化的输入张量,则生成tensor7的优化数据结构的步骤示意图可以如下A3-F3所示:The computer device continues to recognize layer6, and if it recognizes that it is not a connection layer, skips layer6. Continue to identify layer7, layer7 is the connection layer, and tensor5 in the input tensor of layer7 is the optimized input tensor, then the schematic diagram of the steps to generate the optimized data structure of tensor7 can be shown in the following A3-F3:
A3.获取tensor5的优化数据结构,将tensor5的优化数据结构与tensor6进行结合A3. Obtain the optimized data structure of tensor5, and combine the optimized data structure of tensor5 with tensor6
B3.sub_producer_layers=[layer1,layer2,layer4,layer6]B3.sub_producer_layers=[layer1, layer2, layer4, layer6]
C3.sub_consumer_layers=[tensor1’s consumer layers,tensor2’s consumer layers,tensor3’s consumer layers,tensor4’s consumer layers,tensor5’s consumer layers,tensor6’s consumer layers]C3.sub_consumer_layers=[tensor1’s consumer layers, tensor2’s consumer layers, tensor3’s consumer layers, tensor4’s consumer layers, tensor5’s consumer layers, tensor6’s consumer layers]
D3.sub_shapes=[tensor1’s shape,tensor2’s shape,tensor4’s shape,tensor6’s shape]D3.sub_shapes=[tensor1’s shape, tensor2’s shape, tensor4’s shape, tensor6’s shape]
E3.sub_producer_index_table:E3.sub_producer_index_table:
Figure PCTCN2021086552-appb-000003
Figure PCTCN2021086552-appb-000003
F3.sub_consumer_index_table:F3.sub_consumer_index_table:
Figure PCTCN2021086552-appb-000004
Figure PCTCN2021086552-appb-000004
G3.sub_consumer_stride_table:G3.sub_consumer_stride_table:
Figure PCTCN2021086552-appb-000005
Figure PCTCN2021086552-appb-000005
H3.将tensor 1至tensor 6对应的需求层中除了连接层layer3和layer5以外的其他consumer挂接到sub_consumer_index_table和sub_consumer_stride_table中。H3. Attach other consumers in the demand layers corresponding to tensor 1 to tensor 6 except the connection layers layer3 and layer5 to sub_consumer_index_table and sub_consumer_stride_table.
计算机设备将得到的tensor7的优化数据结构作为计算图对应的目标子数据结构。The computer device uses the obtained optimized data structure of tensor7 as the target sub-data structure corresponding to the calculation graph.
在其中一个实施例中,根据目标子数据结构以及连接层在计算图中确定待优化连接层数据包括:根据目标子数据结构在连接层中确定待优化的连接层;在计算图中获取待优化的连接层中每个连接层对应的输入张量;将待优化的连接层以及获取的输入张量作为待优化连接 层数据。In one embodiment, determining the connection layer data to be optimized in the computation graph according to the target sub-data structure and the connection layer includes: determining the connection layer to be optimized in the connection layer according to the target sub-data structure; obtaining the connection layer to be optimized in the computation graph The input tensor corresponding to each connection layer in the connection layer; the connection layer to be optimized and the obtained input tensor are used as the connection layer data to be optimized.
计算机设备根据目标子数据结构在连接层中确定待优化的连接层,待优化的连接层为需要删除的连接层。由于连接层是将连接层的输入张量进行连接,因此连接层的输出张量中存储的数据与输入张量中存储的数据完全相同,可以将连接层以及对应的输入张量全部删除。因此,计算机设备在计算图中获取待优化的连接层中每个连接层对应的输入张量,将待优化的连接层以及获取的输入张量作为待优化连接层数据。The computer device determines the connection layer to be optimized in the connection layer according to the target sub-data structure, and the connection layer to be optimized is the connection layer to be deleted. Since the connection layer connects the input tensors of the connection layer, the data stored in the output tensor of the connection layer is exactly the same as the data stored in the input tensor, and the connection layer and the corresponding input tensors can be deleted. Therefore, the computer device obtains the input tensor corresponding to each connection layer in the connection layer to be optimized in the calculation graph, and uses the connection layer to be optimized and the obtained input tensor as the connection layer data to be optimized.
以图7为例进行说明,计算机设备可以根据tensor7的优化数据结构确定待优化的连接层为layer3、layer5和layer7,在计算图中分别获取layer3、layer5和layer7对应的输入张量,得到tensor 1至tensor 6。因此,将layer3、layer5、layer7和tensor 1至tensor 6作为待优化连接层数据,删除待优化连接层数据。Taking Figure 7 as an example to illustrate, the computer equipment can determine the connection layers to be optimized as layer3, layer5 and layer7 according to the optimized data structure of tensor7, and obtain the input tensors corresponding to layer3, layer5 and layer7 in the calculation diagram respectively, and obtain tensor1 to tensor 6. Therefore, take layer3, layer5, layer7 and tensor 1 to tensor 6 as the connection layer data to be optimized, and delete the connection layer data to be optimized.
在本实施例中,由于获取到的目标子数据结构是能够正确融合连接层的数据结构,根据目标子数据结构在连接层中确定待优化连接层数据,能够提高待优化连接层数据的准确性,从而根据待优化连接层数据正确地融合连接层,同时融合连接层后,还可以提高模型推理速度。In this embodiment, since the obtained target sub-data structure is a data structure that can correctly fuse the connection layer, the data of the connection layer to be optimized is determined in the connection layer according to the target sub-data structure, which can improve the accuracy of the data of the connection layer to be optimized. , so that the connection layer is correctly fused according to the data of the connection layer to be optimized, and the model inference speed can also be improved after the connection layer is fused.
在其中一个实施例中,如图9所示,提供了一种神经网络模型推理装置,包括:任务获取模块902、模型解析模块904、结构生成模块906、数据确定模块908、模型优化模块910和模型推理模块912,其中:In one embodiment, as shown in FIG. 9, a neural network model inference apparatus is provided, including: a task acquisition module 902, a model analysis module 904, a structure generation module 906, a data determination module 908, a model optimization module 910 and Model inference module 912, where:
任务获取模块902,用于获取神经网络模型推理任务,神经网络模型推理任务包括模型标识。The task acquisition module 902 is configured to acquire a neural network model inference task, where the neural network model inference task includes a model identifier.
模型解析模块904,用于获取模型标识对应的神经网络模型,对神经网络模型进行解析,得到神经网络模型对应的计算图,计算图中包括连接层。The model parsing module 904 is configured to obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, and the computation graph includes a connection layer.
结构生成模块906,用于获取预先构建的数据结构模板,根据数据结构模板生成计算图对应的目标子数据结构。The structure generation module 906 is configured to obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the computation graph according to the data structure template.
数据确定模块908,用于根据目标子数据结构以及连接层在计算图中确定待优化连接层数据。The data determination module 908 is configured to determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer.
模型优化模块910,用于根据目标子结构数据对待优化连接层数据进行优化处理,得到优化后的神经网络模型。The model optimization module 910 is configured to perform optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model.
模型推理模块912,用于根据优化后的神经网络模型进行推理,得到模型推理结果。The model inference module 912 is configured to perform inference according to the optimized neural network model to obtain a model inference result.
在其中一个实施例中,计算图包括操作层和张量,操作层包括连接层,张量为操作层的输入张量或输出张量,结构生成模块906还用于对计算图中的操作层进行遍历,识别操作层中的连接层;根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构;根据连接层对应的输出张量的优化数据结构确定计算图对应的目标子数据结构。In one embodiment, the computation graph includes an operation layer and a tensor, the operation layer includes a connection layer, and the tensor is an input tensor or an output tensor of the operation layer. Perform traversal to identify the connection layer in the operation layer; generate the optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph; determine the target object corresponding to the calculation graph according to the optimized data structure of the output tensor corresponding to the connection layer data structure.
在其中一个实施例中,上述装置还包括:识别模块,用于将计算图中的操作层进行拓扑排序,得到拓扑序列;根据拓扑序列依次识别每个操作层是否为连接层;当不是连接层时,则跳过操作层;当是连接层时,则根据数据结构模板以及计算图生成连接层对应的输出张量的优化数据结构。In one of the embodiments, the above-mentioned device further includes: an identification module for topologically sorting the operation layers in the calculation graph to obtain a topological sequence; sequentially identifying whether each operation layer is a connection layer according to the topological sequence; when it is not a connection layer When it is, the operation layer is skipped; when it is the connection layer, the optimized data structure of the output tensor corresponding to the connection layer is generated according to the data structure template and the calculation graph.
在其中一个实施例中,结构生成模块906还用于获取当前连接层,识别当前连接层对应的输入张量中是否存在已优化的输入张量;当存在时,获取已优化的输出张量的优化数据结构,根据已优化的输出张量的优化数据结构、计算图以及数据结构模板生成当前连接层对应的输出张量的优化数据结构,将下一个连接层更新为当前连接层,返回识别当前连接层对应的输入张量中是否存在已优化的输入张量的步骤,直至生成操作层中所有连接层对应的输出张量的优化数据结构。In one embodiment, the structure generation module 906 is further configured to obtain the current connection layer, and identify whether there is an optimized input tensor in the input tensor corresponding to the current connection layer; if there is, obtain the optimized data of the optimized output tensor Structure, generate the optimized data structure of the output tensor corresponding to the current connection layer according to the optimized data structure, calculation graph and data structure template of the optimized output tensor, update the next connection layer to the current connection layer, and return to identify the current connection layer The steps of whether there are optimized input tensors in the corresponding input tensors until the optimized data structure of the output tensors corresponding to all connection layers in the operation layer is generated.
在其中一个实施例中,结构生成模块906还用于当当前连接层对应的输入张量中不存在已优化的输入张量时,根据数据结构模板在计算图中提取当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至数据结构模板中,得到当前连接层对应的输出张量的优化数据结构。In one embodiment, the structure generation module 906 is further configured to extract, according to the data structure template, the output tensor corresponding to the current connection layer in the calculation graph when there is no optimized input tensor in the input tensor corresponding to the current connection layer. Template data, the extracted template data is correspondingly added to the data structure template, and the optimized data structure of the output tensor corresponding to the current connection layer is obtained.
在其中一个实施例中,结构生成模块906还用于根据数据结构模板在计算图中依次提取每个输入张量对应的生产层、需求层以及形状;将提取的生产层、需求层以及形状添加至数据结构模板中;在数据结构模板的生产层索引表中建立生产层对应的生产层索引;在数据结构模板的需求层数据表中建立需求层对应的需求层索引,以及统计每个需求层对应的输入张量的大小;将输入张量对应的需求层中连接层以外的需求层挂接至需求层数据表中,得到当前连接层对应的输出张量的优化数据结构。In one embodiment, the structure generation module 906 is further configured to sequentially extract the production layer, the demand layer and the shape corresponding to each input tensor in the calculation graph according to the data structure template; add the extracted production layer, the demand layer and the shape to the data structure template; establish the production layer index corresponding to the production layer in the production layer index table of the data structure template; establish the demand layer index corresponding to the demand layer in the demand layer data table of the data structure template, and count each demand layer The size of the corresponding input tensor; connect the demand layers other than the connection layer in the demand layer corresponding to the input tensor to the demand layer data table to obtain the optimized data structure of the output tensor corresponding to the current connection layer.
在其中一个实施例中,数据确定模块908还用于根据目标子数据结构在连接层中确定待优化的连接层;在计算图中获取待优化的连接层中每个连接层对应的输入张量;将待优化的连接层以及获取的输入张量作为待优化连接层数据。In one embodiment, the data determination module 908 is further configured to determine the connection layer to be optimized in the connection layer according to the target sub-data structure; obtain the input tensor corresponding to each connection layer in the connection layer to be optimized in the calculation graph ; Use the connection layer to be optimized and the obtained input tensor as the connection layer data to be optimized.
在其中一个实施例中,模型优化模块910还用于删除待优化连接层数据,得到删除处理后的计算图;根据目标数据结构将删除后的计算图进行连接,得到优化后的神经网络模型。In one embodiment, the model optimization module 910 is further configured to delete the connection layer data to be optimized to obtain a calculation graph after deletion; and connect the deleted calculation graphs according to the target data structure to obtain an optimized neural network model.
关于神经网络模型推理装置的具体限定可以参见上文中对于神经网络模型推理方法的限定,在此不再赘述。上述神经网络模型推理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the neural network model inference apparatus, please refer to the limitation on the neural network model inference method above, which will not be repeated here. Each module in the above-mentioned neural network model inference apparatus may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储一种神经网络模型推理方法的数据。该计算机设备的通信接口用于与外部的终端连接通信。该计算机可读指令被处理器执行时以实现一种神经网络模型推理方法。In one of the embodiments, a computer device is provided, the computer device may be a server, and the internal structure diagram thereof may be as shown in FIG. 10 . The computer device includes a processor, memory, a communication interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used for storing data of a neural network model inference method. The communication interface of the computer device is used to connect and communicate with an external terminal. The computer readable instructions, when executed by a processor, implement a neural network model inference method.
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包 括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
一种计算机设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各个方法实施例中的步骤。A computer device, comprising a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the one or more processors, makes the one or more processors execute the above methods to implement steps in the example.
一个或多个存储有计算机可读指令的计算机存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各个方法实施例中的步骤。One or more computer storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps in each of the foregoing method embodiments.
其中,该计算机存储介质为可读存储介质,可读存储介质可以是非易失性,也可以是易失性的。Wherein, the computer storage medium is a readable storage medium, and the readable storage medium may be non-volatile or volatile.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (20)

  1. 一种神经网络模型推理方法,包括:A neural network model inference method, comprising:
    获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
    获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
    获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
    根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
    根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
    根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
  2. 根据权利要求1所述的方法,其特征在于,所述计算图包括操作层和张量,所述操作层包括连接层,所述张量为所述操作层的输入张量或输出张量,所述根据所述数据结构模板生成所述计算图对应的目标子数据结构包括:The method according to claim 1, wherein the computation graph includes an operation layer and a tensor, the operation layer includes a connection layer, and the tensor is an input tensor or an output tensor of the operation layer, The generating of the target sub-data structure corresponding to the computation graph according to the data structure template includes:
    对所述计算图中的操作层进行遍历,识别所述操作层中的连接层;Traversing the operation layers in the calculation graph, and identifying the connection layers in the operation layer;
    根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构;及generating an optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the computation graph; and
    根据所述连接层对应的输出张量的优化数据结构确定所述计算图对应的目标子数据结构。The target sub-data structure corresponding to the computation graph is determined according to the optimized data structure of the output tensor corresponding to the connection layer.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    将所述计算图中的操作层进行拓扑排序,得到拓扑序列;Topological sorting is performed on the operation layers in the calculation graph to obtain a topological sequence;
    根据所述拓扑序列依次识别每个操作层是否为连接层;Identify whether each operation layer is a connection layer in turn according to the topological sequence;
    当不是连接层时,则跳过所述操作层;及When it is not a connection layer, the operation layer is skipped; and
    当是连接层时,则根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构。When it is a connection layer, an optimized data structure of the output tensor corresponding to the connection layer is generated according to the data structure template and the calculation graph.
  4. 根据权利要求2至3中任意一项所述的方法,其特征在于,所述根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构包括:The method according to any one of claims 2 to 3, wherein the generating the optimized data structure of the output tensor corresponding to the connection layer according to the data structure template and the calculation graph comprises:
    获取当前连接层,识别所述当前连接层对应的输入张量中是否存在已优化的输入张量;及Obtain the current connection layer, and identify whether there is an optimized input tensor in the input tensor corresponding to the current connection layer; and
    当存在时,获取所述已优化的输出张量的优化数据结构,根据所述已优化的输出张量的优化数据结构、所述计算图以及所述数据结构模板生成所述当前连接层对应的输出张量的优化数据结构,将下一个连接层更新为所述当前连接层,返回所述识别所述当前连接层对应的输入张量中是否存在已优化的输入张量的步骤,直至生成所述操作层中所有连接层对应的输出张量的优化数据结构。When it exists, obtain the optimized data structure of the optimized output tensor, and generate the corresponding data structure of the current connection layer according to the optimized data structure of the optimized output tensor, the calculation graph and the data structure template. The optimized data structure of the output tensor, update the next connection layer to the current connection layer, and return to the step of identifying whether there is an optimized input tensor in the input tensor corresponding to the current connection layer, until the operation is generated The optimized data structure of the output tensors corresponding to all connected layers in the layer.
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    当所述当前连接层对应的输入张量中不存在已优化的输入张量时,根据所述数据结构模板在所述计算图中提取所述当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至所述数据结构模板中,得到所述当前连接层对应的输出张量的优化数据结构。When there is no optimized input tensor in the input tensor corresponding to the current connection layer, extract the template data corresponding to the output tensor of the current connection layer in the calculation graph according to the data structure template, and extract the template data corresponding to the output tensor of the current connection layer. The template data is correspondingly added to the data structure template to obtain an optimized data structure of the output tensor corresponding to the current connection layer.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述数据结构模板在所述计算图中提取所述当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至所述数据结构模板中,得到所述当前连接层对应的输出张量的优化数据结构包括:The method according to claim 5, wherein the template data corresponding to the output tensor of the current connection layer is extracted in the calculation graph according to the data structure template, and the extracted template data is correspondingly added to In the data structure template, obtaining the optimized data structure of the output tensor corresponding to the current connection layer includes:
    根据所述数据结构模板在所述计算图中依次提取每个输入张量对应的生产层、需求层以及形状;According to the data structure template, the production layer, the demand layer and the shape corresponding to each input tensor are sequentially extracted in the calculation graph;
    将提取的生产层、需求层以及形状添加至所述数据结构模板中;adding the extracted production layer, requirement layer and shape to the data structure template;
    在所述数据结构模板的生产层索引表中建立所述生产层对应的生产层索引;establishing a production layer index corresponding to the production layer in the production layer index table of the data structure template;
    在所述数据结构模板的需求层数据表中建立所述需求层对应的需求层索引,以及统计每个需求层对应的输入张量的大小;及establishing a requirement layer index corresponding to the requirement layer in the requirement layer data table of the data structure template, and counting the size of the input tensor corresponding to each requirement layer; and
    将所述输入张量对应的需求层中连接层以外的需求层挂接至所述需求层数据表中,得到所述当前连接层对应的输出张量的优化数据结构。The requirement layers other than the connection layer in the requirement layer corresponding to the input tensor are attached to the requirement layer data table to obtain an optimized data structure of the output tensor corresponding to the current connection layer.
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据包括:The method according to claim 1, wherein the determining the data of the connection layer to be optimized in the calculation graph according to the target sub-data structure and the connection layer comprises:
    根据所述目标子数据结构在所述连接层中确定待优化的连接层;Determine the connection layer to be optimized in the connection layer according to the target sub-data structure;
    在所述计算图中获取所述待优化的连接层中每个连接层对应的输入张量;及Obtain, in the computation graph, an input tensor corresponding to each connection layer in the connection layer to be optimized; and
    将所述待优化的连接层以及获取的输入张量作为待优化连接层数据。The connection layer to be optimized and the obtained input tensor are used as the connection layer data to be optimized.
  8. 根据权利要求1所述的方法,其特征在于,所述根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型包括:The method according to claim 1, wherein the optimizing the connection layer data to be optimized according to the target substructure data, and obtaining the optimized neural network model comprises:
    删除所述待优化连接层数据,得到删除处理后的计算图;及Deleting the connection layer data to be optimized to obtain a calculation graph after deletion; and
    根据所述目标数据结构将所述删除后的计算图进行连接,得到优化后的神经网络模型。The deleted computation graphs are connected according to the target data structure to obtain an optimized neural network model.
  9. 一种神经网络模型推理装置,包括:A neural network model inference device, comprising:
    任务获取模块,用于获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;a task acquisition module for acquiring a neural network model inference task, where the neural network model inference task includes a model identifier;
    模型解析模块,用于获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;a model parsing module, configured to obtain a neural network model corresponding to the model identifier, analyze the neural network model, and obtain a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
    结构生成模块,用于获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;a structure generation module, configured to obtain a pre-built data structure template, and generate a target sub-data structure corresponding to the computation graph according to the data structure template;
    数据确定模块,用于根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;a data determination module, configured to determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
    模型优化模块,用于根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及A model optimization module for performing optimization processing on the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
    模型推理模块,用于根据所述优化后的神经网络模型进行推理,得到模型推理结果。A model inference module, configured to perform inference according to the optimized neural network model to obtain a model inference result.
  10. 根据权利要求9所述的装置,其特征在于,所述计算图包括操作层和张量,所述操作层包括连接层,所述张量为所述操作层的输入张量或输出张量,所述结构生成模块还用于对所述计算图中的操作层进行遍历,识别所述操作层中的连接层;根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构;及根据所述连接层对应的输出张量的优化数据结构确定所述计算图对应的目标子数据结构。The apparatus according to claim 9, wherein the computation graph includes an operation layer and a tensor, the operation layer includes a connection layer, and the tensor is an input tensor or an output tensor of the operation layer, The structure generation module is further configured to traverse the operation layers in the calculation graph, and identify the connection layers in the operation layers; and generate output sheets corresponding to the connection layers according to the data structure template and the calculation graph. and determining the target sub-data structure corresponding to the computation graph according to the optimized data structure of the output tensor corresponding to the connection layer.
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device comprising a memory and one or more processors, the memory having computer-readable instructions stored in the memory that, when executed by the one or more processors, cause the one or more processors to Each processor performs the following steps:
    获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
    获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
    获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
    根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
    根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
    根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
  12. 根据权利要求11所述的计算机设备,其特征在于,所述计算图包括操作层和张量,所述操作层包括连接层,所述张量为所述操作层的输入张量或输出张量,所述处理器执行所述计算机可读指令时还执行以下步骤:对所述计算图中的操作层进行遍历,识别所述操作层中的连接层;根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构;及根据所述连接层对应的输出张量的优化数据结构确定所述计算图对应的目标子数据结构。The computer device according to claim 11, wherein the computation graph includes an operation layer and a tensor, the operation layer includes a connection layer, and the tensor is an input tensor or an output tensor of the operation layer , the processor also performs the following steps when executing the computer-readable instructions: traversing the operation layers in the calculation graph, identifying the connection layers in the operation layer; according to the data structure template and the calculation The graph generates an optimized data structure of the output tensor corresponding to the connection layer; and determines a target sub-data structure corresponding to the computation graph according to the optimized data structure of the output tensor corresponding to the connection layer.
  13. 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:将所述计算图中的操作层进行拓扑排序,得到拓扑序列;根据所述拓扑序列依次识别每个操作层是否为连接层;当不是连接层时,则跳过所述操作层;及当是连接层时,则根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构。The computer device according to claim 12, wherein, when the processor executes the computer-readable instructions, the processor further performs the following steps: performing topological sorting on the operation layers in the calculation graph to obtain a topological sequence; The topological sequence sequentially identifies whether each operation layer is a connection layer; when it is not a connection layer, skips the operation layer; and when it is a connection layer, generates the The optimized data structure of the output tensor corresponding to the connection layer.
  14. 根据权利要求12至13中任意一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:获取当前连接层,识别所述当前连接层对应的输入张量中是否存在已优化的输入张量;及当存在时,获取所述已优化的输出张量的优化数据结构,根据所述已优化的输出张量的优化数据结构、所述计算图以及所述数据结构模板生成所述当前连接层对应的输出张量的优化数据结构,将下一个连接层更新为所述当前连接层,返回所述识别所述当前连接层对应的输入张量中是否存在已优化的输入张量的 步骤,直至生成所述操作层中所有连接层对应的输出张量的优化数据结构。The computer device according to any one of claims 12 to 13, wherein when the processor executes the computer-readable instruction, the processor further executes the following steps: acquiring a current connection layer, identifying the current connection layer corresponding to the Whether there is an optimized input tensor in the input tensor; and when there is, obtain the optimized data structure of the optimized output tensor, according to the optimized data structure of the optimized output tensor, the calculation graph and all The data structure template generates the optimized data structure of the output tensor corresponding to the current connection layer, updates the next connection layer to the current connection layer, and returns the identification of whether there is an optimized input tensor corresponding to the current connection layer. until the optimized data structure of the output tensors corresponding to all the connection layers in the operation layer is generated.
  15. 根据权利要求14所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:当所述当前连接层对应的输入张量中不存在已优化的输入张量时,根据所述数据结构模板在所述计算图中提取所述当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至所述数据结构模板中,得到所述当前连接层对应的输出张量的优化数据结构。The computer device according to claim 14, wherein when the processor executes the computer-readable instructions, the processor further executes the following step: when an optimized input tensor does not exist in the input tensor corresponding to the current connection layer , extract the template data corresponding to the output tensor of the current connection layer in the calculation graph according to the data structure template, and add the extracted template data to the data structure template correspondingly to obtain the current connection layer corresponding The optimized data structure of the output tensor.
  16. 一个或多个存储有计算机可读指令的计算机存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more computer storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取神经网络模型推理任务,所述神经网络模型推理任务包括模型标识;Obtain a neural network model inference task, where the neural network model inference task includes a model identifier;
    获取所述模型标识对应的神经网络模型,对所述神经网络模型进行解析,得到所述神经网络模型对应的计算图,所述计算图中包括连接层;obtaining a neural network model corresponding to the model identifier, analyzing the neural network model, and obtaining a computation graph corresponding to the neural network model, where the computation graph includes a connection layer;
    获取预先构建的数据结构模板,根据所述数据结构模板生成所述计算图对应的目标子数据结构;Obtaining a pre-built data structure template, and generating a target sub-data structure corresponding to the computation graph according to the data structure template;
    根据所述目标子数据结构以及所述连接层在所述计算图中确定待优化连接层数据;Determine the connection layer data to be optimized in the calculation graph according to the target sub-data structure and the connection layer;
    根据所述目标子结构数据对所述待优化连接层数据进行优化处理,得到优化后的神经网络模型;及Optimizing the connection layer data to be optimized according to the target substructure data to obtain an optimized neural network model; and
    根据所述优化后的神经网络模型进行推理,得到模型推理结果。Inference is performed according to the optimized neural network model to obtain a model inference result.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算图包括操作层和张量,所述操作层包括连接层,所述张量为所述操作层的输入张量或输出张量,所述计算机可读指令被所述处理器执行时还执行以下步骤:对所述计算图中的操作层进行遍历,识别所述操作层中的连接层;根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构;及根据所述连接层对应的输出张量的优化数据结构确定所述计算图对应的目标子数据结构。The storage medium according to claim 16, wherein the computation graph includes an operation layer and a tensor, the operation layer includes a connection layer, and the tensor is an input tensor or an output tensor of the operation layer , when the computer-readable instructions are executed by the processor, the following steps are also performed: traverse the operation layers in the calculation graph, and identify the connection layers in the operation layer; according to the data structure template and the The calculation graph generates an optimized data structure of output tensors corresponding to the connection layer; and determines a target sub-data structure corresponding to the calculation graph according to the optimized data structure of the output tensors corresponding to the connection layer.
  18. 根据权利要求16至17任意一项所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:将所述计算图中的操作层进行拓扑排序,得到拓扑序列;根据所述拓扑序列依次识别每个操作层是否为连接层;当不是连接层时,则跳过所述操作层;及当是连接层时,则根据所述数据结构模板以及所述计算图生成所述连接层对应的输出张量的优化数据结构。The storage medium according to any one of claims 16 to 17, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed: topologically sorting the operation layers in the calculation graph, Obtain a topology sequence; identify whether each operation layer is a connection layer in turn according to the topology sequence; when it is not a connection layer, skip the operation layer; and when it is a connection layer, according to the data structure template and all The computation graph generates an optimized data structure of the output tensors corresponding to the connection layers.
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:获取当前连接层,识别所述当前连接层对应的输入张量中是否存在已优化的输入张量;及当存在时,获取所述已优化的输出张量的优化数据结构,根据所述已优化的输出张量的优化数据结构、所述计算图以及所述数据结构模板生成所述当前连接层对应的输出张量的优化数据结构,将下一个连接层更新为所述当前连接层,返回所述识别所述当前连接层对应的输入张量中是否存在已优化的输入张量的步骤,直至生成所述操作层中所有连接层对应的输出张量的优化数据结构。The storage medium according to claim 18, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed: obtaining a current connection layer, and identifying whether there is an existing connection layer in the input tensor corresponding to the current connection layer an optimized input tensor; and when present, obtain an optimized data structure of the optimized output tensor, generated from the optimized data structure of the optimized output tensor, the computation graph, and the data structure template The optimized data structure of the output tensor corresponding to the current connection layer, update the next connection layer to the current connection layer, and return the identification of whether there is an optimized input tensor in the input tensor corresponding to the current connection layer. step until an optimized data structure of output tensors corresponding to all connection layers in the operation layer is generated.
  20. 根据权利要求19所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:当所述当前连接层对应的输入张量中不存在已优化的输入张量时,根据所述数据结构模板在所述计算图中提取所述当前连接层的输出张量对应的模板数据,将提取的模板数据对应添加至所述数据结构模板中,得到所述当前连接层对应的输出张量的优化数据结构。The storage medium according to claim 19, wherein when the computer-readable instructions are executed by the processor, the following step is further performed: when there is no optimized input tensor in the input tensor corresponding to the current connection layer , extract the template data corresponding to the output tensor of the current connection layer in the calculation graph according to the data structure template, and add the extracted template data to the data structure template correspondingly to obtain the current connection layer The optimized data structure of the corresponding output tensor.
PCT/CN2021/086552 2021-04-12 2021-04-12 Neural network model inference method and apparatus, computer device, and storage medium WO2022217419A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/086552 WO2022217419A1 (en) 2021-04-12 2021-04-12 Neural network model inference method and apparatus, computer device, and storage medium
CN202180050194.4A CN115867923A (en) 2021-04-12 2021-04-12 Neural network model inference method, neural network model inference device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/086552 WO2022217419A1 (en) 2021-04-12 2021-04-12 Neural network model inference method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022217419A1 true WO2022217419A1 (en) 2022-10-20

Family

ID=83639411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086552 WO2022217419A1 (en) 2021-04-12 2021-04-12 Neural network model inference method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN115867923A (en)
WO (1) WO2022217419A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919315A (en) * 2019-03-13 2019-06-21 科大讯飞股份有限公司 A kind of forward inference method, apparatus, equipment and the storage medium of neural network
CN110245741A (en) * 2018-03-09 2019-09-17 佳能株式会社 Optimization and methods for using them, device and the storage medium of multilayer neural network model
US20190378014A1 (en) * 2018-06-08 2019-12-12 Oki Electric Industry Co., Ltd. Neural network load reduction device, information processing unit, and neural network load reduction method and computer-readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245741A (en) * 2018-03-09 2019-09-17 佳能株式会社 Optimization and methods for using them, device and the storage medium of multilayer neural network model
US20190378014A1 (en) * 2018-06-08 2019-12-12 Oki Electric Industry Co., Ltd. Neural network load reduction device, information processing unit, and neural network load reduction method and computer-readable storage medium
CN109919315A (en) * 2019-03-13 2019-06-21 科大讯飞股份有限公司 A kind of forward inference method, apparatus, equipment and the storage medium of neural network

Also Published As

Publication number Publication date
CN115867923A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
WO2021114625A1 (en) Network structure construction method and apparatus for use in multi-task scenario
WO2019136993A1 (en) Text similarity calculation method and device, computer apparatus, and storage medium
CN110458324B (en) Method and device for calculating risk probability and computer equipment
CN109815333A (en) Information acquisition method, device, computer equipment and storage medium
CN106649503A (en) Query method and system based on sql
CN111126668A (en) Spark operation time prediction method and device based on graph convolution network
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
WO2024022354A1 (en) Object recommendation method and apparatus for implementing ia in view of rpa and ai, and storage medium
CN112560444A (en) Text processing method and device, computer equipment and storage medium
CN115062016A (en) Incidence relation extraction method and device and computer equipment
WO2022141489A1 (en) Deep learning model reasoning method and apparatus, computer device, and storage medium
CN109656947B (en) Data query method and device, computer equipment and storage medium
Garrido-Munoz et al. A holistic approach for image-to-graph: application to optical music recognition
WO2022217419A1 (en) Neural network model inference method and apparatus, computer device, and storage medium
WO2023093689A1 (en) Computational graph optimization method and apparatus, and device
WO2020132933A1 (en) Short text filtering method and apparatus, medium and computer device
CN114090722B (en) Method and device for automatically completing query content
CN113811897B (en) Inference method and apparatus of neural network model, computer device, and storage medium
CN114780700A (en) Intelligent question-answering method, device, equipment and medium based on machine reading understanding
US11386155B2 (en) Filter evaluation in a database system
CN117421565B (en) Markov blanket-based equipment assessment method and device and computer equipment
US11893012B1 (en) Content extraction using related entity group metadata from reference objects
CN111309572B (en) Test analysis method and device, computer equipment and storage medium
CN116383883B (en) Big data-based data management authority processing method and system
US20230013748A1 (en) Artificial Intelligence (AI) Framework to Identify Object-Relational Mapping Issues in Real-Time

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21936314

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21936314

Country of ref document: EP

Kind code of ref document: A1