WO2022041015A1 - 神经网络模型优化方法及装置 - Google Patents

神经网络模型优化方法及装置 Download PDF

Info

Publication number
WO2022041015A1
WO2022041015A1 PCT/CN2020/111529 CN2020111529W WO2022041015A1 WO 2022041015 A1 WO2022041015 A1 WO 2022041015A1 CN 2020111529 W CN2020111529 W CN 2020111529W WO 2022041015 A1 WO2022041015 A1 WO 2022041015A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
node
neural network
network model
time
Prior art date
Application number
PCT/CN2020/111529
Other languages
English (en)
French (fr)
Inventor
焦建兵
张卫兵
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080103328.XA priority Critical patent/CN115956247A/zh
Priority to PCT/CN2020/111529 priority patent/WO2022041015A1/zh
Publication of WO2022041015A1 publication Critical patent/WO2022041015A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of artificial intelligence (AI), and in particular, to a method and apparatus for optimizing a neural network model.
  • AI artificial intelligence
  • a computation graph (Graph) is usually used to represent the computation process of the neural network model.
  • the computational graph of the neural network model is a computational graph obtained by dividing each neuron in the neural network model into operators oriented to tensor data.
  • the calculation graph can represent the mathematical expression of each operator and the connection relationship between the operators, that is, can represent the mathematical expression of the neurons of the neural network model and the connection relationship between the neurons.
  • the neural network structure is usually complex, after the neural network model is mapped into a computational graph, the topology of the computational graph will also be complex, the computational complexity is high, and the computation time required to perform computational tasks is relatively long.
  • the present application provides a method and device for optimizing a neural network model, which solves the problem in the prior art that when a computing task is performed through a neural network model, the required computing time is relatively long.
  • a method for optimizing a neural network model including: obtaining a first calculation graph of a neural network model; generating a second calculation graph according to a preset rule and the first calculation graph; wherein, using the second calculation graph to calculate The time for the first input data is less than the time for calculating the first input data by using the first calculation graph; the preset rules include at least one of the following: a mathematical fusion rule, a mathematical splitting rule, an instruction fusion rule, an instruction splitting rule, and Hardware fusion rules; output the second computational graph.
  • the neural network model optimization method provided by the present application can optimize the first computational graph of the neural network model into a second computational graph with stronger computational performance and less computational time required to perform computational tasks.
  • the computing speed of the neural network model when performing the computing task is improved, and the time required for the neural network model to perform the computing task is reduced.
  • a terminal device configured with a neural network model (hereinafter referred to as a terminal device) invokes the neural network model to perform a computing task
  • the neural network model is optimized by using the neural network model optimization method provided in the embodiment of the present application, which can improve the performance of the terminal device. the computing performance, saving the computing time of the terminal device.
  • the mathematical fusion rule is: fuse multiple first computing nodes into a second computing node; wherein, the mathematical expression corresponding to the second computing node is: A mathematical expression determined after mathematical derivation is performed on the mathematical expressions corresponding to the plurality of first computing nodes; the time for calculating the second input data by using the plurality of first computing nodes is greater than the time for calculating the second input data by using the second computing nodes .
  • the neural network model optimization device uses the mathematical fusion rule to fuse the first calculation graph
  • the obtained calculation graph has fewer calculation nodes, the topology structure of the calculation graph is simpler, and at the same time, the calculation capacity of the calculation graph is stronger. Data takes less time. Therefore, when the neural network model optimization device adopts the mathematical fusion rule to optimize the computation graph of the neural network model, the computation performance of the computation graph of the neural network model can be improved, and the computation time required for the computation graph of the neural network model to perform computation tasks can be reduced.
  • the mathematical splitting rule is: the mathematical splitting rule is: splitting a third computing node into a plurality of fourth computing nodes; wherein the third computing nodes correspond to
  • the mathematical expression is: the mathematical expression determined after mathematical derivation is performed on the mathematical expressions corresponding to the plurality of fourth calculation nodes; Calculate the time for the third input data.
  • the neural network model optimization device adopts a mathematical splitting rule to split a computing node into multiple computing nodes, because the time for the multiple computing nodes to perform the computing task after splitting is shorter than that for a computing node before the splitting. The time to perform the computation task. Therefore, the neural network model optimization device using mathematical splitting rules to optimize the computation graph of the neural network model can also improve the computational performance of the computation graph of the neural network model and reduce the time required for the computation graph of the neural network model to compute data.
  • the instruction fusion rule is: according to the received node fusion instruction, a plurality of fifth computing nodes are fused into a sixth computing node; wherein, the node fusion instruction is used for It is instructed to fuse the multiple fifth computing nodes into one sixth computing node; the time for computing the fourth input data by using the multiple fifth computing nodes is longer than the time for computing the fourth input data using the sixth computing node.
  • the neural network model optimization device uses the instruction fusion rule to fuse the first calculation graph
  • the obtained calculation graph has fewer calculation nodes, the topology structure of the calculation graph is simpler, and the calculation capacity of the calculation graph is stronger. Data takes less time. Therefore, when the neural network model optimization device uses the instruction fusion rule to optimize the computation graph of the neural network model, the computation performance of the computation graph of the neural network model can be improved, and the computation time required for the computation graph of the neural network model to perform computation tasks can be reduced.
  • the node fusion instruction in the instruction fusion rule may be a manual input instruction.
  • the neural network model optimization device can fuse the nodes in the neural network model calculation graph according to the manually input instruction, which improves the applicable scenarios of the neural network model optimization method.
  • the instruction splitting rule is used to: split a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; wherein, the node The splitting instruction is used to instruct to split a seventh computing node into multiple eighth computing nodes; the time for computing the fifth input data by using the seventh computing node is greater than the time for computing the fifth input data by using the multiple eighth computing nodes .
  • the neural network model optimization device adopts the instruction splitting rule. After splitting a computing node into multiple computing nodes, the time for the multiple computing nodes to perform the computing task after splitting is shorter than that for a computing node before the splitting. The time to perform the computation task. Therefore, using the instruction splitting rule to optimize the computation graph of the neural network model by the neural network model optimization device can also improve the computational performance of the computation graph of the neural network model and reduce the time required for the computation graph of the neural network model to compute data.
  • the node splitting instructions in the instruction fusion rule may be manually input instructions.
  • the neural network model optimization device can split the nodes in the neural network model calculation graph according to the manually input instruction, which improves the applicable scenarios of the neural network model optimization method.
  • the hardware fusion rule is: the ninth computing node uses the first transmission path to transmit data to the tenth computing node; wherein the ninth computing node uses the first transmission path to transmit data to the tenth computing node.
  • the time for the tenth node to transmit data is shorter than the time for the ninth computing node to transmit data to the tenth node using the second transmission path;
  • the second transmission path is the transmission path for the ninth computing node to transmit data to the tenth node in the first calculation graph.
  • the neural network model can improve the computational performance of the computational graph of the neural network model and reduce the time required for the computational graph of the neural network model to perform computational tasks by optimizing the data transmission path in the nodes.
  • an apparatus for optimizing a neural network model including: a communication unit and a processing unit.
  • the communication unit is used to obtain the first calculation graph of the neural network model;
  • the processing unit is used to generate the second calculation graph according to the preset rule and the first calculation graph;
  • the time for the second calculation graph to calculate the first input data is less than Calculate the time of the first input data in the first calculation graph;
  • the preset rules include at least one of the following: mathematical fusion rules, mathematical splitting rules, instruction fusion rules, instruction splitting rules, and hardware fusion rules; to output the second calculation graph.
  • the mathematical fusion rule is: fuse multiple first computing nodes into a second computing node; wherein, the mathematical expression corresponding to the second computing node is: The mathematical expression determined after mathematical derivation is performed on the mathematical expressions corresponding to the plurality of first computing nodes; the time for computing the second input data by using the plurality of first computing nodes is longer than the time for computing the second input data by the second computing nodes.
  • the mathematical splitting rule is: splitting a third computing node into multiple fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: The mathematical expression determined after mathematical derivation is performed on the mathematical expressions corresponding to the plurality of fourth computing nodes; the time for calculating the third input data by using the third computing nodes is greater than the time for calculating the third input data by the plurality of fourth computing nodes .
  • the instruction fusion rule is: according to the received node fusion instruction, a plurality of fifth computing nodes are fused into a sixth computing node; wherein, the node fusion instruction, using In order to instruct to fuse a plurality of fifth computing nodes into one sixth computing node; the time for computing the fourth input data by using the plurality of fifth computing nodes is longer than the time for computing the fourth input data by the sixth computing node.
  • the instruction splitting rule is used to split a seventh computing node into a plurality of eighth computing nodes according to the received node splitting instruction; wherein, the node The splitting instruction is used to instruct to split a seventh computing node into multiple eighth computing nodes; the time for calculating the fifth input data by using the seventh computing node is greater than the time for calculating the fifth input data by the multiple eighth computing nodes .
  • the hardware fusion rule is: the ninth computing node uses the first transmission path to transmit data to the tenth computing node; wherein, the ninth computing node uses the first transmission path to transmit data to the tenth computing node.
  • the time for the tenth node to transmit data is shorter than the time for the ninth computing node to transmit data to the tenth node using the second transmission path;
  • the second transmission path is the transmission path for the ninth computing node to transmit data to the tenth node in the first calculation graph.
  • the present application provides an apparatus for optimizing a neural network model, including: a processor and a storage medium; the storage medium includes instructions, and the processor is used to execute the instructions, so as to realize any possibility of the first aspect and the first aspect implementation of the method described in.
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on the neural network model optimization device, the neural network model optimization device is made to perform as in the first aspect. and the method described in any possible implementation manner of the first aspect.
  • the present application provides a computer program product comprising instructions that, when the computer program product is run on a neural network model optimization device, enables the neural network model optimization device to perform any one of the first aspect and the first aspect The methods described in possible implementations.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a convolutional neural network used in an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of a calculation diagram of a neural network model provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the architecture of a software stack in the prior art provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for optimizing a neural network model according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of the architecture of an improved software stack provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of node optimization using mathematical fusion rules according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of node optimization using a mathematical splitting rule according to an embodiment of the present application.
  • 9a is a schematic flowchart of a computing node performing a computing task in the prior art according to an embodiment of the present application.
  • FIG. 9b is a schematic flowchart of a computing node that is optimized by adopting a hardware fusion rule to perform a computing task according to an embodiment of the present application;
  • FIG. 10 is a schematic structural diagram of an apparatus for optimizing a neural network model provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of another apparatus for optimizing a neural network model provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of the hardware structure of a neural network model optimization apparatus provided by an embodiment of the application.
  • FIG. 13 is a schematic diagram of a hardware structure of another apparatus for optimizing a neural network model provided by an embodiment of the present application.
  • the neural network model provided by the present application may be any artificial neural network model, such as a convolutional neural network model, a back propagation (back propagation, BP) neural network model, etc., which is not specifically limited in this embodiment of the present application.
  • a convolutional neural network model such as a convolutional neural network model, a back propagation (back propagation, BP) neural network model, etc., which is not specifically limited in this embodiment of the present application.
  • FIG. 1 is a system architecture 100 provided by an embodiment of the present application.
  • a data collection device 160 is used to collect training data.
  • the training data may include training images and classification results corresponding to the training images, wherein the results of the training images may be manually pre-marked results.
  • Target models 101 may also be referred to as target rules 101 .
  • the data collection device 160 After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
  • the training device 120 processes the input original image and compares the output image with the original image until the difference between the image output by the training device 120 and the original image is reached. is less than a certain threshold, thus completing the training of the target model 101 .
  • the target model 101 in this embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 may not necessarily train the target model 101 entirely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training. limit.
  • the target model 101 trained according to the training device 120 can be applied to different systems or devices, such as mobile phone terminals, tablet computers, laptop computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted
  • the terminal etc., may also be a server or a cloud.
  • the training device 120 can generate a corresponding target model 101 based on different training data for different goals or different tasks, and the corresponding target model 101 can be used to achieve the above goals or complete the above tasks, so as to provide users with the required the result of.
  • the target model 101 obtained by training according to the training device 120 may be a CNN, a deep convolutional neural network (DCNN), a recurrent neural network (RNNS), and the like.
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, the positional relationship between the devices, devices, modules, etc. shown in FIG. 1 , the type of training data, and the type or function of the neural network does not constitute any restriction.
  • model converter 110 may reside in client device 140 .
  • the training data may also be text, voice or other types of data.
  • the model converter can also have other names, such as model compiler, etc., as long as the device or device that can realize the similar function as the model converter 110 can be understood as the model converter in this application.
  • the model file of the target model 101 trained by the training device 120 is platform-independent (that is, compiled to run on different hardware platforms). If you want to apply the target model 101 on the client device 140, the target model 101 trained by the training device 120 The model file of the target model 101 needs to be processed by the model converter 110 to compile the model file of the target model 101 from the current format to the format supported by the client device.
  • the model file of the target model 101 needs to be input into the model converter 110, and the model converter 110 compiles the target model 101 to obtain the model file supported by the client device 140, and then The compiled model file is then deployed to the client device 140 .
  • the conversion process of the target model 101 by the model converter 110 may also be referred to as compilation.
  • the developer of the custom operator also needs to provide the model converter 110 with the parameter definition function, parameter analysis function, derivation function of the size of the output tensor (shape), implementation function and Call (forward) functions, etc.
  • the target model 101 is a model developed under the TensorFlow framework, and the operators in some or all of the layers in the target model 101 are custom-defined by the developer, that is, the situation that does not belong to the operators in the AI software stack of the TensorFlow framework.
  • the developer inputs the model file of the target model 101 into the model converter 110 to compile the model file that can be run on the client device through the model converter 110, the developer also needs to provide the model converter 110 with the custom operator Parameter definition function, parameter parsing function, output size (shape) derivation function, implementation function, and calling (forward) function, etc.
  • the structure of the neural network in the embodiment of the present application may be as shown in FIG. 2 .
  • a convolutional neural network (CNN) 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230 .
  • the convolutional/pooling layer 220 may include layers 221-226 as examples, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer Layer 224 is a pooling layer, 225 is a convolutional layer, and 226 is a pooling layer; in another implementation, 221 and 222 are convolutional layers, 223 are pooling layers, and 224 and 225 are convolutional layers. layer, 226 is the pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 may include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially Can be a weight matrix, which is usually pre-defined, usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image during the convolution operation on the image. ...It depends on the value of the stride step) to process, so as to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" described above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the size of the feature maps extracted from the multiple weight matrices with the same size is also the same, and then the multiple extracted feature maps with the same size are combined to form a convolution operation. output.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .
  • the initial convolutional layer eg, 221
  • the features extracted by the later convolutional layers eg, 226 become more and more complex, such as features such as high-level semantics.
  • features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a certain range to produce an average value as the result of average pooling.
  • the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size.
  • the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to utilize the neural network layer 230 to generate one or a set of outputs of the desired number of classes. Therefore, the neural network layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and the output layer 240, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multi-layer hidden layers in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to classification cross entropy, and is specifically used to calculate the prediction error,
  • the forward propagation of the entire convolutional neural network 200 (as shown in Figure 2, the propagation from the direction 210 to 240 is forward propagation)
  • the back propagation (as shown in Figure 2, the propagation from the 240 to 210 direction is the back propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.
  • the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.
  • the neural network model is an information processing system composed of a large number of processing units (referred to as neurons) connected to each other, and the neurons in the neural network model contain corresponding mathematical expressions. After the data is fed into the neuron, the neuron runs the mathematical expressions it contains, performs calculations on the input data, and generates output data.
  • the input data of each neuron is the output data of the previous neuron connected to it; the output data of each neuron is the input data of the next neuron connected to it.
  • the neural network model After inputting data, the neural network model selects corresponding neurons for the input data according to its own learning and training, and calculates the input data according to these neural pairs, and determines and outputs the final operation result.
  • the neural network can continuously learn and evolve in the process of data operation, and continuously optimize its own operation process according to the feedback of the operation results.
  • the number of neurons in the neural network model is usually fixed, but the mathematical expression in each neuron, or the corresponding weight value of the neuron can change continuously according to the continuous training of the neural network model.
  • the calculation graph is used to express the calculation process when the neural network model performs the calculation task in an intuitive form, so that the calculation process when the neural network model performs the calculation task is more clear and clear.
  • the terminal device invokes the corresponding neural network model according to the computing task, and converts the neural network into a corresponding computing graph. After that, the terminal device further splits the calculation graph into the form of a single operator, and sends it to the chip in the form of a lower-level language that can be recognized by the chip, and then the chip runs each operator to achieve the purpose of executing the calculation task according to the neural network model. .
  • FIG. 3 is a structure of a computation graph of a neural network model provided in an embodiment of the present application.
  • abc represents three inputs
  • node 1, node 2, and node 3 represent three computing nodes in the computing graph, respectively, and the connections between the nodes are shown as line segments with arrows, where the direction of the arrows is denoted as the direction of data transmission .
  • FIG. 3 is only an exemplary illustration, and the calculation diagram in actual application may be more complicated.
  • the software stack includes the following four parts: a user program layer, a computing framework layer, an operator layer, and a chip layer.
  • the user program layer is an upper-level language expression of the neural network model, for example, a neural network model expressed in python language.
  • the computational framework layer is used to convert the neural network model expressed by the upper-layer language into a general or specific computational graph representation.
  • the operator layer is used to split each computing node in the computing graph of the computing framework, convert these computing nodes into a lower-level language that the chip can recognize, and deliver the converted computing nodes to the chip.
  • the chip layer is used to run each computing node that is issued to achieve the effect of computing data using the neural network model.
  • an embodiment of the present application provides a neural network model optimization method , after the neural network model optimization device will obtain the first calculation graph of the neural network model; according to the preset rules, and the first calculation graph, generate a second calculation graph; wherein, for the same input data, the second calculation graph calculates the The time for inputting data is less than the time for calculating the input data in the first calculation graph; after that, the neural network model optimization device outputs the second calculation graph.
  • the neural network model optimization method provided by the present application can optimize the first computational graph of the neural network model into a second computational graph with stronger computational performance and less computational time required to perform computational tasks.
  • the computing speed of the neural network model when performing the computing task is improved, and the time required for the neural network model to perform the computing task is reduced.
  • the neural network model optimization method provided by the embodiment of the present application is used to optimize the neural network model, which can improve the computing performance of the terminal device and save the computing time of the terminal device.
  • the neural network model optimization method provided by the present application includes:
  • the neural network model optimization apparatus acquires a first calculation graph of the neural network model.
  • the first calculation graph is a calculation graph directly generated by the terminal device according to the topology structure of the above-mentioned neural network model.
  • the number of computing nodes in the first computing graph is the same as or similar to the number of neurons in the neural network model.
  • the terminal device is preconfigured with a neural network model for image processing (referred to as the first neural network model) for the camera application, and the terminal device is preconfigured with a speech recognition neural network model (referred to as the second neural network model) for the voice assistant.
  • a neural network model for image processing referred to as the first neural network model
  • the terminal device is preconfigured with a speech recognition neural network model (referred to as the second neural network model) for the voice assistant.
  • the terminal device invokes the first neural network model to optimize the captured image to generate a captured image.
  • the terminal device invokes the second neural network model, processes the voice input by the user, and determines the voice input of the user.
  • the terminal device performs corresponding operations according to the user's voice input.
  • the neural network model optimization apparatus described in the embodiments of this application may be a terminal device, a module or unit in the terminal device, or an apparatus integrated in the terminal device.
  • the neural network model optimization apparatus generates a second calculation graph according to the preset rule and the first calculation graph.
  • the time for calculating the first input data by using the second calculation graph is less than the time for calculating the first input data by using the first calculation graph.
  • the preset rule is used to optimize the computation graph of the neural network model, so as to obtain a computation graph with better computation performance and less time required to perform computation tasks. Therefore, for the same input data, the time for the second calculation graph to calculate the input data is less than the time for the first calculation graph to calculate the input data.
  • the neural network model optimization device uses the first calculation graph and the second calculation graph respectively to calculate the same input data.
  • the neural network model optimization device calculates the time for calculating the input data according to the first calculation graph, and the time for calculating the input data in the second calculation graph, and determines whether the time for calculating the input data in the second calculation graph is less than the time for calculating the input data in the first calculation graph. time.
  • the neural network model optimization device outputs a second calculation graph.
  • the second calculation graph is split into corresponding multiple operators, and the operators are converted into The lower-level expression that the chip can understand is sent to the chip, so that the chip can run the neural network model according to the distributed operator.
  • the neural network model optimization method provided by the present application can optimize the first computational graph of the neural network model into a second computational graph with stronger computational performance and less computational time required to perform computational tasks.
  • the computing speed of the neural network model when performing the computing task is improved, and the time required for the neural network model to perform the computing task is reduced.
  • the neural network model optimization method provided by the embodiment of the present application is used to optimize the neural network model, which can improve the computing performance of the terminal device and save the computing time of the terminal device.
  • the software stack in combination with the software stack shown in FIG. 4 above, as shown in FIG. 6 , in this embodiment of the present application, the software stack can be modified to 5 layers, that is, the calculation shown in FIG. 4 .
  • a computational graph optimization layer is added; the computational graph optimization layer is used to implement the optimization method of the neural network model described in the embodiments of the present application.
  • the terminal device when the terminal device performs the computing task according to the software stack shown in FIG. 6 , it can be implemented by the following steps:
  • Step 1 After receiving the computing task, the terminal device invokes the corresponding user program layer to determine the neural network model for executing the computing task.
  • a variety of neural network models for performing different computing tasks can be preset in the terminal device; for example, a neural network model for image processing, a neural network model for speech recognition, and a neural network model for data processing. processing neural network models, etc.
  • the terminal After the terminal receives the computing task, it can select a corresponding neural network model to execute the computing task according to the type of the computing task.
  • the computing task received by the terminal device is an image processing computing task
  • the terminal device determines to use a neural network model for image processing to perform the computing task.
  • the computing task received by the terminal device is a speech recognition computing task
  • the terminal device determines to use the neural network model for speech recognition to perform the computing task.
  • the computing task received by the terminal device is a data processing computing task
  • the terminal device determines to use the neural network model for data processing to perform the computing task.
  • Step 2 The terminal device invokes the computing framework layer to convert the neural network model into a first computing graph.
  • Step 3 The terminal device instructs the neural network model optimization device to call the computation graph optimization layer to optimize the first computation graph into the second computation graph.
  • the terminal device may instruct the neural network model optimization apparatus to optimize the first calculation graph into the second calculation graph by executing the neural network model optimization method described in the embodiment of the present application.
  • Step 4 The terminal device invokes the operator layer to split each computing node in the second computing graph; the terminal device converts each computing node into a lower-level language that can be recognized by the chip, and sends it to the chip layer.
  • Step 5 The terminal device instructs the chip to execute the computing task according to the delivered computing node.
  • the 4-layer structure of the software stack can still be maintained, and the computational graph optimization layer is multiplexed in the computational framework layer. , so as to realize the optimization method of the neural network model described in the embodiment of the present application.
  • the specific implementation process of the terminal invoking the neural network model to perform the computing task is similar to the above steps 1-5.
  • the difference is that the terminal device combines steps 2 and 3, and when the terminal device invokes the computing framework layer, the contents recorded in steps 2 and 3 are implemented in sequence.
  • the four-layer structure of the software stack may still be maintained, and the calculation graph optimization layer may be multiplexed in the calculation framework layer , so as to realize the optimization method of the neural network model described in the embodiment of the present application.
  • the specific implementation process of the terminal invoking the neural network model to perform the computing task is similar to the above steps 1-5.
  • the difference is that the terminal device combines steps 3 and 4, and when the terminal device invokes the operator layer, the contents recorded in steps 3 and 4 are implemented in sequence.
  • the preset rules recorded in the embodiments of the present application include at least one of the following: a mathematical fusion rule, a mathematical splitting rule, an instruction fusion rule, an instruction splitting rule, and a hardware fusion rule. .
  • the above five preset rules are described below respectively.
  • the mathematical fusion rule is: fuse multiple first computing nodes into a second computing node; wherein, the mathematical expression corresponding to the second computing node is: after mathematical derivation of the mathematical expressions corresponding to the multiple first computing nodes The determined mathematical expression; the time for calculating the second input data by using the plurality of first calculation nodes is greater than the time for calculating the second input data by using the second calculation nodes.
  • time used to calculate the second input data by using multiple first computing nodes is greater than the time used to calculate the second input data by using the second computing nodes; it refers to the terminal device calling multiple first computing nodes to calculate the second input data.
  • the sum of the time of the data is greater than the time for the terminal device to call the second computing node to calculate the second input data.
  • the neural network model optimization device fuses a plurality of first computing nodes into a second computing node according to a mathematical fusion rule, which can be specifically implemented as:
  • the neural network model optimization device traverses the calculation nodes in the first calculation graph, and in the case where the mathematical expressions corresponding to a plurality of consecutive first calculation nodes can be deduced as one mathematical expression, the neural network model device calculates the plurality of first calculation nodes.
  • the nodes are merged into a second computing node.
  • the mathematical expression corresponding to the second computing node is a mathematical expression derived according to the mathematical expressions corresponding to the plurality of first computing nodes.
  • the neural network model optimization device has a template for fusing multiple mathematical expressions into one mathematical expression. After the neural network model optimization device determines the corresponding mathematical expressions in the plurality of first nodes, the plurality of mathematical expressions are matched with the mathematical fusion templates in the neural network model optimization device, and after matching the corresponding mathematical fusion templates , and determine the fused mathematical expressions corresponding to the plurality of mathematical expressions according to the mathematical fusion template.
  • the first computing graph includes computing node 1 and computing node 2, wherein computing node 1 is an uplink node of computing node 2, and data is calculated by computing node 1 and computing node 2 in sequence. .
  • a and b are the fixed parameters of the mathematical expression in the calculation node 1, and the values of a and b are fixed values;
  • x 1 is the input data of the calculation node 1 (that is, the data output by the upstream node of the node 1).
  • c and d are fixed parameters of the mathematical expression in the calculation node 2, and the values of c and d are fixed values;
  • x 2 is the input data of the calculation node 2 (that is, the data output by the upstream node of the calculation node 1).
  • the neural network model optimization device deduces the above formula 1 and formula 2, and obtains the following formula 3:
  • x 3 is the input data of calculation node 1 (that is, the data output by the upstream node of node 1), and the values of a and b in formula 3 are the same as formula 1
  • the values of a and b in Equation 3 are the same, and the values of c and d in Equation 3 are the same as the values of c and d in Equation 2.
  • the neural network model optimization device integrates the computing node 1 and the computing node 2 into the computing node 3, and the mathematical expression corresponding to the computing node 3 is the above formula 3.
  • the neural network model optimization apparatus can integrate the computing node 1 and the computing node 2 into the computing node 3, thereby improving the computing performance of the neural network model computing graph and reducing the computing time required for the neural network model computing graph to perform computing tasks.
  • the neural network model optimization device also reduces the number of nodes in the computation graph of the neural network model, and reduces the complexity of the computation graph.
  • the neural network model optimization device uses the mathematical fusion rule to fuse the first calculation graph
  • the obtained calculation graph has fewer calculation nodes, the topology structure of the calculation graph is simpler, and at the same time, the calculation capacity of the calculation graph is stronger. Data takes less time. Therefore, when the neural network model optimization device adopts the mathematical fusion rule to optimize the computation graph of the neural network model, the computation performance of the computation graph of the neural network model can be improved, and the computation time required for the computation graph of the neural network model to perform computation tasks can be reduced.
  • the mathematical splitting rule is: splitting a third computing node into multiple fourth computing nodes; wherein, the mathematical expression corresponding to the third computing node is: mathematically deriving the mathematical expressions corresponding to the multiple fourth computing nodes The mathematical expression determined later; the time for calculating the third input data by using the third calculation node is longer than the time for calculating the third input data by using a plurality of fourth calculation nodes.
  • the computation time for the computation node to perform the computation task is longer than the computation time after the computation node is split into multiple computation nodes. The time required for subsequent multiple computing nodes to perform computing tasks in sequence.
  • the computing complexity of the mathematical expression may exceed the computing capability of the computing node. This will cause the computing performance of the computing node to decrease, and it will take a long time for the computing task to be performed by the node.
  • a computing node has a weak ability to calculate complex mathematical expressions, and the computing node is split into multiple computing nodes. capacity is enhanced.
  • the neural network model optimization device can split the one computing node into multiple computing nodes through mathematical splitting rules, so as to improve the computing performance of the neural network model computing graph and reduce the computational performance of the neural network model computing graph. time required for the task.
  • g and h are fixed parameters of the mathematical expression in the calculation node 4
  • the values of g and h are fixed values
  • x 4 is the input data of the calculation node 4 (that is, the data output by the upstream node of the calculation node 4).
  • the neural network model optimization device can split formula 4 into the following two calculation formulas: formula 5 and formula 6:
  • g is the same as the value of g in the above formula 4, and x 5 is the input data of the computing node 4 (that is, the data output by the upstream node of the computing node 4).
  • x 6 is the output data determined after the operation according to formula 5 (ie, the output data of calculation node 5 and the input data of calculation node 6).
  • the neural network model optimization device determines that the time that the computing node performs the computing task according to formula 4 is greater than the time that the computing node 5 performs the computing task according to the formula 5 and the time that the computing node 6 performs the computing task according to the formula 6. That is to say, for the same input data, the time for computing node 4 to calculate the input data is longer than the time for computing node 5 and computing node 6 to sequentially calculate the input data.
  • the neural network model optimization device splits the above computing node 4 into a computing node 5 and a computing node 6 , the mathematical expression corresponding to the computing node 5 is formula 5, and the mathematical expression corresponding to the computing node 6 is formula 6.
  • the neural network model optimization device adopts a mathematical splitting rule to split a computing node into multiple computing nodes, because the time for the multiple computing nodes to perform the computing task after splitting is shorter than that for a computing node before the splitting. The time to perform the computation task. Therefore, the neural network model optimization device using mathematical splitting rules to optimize the computation graph of the neural network model can also improve the computational performance of the computation graph of the neural network model and reduce the time required for the computation graph of the neural network model to compute data.
  • the instruction fusion rule is: according to the received node fusion instruction, fuse multiple fifth computing nodes into a sixth computing node; wherein, the node fusion instruction is used to instruct to fuse multiple fifth computing nodes into a sixth computing node ;
  • the time for calculating the fourth input data by using the plurality of fifth computing nodes is greater than the time for calculating the fourth input data by using the sixth computing nodes.
  • the fusion instruction in the instruction fusion rule may be issued by a staff member through a compiler or the like, or may be issued by other devices that interact with the neural network model optimization device.
  • the neural network model optimization device optimizes the first calculation graph according to one or more of the mathematic fusion rules, the mathematic splitting rules, and the hardware fusion rules to obtain a third calculation graph.
  • the staff can manually review the third calculation graph to determine whether there are nodes in the third calculation graph that can perform node fusion. If the staff determines that there is a node that can perform node fusion in the third calculation graph, the staff determines the fusion mode of the node, and issues a node fusion instruction through the compiler.
  • the compiler sends the node fusion instruction to the neural network model optimization device.
  • the neural network model optimization device optimizes the corresponding nodes according to the received instructions.
  • the staff issues the node fusion instruction through the compiler, the staff enters the program code written by the upper-level voice corresponding to the node fusion instruction in the compiler; the compiler compiles the program code written by the upper-level voice into The lower-level language that can be recognized by the neural network model optimization device is sent to the neural network model optimization device.
  • the neural network model optimization device uses the instruction fusion rule to fuse the first computation graph
  • the obtained computation graph has fewer computation nodes, the topology structure of the computation graph is simpler, and the computation graph of the computation graph is more powerful. , it takes less time to compute the data. Therefore, when the neural network model optimization device uses the instruction fusion rule to optimize the computation graph of the neural network model, the computation performance of the computation graph of the neural network model can be improved, and the computation time required for the computation graph of the neural network model to perform computation tasks can be reduced.
  • the node fusion instruction in the instruction fusion rule may be a manual input instruction.
  • the neural network model optimization device can fuse the nodes in the neural network model calculation graph according to the manually input instruction, which improves the applicable scenarios of the neural network model optimization method.
  • the instruction splitting rule is: according to the received node splitting instruction, split a seventh computing node into multiple eighth computing nodes; wherein, the node splitting instruction is used to instruct to split a seventh computing node into multiple eighth computing nodes. an eighth computing node; the time for computing the fifth input data by using the seventh computing node is greater than the time for computing the fifth input data by using a plurality of eighth computing nodes.
  • the instruction splitting rule is used to instruct to split a node into multiple nodes.
  • the specific implementation of the instruction splitting rule is similar to the above-mentioned instruction fusion rule. It only needs to replace the node fusion related content with the node splitting content.
  • the specific implementation can refer to the above description of the node fusion rule. Repeat.
  • the neural network model optimization device adopts the instruction splitting rule. After splitting a computing node into multiple computing nodes, the time for the multiple computing nodes to perform the computing task after splitting is shorter than that for a computing node before the splitting. The time to perform the computation task. Therefore, using the instruction splitting rule to optimize the computation graph of the neural network model by the neural network model optimization device can also improve the computational performance of the computation graph of the neural network model and reduce the time required for the computation graph of the neural network model to compute data.
  • the node splitting instructions in the instruction fusion rule may be manually input instructions.
  • the neural network model optimization device can split the nodes in the neural network model calculation graph according to the manually input instruction, which improves the applicable scenarios of the neural network model optimization method.
  • the hardware fusion rule is: the ninth computing node transmits data to the tenth computing node using the first transmission path; wherein, the time for the ninth computing node to transmit data to the tenth node using the first transmission path is less than that of the ninth computing node.
  • the process of the terminal device performing the computing task in calling two connected computing nodes (computing node 7 and computing node 8, wherein computing node 7 is an uplink node of computing node 8) in the computing graph is:
  • Step I The computing node 7 reads the first data (equivalent to the input data of the computing node 7 ) from the storage device.
  • Step II the computing node 7 calculates the first data to generate the second data (equivalent to the output data of the computing node 7 or the input data of the computing node 8).
  • Step III The computing node 7 stores the second data in the storage device.
  • Step IV the computing node 8 reads the second data from the storage device.
  • Step V the computing node 8 computes the second data to generate the third data.
  • step VI the computing node 8 stores the third data in the storage device.
  • step I and step III performed by the computing node 7
  • steps IV and step VI performed by the computing node 8.
  • the terminal device In the case of a large number of computing nodes in the computing graph, the terminal device needs to perform a large number of reading and writing processes. Limited by the read and write performance of the storage device, the terminal device needs to spend a lot of time to read and write data when invoking the neural network model to perform computing tasks.
  • the computing nodes in the computing graph are improved, so that all or part of the computing nodes in the computing graph can transmit data to each other.
  • the number of interactions between the computing node and the storage device can be reduced, thereby improving the computing performance of the computing graph of the neural network model and reducing the time required for the computing graph of the neural network model to perform computing tasks.
  • the computing node 7 and the computing node 8 can transmit data to each other, and the terminal device calls the computing node 7 and the computing node 8 to perform the computing task process, as shown in Figure 9b:
  • Step VII The computing node 7 reads the first data from the storage device.
  • Step VIII the computing node 7 computes the first data to generate the second data.
  • Step IX the computing node 7 sends the second data to the computing node 8 . Accordingly, the computing node 8 receives the second data from the computing node 7 .
  • Step X the computing node 8 computes the second data to generate the third data.
  • Step XI the computing node 8 stores the third data in the storage device.
  • the neural network model optimization device optimizes the computation graph according to the hardware fusion rule, the number of interactions between the computing node and the hardware device is reduced. Data can be transmitted between each computing node in the computing graph in a more reasonable and fast transmission path.
  • the purpose of improving the computing performance of the computation graph of the neural network model and reducing the time required for the computation graph of the neural network model to perform computation tasks is achieved.
  • At least one of the ways of increasing the bandwidth of the neural network model to access the storage device by replacing the original low-speed storage device with a high-speed storage device can realize the calculation graph of the neural network model.
  • the neural network model optimization apparatus adopts the operator before optimization and the operator after optimization respectively to calculate the same input data.
  • the neural network model optimization device determines whether the time for the optimized operator to calculate the input data is less than the time for the optimized operator to calculate the input data according to the time for the operator before optimization to calculate the input data and the time for the optimized operator to calculate the input data.
  • the sub computes the time for the input data.
  • each network element for example, an apparatus for optimizing a neural network model, includes at least one of a hardware structure and a software module corresponding to performing each function in order to implement the above functions.
  • a hardware structure for example, an apparatus for optimizing a neural network model
  • a software module corresponding to performing each function in order to implement the above functions.
  • the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or by computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • the neural network model optimization apparatus may be divided into functional units according to the above method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and other division methods may be used in actual implementation.
  • the neural network model optimization apparatus 1000 includes: a northbound interface 1001, a southbound interface 1002, and one or more of the following: a mathematic fusion module 1003, a mathematic splitting module 1004, Hardware fusion module 1005 , instruction fusion module 1006 , and instruction splitting module 1007 .
  • the northbound interface 1001 is used for interfacing with the upper computing framework layer. After the computing framework generates the corresponding first computation graph according to the neural network model, the neural network model optimization apparatus 1000 obtains the first computation graph from the computing framework through the northbound interface 1001 .
  • the southbound interface 1002 is used for interfacing with the operator layer of the next layer. After the neural network model optimization apparatus 1000 optimizes the first computation graph and generates the second computation graph, the neural network model optimization apparatus 1000 delivers the second computation graph to the operator layer through the southbound interface 1002, so that the operator layer can The second computing graph is split, and each computing node of the second computing graph is delivered to the chip.
  • the mathematical fusion module 1003 is configured to optimize the first calculation graph according to the mathematical fusion rules described in the above embodiments.
  • the mathematical splitting module 1004 is configured to optimize the first calculation graph according to the mathematical splitting rules described in the above embodiments.
  • the hardware fusion module 1005 is configured to optimize the first computation graph according to the hardware fusion rules described in the above embodiments.
  • the instruction fusion module 1006 is configured to optimize the first computation graph according to the instruction fusion rules described in the above embodiments.
  • the instruction splitting module 1007 is configured to optimize the first computation graph according to the instruction splitting rules described in the above embodiments.
  • northbound interface 1001 and southbound interface 1002 may be implemented in one unit.
  • the northbound interface 1001 and the southbound interface 1002 are integrated in the communication unit.
  • One or more of the above-mentioned mathematical fusion module 1003 , mathematical splitting module 1004 , hardware fusion module 1005 , instruction fusion module 1006 , and instruction splitting module 1007 may also be integrated into one unit.
  • one or more of the math fusion module 1003 , the math splitting module 1004 , the hardware fusion module 1005 , the instruction fusion module 1006 , and the instruction splitting module 1007 are integrated and implemented in the processing unit.
  • FIG. 11 shows another possible schematic diagram of the structure of the neural network model optimization device (referred to as the neural network model optimization device 1100 ) involved in the above embodiment.
  • the neural network model optimizes
  • the apparatus 1100 includes a processing unit 1101 and a communication unit 1102 , and may also include a storage unit 1103 .
  • the schematic structural diagram shown in FIG. 11 can be used to illustrate the structure of the apparatus for optimizing the neural network model involved in the above embodiment.
  • the processing unit 1101 is used to control and manage the actions of the network equipment, for example, control the neural network model optimization apparatus to execute S501 , S502 , and S503 in FIG. 5 , and/or actions performed by the apparatus for optimizing the neural network model in other processes described in the embodiments of the present application.
  • the processing unit 1101 can communicate with other devices through the communication unit 1102 .
  • the storage unit 1103 is used for storing program codes and data of the neural network model optimization apparatus.
  • the neural network model optimization apparatus 1100 may be a neural network model optimization apparatus, or may be a neural network model optimization apparatus. chip.
  • the processing unit 1101 may be a processor or a controller, and the communication unit 1102 may be a communication interface, a transceiver, a transceiver, a transceiver circuit, a transceiver, and the like.
  • the communication interface is a general term, which may include one or more interfaces.
  • the storage unit 1103 may be a memory.
  • the processing unit 1101 may be a processor or a controller, and the communication unit 1102 may be an input interface and/or an output interface, pins or circuit etc.
  • the storage unit 1103 may be a storage unit (for example, a register, a cache, etc.) in the chip, or a storage unit (for example, a read-only memory (ROM) located outside the chip in the neural network model optimization apparatus or the neural network model optimization apparatus.
  • ROM read-only memory
  • RAM random access memory
  • the communication unit may also be referred to as a transceiver unit.
  • the antenna and control circuit with transceiver function in the neural network model optimization apparatus 1100 can be regarded as the communication unit 1102 of the neural network model optimization apparatus 1100
  • the processor with processing function can be regarded as the processing unit 1101 of the neural network model optimization apparatus 1100 .
  • a device in the communication unit 1102 for implementing the receiving function may be regarded as a communication unit, the communication unit is used to perform the receiving steps in the embodiments of the present application, and the communication unit may be a receiver, a receiver, a receiving circuit, and the like.
  • the device in the communication unit 1102 for implementing the sending function may be regarded as a sending unit, the sending unit is used to perform the sending step in the embodiments of the present application, and the sending unit may be a transmitter, a transmitter, a sending circuit, or the like.
  • the integrated units in FIG. 11 can be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as independent products.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage
  • the medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a neural network model optimization apparatus, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • Storage media for storing computer software products include: U disk, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
  • the units in FIG. 11 may also be referred to as modules, eg, a processing unit may be referred to as a processing module.
  • the embodiment of the present application also provides a schematic diagram of the hardware structure of a neural network model optimization device (referred to as a neural network model optimization device 1200 ).
  • the neural network model optimization device 1200 includes a processor 1201 , an optional , and also includes a memory 1202 connected to the processor 1201.
  • the neural network model optimization apparatus 1200 further includes a transceiver 1203 .
  • the processor 1201, the memory 1202 and the transceiver 1203 are connected by a bus.
  • the transceiver 1203 is used to communicate with other devices or communication networks.
  • the transceiver 1203 may include a transmitter and a receiver.
  • a device in the transceiver 1203 for implementing the receiving function may be regarded as a receiver, and the receiver is configured to perform the receiving steps in the embodiments of the present application.
  • a device in the transceiver 1203 for implementing the sending function may be regarded as a transmitter, and the transmitter is used to perform the sending step in the embodiment of the present application.
  • the schematic structural diagram shown in FIG. 12 may be used to illustrate the structure of the neural network model optimization apparatus or the neural network model optimization apparatus involved in the foregoing embodiments.
  • the processor 1201 is used to control and manage the actions of the neural network model optimization apparatus.
  • the processor 1201 uses The actions performed by the neural network model optimization device in supporting the neural network model optimization device to perform S501 , S502 , and S503 in FIG. 5 , and/or other processes described in the embodiments of the present application.
  • the processor 1201 may communicate with other network entities through the transceiver 1203 .
  • the memory 1202 is used to store program codes and data of the neural network model optimization apparatus.
  • the processor 1201 includes a logic circuit and at least one of an input interface and an output interface. Wherein, the output interface is used for executing the sending action in the corresponding method, and the input interface is used for executing the receiving action in the corresponding method.
  • FIG. 13 the schematic structural diagram shown in FIG. 13 may be used to illustrate the structure of the apparatus for optimizing the neural network model involved in the foregoing embodiment.
  • the processor 1201 is used to control and manage the actions of the neural network model optimization device.
  • the processor 1201 uses The actions performed by the neural network model optimization device in supporting the neural network model optimization device to perform S501 , S502 , and S503 in FIG. 5 , and/or other processes described in the embodiments of the present application.
  • the processor 1201 may communicate with other network entities through at least one of an input interface and an output interface.
  • the memory 1202 is used to store program codes and data of the neural network model optimization apparatus.
  • FIG. 12 and FIG. 13 may also illustrate the system chip in the neural network model optimization apparatus.
  • the actions performed by the above-mentioned apparatus for optimizing the neural network model can be implemented by the system chip, and the specific actions performed can refer to the above, and details are not repeated here.
  • FIG. 12 and FIG. 13 can also illustrate the system chip in the neural network model optimization apparatus.
  • the actions performed by the above-mentioned apparatus for optimizing the neural network model can be implemented by the system chip, and the specific actions performed can refer to the above, and details are not repeated here.
  • each step in the method provided in this embodiment may be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the processor in this application may include, but is not limited to, at least one of the following: a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a microcontroller (MCU), or Artificial intelligence processors and other types of computing devices that run software, each computing device may include one or more cores for executing software instructions to perform operations or processing.
  • the processor can be a separate semiconductor chip, or can be integrated with other circuits into a semiconductor chip. For example, it can form a SoC (on-chip) with other circuits (such as codec circuits, hardware acceleration circuits, or various bus and interface circuits).
  • the processor may further include necessary hardware accelerators, such as field programmable gate arrays (FPGA), PLDs (Programmable Logic Devices) , or a logic circuit that implements dedicated logic operations.
  • FPGA field programmable gate arrays
  • PLD Programmable Logic Devices
  • the memory in this embodiment of the present application may include at least one of the following types: read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory) , RAM) or other types of dynamic storage devices that can store information and instructions, and can also be electrically erasable programmable read-only memory (Electrically erasable programmable read-only memory, EEPROM).
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • the memory may also be compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.) , a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk storage medium or other magnetic storage device or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation.
  • Embodiments of the present application further provide a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute any of the foregoing methods.
  • Embodiments of the present application also provide a computer program product containing instructions, which, when run on a computer, enables the computer to execute any of the above methods.
  • An embodiment of the present application further provides a chip, the chip includes a processor and an interface circuit, the interface circuit is coupled to the processor, the processor is used to run a computer program or instructions to implement the above method, and the interface circuit is used to connect with the processor. communicate with other modules outside the chip.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center over a wire (e.g.
  • Coaxial cable, optical fiber, digital subscriber line (DSL) or wireless means to transmit to another website site, computer, server or data center.
  • Computer-readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc., that can be integrated with the media.
  • Useful media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供一种神经网络模型优化方法及装置,涉及人工智能技术领域,用于提高神经网络模型的计算性能,降低执行计算任务时间。该方法包括:获取神经网络模型的第一计算图;根据预设规则,以及第一计算图,生成第二计算图;其中,针对同样的输入数据,第二计算图计算该输入数据的时间,少于第一计算图计算该输入数据的时间;预设规则包括以下至少一项:数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则;输出第二计算图。这样,将神经网络模型的第一计算图,优化为计算性能更强,执行计算任务时所需计算时间更少的第二计算图。从而提高了神经网络模型执行计算任务时的计算速度,降低了执行计算任务时间。

Description

神经网络模型优化方法及装置 技术领域
本申请涉及人工智能(artificial intelligence,AI)技术领域,尤其涉及一种神经网络模型优化方法及装置。
背景技术
在神经网络模型中,通常用计算图(Graph)表征神经网络模型的计算过程。神经网络模型的计算图是一种将神经网络模型中的各个神经元拆分为面向张量数据的算子后得到的计算图。该计算图能够表征各个算子的数学表达以及算子之间的连接关系,也即能够表征神经网络模型的神经元的数学表达,以及神经元之间的连接关系。
由于神经网络结构通常比较复杂,在将神经网络模型映射为计算图之后,计算图的拓扑也将比较复杂,计算复杂度较高,执行计算任务时,所需的计算时间比较长。
发明内容
本申请提供一种神经网络模型优化方法及装置,解决了现有技术中通过神经网络模型执行计算任务时,所需的计算时间较长的问题。
为解决上述问题,本申请采用如下技术方案:
第一方面,提供一种神经网络模型优化方法,包括:获取神经网络模型的第一计算图;根据预设规则,以及第一计算图,生成第二计算图;其中,利用第二计算图计算第一输入数据的时间,少于利用第一计算图计算第一输入数据的时间;预设规则包括以下至少一项:数学融合规则,数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则;输出第二计算图。
基于上述技术方案,本申请提供的神经网络模型优化方法,能够将神经网络模型的第一计算图,优化为计算性能更强,执行计算任务时所需计算时间更少的第二计算图。从而提高了神经网络模型执行计算任务时的计算速度,降低了神经网络模型执行计算任务所需的时间。
相应的,在配置有神经网络模型的终端设备(以下简称终端设备)调用神经网络模型执行计算任务时,采用本申请实施例提供的神经网络模型优化方法对神经网络模型进行优化,可以提高终端设备的计算性能,节省终端设备的计算时间。
结合上述第一方面,在一种可能的实现方式中,数学融合规则为:将多个第一计算节点,融合为一个第二计算节点;其中,第二计算节点对应的数学表达式为:对多个第一计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用多个第一计算节点计算第二输入数据的时间,大于利用第二计算节点计算第二输入数据的时间。
基于此,神经网络模型优化装置采用数学融合规则对第一计算图进行融合之后,得到的计算图的计算节点数量更少,计算图的拓扑结构更加简单,同时计算图的计算能力更强,计算数据所需的时间更少。因此,神经网络模型优化装置采用数学融合规 则对神经网络模型的计算图进行优化时,可以提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的计算时间。
结合上述第一方面,在一种可能的实现方式中,数学拆分规则为:数学拆分规则为:将一个第三计算节点拆分为多个第四计算节点;其中,第三计算节点对应的数学表达式为:对多个第四计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用第三计算节点计算第三输入数据的时间,大于利用多个第四计算节点计算第三输入数据的时间。
基于此,神经网络模型优化装置采用数学拆分规则,将一个计算节点拆分为多个计算节点之后,由于拆分后的多个计算节点执行计算任务的时间,小于拆分前的一个计算节点执行计算任务的时间。因此,神经网络模型优化装置采用数学拆分规则对神经网络模型的计算图进行优化同样可以提高神经网络模型计算图的计算性能,降低神经网络模型计算图计算数据所需的时间。
结合上述第一方面,在一种可能的实现方式中,指令融合规则为:根据接收到的节点融合指令,将多个第五计算节点融合为一个第六计算节点;其中,节点融合指令用于指示将多个第五计算节点融合为一个第六计算节点;利用多个第五计算节点计算第四输入数据的时间,大于利用第六计算节点计算第四输入数据的时间。
基于此,神经网络模型优化装置采用指令融合规则对第一计算图进行融合之后,得到的计算图的计算节点数量更少,计算图的拓扑结构更加简单,同时计算图的计算能力更强,计算数据所需的时间更少。因此,神经网络模型优化装置采用指令融合规则对神经网络模型的计算图进行优化时,可以提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的计算时间。
此外,指令融合规则中的节点融合指令可以为人工输入的指令。此时,神经网络模型优化装置可以根据人工输入的指令对神经网络模型计算图中的节点进行融合,提升了神经网络模型优化方法的适用场景。
结合上述第一方面,在一种可能的实现方式中,指令拆分规则用于:根据接收到的节点拆分指令,将一个第七计算节点拆分为多个第八计算节点;其中,节点拆分指令用于指示将一个第七计算节点拆分为多个第八计算节点;利用第七计算节点计算第五输入数据的时间,大于利用多个第八计算节点计算第五输入数据的时间。
基于此,神经网络模型优化装置采用指令拆分规则,将一个计算节点拆分为多个计算节点之后,由于拆分后的多个计算节点执行计算任务的时间,小于拆分前的一个计算节点执行计算任务的时间。因此,神经网络模型优化装置采用指令拆分规则对神经网络模型的计算图进行优化同样可以提高神经网络模型计算图的计算性能,降低神经网络模型计算图计算数据所需的时间。
此外,指令融合规则中的节点拆分指令可以为人工输入的指令。此时,神经网络模型优化装置可以根据人工输入的指令对神经网络模型计算图中的节点进行拆分,提升了神经网络模型优化方法的适用场景。
结合上述第一方面,在一种可能的实现方式中,硬件融合规则为:第九计算节点采用第一传输路径向第十计算节点传输数据;其中,第九计算节点采用第一传输路径向第十节点传输数据的时间小于第九计算节点采用第二传输路径向第十节点传输数据 的时间;第二传输路径为第一计算图中第九计算节点向第十节点传输数据的传输路径。
基于此,神经网络模型可以通过优化数据在节点中的传输路径,达到提高神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需时间的目的。
第二方面,提供一种神经网络模型优化装置,包括:通信单元和处理单元。通信单元,用于获取神经网络模型的第一计算图;处理单元,用于根据预设规则,以及第一计算图,生成第二计算图;第二计算图计算第一输入数据的时间,少于第一计算图计算第一输入数据的时间;预设规则包括以下至少一项:数学融合规则,数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则;通信单元,还用于输出第二计算图。
结合上述第二方面,在一种可能的实现方式中,数学融合规则为:将多个第一计算节点,融合为一个第二计算节点;其中,第二计算节点对应的数学表达式为:对多个第一计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用多个第一计算节点计算第二输入数据的时间,大于第二计算节点计算第二输入数据的时间。
结合上述第二方面,在一种可能的实现方式中,数学拆分规则为:将一个第三计算节点拆分为多个第四计算节点;其中,第三计算节点对应的数学表达式为:对多个第四计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用第三计算节点计算第三输入数据的时间,大于多个第四计算节点计算第三输入数据的时间。
结合上述第二方面,在一种可能的实现方式中,指令融合规则为:根据接收到的节点融合指令,将多个第五计算节点融合为一个第六计算节点;其中,节点融合指令,用于指示将多个第五计算节点融合为一个第六计算节点;利用多个第五计算节点计算第四输入数据的时间,大于第六计算节点计算第四输入数据的时间。
结合上述第二方面,在一种可能的实现方式中,指令拆分规则用于:根据接收到的节点拆分指令,将一个第七计算节点拆分为多个第八计算节点;其中,节点拆分指令,用于指示将一个第七计算节点拆分为多个第八计算节点;利用第七计算节点计算第五输入数据的时间,大于多个第八计算节点计算第五输入数据的时间。
结合上述第二方面,在一种可能的实现方式中,硬件融合规则为:第九计算节点采用第一传输路径向第十计算节点传输数据;其中,第九计算节点采用第一传输路径向第十节点传输数据的时间小于第九计算节点采用第二传输路径向第十节点传输数据的时间;第二传输路径为第一计算图中第九计算节点向第十节点传输数据的传输路径。
第三方面,本申请提供了一种神经网络模型优化装置,包括:处理器和存储介质;存储介质包括指令,处理器用于运行指令,以实现如第一方面和第一方面的任一种可能的实现方式中所描述的方法。
第四方面,本申请提供了一种计算机可读存储介质,计算机可读存储介质中存储有指令,当该指令在神经网络模型优化装置上运行时,使得神经网络模型优化装置执行如第一方面和第一方面的任一种可能的实现方式中所描述的方法。
第五方面,本申请提供一种包含指令的计算机程序产品,当该计算机程序产品在神经网络模型优化装置上运行时,使得神经网络模型优化装置执行如第一方面和第一方面的任一种可能的实现方式中所描述的方法。
应当理解的是,本申请中对技术特征、技术方案、有益效果或类似语言的描述并 不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或有益效果的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或有益效果。因此,本说明书中对于技术特征、技术方案或有益效果的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合本实施例中所描述的技术特征、技术方案和有益效果。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或有益效果即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和有益效果。
附图说明
图1为本申请实施例提供的一种系统架构的结构示意图;
图2为本申请实施例体用的一种卷积神经网络结构示意图;
图3为本申请实施例提供的一种神经网络模型的计算图的结构示意图;
图4为本申请实施例提供的现有技术中的软件栈的架构示意图;
图5为本申请实施例提供的一种神经网络模型优化方法的流程示意图;
图6为本申请实施例提供的一种改进后的软件栈的架构示意图;
图7为本申请实施例提供的一种采用数学融合规则进行节点优化的示意图;
图8为本申请实施例提供的一种采用数学拆分规则进行节点优化的示意图;
图9a为本申请实施例提供的现有技术中计算节点执行计算任务的流程示意图;
图9b为本申请实施例提供的采用硬件融合规则优化后的计算节点执行计算任务的流程示意图;
图10为本申请实施例提供的一种神经网络模型优化装置的结构示意图;
图11为本申请实施例提供的另一种神经网络模型优化装置的结构示意图;
图12为本申请实施例提供的一种神经网络模型优化装置的硬件结构示意图;
图13为本申请实施例提供的又一种神经网络模型优化装置的硬件结构示意图。
具体实施方式
在本申请的描述中,除非另有说明,“/”表示“或”的意思,例如,A/B可以表示A或B。本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,“至少一个”是指一个或多个,“多个”是指两个或两个以上。“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请提供的神经网络模型可以为任意一种人工神经网络模型,例如卷积神经网络模型,反向传播(back propagation,BP)神经网络模型等,本申请实施例对此不做具体限定。
图1是本申请实施例提供的一种系统架构100。在图1中,数据采集设备160用于采集训练数据。以用于图像处理的目标模型101为例来说,训练数据可以包括训练 图像以及训练图像对应的分类结果,其中,训练图像的结果可以是人工预先标注的结果。目标模型101也可以称为目标规则101。
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型101进行描述,训练设备120对输入的原始图像进行处理,将输出的图像与原始图像进行对比,直到训练设备120输出的图像与原始图像的差值小于一定的阈值,从而完成目标模型101的训练。
本申请实施例中的目标模型101具体可以为神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型101可以应用于不同的系统或设备中,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。
训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型101,该相应的目标模型101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
根据训练设备120训练得到目标模型101,可以是CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNNS)等等。
值得注意的是,图1仅是本申请实施例提供的一种系统架构的示意图,图1中所示设备、器件、模块等之间的位置关系、训练数据的类型以及神经网络的类型或功能不构成任何限制。例如,在图1中,模型转换器110可以置于客户设备140中。又如,其中的训练数据也可以是文本、语音或其他类型的数据。又如,模型转换器也可以有其他名称,例如模型编译器等等,只要能实现与模型转换器110类似功能的设备或装置都可理解为本申请中的模型转换器。
训练设备120训练的目标模型101模型文件是平台无关的(即经过编译可运行在不同的硬件平台上),如果想在客户设备140上应用目标模型101,则训练设备120训练好的目标模型101需要经过模型转换器110的处理,将目标模型101的模型文件从当前格式编译到客户设备支持的格式。
例如,目标模型101是TensorFlow框架下开发得到的模型,则需要将目标模型101的模型文件输入模型转换器110,模型转换器110对目标模型101进行编译,得到客户设备140支持的模型文件,然后再将编译得到的模型文件部署到客户设备140上。通常来说,模型转换器110对目标模型101的转换处理,也可以称为编译。
为了编译成功,自定义算子开发者还需要向模型转换器110提供模型中的各层包括的算子的参数定义函数、参数解析函数、输出张量(shape)大小的推导函数、实现函数以及调用(forward)函数等内容。
又如,目标模型101是TensorFlow框架下开发得到的模型,且目标模型101中部 分或全部层中的算子是开发者自定义的,即不属于TensorFlow框架的AI软件栈中的算子的情况下,开发者在将目标模型101的模型文件输入模型转换器110,以通过模型转换器110编译得到可以运行在客户设备上的模型文件时,还需要向模型转换器110提供自定义算子的参数定义函数、参数解析函数、输出大小(shape)的推导函数、实现函数以及调用(forward)函数等内容。
本申请实施例中的神经网络的结构可以如图2所示。
如图2所示,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及神经网络层230。
卷积层/池化层220:
卷积层:
如图2所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图2中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层230中可以包括多层隐含层(如图2所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在神经网络层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图2由210至240方向的传播为前向传播)完成,反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络200仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。
为了便于理解本申请实施例提供的技术方案,首先对本申请实施例中的部分用语进行解释说明。
1、神经网络模型
神经网络模型是由大量处理单元(记为神经元)互相连接组成的信息处理系统,神经网络模型中的神经元中包含有相应的数学表达式。数据输入神经元之后,神经元运行其包含的数学表达式,对输入数据进行计算,生成输出数据。其中,每个神经元的输入数据为与其连接的上一个神经元的输出数据;每个神经元的输出数据为与其连接的下一个神经元的输入数据。
在神经网络模型中,输入数据之后,神经网络模型根据自身的学习训练,为输入数据选择相应的神经元,并根据这些神经对对输入数据进行计算,确定并输出最终的运算结果。同时,神经网络在数据运算过程中还可以不断学习进化,根据对运算结果 的反馈不断优化自身的运算过程,神经网络模型运算训练次数越多,得到的结果反馈越多,计算的结果越准确。神经网络模型的神经元的数量通常是固定的,但是每个神经元中的数学表达式,或者神经元对应的权重值可以根据神经网络模型的不断训练而不断变化。
2、计算图(Graph)
计算图用于以直观的形式表达神经网络模型执行计算任务时的计算过程,使神经网络模型执行计算任务时的计算过程更加清晰明了。
在终端设备调用神经网络模型执行计算任务时,终端设备根据计算任务调用相应的神经网络模型,并将该神经网络转换为相应的计算图。在此之后,终端设备将该计算图进一步拆分为单个算子的形式,以芯片能够识别的下层语言形式下发给芯片,之后芯片运行各个算子,达到根据神经网络模型执行计算任务的目的。
一种示例,如图3所示,为本申请实施例提供的一种神经网络模型的计算图的结构。其中abc代表三个输入,节点1、节点2和节点3分别表征计算图的三个计算节点,节点之间的连接关系以带箭头的线段示出,其中,箭头的指向记为数据的传输方向。如图3所示的计算图的计算过程为:终端设备将3个输入数据:a=4,b=6,c=3输入到神经网络模型的计算图中。首先,将数据b=6和c=3输入到计算节点1中,执行计算节点1中的计算过程,得到的输出数据为u=18。其次,将数据a=4和计算节点1中的输出数据u=18,输入到计算节点2中,执行计算节点2中的计算过程,得到的输出数据为v=22。最终,将计算节点2中的输出数据v=22,输入到计算节点3中,得到最终的输出结果j=66。
需要指出的是,图3仅为示例性说明,实际运用中的计算图可能更为复杂。
3、算子
算子用于表征计算图中,每个计算节点的计算过程。例如上述图3中,计算节点1中的数学表达式:u=b×c,记为节点1中的算子。
4、使芯片和神经网络模型结合的软件栈
为了使芯片和神经网络模型更好的结合,以更好的发挥出芯片和神经网络模型的计算性能,提出了一种如图4所示的使芯片和神经网络模型结合的软件栈。
如图4所示,该软件栈包括以下四部分:用户程序层,计算框架层,算子层,芯片层。
其中,用户程序层为神经网络模型的上层语言表达,例如,使用python语言表达的神经网络模型。
计算框架层用于将上层语言表达的神经网络模型,转化为通用的或者特定的计算图的表达形式。
算子层用于拆分计算框架的计算图中的各个计算节点,并将这些计算节点转换为芯片能够识别的下层语言,并将转换后的计算节点下发给芯片。
芯片层用于运行下发的各个计算节点,达到使用该神经网络模型计算数据的效果。
以上是对本申请涉及到的部分内容以及概念所作的简单介绍。
为了解决现有技术中,神经网络模型的计算图拓扑复杂,计算复杂度较高,执行计算任务时,所需的计算时间比较长的问题,本申请实施例提供的一种神经网络模型 优化方法,神经网络模型优化装置在将获取神经网络模型的第一计算图之后;根据预设规则,以及第一计算图,生成第二计算图;其中,针对同样的输入数据,第二计算图计算该输入数据的时间,少于第一计算图计算该输入数据的时间;在此之后,神经网络模型优化装置输出第二计算图。
基于上述技术方案,本申请提供的神经网络模型优化方法,能够将神经网络模型的第一计算图,优化为计算性能更强,执行计算任务时所需计算时间更少的第二计算图。从而提高了神经网络模型执行计算任务时的计算速度,降低了神经网络模型执行计算任务所需的时间。
相应的,在终端设备调用神经网络模型执行计算任务时,采用本申请实施例提供的神经网络模型优化方法对神经网络模型进行优化,可以提高终端设备的计算性能,节省终端设备的计算时间。
以下,对本申请提供的神经网络模型优化方法进行详细描述。如图5所示,本申请实施例提供的神经网络模型优化方法包括:
S501、神经网络模型优化装置获取神经网络模型的第一计算图。
其中,第一计算图为终端设备根据上述神经网络模型的拓扑结构,直接生成的计算图。第一计算图中的计算节点的个数与神经网络模型中的神经元的数量相同或相近。
需要指出的是,在当前的终端设备(例如手机)中,通常会为不同的应用预置不同的神经网络模型,在终端执行不同的应用的计算任务时,通过调用该应用对应的神经网络模型执行计算任务。
例如,终端设备预先为相机应用配置了进行图像处理的神经网络模型(记为第一神经网络模型),终端设备预先为语音助手配置了语音识别神经网络模型(记为第二神经网络模型)。
在终端设备的相机应用被打开,并执行完拍摄动作之后。终端设备调用第一神经网络模型对拍摄的图像进行优化处理,生成拍摄后的图像。
在终端设备的语音助手应用被打开,并检测到语音助手应用的语音输入之后,终端设备调用第二神经网络模型,对用户输入的语音进行处理,确定用户的语音输入。终端设备根据用户的语音输入执行相应的操作。
需要说明的是,本申请实施例中所记载的神经网络模型优化装置,可以是终端设备,也可以是终端设备中的一个模块或单元,或者是集成在终端设备中的装置。
S502、神经网络模型优化装置根据预设规则,以及第一计算图,生成第二计算图。
其中,利用第二计算图计算第一输入数据的时间,少于利用第一计算图计算第一输入数据的时间。
一种可能的实现方式中,该预设规则用于对神经网络模型的计算图进行优化,以得到计算性能更好,执行计算任务所需时间更少的计算图。因此,针对同样的输入数据,第二计算图计算输入数据的时间,少于第一计算图计算输入数据的时间。
具体来说,神经网络模型优化装置分别采用第一计算图和第二计算图,计算相同的输入数据。神经网络模型优化装置根据第一计算图计算该输入数据的时间,以及第二计算图计算该输入数据的时间,确定第二计算图的计算输入数据的时间是否少于第一计算图计算输入数据的时间。
S503、神经网络模型优化装置输出第二计算图。
一种可能的实现方式中,神经网络模型优化装置在获取到神经网络模型优化装置输出的第二计算图之后,将第二计算图拆分为相应的多个算子,并将算子转换为芯片能够理解的下层表达,下发给芯片,以使得芯片根据下发的算子,运行该神经网络模型。
基于上述技术方案,本申请提供的神经网络模型优化方法,能够将神经网络模型的第一计算图,优化为计算性能更强,执行计算任务时所需计算时间更少的第二计算图。从而提高了神经网络模型执行计算任务时的计算速度,降低了神经网络模型执行计算任务所需的时间。
相应的,在终端设备调用神经网络模型执行计算任务时,采用本申请实施例提供的神经网络模型优化方法对神经网络模型进行优化,可以提高终端设备的计算性能,节省终端设备的计算时间。
一种可能的实现方式中,结合上述图4所示出的软件栈,如图6所示,在本申请实施例中,可以将软件栈修改为5层,即在图4中示出的计算框架层和算子层之间,增加计算图优化层;该计算图优化层用于实现本申请实施例中所记载的神经网络模型的优化方法。
具体来说,终端设备根据如图6所示的软件栈执行计算任务时,可以通过如下步骤实现:
步骤1、终端设备接收到计算任务之后,调用相应的用户程序层,确定执行该计算任务的神经网络模型。
需要指出的是,终端设备中可以预置多种用于执行不同计算任务的神经网络模型;例如,用于进行图像处理的神经网络模型,用于进行语音识别的神经网络模型,用于进行数据处理的神经网络模型等。在终端接收到计算任务之后,可以根据该计算任务的类型,选择对应的神经网络模型来执行该计算任务。
一种示例,终端设备接收到的计算任务为图像处理计算任务,则终端设备确定使用用于进行图像处理的神经网络模型执行该计算任务。
再一种示例,终端设备接收到的计算任务为语音识别计算任务,则终端设备确定使用用于进行语音识别的神经网络模型执行该计算任务。
又一种示例,终端设备接收到的计算任务为数据处理计算任务,则终端设备确定使用用于进行数据处理的神经网络模型执行该计算任务。
步骤2、终端设备调用计算框架层,将神经网络模型转换为第一计算图。
步骤3、终端设备指示神经网络模型优化装置调用计算图优化层,将第一计算图优化为第二计算图。
具体来说,终端设备可以指示神经网络模型优化装置通过执行本申请实施例所记载的神经网络模型的优化方法,将第一计算图优化为第二计算图。
步骤4、终端设备调用算子层,拆分第二计算图中的每个计算节点;终端设备将每个计算节点转换为芯片能够识别的下层语言,下发至芯片层。
步骤5、终端设备指示芯片根据下发的计算节点,执行该计算任务。
又一种可能的实现方式中,结合上述图4所示出的软件栈,在本申请实施例中, 可以仍旧保持该软件栈的4层结构,将计算图优化层复用在计算框架层中,以实现本申请实施例所记载的神经网络模型的优化方法。
在该情况下,终端调用神经网络模型执行计算任务的具体实现过程与上述步骤1-步骤5类似。区别之处在于,终端设备将步骤2和步骤3合并,在终端设备调用计算框架层时,依次实现步骤2和步骤3中所记载的内容。
另一种可能的实现方式中,结合上述图4所示出的软件栈,在本申请实施例中,可以仍旧保持该软件栈的4层结构,将计算图优化层复用在计算框架层中,以实现本申请实施例所记载的神经网络模型的优化方法。
在该情况下,终端调用神经网络模型执行计算任务的具体实现过程与上述步骤1-步骤5类似。区别之处在于,终端设备将步骤3和步骤4合并,在终端设备调用算子层时,依次实现步骤3和步骤4中所记载的内容。
一种可能的实现方式中,结合上述S502,本申请实施例所记载的预设规则包括以下至少一项:数学融合规则,数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则。以下分别对上述五种预设规则进行说明。
Ⅰ、数学融合规则
数学融合规则为:将多个第一计算节点,融合为一个第二计算节点;其中,第二计算节点对应的数学表达式为:对多个第一计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用多个第一计算节点计算第二输入数据的时间,大于利用第二计算节点计算第二输入数据的时间。
需要指出的是,利用多个第一计算节点计算第二输入数据的时间,大于利用第二计算节点计算第二输入数据的时间;指的是终端设备调用多个第一计算节点计算第二输入数据的时间之和,大于终端设备调用第二计算节点计算第二输入数据的时间。
一种可能的实现方式中,神经网络模型优化装置根据数学融合规则,将多个第一计算节点,融合为一个第二计算节点,具体可以实现为:
神经网络模型优化装置遍历第一计算图中的计算节点,在连续多个第一计算节点对应的数学表达式能够推导为一个数学表达式的情况下,神经网络模型装置将该多个第一计算节点融合为第二计算节点。其中,第二计算节点对应的数学表达式为根据多个第一计算节点对应的数学表达式推导出的一个数学表达式。
一种具体的实现方式中,神经网络模型优化装置中具有将多个数学表达式融合为一个数学表达式的模板。在神经网络模型优化装置确定多个第一节点中对应的数学表达式之后,将该多个数学表达式与神经网络模型优化装置中的数学融合模板进行匹配,在匹配到相应的数学融合模板之后,根据该数学融合模板,确定该多个数学表达式对应的融合后的数学表达式。
举例来说,如图7所示,第一计算图中包括计算节点1和计算节点2,其中,计算节点1为计算节点2的上联节点,数据依次经过计算节点1和计算节点2进行计算。
计算节点1对应的数学表达式为如下公式1所示:
a×x 1+b     公式1
其中,a和b为计算节点1中的数学表达式的固定参数,a和b的值为固定值;x 1为计算节点1的输入数据(即节点1的上联节点输出的数据)。
计算节点2对应的数学表达式为如下公式2所示:
c×x 2+d    公式2
其中,c和d为计算节点2中的数学表达式的固定参数,c和d的值为固定值;x 2为计算节点2的输入数据(即计算节点1的上联节点输出的数据)。
神经网络模型优化装置对上述公式1和公式2进行推导,得到如下公式3:
e×x 3+f     公式3
其中,e=a×c,f=b×c+d,x 3为计算节点1的输入数据(即节点1的上联节点输出的数据),公式3中a和b的值为与公式1中a和b的值相同,公式3中c和d的值为与公式2中c和d的值相同。
神经网络模型优化装置将计算节点1和计算节点2融合为计算节点3,计算节点3对应的数学表达式为上述公式3。
这样,神经网络模型优化装置可以将计算节点1和计算节点2融合为计算节点3,从而提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的计算时间。此外,神经网络模型优化装置还减少了神经网络模型计算图的节点数量,降低了计算图的复杂度。
基于此,神经网络模型优化装置采用数学融合规则对第一计算图进行融合之后,得到的计算图的计算节点数量更少,计算图的拓扑结构更加简单,同时计算图的计算能力更强,计算数据所需的时间更少。因此,神经网络模型优化装置采用数学融合规则对神经网络模型的计算图进行优化时,可以提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的计算时间。
Ⅱ、数学拆分规则
数学拆分规则为:将一个第三计算节点拆分为多个第四计算节点;其中,第三计算节点对应的数学表达式为:对多个第四计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用第三计算节点计算第三输入数据的时间,大于利用多个第四计算节点计算第三输入数据的时间。
在神经网络模型的计算图中,对于计算图中的某一个计算节点,可能会存在以下情况:该计算节点执行计算任务的时间,大于将该计算节点拆分为多个计算节点后由拆分后的多个计算节点依次执行计算任务所需的时间。
例如,当一个计算节点中的数学表达式过于复杂时,该数学表达式的计算复杂度可能会超过该计算节点的计算能力。这将导致计算节点的计算性能下降,由该节点执行计算任务时耗费较长的时间。
又例如,一个计算节点计算复杂数学表达式的能力较弱,而将该计算节点拆分为多个计算节点,有多个计算节点计算该复杂数学表达式的一部分时,多个计算节点的计算能力反而增强。
针对这种情况,神经网络模型优化装置可以通过数学拆分规则,将该一个计算节点,拆分为多个计算节点,以提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的时间。
举例来说,如图8所示,计算节点4对应的数学表达式如下公式4所示:
g×x 4+h 2     公式4
其中,g和h为计算节点4中的数学表达式的固定参数,g和h的值为固定值,x 4为计算节点4的输入数据(即计算节点4的上联节点输出的数据)。
针对公式4,神经网络模型优化装置可以将公式4拆分为如下公式5和公式6两个计算公式:
g×x 5     公式5
其中,g的值与上述公式4中g的值相同,x 5为计算节点4的输入数据(即计算节点4的上联节点输出的数据)。
x 6+h 2     公式6
其中,h的值与上述公式4中h的值相同,x 6为根据公式5进行运算后确定的输出数据(即计算节点5的输出数据,计算节点6的输入数据)。
神经网络模型优化装置确定计算节点根据公式4执行计算任务的时间,大于计算节点5根据公式5执行计算任务的时间与计算节点6根据公式6执行计算任务的时间。也即是说,针对同一个输入数据,计算节点4计算该输入数据的时间,大于计算节点5和计算节点6依次计算该输入数据的时间。
此时,神经网络模型优化装置将上述计算节点4拆分为计算节点5和计算节点6,计算节点5对应的数学表达式为公式5,计算节点6对应的数学表达式为公式6。
基于此,神经网络模型优化装置采用数学拆分规则,将一个计算节点拆分为多个计算节点之后,由于拆分后的多个计算节点执行计算任务的时间,小于拆分前的一个计算节点执行计算任务的时间。因此,神经网络模型优化装置采用数学拆分规则对神经网络模型的计算图进行优化同样可以提高神经网络模型计算图的计算性能,降低神经网络模型计算图计算数据所需的时间。
Ⅲ、指令融合规则
指令融合规则为:根据接收到的节点融合指令,将多个第五计算节点融合为一个第六计算节点;其中,节点融合指令用于指示将多个第五计算节点融合为一个第六计算节点;利用多个第五计算节点计算第四输入数据的时间,大于利用第六计算节点计算第四输入数据的时间。
其中,该指令融合规则中的融合指令,可以是工作人员通过编译器等下发的,也可以是与神经网络模型优化装置交互的其他装置下发的。
以下,以工作人员通过编译器向神经网络模型下发为例进行详细说明。
神经网络模型优化装置根据数学融合规则,数学拆分规则,以及硬件融合规则中的一种或多种规则,对第一计算图进行优化,得到第三计算图。此时工作人员可以对第三计算图进行人工审核,确定第三计算图中是否存在可以进行节点融合的节点。若工作人员确定第三计算图中存在可以进行节点融合的节点,则工作人员确定该节点的融合方式,并通过编译器下发节点融合指令。编译器将节点融合指令下发至神经网络模型优化装置。神经网络模型优化装置根据收到的指令对对应的节点进行优化。
需要指出的是,在工作人员通过编译器下发节点融合指令时,工作人员在编译器中输入该节点融合指令对应的上层语音编写的程序代码;编译器将该上层语音编写的程序代码编译为神经网络模型优化装置能够识别的下层语言,并下发至神经网络模型优化装置。
基于上述技术方案,神经网络模型优化装置采用指令融合规则对第一计算图进行融合之后,得到的计算图的计算节点数量更少,计算图的拓扑结构更加简单,同时计算图的计算能力更强,计算数据所需的时间更少。因此,神经网络模型优化装置采用指令融合规则对神经网络模型的计算图进行优化时,可以提升神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需的计算时间。
此外,指令融合规则中的节点融合指令可以为人工输入的指令。此时,神经网络模型优化装置可以根据人工输入的指令对神经网络模型计算图中的节点进行融合,提升了神经网络模型优化方法的适用场景。
Ⅳ、指令拆分规则
指令拆分规则为:根据接收到的节点拆分指令,将一个第七计算节点拆分为多个第八计算节点;其中,节点拆分指令用于指示将一个第七计算节点拆分为多个第八计算节点;利用第七计算节点计算第五输入数据的时间,大于利用多个第八计算节点计算第五输入数据的时间。
需要说明的是,与上述指令融合规则相反,指令拆分规则用于指示将一个节点拆分为多个节点。
指令拆分规则的具体实现方式,与上述指令融合规则相似,只需将其中的节点融合相关内容替换为节点拆分内容即可,其具体实现可以参照上述对节点融合规则的描述,此处不再赘述。
基于此,神经网络模型优化装置采用指令拆分规则,将一个计算节点拆分为多个计算节点之后,由于拆分后的多个计算节点执行计算任务的时间,小于拆分前的一个计算节点执行计算任务的时间。因此,神经网络模型优化装置采用指令拆分规则对神经网络模型的计算图进行优化同样可以提高神经网络模型计算图的计算性能,降低神经网络模型计算图计算数据所需的时间。
此外,指令融合规则中的节点拆分指令可以为人工输入的指令。此时,神经网络模型优化装置可以根据人工输入的指令对神经网络模型计算图中的节点进行拆分,提升了神经网络模型优化方法的适用场景。
Ⅴ、硬件融合规则
需要说明的是,硬件融合规则为:第九计算节点采用第一传输路径向第十计算节点传输数据;其中,第九计算节点采用第一传输路径向第十节点传输数据的时间小于第九计算节点采用第二传输路径向第十节点传输数据的时间;第二传输路径为第一计算图中第九计算节点向第十节点传输数据的传输路径。
以针对存储设备的硬件融合为例,对硬件融合规则进行详细说明:
在现有技术中,终端设备调用计算节点执行计算任务时,通常以“片外存储-片上计算-片外存储”的模式进行。
如图9a所示,终端设备在调用计算图中的两个相连的计算节点(计算节点7和计算节点8,其中计算节点7为计算节点8的上联节点)执行计算任务的流程为:
步骤Ⅰ、计算节点7从存储设备中读取第一数据(相当于计算节点7的输入数据)。
步骤Ⅱ、计算节点7计算第一数据,生成第二数据(相当于计算节点7的输出数据,或者计算节点8的输入数据)。
步骤Ⅲ、计算节点7将第二数据存储在存储设备中。
步骤Ⅳ、计算节点8从存储设备中读取第二数据。
步骤Ⅴ、计算节点8计算第二数据,生成第三数据。
步骤Ⅵ、计算节点8将第三数据存储在存储设备中。
基于上述过程可知,终端设备采用现有技术调用神经网络模型执行计算任务时,针对每个计算节点都需要进行两次读写过程。例如,计算节点7所执行的步骤Ⅰ和步骤Ⅲ;计算节点8所执行的步骤Ⅳ和步骤Ⅵ。
在计算图中的计算节点数量较多的情况下,终端设备需要进行大量的读写过程。受限于存储设备的读写性能,终端设备在调用神经网络模型执行计算任务时,需要花费大量的时间进行数据的读写。
针对上述情况,本申请实施例中,对计算图中的计算节点进行改进,使得计算图中的全部或者部分计算节点之间可以互相传输数据。这样,可以使得计算节点减少与存储设备的交互次数,从而提升神经网络模型的计算图的计算性能,降低神经网络模型的计算图执行计算任务所需的时间。
举例来说,计算节点7和计算节点8之间可以互相传输数据,终端设备在调用计算节点7和计算节点8执行计算任务的过程,如图9b所示:
步骤Ⅶ、计算节点7从存储设备中读取第一数据。
步骤Ⅷ、计算节点7计算第一数据,生成第二数据。
步骤Ⅸ、计算节点7向计算节点8发送第二数据。相应的,计算节点8接收来自计算节点7的第二数据。
步骤Ⅹ、计算节点8计算第二数据,生成第三数据。
步骤Ⅺ、计算节点8将第三数据存储在存储设备中。
基于上述过程可知,在神经网络模型优化装置根据硬件融合规则,对计算图进行优化后,减少了计算节点与硬件设备的交互次数。计算图中的各个计算节点之间可以以更加合理,快捷的传输路径传输数据。从而,达到提高神经网络模型计算图的计算性能,降低神经网络模型计算图执行计算任务所需时间的目的。
需要说明的是,在硬件融合规则中,还可以通过将原本的低速存储设备替换为高速存储设备,提升神经网络模型访问存储设备的带宽中的至少一种方式,实现对神经网络模型计算图的硬件融合。
需要指出的是,在本申请实施例中,神经网络模型优化装置分别采用优化前的算子和优化后的算子,计算相同的输入数据。神经网络模型优化装置根据优化前的算子计算该输入数据的时间,以及优化后的算子计算该输入数据的时间,确定优化后的算子计算输入数据的时间,是否少于优化前的算子计算输入数据的时间。
本申请上述实施例中的各个方案在不矛盾的前提下,均可以进行结合。
上述主要从各个网元之间交互的角度对本申请实施例的方案进行了介绍。可以理解的是,各个网元,例如,神经网络模型优化装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和软件模块中的至少一个。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬 件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对神经网络模型优化装置进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在一种可能的设计中,如图10所示,神经网络模型优化装置1000包括:北向接口1001、南向接口1002,以及以下一项或者多项:数学融合模块1003,数学拆分模块1004,硬件融合模块1005,指令融合模块1006,以及指令拆分模块1007。
其中,北向接口1001用于与上一层计算框架层对接。在计算框架根据神经网络模型生成对应的第一计算图之后,神经网络模型优化装置1000通过北向接口1001从计算框架中获取第一计算图。
南向接口1002用于与下一层的算子层对接。在神经网络模型优化装置1000对第一计算图进行优化,生成第二计算图之后,神经网络模型优化装置1000通过南向接口1002向算子层下发第二计算图,以使得算子层对第二计算图做拆分,将第二计算图的各个计算节点下发至芯片中。
数学融合模块1003,用于根据上述实施例中所记载的数学融合规则,对第一计算图进行优化。
数学拆分模块1004,用于根据上述实施例中所记载的数学拆分规则,对第一计算图进行优化。
硬件融合模块1005,用于根据上述实施例中所记载的硬件融合规则,对第一计算图进行优化。
指令融合模块1006,用于根据上述实施例中所记载的指令融合规则,对第一计算图进行优化。
指令拆分模块1007,用于根据上述实施例中所记载的指令拆分规则,对第一计算图进行优化。
需要指出的是,上述北向接口1001和南向接口1002可以集成在一个单元中实现。例如,将北向接口1001和南向接口1002集成在通信单元中实现。
上述数学融合模块1003,数学拆分模块1004,硬件融合模块1005,指令融合模块1006,以及指令拆分模块1007中的一项或者多项,也可以集成在一个单元中实现。例如,将数学融合模块1003,数学拆分模块1004,硬件融合模块1005,指令融合模块1006,以及指令拆分模块1007中的一项或者多项集成在处理单元中实现。
在采用集成的单元的情况下,图11示出了上述实施例中所涉及的神经网络模型优化装置(记为神经网络模型优化装置1100)的又一种可能的结构示意图,该神经网络模型优化装置1100包括处理单元1101和通信单元1102,还可以包括存储单元1103。图11所示的结构示意图可以用于示意上述实施例中所涉及的神经网络模型优化装置的结构。
当图11所示的结构示意图用于示意上述实施例中所涉及的神经网络模型优化装置的结构时,处理单元1101用于对网络设备的动作进行控制管理,例如,控制神经网络模型优化装置执行图5中的S501、S502、以及S503,和/或本申请实施例中所描述的其他过程中的神经网络模型优化装置执行的动作。处理单元1101可以通过通信单元1102与其他装置通信。存储单元1103用于存储神经网络模型优化装置的程序代码和数据。
当图11所示的结构示意图用于示意上述实施例中所涉及的神经网络模型优化装置的结构时,神经网络模型优化装置1100可以是神经网络模型优化装置,也可以是神经网络模型优化装置内的芯片。
其中,当神经网络模型优化装置1100为神经网络模型优化装置时,处理单元1101可以是处理器或控制器,通信单元1102可以是通信接口、收发器、收发机、收发电路、收发装置等。其中,通信接口是统称,可以包括一个或多个接口。存储单元1103可以是存储器。当神经网络模型优化装置1100为神经网络模型优化装置或神经网络模型优化装置内的芯片时,处理单元1101可以是处理器或控制器,通信单元1102可以是输入接口和/或输出接口、管脚或电路等。存储单元1103可以是该芯片内的存储单元(例如,寄存器、缓存等),也可以是神经网络模型优化装置或神经网络模型优化装置内的位于该芯片外部的存储单元(例如,只读存储器(read-onlymemory,简称ROM)、随机存取存储器(random access memory,简称RAM)等)。
其中,通信单元也可以称为收发单元。神经网络模型优化装置1100中的具有收发功能的天线和控制电路可以视为神经网络模型优化装置1100的通信单元1102,具有处理功能的处理器可以视为神经网络模型优化装置1100的处理单元1101。可选的,通信单元1102中用于实现接收功能的器件可以视为通信单元,通信单元用于执行本申请实施例中的接收的步骤,通信单元可以为接收机、接收器、接收电路等。通信单元1102中用于实现发送功能的器件可以视为发送单元,发送单元用于执行本申请实施例中的发送的步骤,发送单元可以为发送机、发送器、发送电路等。
图11中的集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者神经网络模型优化装置等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。存储计算机软件产品的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
图11中的单元也可以称为模块,例如,处理单元可以称为处理模块。
本申请实施例还提供了一种神经网络模型优化装置(记为神经网络模型优化装置1200)的硬件结构示意图,参见图12或图13,该神经网络模型优化装置1200包括处理器1201,可选的,还包括与处理器1201连接的存储器1202。
在第一种可能的实现方式中,参见图12,神经网络模型优化装置1200还包括收发器1203。处理器1201、存储器1202和收发器1203通过总线相连接。收发器1203 用于与其他设备或通信网络通信。可选的,收发器1203可以包括发射机和接收机。收发器1203中用于实现接收功能的器件可以视为接收机,接收机用于执行本申请实施例中的接收的步骤。收发器1203中用于实现发送功能的器件可以视为发射机,发射机用于执行本申请实施例中的发送的步骤。
基于第一种可能的实现方式,图12所示的结构示意图可以用于示意上述实施例中所涉及的神经网络模型优化装置或神经网络模型优化装置的结构。
当图12所示的结构示意图用于示意上述实施例中所涉及的神经网络模型优化装置的结构时,处理器1201用于对神经网络模型优化装置的动作进行控制管理,例如,处理器1201用于支持神经网络模型优化装置执行图5中的S501、S502、以及S503,和/或本申请实施例中所描述的其他过程中的神经网络模型优化装置执行的动作。处理器1201可以通过收发器1203与其他网络实体通信。存储器1202用于存储神经网络模型优化装置的程序代码和数据。
在第二种可能的实现方式中,处理器1201包括逻辑电路以及输入接口和输出接口中的至少一个。其中,输出接口用于执行相应方法中的发送的动作,输入接口用于执行相应方法中的接收的动作。
基于第二种可能的实现方式,参见图13,图13所示的结构示意图可以用于示意上述实施例中所涉及的神经网络模型优化装置的结构。
当图13所示的结构示意图用于示意上述实施例中所涉及的神经网络模型优化装置的结构时,处理器1201用于对神经网络模型优化装置的动作进行控制管理,例如,处理器1201用于支持神经网络模型优化装置执行图5中的S501、S502、以及S503,和/或本申请实施例中所描述的其他过程中的神经网络模型优化装置执行的动作。处理器1201可以通过输入接口和输出接口中的至少一个与其他网络实体通信。存储器1202用于存储神经网络模型优化装置的程序代码和数据。
其中,图12和图13也可以示意神经网络模型优化装置中的系统芯片。该情况下,上述神经网络模型优化装置执行的动作可以由该系统芯片实现,具体所执行的动作可参见上文,在此不再赘述。图12和图13也可以示意神经网络模型优化装置中的系统芯片。该情况下,上述神经网络模型优化装置执行的动作可以由该系统芯片实现,具体所执行的动作可参见上文,在此不再赘述。
在实现过程中,本实施例提供的方法中的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请中的处理器可以包括但不限于以下至少一种:中央处理单元(central processing unit,CPU)、微处理器、数字信号处理器(DSP)、微控制器(microcontroller unit,MCU)、或人工智能处理器等各类运行软件的计算设备,每种计算设备可包括一个或多个用于执行软件指令以进行运算或处理的核。该处理器可以是个单独的半导体芯片,也可以跟其他电路一起集成为一个半导体芯片,例如,可以跟其他电路(如编解码电路、硬件加速电路或各种总线和接口电路)构成一个SoC(片上系统),或者也可以作为一个ASIC的内置处理器集成在所述ASIC当中,该集成了处理器的ASIC可以单独封装或者也可以跟其他电路封装在一起。该处理器除了包括用于执行软件指 令以进行运算或处理的核外,还可进一步包括必要的硬件加速器,如现场可编程门阵列(field programmable gate array,FPGA)、PLD(可编程逻辑器件)、或者实现专用逻辑运算的逻辑电路。
本申请实施例中的存储器,可以包括如下至少一种类型:只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically erasable programmabler-only memory,EEPROM)。在某些场景下,存储器还可以是只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
本申请实施例还提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述任一方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一方法。
本申请实施例还提供了一种芯片,该芯片包括处理器和接口电路,该接口电路和该处理器耦合,该处理器用于运行计算机程序或指令,以实现上述方法,该接口电路用于与该芯片之外的其它模块进行通信。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,简称SSD))等。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看附图、公开内容、以及所附权利要求书,可理解并实现公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申 请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种神经网络模型优化方法,其特征在于,包括:
    获取神经网络模型的第一计算图;
    根据预设规则,以及所述第一计算图,生成第二计算图;其中,利用所述第二计算图计算第一输入数据的时间,少于利用所述第一计算图计算所述第一输入数据的时间;所述预设规则包括以下至少一项:数学融合规则,数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则;
    输出所述第二计算图。
  2. 根据权利要求1所述的方法,其特征在于,所述数学融合规则为:将多个第一计算节点,融合为一个第二计算节点;其中,所述第二计算节点对应的数学表达式为:对所述多个第一计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用所述多个第一计算节点计算第二输入数据的时间,大于利用所述第二计算节点计算所述第二输入数据的时间。
  3. 根据权利要求2所述的方法,其特征在于,所述数学拆分规则为:将一个第三计算节点拆分为多个第四计算节点;其中,所述第三计算节点对应的数学表达式为:对所述多个第四计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用所述第三计算节点计算第三输入数据的时间,大于利用所述多个第四计算节点计算所述第三输入数据的时间。
  4. 根据权利要求2或3所述的方法,其特征在于,所述指令融合规则为:根据接收到的节点融合指令,将多个第五计算节点融合为一个第六计算节点;其中,所述节点融合指令用于指示将所述多个第五计算节点融合为所述一个第六计算节点;利用所述多个第五计算节点计算第四输入数据的时间,大于利用所述第六计算节点计算所述第四输入数据的时间。
  5. 根据权利要求2-4任一项所述的方法,其特征在于,所述指令拆分规则用于:根据接收到的节点拆分指令,将一个第七计算节点拆分为多个第八计算节点;其中,所述节点拆分指令用于指示将所述一个第七计算节点拆分为所述多个第八计算节点;利用所述第七计算节点计算第五输入数据的时间,大于利用所述多个第八计算节点计算所述第五输入数据的时间。
  6. 根据权利要求2-5任一项所述的方法,其特征在于,所述硬件融合规则为:第九计算节点采用第一传输路径向第十计算节点传输数据;其中,所述第九计算节点采用第一传输路径向第十节点传输数据的时间小于所述第九计算节点采用第二传输路径向第十节点传输数据的时间;所述第二传输路径为所述第一计算图中第九计算节点向第十节点传输数据的传输路径。
  7. 一种神经网络模型优化装置,其特征在于,包括:通信单元和处理单元;
    所述通信单元,用于获取神经网络模型的第一计算图;
    所述处理单元,用于根据预设规则,以及所述第一计算图,生成第二计算图;所述第二计算图计算第一输入数据的时间,少于所述第一计算图计算第一输入数据的时间;所述预设规则包括以下至少一项:数学融合规则,数学拆分规则,指令融合规则,指令拆分规则,以及硬件融合规则;
    所述通信单元,还用于输出所述第二计算图。
  8. 根据权利要求7所述的装置,其特征在于,所述数学融合规则为:将多个第一计算节点,融合为一个第二计算节点;其中,所述第二计算节点对应的数学表达式为:对所述多个第一计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用所述多个第一计算节点计算第二输入数据的时间,大于所述第二计算节点计算第二输入数据的时间。
  9. 根据权利要求8所述的装置,其特征在于,所述数学拆分规则为:将一个第三计算节点拆分为多个第四计算节点;其中,所述第三计算节点对应的数学表达式为:对所述多个第四计算节点对应的数学表达式进行数学推导后所确定的数学表达式;利用所述第三计算节点计算第三输入数据的时间,大于所述多个第四计算节点计算第三输入数据的时间。
  10. 根据权利要求8或9所述的装置,其特征在于,所述指令融合规则为:根据接收到的节点融合指令,将多个第五计算节点融合为一个第六计算节点;其中,所述节点融合指令,用于指示将所述多个第五计算节点融合为所述一个第六计算节点;利用所述多个第五计算节点计算第四输入数据的时间,大于所述第六计算节点计算第四输入数据的时间。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述指令拆分规则用于:根据接收到的节点拆分指令,将一个第七计算节点拆分为多个第八计算节点;其中,所述节点拆分指令,用于指示将所述一个第七计算节点拆分为所述多个第八计算节点;利用所述第七计算节点计算第五输入数据的时间,大于所述多个第八计算节点计算第五输入数据的时间。
  12. 根据权利要求8-11任一项所述的装置,其特征在于,所述硬件融合规则为:第九计算节点采用第一传输路径向第十计算节点传输数据;其中,所述第九计算节点采用第一传输路径向第十节点传输数据的时间小于所述第九计算节点采用第二传输路径向第十节点传输数据的时间;所述第二传输路径为所述第一计算图中第九计算节点向第十节点传输数据的传输路径。
  13. 一种神经网络模型优化装置,其特征在于,所述装置包括处理器和存储介质,所述存储介质包括指令,所述指令被所述处理器运行时,使得所述装置执行如权利要求1至6任一项所述的方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1至6任一项所述的方法。
PCT/CN2020/111529 2020-08-26 2020-08-26 神经网络模型优化方法及装置 WO2022041015A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080103328.XA CN115956247A (zh) 2020-08-26 2020-08-26 神经网络模型优化方法及装置
PCT/CN2020/111529 WO2022041015A1 (zh) 2020-08-26 2020-08-26 神经网络模型优化方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/111529 WO2022041015A1 (zh) 2020-08-26 2020-08-26 神经网络模型优化方法及装置

Publications (1)

Publication Number Publication Date
WO2022041015A1 true WO2022041015A1 (zh) 2022-03-03

Family

ID=80352254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111529 WO2022041015A1 (zh) 2020-08-26 2020-08-26 神经网络模型优化方法及装置

Country Status (2)

Country Link
CN (1) CN115956247A (zh)
WO (1) WO2022041015A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231635A1 (zh) * 2022-05-31 2023-12-07 华为技术有限公司 一种模型传输的方法及装置
WO2024006017A1 (en) * 2022-06-30 2024-01-04 Qualcomm Incorporated Model performance linter

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116629330B (zh) * 2023-04-24 2024-04-16 北京大学 一种算子检测方法、装置以及计算机设备
CN116820524B (zh) * 2023-08-22 2023-11-28 腾讯科技(深圳)有限公司 模型更新方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205762A1 (en) * 2018-01-04 2019-07-04 Datavaloris S.A.S. Method for topological optimization of graph-based models
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN110659728A (zh) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 神经网络优化方法、装置、计算机设备及存储介质
CN110717584A (zh) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 神经网络编译方法、编译器、计算机设备及可读存储介质
CN110766147A (zh) * 2018-07-25 2020-02-07 赛灵思公司 神经网络编译器架构及编译方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205762A1 (en) * 2018-01-04 2019-07-04 Datavaloris S.A.S. Method for topological optimization of graph-based models
CN110321999A (zh) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 神经网络计算图优化方法
CN110766147A (zh) * 2018-07-25 2020-02-07 赛灵思公司 神经网络编译器架构及编译方法
CN110659728A (zh) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 神经网络优化方法、装置、计算机设备及存储介质
CN110717584A (zh) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 神经网络编译方法、编译器、计算机设备及可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231635A1 (zh) * 2022-05-31 2023-12-07 华为技术有限公司 一种模型传输的方法及装置
WO2024006017A1 (en) * 2022-06-30 2024-01-04 Qualcomm Incorporated Model performance linter

Also Published As

Publication number Publication date
CN115956247A8 (zh) 2023-07-11
CN115956247A (zh) 2023-04-11

Similar Documents

Publication Publication Date Title
WO2022041015A1 (zh) 神经网络模型优化方法及装置
US20220180202A1 (en) Text processing model training method, and text processing method and apparatus
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
US20220051056A1 (en) Semantic segmentation network structure generation method and apparatus, device, and storage medium
WO2021057056A1 (zh) 神经网络架构搜索方法、图像处理方法、装置和存储介质
KR102656620B1 (ko) 전자 장치, 그의 제어 방법 및 비일시적 컴퓨터 판독가능 기록매체
CN112633010B (zh) 基于多头注意力和图卷积网络的方面级情感分析方法及系统
WO2021190597A1 (zh) 一种神经网络模型的处理方法以及相关设备
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
US20230229898A1 (en) Data processing method and related device
KR102448382B1 (ko) 텍스트와 연관된 이미지를 제공하는 전자 장치 및 그 동작 방법
CN113064968B (zh) 一种基于张量融合网络的社交媒体情感分析方法及系统
CN114528898A (zh) 基于自然语言命令的场景图修改
CN112699215B (zh) 基于胶囊网络与交互注意力机制的评级预测方法及系统
CN113191479A (zh) 联合学习的方法、系统、节点及存储介质
CN116912629B (zh) 基于多任务学习的通用图像文字描述生成方法及相关装置
WO2023125628A1 (zh) 神经网络模型优化方法、装置及计算设备
EP4318311A1 (en) Neural network model training method, data processing method, and apparatuses
CN109002498B (zh) 人机对话方法、装置、设备及存储介质
CN110084356B (zh) 一种深度神经网络数据处理方法和装置
WO2022127603A1 (zh) 一种模型处理方法及相关装置
CN114090789A (zh) 基于知识图谱的中医养生智能多轮交互系统
US20240232618A9 (en) Training method and apparatus for neural network model, and data processing method and apparatus
CN116975654B (zh) 对象互动方法、装置、电子设备及存储介质
US20240153259A1 (en) Single image concept encoder for personalization using a pretrained diffusion model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20950687

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20950687

Country of ref document: EP

Kind code of ref document: A1