WO2023125628A1

WO2023125628A1 - Neural network model optimization method and apparatus, and computing device

Info

Publication number: WO2023125628A1
Application number: PCT/CN2022/142689
Authority: WO
Inventors: 袁熙昊; 林菁; 严一超; 王兵
Original assignee: 华为技术有限公司
Priority date: 2021-12-31
Filing date: 2022-12-28
Publication date: 2023-07-06
Also published as: CN116432736A

Abstract

Disclosed are a neural network model optimization method and apparatus, and a computing device, which relate to the field of artificial intelligence. The method comprises: a computing device acquiring a neural network model to be optimized, and matching a sub-graph included in a sub-graph set with a sub-graph possibly formed by an operator included in said neural network model; and if a first sub-graph in said neural network model is matched by means of the computing device, replacing the first sub-graph with a second sub-graph, which is equivalent to the first sub-graph, in the sub-graph set to obtain an optimized neural network model, wherein the optimized neural network model includes the second sub-graph, and the processing efficiency of the second sub-graph on input data is higher than the processing efficiency of the first sub-graph on the input data. In this way, insofar as it is ensured that the precision of processing data by the neural network model is lossless, the duration in which the neural network model processes the data is significantly shortened.

Description

Neural network model optimization method, device and computing equipment

This application claims the priority of the Chinese patent application with the application number 202111673491.2 and the title of the invention "Neural Network Model Optimization Method, Device and Computing Equipment" filed with the China Patent Office on December 31, 2021, the entire contents of which are hereby incorporated by reference In this application.

technical field

The present application relates to the field of artificial intelligence, in particular to a neural network model optimization method, device and computing equipment.

Background technique

Artificial Intelligence (AI) is a theory, method, technology and application system that uses computers to simulate and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain results. Artificial intelligence technology is widely used in machine learning (Machine Learning, ML), natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory. Processing data based on neural network models to realize application functions such as recognition is a key technology for artificial intelligence applications.

Usually, the cloud-side device can use the training set to train the neural network model, so that the neural network model has application functions such as recognition, and deploy the neural network model to at least one terminal (such as: smart phones, cameras, self-driving cars, etc.). The terminal uses the configured neural network model to process the acquired application data (such as: image, voice, etc.) to realize application functions such as recognition. In order to improve the accuracy of the neural network model processing data, the neural network model gradually presents a trend of complex structure and increasing parameters, which leads to higher and higher computing resources required by the neural network model to process data, and the processing time of data is getting higher and higher. long.

Contents of the invention

The present application provides a neural network model optimization method, device and computing equipment, thereby shortening the time for the neural network model to process data on the premise of ensuring the accuracy of the neural network model processing data.

In a first aspect, a method for optimizing a neural network model is provided, and the method is executed by a computing device. The method includes: the computing device obtains the neural network model to be optimized, searches for an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, and replaces the first subgraph in the neural network model to be optimized with an equivalent price submap. The equivalent subgraph and the first subgraph have the same output for the same input data, and the processing efficiency of the equivalent subgraph on the input data is greater than that of the first subgraph on the input data, and the subgraph set includes multiple subgraphs.

In this way, after replacing the subgraph in the neural network model to be optimized, since the processing efficiency of the equivalent subgraph to the input data is greater than the processing efficiency of the first subgraph to the input data, thus ensuring that the neural network model processes the data Under the premise of no loss of accuracy, the time for the neural network model to process data is significantly shortened. This method can automatically optimize the neural network model. It is simple, intuitive, efficient, and highly scalable. It only needs to input the neural network model to quickly complete the optimization of the neural network model. It does not require any data and has no loss of accuracy. It is applicable to a wide range of scenarios. In some time-delay-sensitive scenarios, such as target recognition, automatic driving, license plate recognition, target detection and other scenarios, the neural network model optimization method provided by the embodiment of the present application is especially applicable, which can effectively improve the reasoning speed of the neural network model and shorten the time of the neural network model. Reasoning is time-consuming and improves user experience.

In a possible implementation manner, searching for an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set includes: determining a second subgraph corresponding to the first subgraph in the subgraph set, the first The second subgraph and the first subgraph have the same input data, and the output is also the same; it is determined that the computing resources used to execute the first subgraph in the computing device perform the data processing efficiency of the second subgraph higher than that of executing the first subgraph. Data processing efficiency; treat the second subgraph as an equivalent subgraph.

In another possible implementation manner, determining the second subgraph corresponding to the first subgraph in the subgraph set includes: inputting input data into the first subgraph, using computing resources to run the first subgraph, and outputting an operation result; Inputting input data into at least one subgraph in the subgraph set, and determining the subgraph identical to the running result as the second subgraph.

In another possible implementation manner, the method further includes: recording a mapping relationship between the first subgraph and the second subgraph to the subgraph set.

In another possible implementation manner, the sub-graph set includes a first mapping relationship between the first sub-graph and the second sub-graph; determining the second sub-graph corresponding to the first sub-graph in the sub-graph set includes: according to the first The mapping relationship determines the second subgraph corresponding to the first subgraph.

The subgraphs with equivalent relationships indicated by the subgraph set, that is, the first mapping relationship, are matched with the subgraphs in the neural network model to be optimized. The neural network model to be optimized includes multiple operators, multiple operators form multiple subgraphs, and at least two operators form a subgraph. If the computing device determines the first subgraph to be replaced in the neural network model to be optimized, replace the first subgraph with a second subgraph equivalent to the first subgraph in the subgraph set to obtain an optimized neural network model. The optimized neural network model includes a second subgraph. When processing data based on computing resources, the duration of the optimized neural network model is shorter than the duration of the neural network model to be optimized. Among them, subgraphs with equivalence relations are used to output the same result according to the same input data.

Exemplarily, the first subgraph in the neural network model to be optimized that is the same as the subgraph included in the subgraph set is determined according to the subgraph features. Subgraph features include operator type, subgraph structure and operator parameters. In this way, subgraph matching is performed according to subgraph features, and a subgraph identical to a subgraph included in the subgraph set is searched out from the neural network model to be optimized, which can effectively improve the accuracy of the search.

In another possible implementation, before determining the first subgraph to be replaced in the neural network model to be optimized according to the subgraph set, the method further includes: obtaining the operator set according to the neural network model of multiple application scenarios; The set of operators searches subgraphs with equivalence relations to generate subgraph sets. Therefore, a method for automatically searching equivalent subgraphs is provided, which automatically searches possible equivalent subgraphs based on operator sets, without omission, and saves manpower.

In this way, the subgraph in the neural network model to be optimized is replaced based on the equivalent subgraph. Since the replaced subgraph in the optimized neural network model is compatible with the computing resources of the optimized neural network model to process data, that is, computing resources Calculating the replaced subgraph can effectively utilize the computing power of computing resources, thereby significantly shortening the processing time of the neural network model on the premise of ensuring that the accuracy of the data processed by the neural network model is not lost. This method can automatically optimize the neural network model. It is simple, intuitive, efficient, and highly scalable. It only needs to input the neural network model to quickly complete the optimization of the neural network model. It does not require any data and has no loss of accuracy. It is applicable to a wide range of scenarios. In some time-delay-sensitive scenarios, such as target recognition, automatic driving, license plate recognition, target detection and other scenarios, the neural network model optimization method provided by the embodiment of the present application is especially applicable, which can effectively improve the reasoning speed of the neural network model and shorten the time of the neural network model. Reasoning is time-consuming and improves user experience.

After the computing device obtains the optimized neural network model, the optimized neural network model can be deployed to at least one terminal, so that when the terminal processes application data according to the optimized neural network model, the data processing time is shortened and the terminal data processing performance is improved.

Wherein, replacing the first sub-graph with the second sub-graph equivalent to the first sub-graph in the sub-graph set includes: if the computing power of the computing resources for computing the second sub-graph is higher than the computing power of computing resources for computing the first sub-graph, Replace the first subgraph with the second subgraph. The computing power of the computing resource computing subgraph is related to the duration of the computing resource computing subgraph. The computing power may be the data processing efficiency of computing resources for computing the first subgraph.

In another possible implementation manner, determining that the computing resources used to execute the first subgraph in the computing device have higher data processing efficiency when executing the second subgraph than the data processing efficiency when executing the first subgraph includes: computing The resource call cost function runs the first subgraph to record the first data processing efficiency; calculates the resource call cost function to run the second subgraph to record the second data processing efficiency; determines the execution by comparing the first data processing efficiency with the second data processing efficiency The data processing efficiency of the second sub-graph is higher than the data processing efficiency of the execution of the first sub-graph.

Exemplarily, replacing the first subgraph with a second subgraph equivalent to the first subgraph in the subgraph set includes: respectively determining the duration of the second subgraph and the duration of the first subgraph based on the calculation resource utilization cost function, and the cost The function is used to calculate the duration of subgraphs with an equivalence relationship based on the same computing resource; if the duration of the second subgraph is less than the duration of the first subgraph, replace the first subgraph with the second subgraph.

In this way, based on the cost function, the influence of subgraph replacement on the inference performance of the neural network model can be measured, and whether to perform subgraph replacement can be automatically decided to improve the accuracy of subgraph replacement, thereby ensuring that subgraph replacement based on the current hardware platform can bring performance promote. It should be understood that the current hardware platform may refer to computing resources for processing data using an optimized neural network model.

In another possible implementation manner, the method further includes: recording the mapping relationship of the computing resource, the first sub-graph, and the second sub-graph to the sub-graph set.

In another possible implementation manner, the sub-graph set includes a second mapping relationship between computing resources, the first sub-graph, and the second sub-graph; determining the second sub-graph corresponding to the first sub-graph in the sub-graph set includes: Determine the second subgraph corresponding to the first subgraph according to the second mapping relationship; determine that the computing resource used to execute the first subgraph in the computing device executes the data processing efficiency of the second subgraph is higher than the data processing efficiency when executing the first subgraph The processing efficiency includes: determining according to the second mapping relationship that the computing resource used to execute the first subgraph has a higher data processing efficiency when executing the second subgraph than when executing the first subgraph.

Exemplarily, the sub-graph set is also used to instruct computing resources to calculate the computing power correspondence between the second sub-graph and the first sub-graph. Replacing the first subgraph with a second subgraph equivalent to the first subgraph in the subgraph set includes: replacing the first subgraph with the second subgraph determined according to the computing power correspondence. Therefore, since the computing power correspondence has already indicated the affinity between equivalent subgraphs and computing resources, the subgraph replacement of the neural network model to be optimized based on the computing power correspondence can effectively improve the speed of subgraph replacement and save subgraph replacement. duration.

In another possible implementation, during the optimization process of the neural network model to be optimized, the computing device can also automatically derive the computing power corresponding relationship of the equivalent subgraph that is compatible with the current hardware platform, and any hit computing power corresponding relationship is medium The neural network model of the valence subgraph, according to the method provided by the embodiment of the present application, can obtain reasoning performance improvement on the corresponding hardware platform, and can also guide the structural design of the next-generation hardware-friendly neural network model according to the corresponding relationship of computing power.

In another possible implementation manner, the computing device updates the operator set, that is, adds a new operator to the operator set, performs an equivalent subgraph search according to the updated operator set, and obtains an updated equivalent subgraph.

In a second aspect, a neural network model optimization device is provided, and the device includes various modules for executing the neural network model optimization method in the first aspect or any possible design of the first aspect.

A third aspect provides a processor, the processor is configured to execute the operation steps of the neural network model optimization method in the first aspect or any possible design of the first aspect.

In a fourth aspect, there is provided a computing device, the computing device includes at least one processor and a memory, and the memory is used to store a set of computer instructions; When the device executes the set of computer instructions, it executes the first aspect or the operation steps of the neural network model optimization method in any possible implementation manner of the first aspect.

In a fifth aspect, there is provided a computer-readable storage medium, including: computer software instructions; when the computer software instructions are run in the computing device, the computing device is made to execute the computer program described in the first aspect or any one of the possible implementation manners of the first aspect. Operational steps of the method.

In a sixth aspect, a computer program product is provided. When the computer program product is run on a computer, the computing device executes the operation steps of the method described in the first aspect or any possible implementation manner of the first aspect.

In a seventh aspect, a chip system is provided, and the chip system includes a processor, configured to implement the functions of the processor in the method of the first aspect above. In a possible design, the chip system further includes a memory for storing program instructions and/or data. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.

On the basis of the implementation manners provided in the foregoing aspects, the present application may further be combined to provide more implementation manners.

Description of drawings

Fig. 1 is the structural representation of a kind of neural network provided by the present application;

Fig. 2 is a schematic structural diagram of a convolutional neural network provided by the present application;

FIG. 3 is a schematic diagram of a system architecture provided by the present application;

FIG. 4 is a schematic diagram of a method for generating a sub-atlas provided by the present application;

FIG. 5 is a schematic diagram of a generation operator set provided by the present application;

FIG. 6 is a schematic diagram of generating an equivalent subgraph relationship provided by the present application;

FIG. 7 is a schematic diagram of a neural network model optimization method provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of sub-graph replacement provided by the embodiment of the present application;

FIG. 9 is a schematic diagram of another neural network model optimization method provided in the embodiment of the present application;

FIG. 10 is a schematic diagram of a generated computing power correspondence provided by the present application;

FIG. 11 is a schematic diagram of a neural network model optimization scenario provided by the present application;

FIG. 12 is a schematic structural diagram of a neural network model optimization device provided by the present application;

FIG. 13 is a schematic structural diagram of a computing device provided in the present application.

Detailed ways

For ease of understanding, the following first introduces related terms and neural network related concepts involved in the embodiments of the present application.

(1) neural network

A neural network can be made up of neurons, which can refer to operational units that take x _s and intercept 1 as input. The output of the arithmetic unit satisfies the following formula (1).

Among them, s=1, 2, ... n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neuron. f is the activation function of the neuron, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neuron into an output signal. The output signal of the activation function can be used as the input of the next layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neurons, that is, the output of one neuron can be the input of another neuron. The input of each neuron can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neurons. Weights characterize the strength of connections between different neurons. The weight determines the influence of the input on the output. A weight close to 0 means that changing the input does not change the output. Negative weights mean that increasing the input decreases the output.

As shown in FIG. 1 , it is a schematic structural diagram of a neural network provided in the embodiment of the present application. The neural network 100 includes N processing layers, where N is an integer greater than or equal to 3. The first layer of the neural network 100 is the input layer 110, which is responsible for receiving input signals, and the last layer of the neural network 100 is the output layer 130, which is responsible for outputting the processing results of the neural network. The other layers except the first layer and the last layer are intermediate layers 140, and these intermediate layers 140 together form a hidden layer 120, and each intermediate layer 140 in the hidden layer 120 can receive input signals and output signals. The hidden layer 120 is responsible for the processing of the input signal. Each layer represents a logical level of signal processing, and through multiple layers, data signals can be processed by multi-level logic.

In some feasible embodiments, the input signals of the neural network may be signals in various forms such as video signals, voice signals, text signals, image signals or temperature signals. The image signal can be the scenery captured by the camera (image sensor), the environmental image captured by the monitoring equipment, and the facial image acquired by the access control system, etc. The input signal of the neural network also includes various other computer-processable engineering signals, which will not be listed one by one here. If the neural network is used to carry out deep learning on the image signal, the quality of the image processed by the neural network can be improved.

(2) Deep Neural Network

Deep Neural Network (DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The deep neural network is divided according to the position of different layers. The neural network inside the deep neural network can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer is connected to any neuron in the i+1-th layer.

Although the deep neural network looks complicated, it is actually not complicated in terms of the work of each layer. Simply put, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector

After such a simple operation to get the output vector

Due to the large number of layers in the deep neural network, the coefficient W and the offset vector

The number is also higher. The definition of these parameters in the deep neural network is as follows: Take the coefficient W as an example: Assume that in a three-layer deep neural network, the 4th neuron of the second layer to the 2nd neuron of the third layer The linear coefficients are defined as

Among them, the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the index 2 of the third layer output and the index 4 of the second layer input.

In summary, the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as

It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

(3) Convolutional neural network

Convolutional Neuron Network (CNN) is a deep neural network with a convolutional structure. A convolutional neural network consists of a feature extractor consisting of a convolutional layer and a subsampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can only be connected to some adjacent neurons. A convolutional layer can output several feature maps, and the feature map can refer to the intermediate results during the operation of the convolutional neural network. Neurons in the same feature map share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location. That is, the statistics for one part of the image are the same as for other parts. That means that the image information learned in one part can also be used in another part. So for all positions on the image, the same learned image information can be used. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

Exemplarily, as shown in FIG. 2 , it is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application. The convolutional neural network 200 may include an input layer 210 , a convolutional/pooling layer 220 (where the pooling layer is optional) and a neural network layer 230 .

The convolutional layer/pooling layer 220 may include layers 221 to 226, for example. In one example, layer 221 may be a convolutional layer, layer 222 may be a pooling layer, layer 223 may be a convolutional layer, layer 224 may be a pooling layer, and layer 225 may be a convolutional layer, for example. , the layer 226 may be, for example, a pooling layer. In another example, layers 221 and 222 may be, for example, convolutional layers, layer 223 may be, for example, a pooling layer, layers 224 and 225 may be, for example, convolutional layers, and layer 226 may be, for example, a pooling layer. The output of a convolutional layer can be used as input to a subsequent pooling layer, or as input to another convolutional layer to continue the convolution operation.

Taking the convolutional layer 221 as an example, the inner working principle of one convolutional layer will be introduced.

The convolution layer 221 may include many convolution operators, and the convolution operators may also be called kernels. The role of the convolution operator in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator can essentially be a weight matrix, which is usually predefined. During the convolution operation on the image, the weight matrix is usually one pixel by one pixel (or two pixels by two pixels, depending on the value of the stride) along the horizontal direction on the input image. processing to complete the work of extracting specific features from the image. The size of this weight matrix is related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix is extended to the full depth of the input image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row×column) are applied, That is, multiple matrices of the same shape. The output of each weight matrix is stacked to form the depth dimension of the convolved image. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to filter unwanted noise in the image. Do blurring etc. The multiple weight matrices have the same size (row×column), and the feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple feature maps of the same size are combined to form the convolution operation. output.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions .

When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (such as layer 221 ) often extracts more general features, which can also be called low-level features. As the depth of the convolutional neural network 200 deepens, the features extracted by the later convolutional layers (for example, layer 226) become more and more complex, such as high-level semantic features, and the higher the semantic features, the more suitable for unresolved issues.

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce pooling layers after convolutional layers. Each layer from layer 221 to layer 226 as shown in the convolutional layer/pooling layer 220 in Figure 2 can be a layer of convolutional layer followed by a layer of pooling layer, or a multi-layer convolutional layer followed by a layer or multiple pooling layers. In image processing, the sole purpose of pooling layers is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling an input image to obtain an image of a smaller size. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of maximum pooling. Also, just like the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the image output after being processed by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

After being processed by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer 220 extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of outputs with the required number of classes. Therefore, the neural network layer 230 may include a multi-layer hidden layer (layer 231, layer 232 to layer 23n as shown in FIG. 2 ) and an output layer 240, and the parameters contained in the multi-layer hidden layer may be determined according to specific tasks. Types of related training data are pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction, and so on.

After the multi-layer hidden layer in the neural network layer 230, that is, the last layer of the entire convolutional neural network 200 is the output layer 240, which has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error. Once The forward propagation of the entire convolutional neural network 200 (as shown in Figure 2 is forward propagation from layer 210 to layer 240 direction) is completed, and the reverse propagation (as shown in Figure 2 is backward propagation from layer 240 to layer 210 direction) ) will begin to update the weight values and deviations of the above-mentioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the output result of the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in FIG. 2 is only an example of a convolutional neural network, and in specific applications, the convolutional neural network may also exist in the form of other network models.

(4) Loss function

In the process of training the deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the network with the target value you really want, and then according to the difference between the two Different situations to update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, which is to pre-configure parameters for each layer in the deep neural network), for example, if the network's prediction value is high , just adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(5) Back propagation algorithm

The convolutional neural network can use the back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial super-resolution model by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.

The above-mentioned neural network can also be called a neural network model. The intermediate layers contained in the neural network can also be called operators. Operators are used to implement a unit calculation in a neural network. For example, an operator that implements convolutional layer calculations may be called a convolutional operator (conv). The operator that realizes the calculation of the pooling layer may be called a pooling operator (pool). The operator implementing the calculation of the activation layer may be called an activation operator (Relu). The activation operator can also be called a linear rectification operator. At least two operators can form a subgraph. The subgraph refers to the network structure composed of some intermediate layers in the neural network model.

The embodiment of the present application provides a method for optimizing a neural network model, especially a technology for optimizing a neural network model based on mutually equivalent subgraphs, that is, automatically searches for mathematically equivalent subgraphs based on multiple operators, and automatically Discover subgraphs that give full play to the computing power of the hardware, replace the equivalent subgraphs in the neural network model, and significantly shorten the processing time of the neural network model on the premise of ensuring the accuracy of the data processed by the neural network model without loss.

The implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the present application. As shown in FIG. 3 , the system 300 includes an execution device 310 , a training device 320 , a database 330 , a terminal device 340 , a data storage system 350 and a data collection device 360 .

The execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, a virtual reality (virtual reality, VR), an augmented reality (augmented reality, AR) device, a mixed reality (Mixed Reality, MR) device, an extended reality (Extended Reality, ER) devices, cameras, or vehicle-mounted terminals, etc., or edge devices (for example, boxes carrying chips with processing capabilities), etc.

The training device 320 may be a server or a cloud device or the like. The training device 320 has strong computing capability, and can run the neural network model, perform calculations such as training the neural network model.

As a possible embodiment, the executing device 310 and the training device 320 are different processors deployed on different physical devices (such as servers or servers in a cluster). For example, the execution device 310 may be a neural network model processor (neural network processing unit, NPU), a graphics processing unit (graphic processing unit, GPU), a central processing unit (central processing unit, CPU), other general processors, digital signal Processor (digital signal processing, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices , discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. The training device 320 may be a GPU, an NPU, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in the solution of this application.

The data collection device 360 is used to collect training data and store the training data in the database 330 . The training data may be data in at least one form of images, voice and text. For example, training data includes training images and objects in the training images.

The training device 320 is used to train the neural network model with the training data until the loss function in the neural network model converges, and the training of the neural network model is completed if the loss function value is less than a specific threshold, so that the neural network model reaches a certain accuracy. Alternatively, if all the training data in the database 330 are used for training, then the training of the neural network model is completed, so that the trained neural network model has functions such as recognition or classification. Furthermore, the training device 320 configures the trained neural network model 301 to the execution device 310 . The execution device 310 is used to realize functions such as processing application data according to the trained neural network model 301 to realize recognition.

In some embodiments, the training device 320 can configure the trained neural network model 301 to multiple execution devices 310 . Each execution device 310 utilizes the trained neural network model 301 to implement functions such as recognition or classification.

For example, for autonomous driving scenarios, when an autonomous vehicle is driving along a predetermined route, the neural network model is used to identify road signs, driving reference objects, and obstacles on the road in the environment to ensure safe and accurate driving of the autonomous vehicle. . Signposts can contain graphical or textual signposts. Driving reference objects can be buildings or plants. Obstacles on the road may include dynamic objects (eg animals) or stationary objects (eg stationary vehicles).

As another example, for monitoring scenarios, the neural network model is mainly used to identify objects (such as cars and users) in environments such as intersections or parks.

As another example, for natural language processing scenarios, the neural network model is mainly used to recognize speech or text.

In order to improve the accuracy of data processing by the neural network model, the training device 320 may also iteratively train the neural network model based on the training data maintained by the database 330 and the application data provided by the execution device 310 . Understandably, iterative training refers to any training after the first training of the neural network model. Since the training data maintained by the database 330 may be a full training set, including application data acquired in different application scenarios, the data in different application scenarios have different or dissimilar application scenario features (such as: environmental features, time features). If the training set in which the training device 320 iteratively trains the neural network model contains data of different or dissimilar application scenario characteristics, it will make it difficult for the neural network model to achieve better results in processing application data in different application scenarios.

In the neural network model optimization method provided in the embodiment of the present application, after the training device 320 completes the training of the neural network model, before deploying the trained neural network model 301 to the execution device 310, optimize the trained neural network model 301 based on the sub-atlas , that is to determine the subgraph equivalent to the subgraph in the neural network model 301, based on the data processing efficiency of the two equivalent subgraphs calculated by the execution device 310, replace the subgraph in the neural network model 301, and convert the optimized neural network Model 301 is deployed to execution device 310 . Therefore, on the premise of ensuring that the accuracy of processing data by the execution device 310 based on the neural network model 301 is not compromised, the duration of data processing by the neural network model 301 is significantly shortened.

It should be noted that, in practical applications, the training data maintained in the database 330 may not all come from the data collection device 360, and may also be received from other devices. In addition, the training device 320 does not necessarily train the neural network model based entirely on the training data maintained by the database 330, and may also obtain training data from the cloud or other places to train the neural network model. The above description should not be used as a limitation to the embodiments of the present application.

Further, according to the functions performed by the execution device 310, the execution device 310 can be further subdivided into the architecture shown in FIG. Preprocessing module 313 .

The I/O interface 312 is used for data interaction with external devices. A user can input data to the I/O interface 312 through the terminal device 340 . Input data can include images or video. In addition, the input data can also come from the database 330 .

The preprocessing module 313 is configured to perform preprocessing according to the input data received by the I/O interface 312 .

When the execution device 310 preprocesses the input data, or in the execution device 310 computing module 311 performs calculation and other related processing, the execution device 310 can call the data, codes, etc. in the data storage system 350 for corresponding processing , the correspondingly processed data and instructions may also be stored in the data storage system 350 .

For example, the optimized neural network model stored in the execution device 310 may be applied to the execution device 310 . After the execution device 310 obtains the application data, the calculation module 311 inputs the application data into the optimized neural network model to obtain a processing result. Since the optimized neural network model is a model optimized by the training device 320 according to the sub-atlas, processing the application data by using the optimized neural network model can meet the accuracy and duration requirements of the user for data processing.

Finally, the I/O interface 312 returns the processing result to the terminal device 340, thereby providing it to the user, so that the user can view the processing result.

In the situation shown in FIG. 3 , the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 312 . In another case, the terminal device 340 can automatically send input data to the I/O interface 312. If the terminal device 340 is required to automatically send the input data to obtain authorization from the user, the user can set corresponding permissions in the terminal device 340. The user can view the processing results output by the execution device 310 on the terminal device 340, and the specific presentation form may be specific ways such as display, sound, and action. The terminal device 340 can also be used as a data collection terminal, collecting input data input to the I/O interface 312 as shown in the figure and output processing results of the I/O interface 312 as new sample data, and storing them in the database 330 . Of course, the terminal device 340 may not be used for collection, but the I/O interface 312 will use the input data input to the I/O interface 312 as shown in the figure and the processing results of the output I/O interface 312 as new sample data Stored in database 330 .

Fig. 3 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in Fig. 3 does not constitute any limitation. For example, in Fig. 3, the data storage system 350 is The execution device 310 is an external memory, and in other cases, the data storage system 350 may also be placed in the execution device 310 .

The computing device obtains the neural network model to be optimized, searches for an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, and replaces the first subgraph in the neural network model to be optimized with an equivalent subgraph . The equivalent subgraph and the first subgraph have the same output for the same input data, and the processing efficiency of the equivalent subgraph on the input data is greater than that of the first subgraph on the input data, and the subgraph set includes multiple subgraphs.

Next, the optimization of the neural network model provided by the embodiment of the present application will be described in detail with reference to FIG. 4 to FIG. 10 . FIG. 4 is a schematic diagram of a method for generating a sub-atlas provided by an embodiment of the present application. Here, the training device 320 in FIG. 3 is taken as an example for illustration. As shown in Figure 4, the method includes the following steps.

Step 410, the training device 320 acquires operator sets according to neural network models of multiple application scenarios.

The training device 320 extracts operators from neural network models applied to different application scenarios, and removes repeated operators to form an operator set. The operator set includes multiple operators, and each operator is used to implement different computing functions. For example, the operator set includes linear rectification operator (Relu), matrix transformation operator (Reshape), convolution operator (Conv), pooling operator (pool), maximum pooling operator (Maxpool) and matrix transposition Operator (Transpose), etc. Application scenarios include but are not limited to target recognition and automatic driving scenarios. The neural network model described in the embodiment of the present application may be a mainstream computer vision (computer vision, CV) model. CV models include, for example, YOLO, AlenNet, Residual Network (ResNet) and Dense Convolutional Network (DenseNet).

Exemplarily, the neural network model shown in (a) in Figure 5 contains 9 operators, wherein the 9 operators include 3 linear rectification operators, 3 convolution operators, 2 matrix transformation operators and 1 matrix transpose operator. As shown in (b) in FIG. 5 , the training device 320 removes repeated operators among the nine operators, and the obtained operator set includes linear rectification operators, convolution operators, matrix transformation operators and matrix transposition operators.

Optionally, the training device 320 may also obtain the operator set according to the neural network model given by the user. Alternatively, the training device 320 acquires a given set of operators.

In step 420, the training device 320 searches for subgraphs with equivalence relations according to the operator set, so as to generate a subgraph set.

The training device 320 constructs multiple legal subgraphs by permuting and combining operators included in the operator set, or constructs multiple legal subgraphs by permuting and combining operators according to characteristics such as the number of operators, operator types, and operator parameters. A legal subgraph can mean that the output of any operator in the subgraph conforms to the input of the operator connected to it.

Furthermore, the training device 320 searches the legal subgraphs for subgraphs with an equivalence relationship to generate a subgraph set, and the subgraph set includes multiple pairs of subgraphs with an equivalence relationship. For example, the training device 320 uses various methods including but not limited to subgraph hashing, random test case comparison output, mathematical equivalence analysis, etc. to determine whether any two legal subgraphs are equivalent, and if the two legal subgraphs are equivalent, output A pair of mutually equivalent subgraphs. Repeat this step to search mutually equivalent subgraphs from multiple legal subgraphs to form a subgraph set. Understandably, the training device 320 can generate the first mapping relationship according to step 410 and step 420, and the training device 320 can determine the subgraph to be replaced in the model to be optimized according to the first mapping relationship. A subgraph with an equivalence relationship is used to output the same result based on the same input data, that is, inputting the same input data to two subgraphs with an equivalence relationship can output the same output data. It should be noted that two mutually equivalent subgraphs may contain different operator types and subgraph structures, but the operator parameters need to be the same.

For example, as shown in FIG. 6 , it is assumed that the operator set includes a linear rectification operator, a convolution operator, a matrix transformation operator, a matrix transposition operator, and a string concatenation operator (Concat). The training device 320 searches the operator set, and obtains that subgraph 1 is equivalent to subgraph 2, that is, the two convolution operators in subgraph 1 are combined into one convolution operator. The operator parameters of the two convolution operators in sub-figure 1 are the same, that is, the output dimension is 64, the input dimension is 16, and the convolution kernel is 3. The operator parameters of the convolution operator in sub-figure 2 are output dimension 128, input dimension 16, and convolution kernel 3. However, if subgraph 1 contains more operators than subgraph 2, the execution device 310 takes more steps to calculate subgraph 1 than the steps for execution device 310 to calculate subgraph 1, and the execution device 310 calculates the duration of subgraph 1 It is longer than the execution device 310 calculates the duration of the sub-graph 1 .

Optionally, the training device 320 can also optimize multiple legal subgraphs, delete redundant paths in the legal subgraphs, improve the accuracy of calculating subgraphs and shorten the duration of calculating subgraphs. For example, the training device 320 optimizes multiple legal subgraphs based on a pruning algorithm.

Compared with mutually equivalent subgraphs designed according to the experience of domain experts, the automatic search for mutually equivalent subgraphs according to the operator set provided by the embodiment of this application can effectively save manpower and cover all possible mutually equivalent subgraphs. picture.

When the computing device replaces the subgraph in the neural network model to be optimized, it determines the second subgraph corresponding to the first subgraph in the subgraph set, and the second subgraph and the first subgraph aim at the same input data, and output Also the same. If the computing device determines that the computing resources used to execute the first sub-graph in the computing device have a higher data processing efficiency when executing the second sub-graph than when executing the first sub-graph, and use the second sub-graph as an equivalent sub-graph, The first subgraph is replaced by the second subgraph to realize the optimization of the neural network model to be optimized. The neural network model optimization method will be described in detail below with reference to FIG. 7 to FIG. 10 .

FIG. 7 is a schematic diagram of a method for optimizing a neural network model provided by an embodiment of the present application. Here, the training device 320 in FIG. 3 is taken as an example for illustration. As shown in Fig. 7, the method includes the following steps.

Step 710, the training device 320 obtains the neural network model to be optimized.

The training device 320 can obtain the neural network model to be optimized from Internet open source models. Alternatively, the training device 320 uses the neural network model provided by the user as the neural network model to be optimized. Alternatively, the training device 320 uses the trained neural network model obtained through self-training as the neural network model to be optimized.

The neural network model to be optimized includes multiple operators, multiple operators form multiple subgraphs, and at least two operators form a subgraph. It can be understood that at least two continuous operators in the neural network model to be optimized form a subgraph. Different subgraphs in the neural network model to be optimized can be composed of different continuous operators.

Step 720, the training device 320 determines the first sub-graph to be replaced in the neural network model to be optimized according to the sub-graph set.

The training device 320 determines the first subgraph in the neural network model to be optimized that is the same as the subgraph included in the subgraph set according to the subgraph features. Subgraph features include operator type, subgraph structure and operator parameters. The operator type refers to the type of operator contained in the subgraph. For example, the operator type includes convolution, matrix transformation, matrix transpose, and linear rectification. The subgraph structure refers to the connection mode of the operators contained in the subgraph. Operator parameters refer to parameters such as weights of operators included in the subgraph.

In some embodiments, the training device 320 matches the subgraph contained in the subgraph set with the subgraph contained in the neural network model to be optimized. If the operator type, subgraph structure and operator parameters of the two subgraphs are the same, then It is determined that the subgraph contained in the neural network model to be optimized is the same as the subgraph contained in the subgraph set, that is, the training device 320 determines a subgraph to be replaced from the neural network model to be optimized. The training device 320 traverses the subgraphs in the subgraph set, and determines all possible subgraphs to be replaced in the neural network model to be optimized.

In an example, the subgraphs with equivalence relations can be presented in the form of a table, as shown in Table 1.

Table 1

等价子图标识Equivalent subgraph identification	等价子图equivalent subgraph

等价子图1Equivalent Subgraph 1	子图1<—>子图2 Subgraph 1 <—> Subgraph 2
等价子图2Equivalent Subgraph 2	子图3<—>子图4Subfigure 3 <—> Subfigure 4
……	……

It can be seen from Table 1 that sub-graph 1 is equivalent to sub-graph 2. Subgraph 3 is equivalent to subgraph 4.

It should be noted that Table 1 only shows the storage form of the subgraph with the equivalence relationship in the storage device in the form of a table, and does not limit the storage form of the corresponding relationship in the storage device. Of course, the corresponding relationship is in The storage form in the storage device may also be stored in other forms, which is not limited in this embodiment.

Assume that the training device 320 determines the first sub-graph to be replaced in the neural network model to be optimized according to the sub-graph set.

In step 730, the training device 320 replaces the first subgraph with a second subgraph equivalent to the first subgraph in the subgraph set to obtain an optimized neural network model.

According to the characteristics of computing resources (such as: processors), such as the number of processor cores and the hardware structure of processors, computing resources have different affinity for operators of different types, that is, different computing resources are suitable for Calculate operators of different operator types. Affinity refers to the degree to which computing resource computing operators effectively use hardware computing power (abbreviation: computing power). Operation ability is one of the basic components of mathematics ability, which refers to the ability to use the knowledge about operations to perform operations and reason to obtain the results of operations.

For example, processor 1 is adapted to calculate a matrix transformation operator, and processor 2 is adapted to calculate a matrix transpose operator. If processor 1 calculates the matrix transpose operator, processor 1 cannot effectively use the computing power of processor 1 to calculate the matrix transpose operator. Therefore, the computing power of the processor 1 to calculate the matrix transformation operator is higher than the computing power of the processor 1 to calculate the matrix transpose operator.

It should be understood that the computing power of the computing resource computing operator is related to the duration of the computing resource computing operator. If the computing power of computing resources can be effectively used when computing resource computing operators, the duration of computing resource computing operators will be shorter; if the computing power of computing resources cannot be effectively used when computing resource computing The duration is longer.

In some embodiments, after the training device 320 determines the first sub-graph to be replaced in the neural network model to be optimized according to the sub-graph set, the execution device 310 calculates the computing power of the second sub-graph and the execution device 310 calculates the first sub-graph , and determine whether to replace the first subgraph with the second subgraph equivalent to the first subgraph in the subgraph set. The execution device 310 may be a device that needs to deploy a neural network model to be optimized, and a resource that processes application data based on the neural network model to be optimized to implement application functions such as recognition.

If the calculation power of the second sub-graph based on computing resources is higher than the calculation power of the first sub-graph based on computing resources, replace the first sub-graph with the second sub-graph; if the calculation power of the second sub-graph based on computing resources is lower than Computing resources calculate the computing power of the first sub-graph, and the second sub-graph is not used to replace the first sub-graph. The computing power of the computing resource to calculate the second sub-graph may refer to the data processing efficiency of the computing resource to calculate the second sub-graph. The computing power of the computing resource to calculate the first sub-graph may refer to the data processing efficiency of the computing resource to calculate the first sub-graph.

Way 1, the training device 320 determining the second sub-graph corresponding to the first sub-graph in the sub-graph set includes: determining the second sub-graph corresponding to the first sub-graph according to the second mapping relationship. Determining that the computing resources used to execute the first subgraph in the computing device have a higher data processing efficiency when executing the second subgraph than when executing the first subgraph, including: determining according to the second mapping relationship The data processing efficiency when the computing resource executes the second subgraph is higher than the data processing efficiency when executing the first subgraph.

The sub-graph set is used to indicate the computing power corresponding relationship of computing resources to calculate equivalent sub-graphs, that is, the second mapping relationship. The training device 320 may determine whether to perform subgraph replacement based on the computing power correspondence. The training device 320 executes step 731, that is, replacing the first subgraph with the second subgraph determined according to the computing power correspondence.

For example, the computing power correspondence is used to represent the correspondence between computing resources, neural network models, sub-graphs, and computing power for running sub-graphs. The computing power correspondence can be presented in the form of a table, as shown in Table 2.

Table 2

It can be seen from Table 2 that computing resource 1, neural network model 1, sub-graph 1, sub-graph 2, sub-graph 3, and sub-graph 4 have a computing power correspondence, that is, computing resource 1 calculates the computing power of sub-graph 1 based on neural network model 1 It is lower than the computing power of computing resource 1 to calculate sub-graph 2 based on neural network model 1. The computing power of computing resource 1 for calculating sub-graph 3 based on neural network model 1 is lower than the computing power of computing resource 1 for computing sub-graph 4 based on neural network model 1.

For example, assuming that the neural network model to be optimized is neural network model 1, the training device 320 determines that the subgraph to be replaced in neural network model 1 is subgraph 1 according to the computing power correspondence, and subgraph 1 and subgraph 2 are a pair of equal Since the computing power of computing resource 1 to calculate sub-graph 1 based on neural network model 1 is lower than the computing power of computing resource 1 to calculate sub-graph 2 based on neural network model 1, the training device 320 can use neural network model 1 to Subfigure 1 is replaced with Subfigure 2.

It should be noted that Table 2 only shows the storage form of the computing power correspondence in the storage device in the form of a table, and does not limit the storage form of the computing power correspondence in the storage device. Of course, the computing power correspondence The storage form in the storage device may also be stored in other forms, which is not limited in this embodiment.

In the second way, the computing power of computing resources to calculate sub-graphs is related to the duration of computing resources to calculate sub-graphs. The training device 320 may determine whether to perform subgraph replacement based on the computing resource utilization cost function. The cost function is used to calculate the duration of subgraphs with equivalence relations based on the same computing resource. The input data of the cost function includes operator type, subgraph structure, operator parameters and input parameters. The output data of the cost function includes the duration of computing subgraphs with equivalence relations based on the same computing resource.

For example, the subgraph set includes a first mapping relationship between the first subgraph and the second subgraph. The training device 320 determining the second subgraph corresponding to the first subgraph in the subgraph set includes: determining the second subgraph corresponding to the first subgraph according to the first mapping relationship. The training device 320 executes step 732 . Step 732: Determine the duration of the second subgraph and the duration of the first subgraph respectively based on the computing resource utilization cost function. Step 733, judging whether the duration of the second sub-picture is greater than the duration of the first sub-picture.

If the duration of the second sub-image is less than the duration of the first sub-image, perform step 734, that is, replace the first sub-image with the second sub-image.

If the duration of the second sub-image is greater than or equal to the duration of the first sub-image, step 735 is performed, that is, the first sub-image is not replaced by the second sub-image.

The optimized neural network model includes a second subgraph, and when data is processed based on computing resources, the duration of the optimized neural network model is shorter than the duration of the neural network model to be optimized.

Mode 3, the sub-image set contains multiple sub-images. The training device 320 can determine the subgraphs equivalent to the subgraphs in the model to be optimized in real time, that is, input the input data into the subgraphs in the model to be optimized and the subgraphs in the subgraph set, and determine the subgraphs with the same result as mutual Equivalent subgraphs. For example, the training device 320 determining the second sub-graph corresponding to the first sub-graph in the sub-graph set may also include: Step 736, the training device 320 inputs the input data to the first sub-graph, and the training device 320 runs the first sub-graph Graph, outputting the running result; the training device 320 inputs the input data to at least one sub-graph in the sub-graph set, and determines the same sub-graph as the running result as the second sub-graph. Furthermore, the training device 320 judges whether to replace the first sub-picture with the second sub-picture according to the second way.

Exemplarily, as shown in Figure 8, the neural network model to be optimized includes a sub-graph 3, and the sub-graph 3 includes 2 matrix transformation operators and 1 matrix transposition operator, and 1 matrix transformation operator is connected to 1 matrix transposition operator operator, and then connect to a matrix transformation operator. Subgraph 4 includes the convolution operator. Subgraph 3 and subgraph 4 are a pair of mutually equivalent subgraphs. Replace subgraph 3 with subgraph 4, that is, replace the exchange (shuffle) operation in the distributed processing of big data with convolution, and obtain the optimized neural network model .

In some other embodiments, after the training device 320 determines to replace the first subgraph with the second subgraph according to the computing power correspondence, it can also determine the duration of the second subgraph and the duration of the first subgraph respectively according to the utilization cost function . If the time length of the second sub-picture is less than the time length of the first sub-picture, then replace the first sub-picture with the second sub-picture. As shown in FIG. 9 , the training device 320 may first execute step 731 , and then execute step 732 and step 733 , and step 734 or step 735 . Therefore, the training device 320 improves the accuracy of subgraph replacement through two judgments.

In some other embodiments, the computing power correspondence may also be deployed to the execution device 310, and the execution device 310 performs an optimization operation of subgraph replacement for the neural network model to be optimized according to the computing power correspondence. For example, the execution device 310 determines the first subgraph to be replaced in the neural network model to be optimized according to the computing power correspondence, and the second subgraph is equivalent to the first subgraph, then replaces the first subgraph with the second subgraph, Get the optimized neural network model.

In some other embodiments, during the optimization process of the neural network model to be optimized, the training device 320 can automatically save the equivalent subgraph that can bring performance benefits to computing resources to form the second mapping relationship, that is, the equivalent of hardware affinity In the subgraph knowledge base, any model that hits the subgraph in the knowledge base, based on the method provided by the embodiment of the present application, can improve the reasoning performance by optimizing the neural network model to be optimized. As shown in (a) in FIG. 10 , assuming that the computing power of computing resource 1 to calculate subgraph 2 based on neural network model 1 is higher than the computing power of computing resource 1 computing subgraph 1 based on neural network model 1, training device 320 can In neural network model 1, subgraph 1 is replaced with subgraph 2, and corresponding relationship 1 is generated, that is, computing resource 1, neural network model 1, subgraph 1 and subgraph 2. As shown in (b) in FIG. 10 , assuming that the computing power of computing resource 2 to calculate subgraph 4 based on neural network model 2 is higher than the computing power of computing resource 2 computing subgraph 3 based on neural network model 2, training device 320 can In neural network model 2, subgraph 3 is replaced with subgraph 4, and a corresponding relationship 2 is generated, that is, computing resource 2, neural network model 2, subgraph 3, and subgraph 4.

The training device 320 can call the underlying AI chip and the operator interface provided by the operator adaptation layer, optimize the neural network model, and collect relevant operator and sub-graph performance data required by the system.

Compared with replacing subgraphs with equivalent subgraphs designed based on the experience of domain experts, since the computing power characteristics of computing resources are not considered, subgraph replacement may instead prolong the performance of the optimized neural network model when processing data based on computing resources. duration. Moreover, the equivalent subgraph designed according to the application is not necessarily applicable to different computing resources. For different computing resources, the equivalent subgraph needs to be re-analyzed and designed, and the equivalent subgraph cannot be reused. The embodiment of the present application decides whether to perform subgraph replacement according to the corresponding relationship of computing power or the cost function, so as to ensure that each subgraph replacement can effectively bring low performance benefits even if it is based on different computing resources. The neural network model optimization method provided by the embodiment of the present application is easy to use, and the process is fully automated. The user only needs to input the neural network model to be optimized, and the optimized neural network model can be obtained without any other operations. The optimization process is simple and efficient.

It should be noted that, for any subgraph that may be replaced in the neural network model to be optimized, the training device 320 may replace the subgraph according to the methods of step 720 and step 730 above. In addition, if the training device 320 has performed subgraph replacement on the neural network model to be optimized to obtain an updated neural network model, then the training device 320 can continue to perform subgraph replacement based on the updated neural network model according to steps 720 and 730, traversing All possible replacement subgraphs until the optimized neural network model is obtained. It is understandable that the training device 320 performs subgraph replacement of the neural network model to be optimized according to step 720 and step 730, so as to obtain multiple updated neural network models, and the finally obtained optimized neural network model may be multiple updated neural network models. The optimal neural network model in the model.

The training device 320 obtains the optimized neural network model after performing subgraph replacement optimization on the neural network model to be optimized, and deploys the optimized neural network model to the execution device 310, and the execution device 310 processes application data based on the optimized neural network model. Exemplarily, as shown in FIG. 7 or FIG. 9 , after step 734 , the executing device 310 executes step 740 and step 750 . Step 740 , the training device 320 deploys the optimized neural network model to the executing device 310 . In step 750, the executing device 310 processes the application data based on the optimized neural network model to realize application functions such as recognition. Therefore, the duration of processing application data by the executing device 310 is reduced.

Compared with modifying the weight of the neural network model based on quantization technology, it can reduce the underlying calculation amount and achieve the acceleration effect. However, quantization techniques generally require sample data to be corrected, otherwise it will cause a large loss of precision. In scenarios without any sample data, quantization techniques are not applicable.

Compared with the pruning technology, the weights or channels with low importance of the neural network model are deleted to reduce the parameter amount of the neural network model and achieve the effect of inference acceleration. Weight pruning is also called unstructured pruning. After the pruning is completed, it will cause sparsification. Generally, specific hardware that supports sparse computing is required, otherwise there will be no acceleration effect; channel pruning is also called structured pruning. Pruning will bring obvious loss of precision. After pruning, training data is needed to train the neural network model to improve the precision, which is not suitable for scenarios without training data.

The inference acceleration algorithm based on equivalent subgraph replacement provided by the embodiment of the present application can automatically search for equivalent subgraphs compatible with the hardware platform, and automatically perform subgraph replacement on the neural network model to be optimized to achieve inference acceleration without loss of accuracy. The embodiment of this application does not require any data, and can achieve inference acceleration without any data, and has a wide range of application scenarios.

The application scenarios described in the embodiments of this application may include target detection, monitoring, automatic driving, speech recognition, product recommendation, machine translation, AI product classification, industrial quality inspection, and so on.

Object detection is an important part of computer vision. Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, and medical diagnosis. The subject's data and information knowledge. To put it figuratively, it is to install eyes (cameras/video cameras) and brains (algorithms) on computers to replace human eyes to identify and measure targets, so that computers can perceive the environment. Because perception can be thought of as extracting information from sensory signals, computer vision can also be thought of as the science of how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision is to use various imaging systems to replace the visual organs to obtain input information, and then use the computer to replace the brain to complete the processing and interpretation of these input information. The ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.

Object detection methods can be applied in scenarios such as face detection, vehicle detection, pedestrian counting, automatic driving, security systems, and medical fields. For example, in an autonomous driving scenario, an autonomous vehicle recognizes objects in the surrounding environment during driving to adjust the speed and direction of the autonomous vehicle so that the autonomous vehicle can drive safely and avoid traffic accidents. Objects may be other vehicles, traffic control devices, or other types of objects. As another example, in the security system, a large number of users are identified to assist the staff to determine the target person as soon as possible. Usually, the input data (such as image or video) is input to the neural network with target detection function, the neural network performs feature extraction on the input data, and the target detection is performed based on the extracted features, and the detection result is obtained.

In addition, the execution device 310 may have stored the optimized neural network model before executing step 750, that is, the execution device 310 processes the application data based on the optimized neural network model, therefore, the execution device 310 may read the optimized neural network model from the memory. The network model processes application data based on the optimized neural network model.

Optionally, the execution device 310 does not store the optimized neural network model, and needs to download the optimized neural network model from the server or optimize the neural network model by itself. The server may refer to a cloud server.

As an example, FIG. 11 is a schematic structural diagram of a system 1100 provided in the present application. As shown in FIG. 11 , the system 1100 may be an entity that provides cloud services to users by using basic resources. System 1100 includes cloud data center 1110 . The cloud data center 1110 includes a device resource pool (including computing resources 1111 , storage resources 1112 and network resources 1113 ) and a cloud service platform 1120 . The computing resource 1111 included in the cloud data center 1110 may be a computing device (such as a server).

An interaction means 1131 may be deployed on the execution device 1130 . The interaction means 1131 may be a browser or an application capable of message interaction with the cloud service platform 1120 . The user can access the cloud service platform 1120 through the interaction device 1131, upload a request to the cloud data center 1110, and request to optimize the neural network model used in the automatic driving scene. After receiving the request uploaded by the execution device 1130 , the cloud data center 1110 optimizes the neural network model requested by the user, and feeds back the optimized neural network model 301 to the execution device 1130 . The execution device 1130 may be a smart terminal or an edge station. The edge station can process the application data of the self-driving car and transmit the processing results to the self-driving car. The processing results are used to instruct the autonomous vehicle to operate. Alternatively, the execution device 1130 may also be an automatic driving vehicle, and the edge station deploys the optimized neural network model 301 to the automatic driving vehicle, and the automatic driving vehicle processes application data according to the optimized neural network model, and instructs the automatic driving vehicle to operate.

It can be understood that, in order to realize the functions in the foregoing embodiments, the computing device includes corresponding hardware structures and/or software modules for performing various functions. Those skilled in the art should easily realize that the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or by computer software driving hardware depends on the specific application scenario and design constraints of the technical solution.

The neural network model optimization method provided by this embodiment is described in detail above with reference to FIG. 1 to FIG. 11 , and the neural network model optimization device provided by this embodiment will be described below in conjunction with FIG. 12 .

FIG. 12 is a schematic structural diagram of a possible neural network model optimization device provided in this embodiment. These neural network model optimization devices can be used to implement the functions of the training device 320 in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments. In this embodiment, the apparatus for optimizing the neural network model may be the training device 320 shown in FIG. 4 , FIG. 7 or FIG. 9 , or it may be a module (such as a chip) applied to a server.

As shown in FIG. 12 , the neural network model optimization apparatus 1200 includes a communication module 1210 , a module to be replaced 1220 , a replacement module 1230 and a storage module 1240 . The neural network model optimization apparatus 1200 is used to realize the function of the training device 320 in the method embodiment shown in FIG. 4 , FIG. 7 or FIG. 9 above.

The communication module 1210 is used to obtain the neural network model to be optimized, and deploy the optimized neural network model to the execution device 310 . The neural network model to be optimized includes a plurality of operators, the plurality of operators form a plurality of subgraphs, and at least two operators form a subgraph. For example, the communication module 1210 is used to execute step 710 and step 740 in FIG. 7 .

The module to be replaced 1220 is used to find an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, the equivalent subgraph and the first subgraph are for the same input data, and output It is also the same, and the processing efficiency of the equivalent subgraph on the input data is greater than the processing efficiency of the first subgraph on the input data, and the subgraph set includes a plurality of subgraphs. For example, the module to be replaced 1220 is used to execute step 720 and step 730 in FIG. 7 .

The replacement module 1230 is used to replace the first subgraph in the neural network model to be optimized with the equivalent subgraph. For example, the replacement module 1230 is used to execute step 734 in FIG. 7 .

The to-be-replaced module 1220 is specifically configured to determine a second sub-graph corresponding to the first sub-graph in the sub-graph set, the second sub-graph and the first sub-graph are for the same input data, and the output is also the same ; determining that the computing resource used to execute the first sub-graph in the computing device has a data processing efficiency higher than that of executing the second sub-graph when executing the second sub-graph; subgraph as the equivalent subgraph.

The storage module 1240 may correspond to storing information such as sub-graph sets and operator sets in the above method embodiments.

The neural network model optimization apparatus 1200 may also include a search module 1250 . The search module 1250 is configured to obtain an operator set according to the neural network models of multiple application scenarios; and, according to the operator set, search for the sub-graph with an equivalence relationship, so as to generate the sub-graph set. For example, the search module 1250 is used to execute step 410 to step 420 in FIG. 4 .

Optionally, the neural network model optimization apparatus 1200 may also include an updating module 1260 . The update module 1260 updates the operator set and the sub-graph set with the newly added operator.

It should be understood that the neural network model optimization device 1200 in the embodiment of the present application may be implemented by a graphics processing unit (graphics processing unit, GPU), a neural network processor (neural network processing unit, NPU), an application-specific integrated circuit (application-specific Integrated circuit, ASIC) implementation, or programmable logic device (programmable logic device, PLD) implementation, the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), generic array logic (GAL), or any combination thereof. When the neural network model optimization method shown in FIG. 4 , FIG. 7 or FIG. 9 can also be realized by software, the neural network model optimization device 1200 and its modules can also be software modules.

The neural network model optimization device 1200 according to the embodiment of the present application may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various units in the neural network model optimization device 1200 are respectively in order to realize Fig. 4 , the corresponding flow of each method in FIG. 7 or FIG. 9 , for the sake of brevity, details are not repeated here.

FIG. 13 is a schematic structural diagram of a computing device 1300 provided in this embodiment. As shown, the computing device 1300 includes a processor 1310, a bus 1320, a memory 1330, a memory unit 1350 (also referred to as a main memory unit), and a communication interface 1340. The processor 1310 , memory 1330 , memory unit 1350 and communication interface 1340 are connected through a bus 1320 .

It should be understood that in this embodiment, the processor 1310 may be a CPU, and the processor 1310 may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components wait. A general purpose processor may be a microprocessor or any conventional processor or the like.

The processor can also be a GPU, NPU, microprocessor, ASIC, or one or more integrated circuits used to control the program execution of the solution of this application.

The communication interface 1340 is used to realize communication between the computing device 1300 and external devices or devices. In this embodiment, the communication interface 1340 is used for data interaction with other computing devices.

Bus 1320 may include a path for communicating information between the components described above (eg, processor 1310, memory unit 1350, and storage 1330). In addition to the data bus, the bus 1320 may also include a power bus, a control bus, a status signal bus, and the like. However, for clarity of illustration, the various buses are labeled as bus 1320 in the figure. The bus 1320 can be a peripheral component interconnection standard (Peripheral Component Interconnect Express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, unified bus (unified bus, Ubus or UB), computer fast link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.

As one example, computing device 1300 may include multiple processors. The processor may be a multi-CPU processor. A processor herein may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions). The processor 1310 may call the sub-graph set stored in the memory 1330, determine the first sub-graph to be replaced in the neural network model to be optimized according to the sub-atlas, and use the sub-graph equivalent to the first sub-graph in the sub-atlas The second subgraph replaces the first subgraph to obtain an optimized neural network model, and when data is processed based on computing resources, the duration of the optimized neural network model is shorter than the duration of the to-be-optimized neural network model.

It is worth noting that in FIG. 13 , the computing device 1300 includes only one processor 1310 and one memory 1330 as an example. Here, the processor 1310 and the memory 1330 are respectively used to indicate a type of device or device. In a specific embodiment , the quantity of each type of device or equipment can be determined according to business needs.

The memory unit 1350 may correspond to a storage medium for storing information such as a sub-atlas in the foregoing method embodiments. The memory unit 1350 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).

The storage 1330 is used to store data, and may be a solid-state hard disk or a mechanical hard disk.

The above-mentioned computing device 1300 may be a general-purpose device or a special-purpose device. For example, the computing device 1300 may be a mobile phone terminal, a tablet computer, a notebook computer, a VR device, an AR device, a mixed reality (Mixed Reality, MR) device or an extended reality (Extended Reality, ER) device, a vehicle terminal, etc., and may also be an edge equipment (eg, a box carrying a chip with processing power), etc. Optionally, computing device 1300 may also be a server or other devices with computing capabilities.

It should be understood that the computing device 1300 according to this embodiment may correspond to the neural network model optimization apparatus 1200 in this embodiment, and may correspond to executing the corresponding subject in FIG. 4 , FIG. 7 or FIG. 9 , and the neural network model optimization The above and other operations and/or functions of each module in the apparatus 1200 are for realizing the corresponding processes in FIG. 4 , FIG. 7 or FIG. 9 , and for the sake of brevity, details are not repeated here.

The method steps in this embodiment may be implemented by means of hardware, and may also be implemented by means of a processor executing software instructions. Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC. Additionally, the ASIC may reside in a computing device. Certainly, the processor and the storage medium may also exist in the network device or the terminal device as discrete components.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices. The computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. Described usable medium can be magnetic medium, for example, floppy disk, hard disk, magnetic tape; It can also be optical medium, for example, digital video disc (digital video disc, DVD); It can also be semiconductor medium, for example, solid state drive (solid state drive) , SSD).

The above is only a specific embodiment of the application, but the scope of protection of the application is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the scope of the technology disclosed in the application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A neural network model optimization method, characterized in that said method is executed by a computing device, said method comprising:

Obtaining a neural network model to be optimized, the neural network model to be optimized includes a plurality of operators, the plurality of operators form a plurality of subgraphs, and at least two operators form a subgraph;

Find the equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, the equivalent subgraph and the first subgraph are for the same input data, and the output is also the same, and the The processing efficiency of the equivalent subgraph for the input data is greater than the processing efficiency of the first subgraph for the input data, and the subgraph set includes a plurality of subgraphs;

The first subgraph in the neural network model to be optimized is replaced with the equivalent subgraph.
The method according to claim 1, wherein the searching for an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set comprises:

Determining a second subgraph corresponding to the first subgraph in the subgraph set, the second subgraph and the first subgraph are for the same input data, and the output is also the same;

determining that the computing resources used to execute the first subgraph in the computing device have a higher data processing efficiency when executing the second subgraph than when executing the first subgraph;

The second subgraph is used as the equivalent subgraph.
The method according to claim 2, wherein the determining the second subgraph corresponding to the first subgraph in the subgraph set comprises:

inputting the input data into the first subgraph, running the first subgraph through the computing resource, and outputting an operation result;

Inputting the input data into at least one sub-graph in the sub-graph set, and determining the same sub-graph as the operation result as the second sub-graph.
The method according to claim 3, characterized in that the method further comprises:

Recording the mapping relationship between the first sub-graph and the second sub-graph in the sub-graph set.
The method according to claim 2, wherein the set of sub-graphs includes a first mapping relationship between the first sub-graph and the second sub-graph;

The determining the second sub-image corresponding to the first sub-image in the sub-image set includes:

A second subgraph corresponding to the first subgraph is determined according to the first mapping relationship.
The method according to claim 3 or 4, characterized in that the processing efficiency of the data when the computing resource for executing the first sub-graph in the computing device is determined to execute the second sub-graph is higher than The data processing efficiency when executing the first sub-graph includes:

The computing resource invokes a cost function to run the first subgraph, and records the first data processing efficiency;

The computing resources call a cost function to run the second subgraph, and record the second data processing efficiency;

By comparing the first data processing efficiency with the second data processing efficiency, it is determined that the data processing efficiency when the second sub-graph is executed is higher than the data processing efficiency when the first sub-graph is executed.
The method according to claim 6, further comprising:

Recording the mapping relationship of the computing resource, the first sub-graph, and the second sub-graph to the sub-graph set.
The method according to claim 2, wherein the set of sub-graphs includes a second mapping relationship of computing resources, the first sub-graph, and the second sub-graph;

The determining the second sub-image corresponding to the first sub-image in the sub-image set includes:

determining a second subgraph corresponding to the first subgraph according to the second mapping relationship;

The determining that the computing resource used to execute the first subgraph in the computing device has a data processing efficiency higher than that of executing the first subgraph when executing the second subgraph includes:

It is determined according to the second mapping relationship that the computing resource used to execute the first subgraph has a higher data processing efficiency when executing the second subgraph than when executing the first subgraph.
A neural network model optimization device, characterized in that it comprises:

A communication module, configured to obtain a neural network model to be optimized, the neural network model to be optimized includes a plurality of operators, the plurality of operators form a plurality of subgraphs, and at least two operators form a subgraph;

The module to be replaced is used to find the equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, the equivalent subgraph and the first subgraph are for the same input data, and output It is also the same, and the processing efficiency of the equivalent subgraph for the input data is greater than the processing efficiency of the first subgraph for the input data, and the subgraph set includes multiple subgraphs;

The replacement module is further configured to replace the first subgraph in the neural network model to be optimized with the equivalent subgraph.
The device according to claim 9, wherein when the module to be replaced searches for an equivalent subgraph of the first subgraph in the neural network model to be optimized in the subgraph set, it is specifically used for:

Determining a second subgraph corresponding to the first subgraph in the subgraph set, the second subgraph and the first subgraph are for the same input data, and the output is also the same;

determining that the computing resources used to execute the first subgraph in the computing device have a higher data processing efficiency when executing the second subgraph than when executing the first subgraph;

The second subgraph is used as the equivalent subgraph.
The device according to claim 10, wherein when the module to be replaced determines the second submap corresponding to the first submap in the submap set, it is specifically used for:

inputting the input data into the first subgraph, running the first subgraph through the computing resource, and outputting an operation result;

Inputting the input data into at least one sub-graph in the sub-graph set, and determining the same sub-graph as the operation result as the second sub-graph.
The device according to claim 11, further comprising:

A storage module, configured to record the mapping relationship between the first sub-graph and the second sub-graph to the sub-graph set.
The device according to claim 10, wherein the set of sub-graphs includes a first mapping relationship between the first sub-graph and the second sub-graph;

When the module to be replaced determines the second submap corresponding to the first submap in the submap set, it is specifically used for:

A second subgraph corresponding to the first subgraph is determined according to the first mapping relationship.
The apparatus according to claim 11 or 12, wherein the module to be replaced determines the processing of data when the computing resources used to execute the first sub-graph in the computing device execute the second sub-graph When the efficiency is higher than the data processing efficiency when executing the first subgraph, it is specifically used for:

The computing resource invokes a cost function to run the first subgraph, and records the first data processing efficiency;

The computing resources call a cost function to run the second subgraph, and record the second data processing efficiency;

By comparing the first data processing efficiency with the second data processing efficiency, it is determined that the data processing efficiency when the second sub-graph is executed is higher than the data processing efficiency when the first sub-graph is executed.
The device according to claim 14, further comprising:

A storage module, configured to record the mapping relationship of the computing resource, the first sub-graph, and the second sub-graph to the sub-graph set.
The device according to claim 10, wherein the set of sub-graphs includes a second mapping relationship of computing resources, the first sub-graph, and the second sub-graph;

When the module to be replaced determines the second submap corresponding to the first submap in the submap set, it is specifically used for:

determining a second subgraph corresponding to the first subgraph according to the second mapping relationship;

The determining that the data processing efficiency of the computing resources used to execute the first subgraph in the computing device when executing the second subgraph is higher than the data processing efficiency when executing the first subgraph includes:

It is determined according to the second mapping relationship that the computing resource used to execute the first subgraph has a higher data processing efficiency when executing the second subgraph than when executing the first subgraph.
A computing device, characterized in that it includes a memory and a processor, the memory is used to store a set of computer instructions; when the processor executes the set of computer instructions, any one of the preceding claims 1-8 is performed Operation steps of the method described in item.