WO2021120177A1 - 编译神经网络模型的方法和装置 - Google Patents

编译神经网络模型的方法和装置 Download PDF

Info

Publication number
WO2021120177A1
WO2021120177A1 PCT/CN2019/127035 CN2019127035W WO2021120177A1 WO 2021120177 A1 WO2021120177 A1 WO 2021120177A1 CN 2019127035 W CN2019127035 W CN 2019127035W WO 2021120177 A1 WO2021120177 A1 WO 2021120177A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
layer
neural network
model file
model
Prior art date
Application number
PCT/CN2019/127035
Other languages
English (en)
French (fr)
Inventor
柯继伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/127035 priority Critical patent/WO2021120177A1/zh
Priority to CN201980102747.9A priority patent/CN114746868A/zh
Publication of WO2021120177A1 publication Critical patent/WO2021120177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method and device for compiling a neural network model.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • neural networks for example, convolutional neural networks
  • neural networks are also used in the processing and analysis of various media signals such as images, videos and voices, as well as text and other information. Great achievements have been made.
  • Neural networks are usually obtained in the following ways: algorithm engineers use the learning framework to build a model, and after parameter adjustment and training optimization, the generated network parameters and model structure are saved together.
  • the saved file is the model file that can be used for forward inference.
  • the neural network is usually deployed in another device, model framework, or system for application or retraining.
  • the user In order to ensure the normal compilation of the model file, the user not only needs to provide the model file to be compiled to the model compilation device, but also needs to provide the model compilation device with information such as the function to obtain the layer parameters in the model, the derivation function of the layer output size, etc. Will make the user's workload larger, thereby affecting the user experience.
  • the model file needs to be compiled into a file supported by the caffe framework.
  • the user not only needs to provide the model file to be converted to the compiling device, but also needs to provide information such as the function to obtain the layer parameters of the neural network, the derivation function of the layer output size, etc., which will make the user's workload larger and affect user experience.
  • model file to be converted also needs to provide information such as the function to obtain the layer parameter corresponding to the custom operator, the derivation function of the layer output size, and so on, which will cause a heavy workload for the user and affect the user experience.
  • the present application provides a method and related device for compiling a neural network model, which can reduce the workload for users to provide a model file to be compiled and improve the compilation efficiency of the neural network model.
  • the present application provides a method for compiling a neural network model.
  • the method includes: obtaining a first model file of the neural network, the first model file including first information, and the first information is used to indicate The tensor size of the output of the target layer in the neural network; according to the first information in the first model file, the first model file is compiled to obtain the second model file of the neural network.
  • the first model file when compiling the architecture information of the neural network according to the first model file, the first model file can be used directly.
  • the output tensor size of the target layer is read in the file, without the need to obtain the model information of the neural network according to the function provided by the user. In this way, not only can users no longer need to provide one or more functions used to obtain the output tensor size of the target layer from the model information of the neural network, thereby reducing the user's workload, but also improving the compilation of the first model file
  • the efficiency of the second model file can ultimately increase the user's utilization rate of the target device that can run the second model file.
  • directly providing the output tensor size of the target layer compared with providing the output tensor function size, can also reduce the size of the first model file, thereby helping to improve the transmission efficiency of the first model file and introducing storage space .
  • the target layer is a custom layer.
  • the first model file of the neural network includes the output tensor size of the user-defined layer.
  • the first information may be a parameter in the layer definition of the target layer.
  • the first model file includes second information, and the second information is used to indicate that the target layer is a custom layer.
  • the compiling the first model file according to the first information in the first model file includes: obtaining the first model file from the first model file according to the second information Information; register the target layer according to the first information.
  • the second information can directly determine whether the target layer is a custom layer, so as to determine whether the first model file contains The size of the output tensor of the target layer, so that the first information can be quickly read from the first model file, and the target layer can be registered according to the first information.
  • Such an implementation manner can further improve the efficiency of compiling and obtaining the second model file according to the first model file.
  • the second information may be a parameter in the layer definition of the target layer.
  • the second information corresponding to different types of custom layers in the neural network is the same.
  • the same value is used to identify the layer in the neural network as a custom layer.
  • the value of the type parameter in the layer definition of all custom layers is the same value pre-configured by the software stack supported by the target device.
  • different types of custom layers correspond to different second information.
  • the second information corresponding to all custom layers is no longer unified into the same value, but different values are set for the second information corresponding to different types of custom layers according to the type of the custom layer.
  • the type value of the custom layer of the pooling layer type is unified into one value
  • the type value of the custom layer of the convolution layer type is unified into another value
  • Such an implementation method helps the user or the compilation tool or the target device to classify and manage these custom layers, thereby reducing the user's workload and compilation efficiency and operating efficiency.
  • the first model file further includes third information, and the third information is used to uniquely identify the target layer in the neural network.
  • the second model file includes the correspondence between the third information and the realization function of the target layer.
  • the third information may be a parameter used to indicate the name of the target layer in the layer definition of the target layer, or may be a numerical identifier set by the user for the target layer.
  • the compilation tool can generate the difference between the unique identification information and the implementation function of the target layer after compiling the target layer. Correspondence between.
  • the user does not need to provide a function indicating how to call the realization function of the target layer, and the target device can find the realization function of the target layer according to the unique identification information and the corresponding relationship, thereby completing the callback of the realization function of the target layer.
  • these implementations can further reduce the user's workload.
  • the present application provides an apparatus for compiling a neural network model, including: an acquisition module for acquiring a first model file of the neural network, the first model file includes first information, and the first information It is used to indicate the tensor size of the output of the target layer in the neural network; the compilation module is used to compile the first model file according to the first information in the first model file to obtain the The second model file of the neural network.
  • the target layer is a custom layer.
  • the first model file further includes second information, and the second information is used to indicate that the target layer is a custom layer.
  • the compilation module is specifically configured to: obtain the first information from the first model file according to the second information; and register the target layer according to the first information.
  • the second information corresponding to different types of custom layers in the neural network is the same.
  • different types of custom layers correspond to different second information.
  • the first model file further includes third information, and the third information is used to uniquely identify the target layer in the neural network.
  • the second model file includes the correspondence between the third information and the realization function of the target layer.
  • the present application provides a device for compiling a neural network model.
  • the device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed At this time, the processor is used to execute the method in the first aspect.
  • the present application provides a computer-readable medium that stores instructions for device execution, and the instructions are used to implement the method in the first aspect.
  • the present application provides a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the method in the first aspect.
  • the present application provides a chip that includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the method in the first aspect.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is used to execute the method in the first aspect.
  • the present application provides a computing device.
  • the computing device includes a processor and a memory, where computer instructions are stored in the memory, and the processor executes the computer instructions to implement the method in the first aspect.
  • Fig. 1 is a schematic diagram of an artificial intelligence main body framework provided by this application.
  • FIG. 2 is a schematic structural diagram of a system architecture provided by this application.
  • FIG. 3 is a schematic structural diagram of a convolutional neural network provided by this application.
  • FIG. 4 is a schematic structural diagram of another convolutional neural network provided by this application.
  • FIG. 5 is a schematic flowchart of the method for compiling a neural network model provided by this application.
  • FIG. 6 is a schematic flowchart of the apparatus for compiling a neural network model provided by this application.
  • FIG. 7 is another schematic flowchart of the apparatus for compiling a neural network model provided by this application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of the artificial intelligence system and is suitable for general artificial intelligence field requirements.
  • Intelligent Information Chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, safe city, smart terminal, etc.
  • the embodiments of the present application relate to related applications of neural networks.
  • the following first introduces related terms and other related concepts of neural networks that may be involved in the embodiments of the present application.
  • a deep learning model refers to a machine learning model that includes a deep neural network structure. Algorithm engineers use the deep learning framework to build a model, and after tuning and training the model, save the final generated network parameters and model structure together, and the resulting file is a model file that can be used for forward inference.
  • model files trained by different deep learning frameworks are not the same, but the complete model file generally contains information such as tensor data, arithmetic units, and calculation graphs.
  • Tensor is the data container of the deep learning system, which can be understood as the expansion of the matrix to any latitude.
  • a tensor containing only one number is called a scalar, scalar tensor, zero-dimensional tensor or 0D tensor; an array of numbers is called a vector or one-dimensional tensor or 1D tensor; an array of vectors is called Matrix or two-dimensional tensor or 2D tensor; a three-dimensional tensor can be obtained by combining multiple matrices into a new data, and a three-dimensional tensor can be intuitively understood as a cube composed of numbers; combining multiple three-dimensional tensors As an array, you can create a four-dimensional tensor, and so on. Deep learning generally processes 0D to 4D tensors, but 5D tensors may be encountered when processing video data.
  • Operation/operator which can also be called calculation unit, operator, or operator, represents a symbolic operation process, and is the basic unit of the mainstream deep learning framework, that is, the node in the graph.
  • the input and output of the arithmetic unit are tensors. All the transformations learned by the deep network can be simplified into some tensor operations on the numerical data tensor.
  • Common arithmetic units include add unit, batch normalization unit, convolution unit, gated recurrent unit, local response normalization (LRN) unit, long and short-term memory (long-term memory, LSTM) unit, max pooling unit, sparse activation function (rectified liner uints, ReLU), recurrent neural network (recurrent neural networks, RNN) unit, Softmax function, etc.
  • Computational graph also known as data flow graph, is defined as a directed acyclic graph. Both tensors and computing units are objects in the graph. The computing units are the nodes of the graph, and the tensors are the data flowing on the edges of the graph. Acyclic means that the graph cannot have cycles. For example, the tensor x cannot be the input of a certain layer that generates x. The only allowed processing loop (i.e., loop connection) is the inner loop of the loop layer.
  • the deep learning framework uses calculation graphs to express calculations as dependencies between independent instructions. We can understand calculation graphs as a way to express and evaluate mathematical expressions.
  • Convolutional neural network (convosutionas neuras network, CNN)
  • Convolutional neural network is a deep neural network with convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size.
  • the convolution kernel can obtain reasonable weights through learning.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 300 may include an input layer 310, a convolutional layer/pooling layer 320 (where the pooling layer is optional), and a neural network layer 330.
  • the input layer 310 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 320 and the following
  • the neural network layer 330 performs processing to obtain the processing result of the image.
  • the convolutional layer/pooling layer 320 may include layers 321-326, for example: in one implementation, layer 321 is a convolutional layer, layer 322 is a pooling layer, and layer 323 is a convolutional layer. Layers, 324 is a pooling layer, 325 is a convolutional layer, and 326 is a pooling layer; in another implementation, 321 and 322 are convolutional layers, 323 is a pooling layer, and 324 and 325 are convolutional layers. Layer, 326 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the internal working principle of a convolutional layer will be introduced.
  • the input data is voice or text or other types of data
  • the internal working principle of the convolutional layer is similar.
  • the convolution layer 321 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
  • the initial convolutional layer (such as 321) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 326) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • Neural network layer 330
  • the convolutional neural network 300 After processing by the convolutional layer/pooling layer 320, the convolutional neural network 300 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 320 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 300 needs to use the neural network layer 330 to generate one or a group of required classes of output. Therefore, the neural network layer 330 can include multiple hidden layers (331, 332 to 33n as shown in FIG. 3) and an output layer 340. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 340 After the multiple hidden layers in the neural network layer 330, that is, the final layer of the entire convolutional neural network 300 is the output layer 340.
  • the output layer 340 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (where the pooling layer is optional), and a neural network layer 430.
  • CNN convolutional neural network
  • the multiple convolutional/pooling layers (421 to 426) in the convolutional/pooling layer 420 in Figure 4 are parallel, and the extracted features are all input to the full neural network layer 430 for processing. deal with.
  • the neural network layer 430 may include multiple hidden layers, that is, hidden layer 1 to hidden layer n, which may be denoted as 431 to 43n.
  • convolutional neural network shown in Figs. 3 and 4 is only used as an example of two possible convolutional neural networks in the embodiment of the present application.
  • convolutional neural network in the embodiment of the present application Convolutional neural networks can also exist in the form of other network models.
  • RNN is used to process sequence data.
  • the reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output. For example, if you want to predict what the next word of a sentence will be, you generally need to use the previous word, because the preceding and following words in a sentence are not independent.
  • the specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • an embodiment of the present application provides a system architecture 200.
  • a data collection device 260 is used to collect training data.
  • the training data may include training images and classification results corresponding to the training images, where the results of the training images may be manually pre-labeled results.
  • the target model 201 may also be referred to as a target rule 201.
  • the data collection device 260 stores the training data in the database 230, and the training device 220 trains to obtain the target model/rule 201 based on the training data maintained in the database 230.
  • the training device 220 processes the input original image and compares the output image with the original image until the difference between the image output by the training device 120 and the original image is If it is less than a certain threshold, the training of the target model 201 is completed.
  • the target model 201 in the embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 230 may not all come from the collection of the data collection device 260, and may also be received from other devices.
  • the training device 220 does not necessarily perform training of the target model 201 completely based on the training data maintained by the database 230. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to the embodiments of this application. The limit.
  • the target model 201 obtained by training according to the training device 220 can be applied to different systems or devices, such as the client device 240 shown in FIG. 2.
  • the client device 240 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook computer. , Augmented reality (AR) AR/virtual reality (VR), in-vehicle terminal, etc., it can also be a server or cloud.
  • AR Augmented reality
  • VR virtual reality
  • in-vehicle terminal etc.
  • it can also be a server or cloud.
  • the training device 220 can generate a corresponding target model 201 based on different training data for different goals or different tasks.
  • the corresponding target model 201 can be used to achieve the above goals or complete the above tasks, so as to provide users with what they need. the result of.
  • the target model 201 obtained by training according to the training device 220 may be a CNN, a deep convolutional neural network (DCNN), a recurrent neural network (RNNS), and so on.
  • DCNN deep convolutional neural network
  • RNNS recurrent neural network
  • FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure, the type of training data, and the type or function of the neural network are different. Constitute any restriction.
  • the model converter 210 may be placed in the client device 240.
  • the training data can also be text, voice or other types of data.
  • the model converter may also have other names, such as a model compiler, etc. As long as a device or device that can realize a function similar to the model converter 210 can be understood as the model converter in this application.
  • the target model 201 model file trained by the training device 220 is platform-independent (that is, it can be compiled and run on different hardware platforms). If you want to apply the target model 201 on the client device 240, then the target model 201 trained by the training device 220 It needs to be processed by the model converter 210 to compile the model file of the target model 201 from the current format to the format supported by the client device.
  • the model file of the target model 201 needs to be input to the model converter 210, and the model converter 210 compiles the target model 201 to obtain the model file supported by the client device 240, and then The compiled model file is then deployed to the client device 240.
  • the conversion processing of the target model 201 by the model converter 210 may also be referred to as compilation.
  • the custom operator developer In order to compile successfully, the custom operator developer also needs to provide the model converter 210 with the parameter definition function, the parameter analysis function, the derivation function of the output tensor (shape) size, the realization function, and the operator included in each layer in the model. Call (forward) functions and other content.
  • the target model 201 is a model developed under the TensorFlow framework, and the operators in some or all of the layers in the target model 201 are customized by the developer, that is, the operator does not belong to the AI software stack of the TensorFlow framework.
  • the developer inputs the model file of the target model 201 into the model converter 210 to compile the model file that can be run on the client device through the model converter 210, the developer also needs to provide the model converter 210 with a custom operator Parameter definition function, parameter analysis function, derivation function of output shape (shape), realization function, and forward function. It is understandable that in this application, the custom operator and the custom layer can be replaced with each other.
  • model converter 210 since the custom operator developed by the user can only be used in its own deployment environment and does not need to be opened to the outside world, the model converter 210 does not require the user to provide how to define the operator parameters, For details such as calculations, users only need to provide the information needed to ensure the normal operation of graph compilation. That is to say, the so-called universal custom operator registration mechanism provided by the prior art when compiling the model actually forces the user to provide a lot of additional information required for the opening of the operator. If the user's goal is only The model converter 210 supports the registration and invocation of custom operators developed by it, which can simplify the development process without providing such a large amount of operator registration information.
  • model converter 210 does not need such complicated logic to register and call the custom operator developed by the model converter 210. It only needs the user to provide the shape constant of the operator in the model.
  • the details of the layer parameters can also be simplified, because the calculation function of the user-defined operator is completed by the user, and the internal interface of the user is input, and there is no need to register to the model converter 210.
  • this application proposes a lightweight custom operator registration process and callback method in a neural network reasoning scenario, so that users expect to compile models including custom operators based on the chip-supported model converter 210. It is possible to reduce user workload and improve user experience, which in turn translates into chip sales.
  • the model converter 210 may provide a unified custom layer type for the user to call, so as to save the workload of user-defined layer parameters and parsing functions.
  • Table 1 is an example of the layer definition of the unified custom layer of this application. As shown in Table 1, the layer parameter definition includes type "type”, “shape (shape)” and “layer identification (proc_id)". "Shape” and “proc_id” can also be collectively referred to as "Attribute”.
  • the value of the parameter "Type” is "Custom”
  • the parameter “shape” is used to specify the size of the output channel of the custom operator in the current model
  • the parameter “proc_id” is used to specify the correspondence between the custom operator and the implemented function relationship.
  • the user does not need to write the shape derivation function of the custom layer, just provide the fixed shape of the custom layer in the network.
  • the user can extract the shape of the custom layer from the training framework after the model training is completed.
  • the user can automatically complete the shape insertion of the custom layer according to the "layer name (layer_name)" of the custom layer through an automated script.
  • the number of shapes is only an example, and can include more or less fixed shapes; and the value of shape is also an example, and can be values other than 1, 64, and 56.
  • the user can define the proc_id in the layer parameter.
  • the proc_id corresponds to the calculation function one-to-one, so that the corresponding calculation function can be found according to the proc_id at an appropriate time to complete the call of the custom operator.
  • the definition example of the custom operator is as follows:
  • the client device can retrieve the self-defined operator calculation function implemented by itself in the mapping table based on this proc_id, and complete the call.
  • Table 2 is an example of a mapping table between proc_id and calculation function.
  • the proc_id mapping function (function) with a value of 0 is named void custom_reshape_forward()
  • the proc_id mapping function with a value of 1 is named void custom_permute_forward().
  • the model converter 210 may also classify based on the layer type, and provide users with multiple custom types, so that the user can manage the custom operators according to the operation types according to the custom types.
  • Table 3 is an example of the parameter definition of the custom layer provided to the user after the types of the custom layer are divided.
  • the types of custom layers include "custom convolution layer (CustomConvolution)", “custom pooling layer (CustomPooling)”.
  • a mapping table between other parameters of the custom layer and the realization function of the custom layer may also be established, and the other parameters may be transparently transmitted.
  • the index used for the realization function as long as the parameter value of the other parameter is unique in the entire neural network, it can assist in positioning the operator realization.
  • the parameter of the custom layer also includes the layer name (layer_name) of the custom layer.
  • Fig. 5 is an exemplary flowchart of a method for compiling a neural network model according to an embodiment of the application.
  • the method includes S610 and S620. This method can be executed by the model converter 210 in FIG. 2.
  • S610 Obtain a first model file of the neural network, where the first model file includes first information, and the first information is used to indicate the tensor size of the output of the target layer in the neural network.
  • the neural network in this application includes but is not limited to CNN, DNN, RNN, NLP, GNN, etc.
  • the neural network can be any neural network used to achieve any of the following functions: image recognition, target detection, image segmentation, speech recognition , Machine translation, automatic labeling, target tracking, super resolution, etc.
  • the target layer can be any layer in the neural network.
  • the first model file may include multiple pieces of first information, and each first information corresponds to a target layer in the neural network, that is, each first information is used to indicate the output tensor size of a target layer in the neural network.
  • the first model file may be a model file obtained by the developer of the neural network model constructed and trained.
  • An example of the first model file is a pb file of tensorflow, a prototxt file of caffe, or a pth file of pytorch, etc.
  • obtaining the first model file may be receiving the first model file, or may be reading the first model file from the memory.
  • the first information may be a parameter in the layer definition of the target layer.
  • the layer definition of the target layer may include the "shape" parameter as shown in Table 1, and the "shape" parameter is the first information.
  • S620 Compile the first model file according to the first information in the first model file to obtain a second model file of the neural network.
  • Compiling the first model file according to the first information in the first model file can be understood as: converting the first model file based on the model converter supported by the target device according to the first information , To generate a second model file that can be run or executable on the target device.
  • the target device usually refers to a specific hardware device, such as a specific chip.
  • the format of the second model file and the format of the second model file may be the same or different.
  • An example of the second model file is a pb file of tensorflow, a prototxt file of caffe, or a pth file of pytorch, etc.
  • the first model file when compiling the architecture information of the neural network according to the first model file, the first model file can be used directly.
  • the output tensor size of the target layer is read in the file, without the need to obtain the model information of the neural network according to the function provided by the user. In this way, not only can users no longer need to provide one or more functions used to obtain the output tensor size of the target layer from the model information of the neural network, thereby reducing the user's workload, but also improving the compilation of the first model file
  • the efficiency of the second model file can ultimately increase the user's utilization rate of the target device that can run the second model file.
  • the target layer may be a user-defined layer, or may be a layer that has been configured in the software stack supported by the target device.
  • the first model file only needs to include the output tensor size of the custom layer, instead of including the output tensor size of the software stack's own layer, which increases the user's workload.
  • the first model file may further include second information, and the second information is used to indicate that the target layer is a custom layer.
  • compiling the first model file according to the first model file may include: obtaining first information from the first model file according to the second information; and registering the target layer according to the first information.
  • the registration in this application can be understood as an operation that enables the model converter to correctly identify the custom operator developed by the user. If this operation is not performed, the model converter may directly report an error.
  • Registering the target layer in this application can be understood as registering the target layer with the model converter.
  • the output tensor size of the target layer is directly included in the first model file, so that the first information can be directly read from the first model file, and the The target layer is registered in the software stack supported by the target device.
  • Such an implementation manner can further improve the efficiency of compiling and obtaining the second model file according to the first model file.
  • the second information may be a parameter in the layer definition of the target layer.
  • the layer definition of the target layer may include the "Type" parameter as shown in Table 1, and the "Type" parameter is the second information.
  • the compiler can tentatively read the first information corresponding to the layer directly, that is, tentatively read the layer’s information directly.
  • Output tensor size If the reading is successful, the layer can be registered according to the read output tensor size; if the reading fails, the layer can be registered in the manner in the prior art. For example, the shape size derivation function corresponding to the layer is obtained, and the output tensor size of the layer is obtained according to the function.
  • the second information is not included in the first model file, in other implementations, for each layer, you can first determine whether the layer is the layer of the software stack. If so, use the method corresponding to the layer of the software stack. Get the output tensor size, otherwise directly read the first information corresponding to the layer, that is, directly read the output tensor size of the layer.
  • the second information corresponding to different types of custom layers in the neural network may be the same, that is, for all custom layers, regardless of type, the corresponding second information is the same.
  • Such an implementation method because it is sufficient to set the same type value for all custom layers, it helps to reduce the workload of the user.
  • the second information corresponding to all custom layers may be a pre-appointed value in the software stack. In this way, when compiling, it is convenient for the compiling device to determine whether the layer in the neural network is a custom layer.
  • the type parameter in the layer definition of all custom layers is the second information, and the value of the type parameter is the same value pre-configured by the software stack supported by the target device.
  • the "Type" parameter in the layer definition can be set to "Custom".
  • Examples of the types of layers in the neural network include: convolutional layers, pooling layers, fully connected layers, and so on.
  • the values of the second information corresponding to different types of custom layers are different.
  • the second information corresponding to all custom layers is no longer unified into the same value, but different second information is set according to the type of the custom layer, for example, the second information of the custom layer of the pooling layer type
  • the information is unified into the same value, and the second information corresponding to the custom layer of the convolutional layer type is unified into the same value.
  • Such an implementation method helps the user or the compilation tool or the target device to classify and manage these custom layers, thereby reducing the user's workload and compilation efficiency and operating efficiency.
  • the second information is the "Type” parameter in the layer definition of the custom layer
  • the "Type” parameter in the layer definition of the custom layer whose type is the pooling layer can be set to "CustomPooling”
  • the "Type” parameter in the layer definition of the custom layer whose type is the convolutional layer can be set to "CustomConvolution”.
  • the first model file may further include third information, and the third information is used to uniquely identify the target layer in the neural network.
  • the second model file may include the correspondence between the third information and the realization function of the target layer.
  • the first model file provides the unique identification information of the target layer in the neural network
  • the compilation tool compiles the target layer
  • the corresponding relationship between the unique identification information and the realization function of the target layer can be generated.
  • the user does not need to provide a function indicating how to call the realization function of the target layer
  • the target device can find the realization function of the target layer according to the unique identification information and the corresponding relationship, thereby completing the callback of the realization function of the target layer.
  • these implementations can further reduce the user's workload.
  • the third information may be a name parameter in the layer definition of the target layer, or may be a numerical identifier set by the user for the target layer.
  • the third information may be the "proc_id” parameter therein, and the parameter may be specified by the user.
  • the third information may be the "layer_name” parameter therein.
  • Fig. 6 is an exemplary structural diagram of a device for compiling a neural network model of the present application.
  • the device 700 includes an obtaining module 710 and a compiling module 720.
  • the apparatus 700 can implement the method shown in FIG. 5 described above.
  • the obtaining module 710 is used to perform S610
  • the compiling module 720 is used to perform S620.
  • the apparatus 700 may be the training device 220 in FIG. 2; in other possible implementation manners, the apparatus 700 may be the client device 240 described in FIG. 2.
  • the device 700 may be deployed in a cloud environment, which is an entity that uses basic resources to provide cloud services to users in a cloud computing mode.
  • the cloud environment includes a cloud data center and a cloud service platform.
  • the cloud data center includes a large number of basic resources (including computing resources, storage resources, and network resources) owned by a cloud service provider.
  • the computing resources included in the cloud data center can be a large number of computing resources.
  • Device for example, server).
  • the device 700 may be a server used for compiling a neural network model in a cloud data center.
  • the device 700 may also be a virtual machine created in a cloud data center for compiling a neural network model.
  • the device 700 may also be a software device deployed on a server or a virtual machine in a cloud data center.
  • the software device is used to compile a neural network model.
  • the software device may be distributed on multiple servers, or distributed on multiple servers. Deployed on multiple virtual machines, or distributed on virtual machines and servers.
  • the compilation module 720 includes multiple sub-modules
  • the multiple sub-modules may be deployed on multiple servers, or distributedly deployed on multiple virtual machines, or distributedly deployed on virtual machines and servers.
  • the device 700 can be abstracted by a cloud service provider on a cloud service platform into a cloud service that compiles a neural network model and provided to the user.
  • the cloud environment uses the cloud service to compile and train the neural network
  • the user can upload the neural network model to be compiled to the cloud environment through the application program interface (API) or through the web interface provided by the cloud service platform, and the device 1100 receives the neural network model to be compiled and treats it
  • the compiled neural network model is compiled, and the finally compiled neural network model is returned by the device 700 to the edge device where the user is located.
  • API application program interface
  • the apparatus 700 can also be separately deployed on a computing device in any environment.
  • This application also provides an apparatus 800 as shown in FIG. 7, and the apparatus 800 includes a processor 802, a communication interface 803, and a memory 804.
  • An example of the device 800 is a chip.
  • Another example of the apparatus 800 is a computing device.
  • the processor 802, the memory 804, and the communication interface 803 may communicate through a bus.
  • Executable code is stored in the memory 804, and the processor 802 reads the executable code in the memory 804 to execute the corresponding method.
  • the memory 804 may also include an operating system and other software modules required for running processes.
  • the operating system can be LINUX TM , UNIX TM , WINDOWS TM etc.
  • the executable code in the memory 804 is used to implement the method shown in FIG. 5, and the processor 802 reads the executable code in the memory 804 to execute the method shown in FIG. 5.
  • the processor 802 may be a central processing unit (CPU).
  • the memory 804 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM).
  • volatile memory such as a random access memory (random access memory, RAM).
  • RAM random access memory
  • the memory 804 may also include non-volatile memory (2non-volatile memory, 2NVM), such as read-only memory (2read-only memory, 2ROM), flash memory, hard disk drive (HDD), or solid-state boot ( solid state disk, SSD).
  • 2NVM non-volatile memory
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Abstract

人工智能领域编译神经网络模型的方法和相关装置。该编译神经网络模型的方法中,获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小(S610);根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件(S620)。该方法可以减少用户提供待编译模型文件的工作量以及提高神经网络模型的编译效率。

Description

编译神经网络模型的方法和装置 技术领域
本申请涉及人工智能领域,特别涉及一种编译神经网络模型的方法和装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
随着人工智能技术的快速发展,神经网络(例如,卷积神经网络)的性能得到了持续的提升,神经网络在图像、视频以及语音等多种媒体信号以及文本等信息的处理与分析中也取得了很大的成就。
神经网络通常通过以下方式获取:算法工程师使用学习框架构建好模型,经过调参和训练优化后,将生成的网络参数和模型结构一起保存,保存的文件即为可用于前向推理的模型文件。
一般来说,当用户在一个设备、模型框架或者系统中构建以及训练得到神经网络的模型文件之后,通常会将该神经网络部署在另一设备、模型框架或系统中应用或者再训练。这种情况下,需要对这个模型文件进行编译,然后在上述另一设备、模型框架或系统中运行编译得到的模型文件。
为了保证对模型文件的正常编译,用户不仅需要向模型编译装置提供待编译的模型文件,还需要向模型编译装置提供获取构成模型中的层参数的函数、层输出大小的推导函数等信息,这会使得用户的工作量较大,从而影响用户体验。
例如,用户以TensorFlow作为开发框架构建以及训练得到一个图像识别的神经网络的模型文件之后,希望将其部署在使用caffe框架的设备上时,需要将该模型文件编译为caffe框架支持的文件。此时,用户不仅需要向编译装置提供待转换的模型文件,还需要向提供获取神经网络的层参数的函数、层输出大小的推导函数等信息,这会使得用户的工作量较大,从而影响用户体验。
又如,用户以TensorFlow作为开发框架,使用自定义算子构建以及训练得到一个图像识别的神经网络的模型文件之后,希望将其部署在使用该框架的设备上时,用户不仅需要向编译装置提供待转换的模型文件,还需要向提供获取该自定义算子对应的层参数的函数、层输出大小的推导函数等信息,这会使得用户的工作量较大,从而影响用户体验。
发明内容
本申请提供一种编译神经网络模型的方法和相关装置,可以减少用户提供待编译模型文件的工作量以及提高神经网络模型的编译效率。
第一方面,本申请提供一种编译神经网络模型的方法,该方法包括:获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小;根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件。
该方法中,因为根据第一模型文件中直接包括了神经网络中的目标层的输出张量大小,使得根据第一模型文件对所述神经网络的架构信息进行编译时,可以直接从第一模型文件中读取到目标层的输出张量大小,而不需要根据用户提供的函数从神经网络的模型信息中去获取。这样,不仅可以让用户不在需要提供从神经网络的模型信息中获取目标层的输出张量大小所使用的一个或多个函数,从而降低用户的工作量,还可以提高根据第一模型文件编译得到第二模型文件的效率,最终可以提高用户对能够运行第二模型文件的目标装置的使用率。
此外,直接提供目标层的输出张量大小,与提供获取输出张量函数大小相比,还可以较少第一模型文件的大小,从而有助于提高第一模型文件的传输效率以及介绍存储空间。
在一些可能的实现方式中,所述目标层为自定义层。也就是说,神经网络的第一模型文件中包括的是用户自定义的层的输出张量大小。
这些实现方式中,可选地,第一信息可以是目标层的层定义中的一个参数。
在一些可能的实现方式中,所述第一模型文件中包括第二信息,所述第二信息用于指示所述目标层为自定义层。其中,所述根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,包括:根据所述第二信息从所述第一模型文件中获取所述第一信息;根据所述第一信息注册所述目标层。
也就是说,在约定第一模型文件中包括自定义层的输出张量大小情况下,通过第二信息可以直接确定目标层是否为自定义层,从而可以确定出第一模型文件中是否包含了目标层的输出张量大小,从而可以快速从第一模型文件中读取第一信息,并根据该第一信息注册目标层。这样的实现方式,可以进一步提高根据第一模型文件编译得到第二模型文件的效率。
可选地,所述第二信息可以为目标层的层定义中的一个参数。
在一些可能的实现方式中,所述神经网络中不同类型的自定义层对应的第二信息相同。也就是说,统一使用相同的值来标识神经网络中的层是自定义层。
例如,所有自定义层的层定义中的类型参数的值为目标装置支持的软件栈预先配置的同一个值。
这种实现方式,由于给所有的自定义层对应的第二信息均设置同一个值即可,因此有助于减少用户的工作量。
在一些可能的实现方式中,所述神经网络的所有自定义层中,不同类型的自定义层对应的第二信息不同。
也就是说,不再是所有自定义层对应的第二信息统一成同一个值,而是按照自定义层的类型为不同类型的自定义层对应的第二信息设置不同的值。
例如,池化层类型的自定义层的类型值统一成一个值,卷积层类型的自定义层的类型值统一成另一个值。
这样的实现方式,有助于用户或者编译工具或者目标装置对这些自定义层进行归类和管理,从而较少用户的工作量和编译效率以及运行效率。
在一些可能的实现方式中,所述第一模型文件中还包括第三信息,所述第三信息用于在所述神经网络中唯一标识所述目标层。其中,所述第二模型文件中包括所述第三信息与所述目标层的实现函数的对应关系。
可选地,第三信息可以是目标层的层定义中用于指示所述目标层的名称的参数,或者可以是用户为该目标层设置的数值标识。
这些实现方式中,由于第一模型文件中提供了目标层在神经网络中的唯一标识信息,因此,编译工具对该目标层进行编译之后,可以生成该唯一标识信息与该目标层的实现函数之间的对应关系。这样,不需要用户提供指示如何调用该目标层的实现函数的函数,目标装置就可以根据该唯一标识信息和该对应关系找到该目标层的实现函数,从而完成该目标层实现函数的回调。也就是说,这些实现方式可以进一步减少用户的工作量。
第二方面,本申请提供了一种编译神经网络模型的装置,包括:获取模块,用于获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小;编译模块,用于根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件。
在一些可能的实现方式中,所述目标层为自定义层。
在一些可能的实现方式中,所述第一模型文件中还包括第二信息,所述第二信息用于指示所述目标层为自定义层。其中,所述编译模块具体用于:根据所述第二信息从所述第一模型文件中获取所述第一信息;根据所述第一信息注册所述目标层。
在一些可能的实现方式中,所述神经网络中不同类型的自定义层对应的第二信息相同。
在一些可能的实现方式中,所述神经网络的所有自定义层中,不同类型的自定义层对应的第二信息不同。
在一些可能的实现方式中,所述第一模型文件中还包括第三信息,所述第三信息用于在所述神经网络中唯一标识所述目标层。其中,所述第二模型文件中包括所述第三信息与所述目标层的实现函数的对应关系。
第三方面,本申请提供了一种编译神经网络模型的装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面中的方法。
第四方面,本申请提供一种计算机可读介质,该计算机可读介质存储用于设备执行的指令,该指令用于实现第一方面中的方法。
第五方面,本申请提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行第一方面中的方法。
第六方面,本申请提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第一方面中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令, 所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面中的方法。
第七方面,本申请提供了一种计算设备,计算设备包括处理器和存储器,其中:存储器中存储有计算机指令,处理器执行计算机指令,以实现第一方面中的方法。
附图说明
图1是本申请提供的一种人工智能主体框架示意图。
图2为本申请提供的一种系统架构的结构示意图。
图3为本申请提供的一种卷积神经网络的结构示意图。
图4为本申请提供的另一种卷积神经网络的结构示意图。
图5为本申请提供的编译神经网络模型的方法的示意性流程图。
图6为本申请提供的编译神经网络模型的装置的一种示意性流程图。
图7为本申请提供的编译神经网络模型的装置的另一种示意性流程图。
具体实施方式
下面对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能系统总体工作流程,适用于通用的人工智能领域需求。
下面从“智能信息链”(水平轴)和“信息技术(information technology,IT)价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。
本申请实施例涉及了神经网络的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和其他相关概念进行介绍。
(1)深度学习模型
深度学习模型是指一种包含深度神经网络结构的机器学习模型。算法工程师使用深度学习框架构建好模型,并对该模型进行调参和训练优化后,将最终生成的网络参数和模型结构一并保存,得到的文件即为可用于前向推理的模型文件。
不同深度学习框架训练得到的模型文件的格式不尽相同,但完整的模型文件一般都包含了张量数据、运算单元和计算图等信息。
(2)张量
张量(Tensor)是深度学习系统的数据容器,它可以理解为矩阵向任意纬度的扩展。仅包含一个数字的张量叫做标量(Scalar)、标量张量、零维张量或0D张量;数字组成的数组叫做向量(Vector)或一维张量或1D张量;向量组成的数组叫做矩阵(Matrix)或二维张量或2D张量;多个矩阵组合成一个新的数据可以得到一个三维张量,三维张量直观地可以理解为数字组成的立方体;将多个三维张量组合成一个数组,可以创建一个四维张量,以此类推。深度学习处理的一般是0D到4D的张量,但处理视频数据时可能会遇到5D张量。
(3)运算单元
运算单元(operation/operator),也可以称为计算单元、操作符或算子,表示一种符号化的运算过程,是主流深度学习框架的基本单元,即图中的节点。运算单元的输入和输出都是张量。深度网络学到的所有变换都可以简化为数值数据张量上的一些张量运算(tensor operation)。
常见的运算单元有加(add)单元、批正则化(BatchNormalization)单元、卷积单元、门控循环单元(Gated Recurrent Unit)、局部响应归一化(local response normalization,LRN)单元、长短期记忆(long short-term memory,LSTM)单元、最大池化(max pool)单元、稀疏激活函数(rectified liner uints,ReLU)、循环神经网络(recurrent neural networks,RNN)单元和Softmax函数等。
(4)计算图
计算图(graph),又称数据流图,被定义为有向无环图(directed acyclic graph)。张量和运算单元都是图中的对象,运算单元是图的节点,张量是图的边上流动的数据。无环(acyclic)是指图不能有循环,例如,张量x不能成为生成x的某一层的输入。唯一允许的处理循环(即循环连接)是循环层的内部循环。
深度学习框架使用计算图将计算表示为独立的指令之间的依赖关系,我们可以通俗地将计算图理解为表达和评估数学表达式的一种方式。
(5)卷积神经网络(convosutionas neuras network,CNN)
卷积神经网络是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
下面结合图3重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
本申请实施例中的卷积神经网络的结构可以如图3所示。在图3中,卷积神经网络(CNN)300可以包括输入层310,卷积层/池化层320(其中池化层为可选的),以及神经网络层330。
以图像处理为例(输入数据为文本或语音时的操作类似),其中,输入层310可以获取待处理图像,并将获取到的待处理图像交由卷积层/池化层320以及后面的神经网络层330进行处理,可以得到图像的处理结果。
下面对图3中的CNN 300中内部的层结构进行详细的介绍。
卷积层/池化层320:
卷积层:
如图3所示卷积层/池化层320可以包括如示例321-326层,举例来说:在一种实现中,321层为卷积层,322层为池化层,323层为卷积层,324层为池化层,325为卷积层,326为池化层;在另一种实现方式中,321、322为卷积层,323为池化层,324、325为卷积 层,326为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层321为例,且以输入数据为图像为例,介绍一层卷积层的内部工作原理。输入数据为语音或文本或其他类型的数据时,卷积层的内部工作原理类似。
卷积层321可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。
当卷积神经网络300有多个卷积层的时候,初始的卷积层(例如321)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络300深度的加深,越往后的卷积层(例如326)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图3中320所示例的321-326各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。例如,在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层330:
在经过卷积层/池化层320的处理后,卷积神经网络300还不足以输出所需要的输出 信息。因为如前所述,卷积层/池化层320只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络300需要利用神经网络层330来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层330中可以包括多层隐含层(如图3所示的331、332至33n)以及输出层340,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在神经网络层330中的多层隐含层之后,也就是整个卷积神经网络300的最后层为输出层340,该输出层340具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络300的前向传播(如图3由310至340方向的传播为前向传播)完成,反向传播(如图3由340至310方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络300的损失,及卷积神经网络300通过输出层输出的结果和理想结果之间的误差。
本申请实施例中的神经网络的结构可以如图4所示。在图4中,卷积神经网络(CNN)400可以包括输入层410,卷积层/池化层420(其中池化层为可选的),以及神经网络层430。与图3相比,图4中的卷积层/池化层420中的多个卷积层/池化层(421至426)并行,将分别提取的特征均输入给全神经网络层430进行处理。神经网络层430可以包括多个隐含层,即隐含层1至隐含层n,可以记为431至43n。
需要说明的是,图3和图4所示的卷积神经网络仅作为一种本申请实施例中的两种可能的卷积神经网络的示例,在具体的应用中,本申请实施例中的卷积神经网络还可以以其他网络模型的形式存在。
(6)循环神经网络(recurrent neural network,RNN)
RNN是用来处理序列数据的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。
如图2所示,本申请实施例提供了一种系统架构200。在图2中,数据采集设备260用于采集训练数据。以用于图像处理的目标模型201为例来说,训练数据可以包括训练图像以及训练图像对应的分类结果,其中,训练图像的结果可以是人工预先标注的结果。目标模型201也可以称为目标规则201。
在采集到训练数据之后,数据采集设备260将这些训练数据存入数据库230,训练设备220基于数据库230中维护的训练数据训练得到目标模型/规则201。
下面对训练设备220基于训练数据得到目标模型201进行描述,训练设备220对输入的原始图像进行处理,将输出的图像与原始图像进行对比,直到训练设备120输出的图像与原始图像的差值小于一定的阈值,从而完成目标模型201的训练。
本申请实施例中的目标模型201具体可以为神经网络。需要说明的是,在实际的应用中,所述数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可 能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型201的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备220训练得到的目标模型201可以应用于不同的系统或设备中,如应用于图2所示的客户设备240,所述客户设备240可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。
训练设备220可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型201,该相应的目标模型201即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
根据训练设备220训练得到目标模型201,可以是CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNNS)等等。
值得注意的是,图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系、训练数据的类型以及神经网络的类型或功能不构成任何限制。例如,在图2中,模型转换器210可以置于客户设备240中。又如,其中的训练数据也可以是文本、语音或其他类型的数据。又如,模型转换器也可以有其他名称,例如模型编译器等等,只要能实现与模型转换器210类似功能的设备或装置都可理解为本申请中的模型转换器。
训练设备220训练的目标模型201模型文件是平台无关的(即经过编译可运行在不同的硬件平台上),如果想在客户设备240上应用目标模型201,则训练设备220训练好的目标模型201需要经过模型转换器210的处理,将目标模型201的模型文件从当前格式编译到客户设备支持的格式。
例如,目标模型201是TensorFlow框架下开发得到的模型,则需要将目标模型201的模型文件输入模型转换器210,模型转换器210对目标模型201进行编译,得到客户设备240支持的模型文件,然后再将编译得到的模型文件部署到客户设备240上。通常来说,模型转换器210对目标模型201的转换处理,也可以称为编译。
为了编译成功,自定义算子开发者还需要向模型转换器210提供模型中的各层包括的算子的参数定义函数、参数解析函数、输出张量(shape)大小的推导函数、实现函数以及调用(forward)函数等内容。
又如,目标模型201是TensorFlow框架下开发得到的模型,且目标模型201中部分或全部层中的算子是开发者自定义的,即不属于TensorFlow框架的AI软件栈中的算子的情况下,开发者在将目标模型201的模型文件输入模型转换器210,以通过模型转换器210编译得到可以运行在客户设备上的模型文件时,还需要向模型转换器210提供自定义算子的参数定义函数、参数解析函数、输出大小(shape)的推导函数、实现函数以及调用(forward)函数等内容。可以理解的是,本申请中,自定义算子与自定义层可以互相替换。
由此可知,开发者要想在客户设备上部署自己构建的模型,除了提供模型文件,还需要提供模型中的各自定义算子包括的算子参数定义函数、参数解析函数、输出大小(shape)的推导函数、实现函数以及调用(forward)函数等内容,这会增大开发者的工作量,降低 模型的部署效率,影响开发者对客户设备的满意度,导致客户设备甚至是客户设备内的AI芯片的使用率。
但对于模型转换器210而言,由于用户开发的自定义算子只会在其自身部署环境下使用,不需要对外开放,模型转换器210可不要求用户提供如何定义算子参数、算子中的运算等细节,用户只要需要提供确保图编译正常运行所需的信息即可。也就是说,现有技术中对模型进行编译时所提供的一套所谓通用的自定义算子注册机制,实际上迫使用户额外地提供了许多算子对外开放所需要信息,如果用户目标仅是模型转换器210支持其所开发自定义算子的注册与调用,可对开发过程进行简化,不需要提供如此繁多的算子注册信息。
例如,如何推导输出的张量大小,即shape的推导函数是不必要的,因为该推导过程与算子的计算细节强相关,需要考虑各种边界场景,计算量大,但如果用户目标仅是模型转换器210对其所开发自定义算子的注册与调用,并不需要如此复杂的逻辑,只需要用户提供该算子在该模型中的shape常量即可。
又如,层参数细节也可简化,因为用户自定义算子的计算函数是由用户自行完成的,输入用户的内部接口,不需要注册给模型转换器210。
因此现有技术中过于复杂的自定义算子注册机制,给用户增加了工作量,影响用户体验。
针对上述问题,本申请提出了一种神经网络推理场景下轻量化的自定义算子注册流程与回调方法,以使得用户期望基于芯片支持模型转换器210编译包括自定义算子的模型时,尽可能多的减小用户工作量,提升用户体验,进而转化为芯片销量。
本申请提出的方法中,可以由模型转换器210提供统一的自定义层类型供用户调用,以省去用户定义层参数以及解析函数的工作量。表1为本申请的统一的自定义层的层定义的一种示例。如表1所示,该层参数定义包括类型“type”、“形状(shape)”和“层标识(proc_id)”。“形状(shape)”和“层标识(proc_id)”也可以统称为“属性(Attribute)”。参数“Type”的值为“Custom”,参数“shape”用于指定自定义算子在当前模型中的输出通道的大小,参数“proc_id”用于指定自定义算子与实现函数之间的对应关系。
表1统一的自定义层的参数定义
Figure PCTCN2019127035-appb-000001
用户构建的模型中,所有自定义层复用统一的层定义,且用户完成该模型的训练之后,将该层的类型修改为该统一的自定义层类型。以表1所示的自定义层类型为例,用户完成模型训练后,只需将自定义层的类型“type”修改为“Custom”即可。例如:
layer_type=UserDefinedOp→layer_type=Custom
此外,用户不需要写自定义层的shape推导函数,只需提供该自定义层在该网络中的固定shape即可。通常来说,用户可在模型训练完成后,从训练框架中提取该自定义层的shape。甚至,用户可以通过自动化脚本根据该自定义层的“层名称(layer_name)”自动完成自定义层的shape插入。例如:
layer_type=Custom
shape=(1,64,56,56)
可以理解的是,其中shape的个数仅是一种示例,可以包含更多或更少的固定的shape;且shape的值也是示例,可以是1、64和56以外的值。
为使自定义算子经模型转换器210编译后,能方便的找回其算子实现与其表示的网络节点的对应关系,如表1所示,用户可在层参数中定义proc_id。proc_id与计算函数一一对应,以使得在恰当的时间根据该proc_id找到其对应的计算函数,完成自定义算子的调用。例如,完成训练后,自定义算子的定义示例如下:
layer_type=Custom
shape=(1,64,56,56)
proc_id=1
编译成功后,客户设备可以基于此proc_id到映射表中找回自己实现的自定义算子计算函数,并完成调用。表2为proc_id与计算函数的映射表的一种示例。
表2 proc_id与计算函数的映射表
Figure PCTCN2019127035-appb-000002
表2中,值为0的proc_id映射函数(function)名为void custom_reshape_forward()的计算函数,值为1的proc_id映射函数名为void custom_permute_forward()的计算函数。
本申请的方法中,可选地,模型转换器210也可基于层类型进行分类,为用户提供多种自定义类型,方便用户根据该自定义类型对自定义算子根据操作类型进行管理。
表3是对自定义层的类型进行划分后,为用户提供的自定义层的参数定义的一种示例。其中,自定义层的类型包括“用户自定义卷积层(CustomConvolution)”、“用户自定义池化层(CustomPooling)”。
表3多类型的自定义层的参数定义
Figure PCTCN2019127035-appb-000003
本申请的方法中,可选地,模型转换器210对模型编译成功后,也可建立自定义层的 其他参数与该自定义层的实现函数之间的映射表,并透传该其他参数,以用于该实现函数的索引,只要该其他参数的参数值在整个神经网络中是唯一的,即可辅助定位到算子实现。以如下自定义层的参数定义为例:
layer_name=my_custom_op0
layer_type=Custom
shape=(1,64,56,56)
proc_id=1
其中,该自定义层的参数还包括自定义层的层名称(layer_name)。模型转换器210编译成功之后,可以得到如表4所示的映射关系。
表4 layer_name与计算函数的映射表
layer_name fuction
my_custom_op0 void custom_reshape_forward()
下面结合附图介绍本申请一个实施例的编译神经网络模型的方法。图5为本申请实施例的编译神经网络模型的方法的示例性流程图。该方法包括S610和S620。该方法可以由图2中的模型转换器210执行。
S610,获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小。
本申请中的神经网络包括但不限于CNN、DNN、RNN、NLP、GNN等,该神经网络可以是任意用于实现以下任意一种功能的神经网络:图像识别、目标检测、图像分割、语音识别、机器翻译、自动标注、目标跟踪、超分辨率等。
其中,目标层可以是神经网络中的任意一个层。第一模型文件中可以包括多个第一信息,每个第一信息对应神经网络中的一个目标层,即每个第一信息用于指示神经网络中的一个目标层的输出张量大小。
其中,第一模型文件可以是神经网络模型的开发者构建以及训练后得到的模型文件。第一模型文件的一种示例为tensorflow的pb文件,caffe的prototxt文件或pytorch的pth文件等。
其中,获取第一模型文件,可以是接收第一模型文件,或者可以是从存储器中读取第一模型文件。
其中,第一信息可以是目标层的层定义中的一个参数。例如,目标层的层定义中可以包括如表1中所示的“shape”参数,该“shape”参数即为第一信息。
S620,根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件。
根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,可以理解为:基于目标装置支持的模型转换器,根据第一信息,对第一模型文件进行转换,以生成在目标装置上可运行或者可执行的第二模型文件。目标装置通常是指特定的硬件装置,例如特指的某种芯片。
其中,第二模型文件的格式与第二模型文件的格式可以相同,也可以不相同。第二模型文件的一种示例为tensorflow的pb文件,caffe的prototxt文件或pytorch的pth文件等。
该方法中,因为根据第一模型文件中直接包括了神经网络中的目标层的输出张量大小,使得根据第一模型文件对所述神经网络的架构信息进行编译时,可以直接从第一模型文件中读取到目标层的输出张量大小,而不需要根据用户提供的函数从神经网络的模型信息中去获取。这样,不仅可以让用户不在需要提供从神经网络的模型信息中获取目标层的输出张量大小所使用的一个或多个函数,从而降低用户的工作量,还可以提高根据第一模型文件编译得到第二模型文件的效率,最终可以提高用户对能够运行第二模型文件的目标装置的使用率。
本实施例中,所述目标层可以是用户自定义的层,也可以是目标装置支持的软件栈中已经配置好的层。
但是通常来说,用户只需在第一模型文件中提供自定义层的输出张量大小即可,因为软件栈中一般自带用于获取软件栈中已经配置好的层的相关信息的函数,不需要用户提供软件栈自带层的输出张量大小,编译装置也能获知软件栈自带层的输出张量大小。第一模型文件中只需包括自定义层的输出张量大小,而不用包括软件栈自带层的输出张量大小,不同额外增加用户的工作量。
本实施例中,第一模型文件中还可以包括第二信息,所述第二信息用于指示所述目标层为自定义层。其中,根据所述第一模型文件对所述第一模型文件进行编译,可以包括:根据第二信息从第一模型文件中获取第一信息;根据第一信息注册所述目标层。
可以理解的是,本申请中的注册可以理解为使模型转换器可正确识别用户开发的自定义算子的一种操作,如不执行该操作,模型转换器可能会直接报错。本申请中的注册目标层可以理解为向模型转换器注册该目标层。
也就是说,通过第二信息可以直接确定出第一模型文件中直接包括了目标层的输出张量大小,从而可以直接从第一模型文件中读取第一信息,并根据该第一信息将目标层注册到目标装置支持的软件栈中。这样的实现方式,可以进一步提高根据第一模型文件编译得到第二模型文件的效率。
本实施例中,第二信息可以是目标层的层定义中的一个参数。例如,目标层的层定义中可以包括如表1中所示的“Type”参数,该“Type”参数即为第二信息。
如果第一模型文件中不包括第二信息,在一些实现方式中,针对每个层,编译装置可以试探性地直接读取该层对应的第一信息,即试探性地直接读取该层的输出张量大小。如果读取成功,则可以根据读取到的输出张量大小注册该层;如果读取失败,则可以按照现有技术中的方式来注册该层。例如,获取该层对应的shape大小推导函数,根据该函数获取该层的输出张量大小。
如果第一模型文件中不包括第二信息,在另一些实现方式中,针对每个层,可以先判断该层是否为软件栈自带层,若是,则使用软件栈自带层对应的方式来获取输出张量大小,否则直接读取该层对应的第一信息,即直接读取该层的输出张量大小。
本实施例中,神经网络中不同类型的自定义层对应的第二信息可以相同,即所有自定义层,不分类型,对应的第二信息为同一个。这样的实现方式,由于给所有的自定义层均设置同一个类型值即可,因此有助于减少用户的工作量。
可选地,所有自定义层对应的第二信息可以是软件栈中预先约定好的一个值。这样,编译时,方便编译装置确定神经网络中的层是否为自定义层。
例如,所有自定义层的层定义中的类型参数为第二信息,且类型参数的值为目标装置支持的软件栈预先配置的同一个值。例如,如表1所示,只要是自定义层,其层定义中的“Type”参数都可以设置为“Custom”。
神经网络中的层的类型的示例包括:卷积层、池化层、全连接层等等。
在另一些实现方式中,所述神经网络的所有自定义层中,不同类型的自定义层对应的第二信息的值不同。
也就是说,不再是所有自定义层对应的第二信息统一成同一个值,而是按照自定义层的类型设置不同的第二信息,例如,池化层类型的自定义层的第二信息统一成同一个值,卷积层类型的自定义层对应的第二信息统一成同一个值。
这样的实现方式,有助于用户或者编译工具或者目标装置对这些自定义层进行归类和管理,从而较少用户的工作量和编译效率以及运行效率。
例如,如表2所示,第二信息为自定义层的层定义中的“Type”参数时,类型为池化层的自定义层的层定义中的“Type”参数可以设置为“CustomPooling”,类型为卷积层的自定义层的层定义中的“Type”参数可以设置为“CustomConvolution”。
在一些实现方式中,第一模型文件中还可以包括第三信息,第三信息用于在所述神经网络中唯一标识所述目标层。相应地,第二模型文件中可以包括第三信息与目标层的实现函数的对应关系。
由于第一模型文件中提供了目标层在神经网络中的唯一标识信息,因此,编译工具对该目标层进行编译之后,可以生成该唯一标识信息与该目标层的实现函数之间的对应关系。这样,不需要用户提供指示如何调用该目标层的实现函数的函数,目标装置就可以根据该唯一标识信息和该对应关系找到该目标层的实现函数,从而完成该目标层实现函数的回调。也就是说,这些实现方式可以进一步减少用户的工作量。
可选地,第三信息可以是目标层的层定义中的名称参数,或者可以是用户为该目标层设置的数值标识。
例如,如表1所示,第三信息可以是其中的“proc_id”参数,该参数可以由用户指定。又如,如表3所示,第三信息可以是其中的“layer_name”参数。
图6是本申请编译神经网络模型的装置的一种示例性结构图。该装置700包括获取模块710和编译模块720。该装置700可以实现前述图5所示的方法。
例如,获取模块710用于执行S610,编译模块720用于执行S620。
在一些可能的实现方式中,装置700可以是图2中的训练设备220;在另一些可能的实现方式中,装置700可以是图2中所述的客户设备240。
装置700可部署在云环境中,云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,所述云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。装置700可以是云数据中心中用于对神经网络模型进行编译的服务器。装置700也可以是创建在云数据中心中的用于对神经网络模型进行编译的虚拟机。装置700还可以是部署在云数据中心中的服务器或者虚拟机上的软件装置,该软件装置用于对神经网络模型进行编译,该软件装置可以分布式地部署在多个服务器上、或者分布式地部署在多个虚拟机上、或者分布式地部署在虚拟机和服务器上。例如,编译 模块720包括多个子模块时,这多个子模块可以部署在多个服务器上,或分布式地部署在多个虚拟机上,或者分布式地部署在虚拟机和服务器上。
装置700可以由云服务提供商在云服务平台抽象成一种编译神经网络模型的云服务提供给用户,用户在云服务平台购买该云服务后,云环境利用该云服务向用户提编译练神经网络模型的云服务,用户可以通过应用程序接口(application program interface,API)或者通过云服务平台提供的网页界面上传待编译的神经网络模型至云环境,由装置1100接收待编译的神经网络模型,对待编译的神经网络模型进行编译,最终编译得到的神经网络模型由装置700返回至用户所在的边缘设备。
当装置700为软件装置时,装置700也可以单独部署在任意环境的一个计算设备上。
本申请还提供一种如图7所示的装置800,装置800包括处理器802、通信接口803和存储器804。装置800的一种示例为芯片。装置800的另一种示例为计算设备。
处理器802、存储器804和通信接口803之间可以通过总线通信。存储器804中存储有可执行代码,处理器802读取存储器804中的可执行代码以执行对应的方法。存储器804中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX TM,UNIX TM,WINDOWS TM等。
例如,存储器804中的可执行代码用于实现图5所示的方法,处理器802读取存储器804中的该可执行代码以执行图5所示的方法。
其中,处理器802可以为中央处理器(central processing unit,CPU)。存储器804可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器804还可以包括非易失性存储器(2non-volatile memory,2NVM),例如只读存储器(2read-only memory,2ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态启动器(solid state disk,SSD)。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (13)

  1. 一种编译神经网络模型的方法,其特征在于,包括:
    获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小;
    根据所述第一模型文件中所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件。
  2. 根据权利要求1所述的方法,其特征在于,所述目标层为自定义层。
  3. 根据权利要求2所述的方法,其特征在于,所述第一模型文件中还包括第二信息,所述第二信息用于指示所述目标层为自定义层;
    其中,所述根据所述第一模型文件中所述第一信息,对所述第一模型文件进行编译,包括:
    根据所述第二信息从所述第一模型文件中获取所述第一信息;
    根据所述第一信息注册所述目标层。
  4. 根据权利要求3所述的方法,其特征在于,所述神经网络中不同类型的自定义层对应的第二信息相同。
  5. 根据权利要求3所述的方法,其特征在于,所述神经网络的所有自定义层中,不同类型的自定义层对应的第二信息不同。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述第一模型文件中还包括第三信息,所述第三信息用于在所述神经网络中唯一标识所述目标层;
    其中,所述第二模型文件中包括所述第三信息与所述目标层的实现函数的对应关系。
  7. 一种编译神经网络模型的装置,其特征在于,包括:
    获取模块,用于获取神经网络的第一模型文件,所述第一模型文件中包括第一信息,所述第一信息用于指示所述神经网络中目标层的输出的张量大小;
    编译模块,用于根据所述第一模型文件中的所述第一信息,对所述第一模型文件进行编译,以得到所述神经网络的第二模型文件。
  8. 根据权利要求7所述的装置,其特征在于,所述目标层为自定义层。
  9. 根据权利要求8所述的装置,其特征在于,所述第一模型文件中还包括第二信息,所述第二信息用于指示所述目标层为自定义层;
    其中,所述编译模块具体用于:根据所述第二信息从所述第一模型文件中获取所述第一信息;根据所述第一信息注册所述目标层。
  10. 根据权利要求9所述的装置,其特征在于,所述神经网络中不同类型的自定义层对应的第二信息相同。
  11. 根据权利要求9所述的装置,其特征在于,所述神经网络的所有自定义层中,不同类型的自定义层对应的第二信息不同。
  12. 根据权利要求7至11中任一项所述的装置,其特征在于,所述第一模型文件中还包括第三信息,所述第三信息用于在所述神经网络中唯一标识所述目标层;
    其中,所述第二模型文件中包括所述第三信息与所述目标层的实现函数的对应关系。
  13. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在处理器上运行时,所述处理器执行如权利要求1至6中任一项所述的方法。
PCT/CN2019/127035 2019-12-20 2019-12-20 编译神经网络模型的方法和装置 WO2021120177A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/127035 WO2021120177A1 (zh) 2019-12-20 2019-12-20 编译神经网络模型的方法和装置
CN201980102747.9A CN114746868A (zh) 2019-12-20 2019-12-20 编译神经网络模型的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/127035 WO2021120177A1 (zh) 2019-12-20 2019-12-20 编译神经网络模型的方法和装置

Publications (1)

Publication Number Publication Date
WO2021120177A1 true WO2021120177A1 (zh) 2021-06-24

Family

ID=76478197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127035 WO2021120177A1 (zh) 2019-12-20 2019-12-20 编译神经网络模型的方法和装置

Country Status (2)

Country Link
CN (1) CN114746868A (zh)
WO (1) WO2021120177A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024061287A1 (zh) * 2022-09-23 2024-03-28 维沃移动通信有限公司 人工智能ai模型传输方法、装置、终端及介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115392443B (zh) * 2022-10-27 2023-03-10 之江实验室 类脑计算机操作系统的脉冲神经网络应用表示方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563512A (zh) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 一种数据处理方法、装置以及存储介质
CN109697500A (zh) * 2018-12-29 2019-04-30 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备及存储介质
US10424048B1 (en) * 2019-02-15 2019-09-24 Shotspotter, Inc. Systems and methods involving creation and/or utilization of image mosaic in classification of acoustic events

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563512A (zh) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 一种数据处理方法、装置以及存储介质
CN109697500A (zh) * 2018-12-29 2019-04-30 北京中科寒武纪科技有限公司 数据处理方法、装置、电子设备及存储介质
US10424048B1 (en) * 2019-02-15 2019-09-24 Shotspotter, Inc. Systems and methods involving creation and/or utilization of image mosaic in classification of acoustic events

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024061287A1 (zh) * 2022-09-23 2024-03-28 维沃移动通信有限公司 人工智能ai模型传输方法、装置、终端及介质

Also Published As

Publication number Publication date
CN114746868A (zh) 2022-07-12

Similar Documents

Publication Publication Date Title
US20210142008A1 (en) Named entity disambiguation using entity distance in a knowledge graph
CN111797893B (zh) 一种神经网络的训练方法、图像分类系统及相关设备
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2021190597A1 (zh) 一种神经网络模型的处理方法以及相关设备
WO2021233342A1 (zh) 一种神经网络构建方法以及系统
WO2022068627A1 (zh) 一种数据处理方法及相关设备
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2022068623A1 (zh) 一种模型训练方法及相关设备
CN111382868A (zh) 神经网络结构搜索方法和神经网络结构搜索装置
WO2023221928A1 (zh) 一种推荐方法、训练方法以及装置
WO2022111617A1 (zh) 一种模型训练方法及装置
WO2021244249A1 (zh) 一种分类器的训练方法、数据处理方法、系统以及设备
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
WO2024041479A1 (zh) 一种数据处理方法及其装置
WO2024083121A1 (zh) 一种数据处理方法及其装置
WO2023284716A1 (zh) 一种神经网络搜索方法及相关设备
US20240046067A1 (en) Data processing method and related device
WO2021120177A1 (zh) 编译神经网络模型的方法和装置
Kroshchanka et al. A neural-symbolic approach to computer vision
WO2024012360A1 (zh) 一种数据处理方法及相关装置
WO2023197910A1 (zh) 一种用户行为预测方法及其相关设备
WO2023197857A1 (zh) 一种模型切分方法及其相关设备
WO2023185541A1 (zh) 一种模型训练方法及其相关设备
US20230117973A1 (en) Data processing method and apparatus
WO2023050143A1 (zh) 一种推荐模型训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19956913

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19956913

Country of ref document: EP

Kind code of ref document: A1