WO2019114517A1 - 一种神经网络模型部署方法、预测方法及设备 - Google Patents

一种神经网络模型部署方法、预测方法及设备 Download PDF

Info

Publication number
WO2019114517A1
WO2019114517A1 PCT/CN2018/116958 CN2018116958W WO2019114517A1 WO 2019114517 A1 WO2019114517 A1 WO 2019114517A1 CN 2018116958 W CN2018116958 W CN 2018116958W WO 2019114517 A1 WO2019114517 A1 WO 2019114517A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
target
neural network
network model
network layer
Prior art date
Application number
PCT/CN2018/116958
Other languages
English (en)
French (fr)
Inventor
朱晓龙
王一同
黄凯宁
梅利健
黄生辉
罗镜民
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18887768.2A priority Critical patent/EP3614316A4/en
Publication of WO2019114517A1 publication Critical patent/WO2019114517A1/zh
Priority to US16/659,888 priority patent/US12020142B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a neural network model deployment method, a prediction method, and a device.
  • Neural networks especially Convolutional Neural Network (CNN), as an important branch of deep learning, have become a research hotspot in the fields of speech analysis and image recognition.
  • CNN Convolutional Neural Network
  • the practical application of neural networks is generally divided into neural network model training and neural network model prediction.
  • Neural network model training can be understood as: based on a large number of sample data, the parameters of the neural network are learned and adjusted to obtain a neural network model with demand function; the neural network model prediction can be understood as, based on the trained neural network model Perform calculations on the predicted input data to determine prediction results (such as classification or recognition results) to implement the demand function.
  • the neural network model training is performed on the server and deployed on the server, which makes the neural network model prediction need to be coordinated by the terminal device and the server.
  • the terminal device obtains the predicted input data and Submitting to the server and predicting the neural network model by the neural network model deployed on the server, this undoubtedly leads to certain difficulties in the prediction of the speed and real-time requirements of the neural network model prediction; therefore, how to improve the deployment mode of the neural network model is considered to be improved.
  • the basis for predicting the speed and real-time performance of neural network models has become a problem to be considered by those skilled in the art.
  • the embodiment of the present application provides a neural network model deployment method, a prediction method, and a device, to deploy a neural network model in a terminal device, to provide a basis for improving the speed and real-time performance of the neural network model prediction, and provided by
  • the neural network model deployment method has the characteristics of high versatility.
  • a neural network model deployment method is applied to a terminal device, including:
  • Reading an initial neural network model obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • the target operating parameters are loaded in the corresponding target network layer of each network layer according to the target operating parameters of each network layer, and the target neural network model deployed in the terminal device is obtained.
  • the embodiment of the present application further provides a neural network model prediction method, which is applied to a terminal device, and includes:
  • the predicted input data is used as an input of the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • the embodiment of the present application further provides a neural network model deployment device, which is applied to a terminal device, and includes:
  • a reading module configured to read an initial neural network model, obtain a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • a target network layer implementation module configured to implement a corresponding target network layer of each network layer by a Layer class in the terminal device according to a layer definition of each network layer, so that each target network layer inherits from the Layer class ;
  • a network layer connection module configured to connect each target network layer with a Net class
  • a format conversion module configured to convert operating parameters of each network layer into a predetermined format, and obtain target operating parameters of each network layer
  • the parameter loading module is configured to load corresponding target operating parameters in the corresponding target network layer of each network layer according to the target operating parameters of each network layer, to obtain a target neural network model deployed in the terminal device.
  • the embodiment of the present application further provides a neural network model prediction apparatus, which is applied to a terminal device, and includes:
  • a data acquisition module configured to acquire predicted input data by using a data input device of the terminal device
  • a model calling module configured to invoke a neural network model pre-deployed by the terminal device
  • a model processing module configured to use the predicted input data as an input of the neural network model, and process the predicted input data by the neural network model to obtain a predicted result.
  • the embodiment of the present application further provides a terminal device, including: at least one memory and at least one graphics processor; the memory stores a program, and when the graphics processor invokes the program stored in the memory, the following operations are implemented:
  • Reading an initial neural network model obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • the target operating parameters are loaded in the corresponding target network layer of each network layer according to the target operating parameters of each network layer, and the target neural network model deployed in the terminal device is obtained.
  • the embodiment of the present application further provides a storage medium storing a program suitable for execution by a graphics processor, and when the program is executed by the graphics processor, implementing the foregoing neural network model deployment method. step.
  • the embodiment of the present application further provides a terminal device, including: at least one memory and at least one graphics processor; the memory stores a program, and when the graphics processor invokes the program stored in the memory, the following operations are implemented:
  • the predicted input data is used as an input of the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • the neural network model deployment method and device based on the defined framework body of the neural network model suitable for the terminal device, redefine the network layers of the initial neural network model by using the Layer class. Obtaining each target network layer inherited from the Layer class; and connecting each target network layer through the Net class, loading corresponding target operational parameters converted into a predetermined format in each target network layer, and obtaining a target neural network model.
  • the Layer class is an abstract class.
  • the embodiment of the present application deploys the initial neural network model to the terminal by using the Layer class as the base class of each target network layer of the target neural network model, and thus for the initial neural network model trained by different learning frames.
  • the framework body provided by the embodiment of the present application can be used to redefine the network layers of the initial neural network model with the Layer class as the base class of the target network layer, and the target network layers are redefined by the Net class connection.
  • the neural network model prediction can be realized directly based on the neural network model deployed by the terminal device, in order to improve the prediction of the neural network model.
  • Speed and real-time provide the basis; at the same time, the embodiment of the present application can perform general deployment to the terminal device for the neural network model trained by different learning frameworks, reduce the use limitation of the neural network model to the deployment of the terminal device, and improve the neural network model.
  • the versatility of deployment is a general deployment of the initial neural network model of different learning framework training to the terminal equipment, so that when the neural network model is predicted, the neural network model prediction can be realized directly based on the neural network model deployed by the terminal device, in order to improve the prediction of the neural network model.
  • Speed and real-time provide the basis; at the same time, the embodiment of the present application can perform general deployment to the
  • FIG. 1 is a flowchart of a method for deploying a neural network model according to an embodiment of the present application
  • FIG. 2 is a diagram showing an example of a network layer of a CNN inherited from a Layer class
  • FIG. 3 is another flowchart of a method for deploying a neural network model according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a main body of a neural network model according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a neural network model prediction method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a partial pseudo code of a conventional convolution calculation method
  • FIG. 7 is a schematic diagram of a partial pseudo code of a kernel function of a GPU according to an embodiment of the present disclosure
  • FIG. 8 is a schematic diagram of another part of a pseudo code of a kernel function of a GPU according to an embodiment of the present disclosure.
  • FIG. 9 is a structural block diagram of a neural network model deployment apparatus according to an embodiment of the present application.
  • FIG. 10 is another structural block diagram of a neural network model deployment apparatus according to an embodiment of the present application.
  • FIG. 11 is still another structural block diagram of a neural network model deployment apparatus according to an embodiment of the present application.
  • FIG. 12 is a block diagram showing the hardware structure of a terminal device according to an embodiment of the present application.
  • FIG. 13 is a structural block diagram of a neural network model predicting apparatus according to an embodiment of the present application.
  • the neural network model deployment method can deploy a server-trained neural network model in a terminal device, and can implement a general deployment of a neural network model trained by the server based on different learning frameworks to a terminal device; Based on the general deployment of neural network models trained by different learning frameworks such as Tensorflow (Tensorflow is an artificial intelligence learning system developed by Google) and caffe (convolution neural network framework) to avoid the use of neural network models to terminal devices When deployed, the terminal device needs to install a mobile learning framework corresponding to the learning framework of the server training neural network model;
  • Tensorflow is an artificial intelligence learning system developed by Google
  • caffe convolution neural network framework
  • the inventors of the present application have defined a novel framework body suitable for the neural network model of the terminal device, which redefines the neural network model by the Layer class.
  • Each network layer that is, the network layer of the neural network model can be inherited from the Layer class, and the Layer class implements the specific methods of each network layer; at the same time, the network layers of the neural network model inherited from the Layer class can pass through the Net (network) Class connection, the network structure that constitutes the neural network model.
  • the Layer class is an abstract class.
  • the embodiment of the present application deploys the neural network model to the terminal device by using the Layer class as the base class of the network layer for constructing the neural network model, thereby training the neural network model for different learning frameworks.
  • the framework body provided by the embodiment of the present application can be used to redefine the network layers of the neural network model by using the Layer class as the base class of the network layer, and connect the network layers with the Net class to realize the training of different learning frameworks. Universal deployment of network models to end devices.
  • FIG. 1 is a flowchart of a neural network model deployment method provided by an embodiment of the present application, and the method can be applied to a terminal device, especially
  • the terminal device that can be applied to the IOS operating system (the IOS operating system is the mobile operating system of the Apple company), as embodied by the GPU (Graphics Processing Unit) of the IOS terminal; obviously, the neural network provided by the embodiment of the present application
  • the network model deployment method is not limited to the operating system of the terminal device being IOS, and may also be applied on the terminal device of other operating systems, and the GPU of the terminal device performs the deployment of the neural network model;
  • a neural network model deployment method provided by an embodiment of the present application may include:
  • Step S100 reading an initial neural network model, obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer.
  • the initial neural network model can be considered as a neural network model to be deployed to the terminal device; as an alternative, the initial neural network model can be a neural network model trained by the server based on a specific learning framework, such as a server based Tensorflow
  • the learning neural network model is a learning framework.
  • the Tensorflow learning framework is an artificial intelligence learning system developed by Google.
  • the initial neural network model may be a neural network model that has been deployed on the terminal device but needs to be re-deployed based on the framework body provided by the embodiment of the present application, such as specific learning for the terminal device through the mobile version.
  • the neural network model deployed by the framework to the terminal device can be re-deployed by using the neural network model deployment method provided by the embodiment of the present application, so that the neural network model deployed in the terminal device has the characteristics of the framework body provided by the embodiment of the present application; for example
  • the neural network model deployment method provided by the embodiment of the present application can be used for redeployment.
  • the embodiment of the present application can read the initial neural network model, obtain the layer definition of each network layer of the initial neural network model, and the operating parameters of each network layer;
  • the embodiment of the present application can read the network structure of the initial neural network model, and obtain a layer definition of each network layer.
  • the network structure of the initial neural network model generally includes multiple types of interconnected network layers, such as an initial neural network.
  • the network model of the network model is CNN.
  • An optional network structure of CNN can include: Normalization Layer (BatchNorm), Convolution Layer (Convolution), Deconvolution Layer (Deconvolution), and Activation Layer (ReLU). Eltwise, PReLU with parameters, Pooling, Resize, Depthwise Convolution, Concat, etc.
  • the example of the CNN network structure is only optional, and the specific network structure can also adjust the settings according to actual needs;
  • the embodiment of the present application can read the layer definition of each network layer of the initial neural network model, and the layer definition of the network layer can describe the layer attribute of the network layer, including the Name information, type information, initialization information, etc. of the network layer.
  • the network layer can be distinguished according to the name information.
  • the name information may be a network layer name set for the network layer, a layer serial number of the network layer in the initial neural network model, and the like.
  • the type information is used to describe the type to which the network layer belongs, such as the normalization layer, the convolution layer, the deconvolution layer, the pooling layer, and so on.
  • the initialization information is used to describe the operating parameters of the initialization parameters of the network layer.
  • the operating parameters of each network layer may be read from each network layer of the initial neural network model; the operating parameters of each network layer of the initial neural network model may be considered as: The weight of the network layer and the operating parameters of each network layer of the initial neural network model determine the functions of the initial neural network model.
  • the learning adjustment is mainly the operation of each network layer of the neural network model. parameter.
  • Step S110 According to the layer definition of each network layer, the target network layer corresponding to each network layer is implemented by the Layer class in the terminal device, so that each target network layer inherits from the Layer class.
  • inheritance is a method in a programming language. If class A inherits from class B, then class B can be called a base class, class A can be called a derived class, and class A can obtain class B attributes and methods.
  • each network layer of the initial neural network model may be redefined, and the corresponding target network layer of each network layer is obtained in the terminal device, and the target network layer is obtained. It can be considered as a network layer in the target neural network model.
  • the target neural network model can be considered as a neural network model deployment method provided by the embodiment of the present application, and the deployment result of the initial neural network model to the terminal device is completed;
  • the network layer of the initial neural network model may correspond to a target network layer of the target neural network model; when the network layers of the initial neural network model are redefined to obtain the corresponding target network layer of each network layer, the present application is implemented.
  • the Layer class implements the corresponding target network layer of each network layer, so that the redefined target network layers are Inherited from the Layer class;
  • a layer definition of a network layer of the initial neural network model may include: name information, type information, initialization information, and the like of the network layer; correspondingly, the Layer class may be preset with a name attribute, Layer. Type attribute, and initialization method (such as init method);
  • the target network layer of each network layer is implemented by the Layer class in the terminal device according to the layer definition of each network layer, so that each target network layer inherits from the Layer class.
  • the implementation process may be: for any network layer, using the Layer class as a base class, adjusting the name attribute of the Layer class preset according to the name information of the network layer in the terminal device, and adjusting the Layer class according to the type information of the network layer.
  • the preset Layer type attribute is initialized by the initialization method preset by the Layer class according to the initialization information of the network layer, so as to obtain the corresponding target network layer of each network layer, so that the obtained target network layers are all Layer class. Derived class. Therefore, the Layer class performs this processing on each network layer in the initial neural network model, so that the corresponding target network layer of each network layer can be realized.
  • each target network layer is connected by a Net class.
  • the structure of each network layer in the initial neural network model is redefined according to the Layer class, and is redefined.
  • the initial neural network model has the A1, B1, and C1 network layers, and the A1 network layer is connected to the B1 network layer, and the B1 network layer is connected to the C1 network layer. Then, the A1 is implemented by the Layer class in step S110.
  • the class connects the target network layer A2, the target network layer B2, and the target network layer C2, so that the target network layer A2 is connected to the target network layer B2, and the target network layer B2 is connected to the target network layer C2, thereby obtaining the network layers of the initial neural network model.
  • Step S130 Convert the operating parameters of each network layer into a predetermined format, and obtain target operating parameters of each network layer.
  • the embodiment of the present application needs to load the operating parameters of each target network layer, so that The target neural network model has functions corresponding to the initial neural network model;
  • the embodiment of the present application can convert the operating parameters of each network layer of the read initial neural network model into a predetermined format, and obtain target operating parameters of each network layer; a target operating parameter of the network layer can be understood as: The operating parameters are loaded into the corresponding target network layer.
  • the predetermined format may be a format of a framework body suitable for the neural network model of the terminal device provided by the embodiment of the present application, and the specific format type may be set according to actual conditions.
  • step S130 may be performed after the execution of step S100, not necessarily after execution of step S120.
  • Step S140 Load corresponding target operating parameters in the corresponding target network layer of each network layer according to the target operating parameters of each network layer, and obtain a target neural network model deployed in the terminal device.
  • the embodiment of the present application may load a corresponding target operating parameter in a corresponding target network layer of the network layer, and obtain a target network layer loaded with a corresponding target operating parameter; For each target network layer, the corresponding target operating parameters are loaded, and the target network layer can be redefined in the terminal device, and the target operating network layer is connected, and the target operating parameters of each target network layer are performed.
  • the loading is performed to obtain the target neural network model, so that the neural network model deployment method provided by the embodiment of the present application completes the deployment of the initial neural network model to the terminal device, and the obtained deployment result is the target neural network model.
  • the neural network model deployment method includes: reading an initial neural network model, obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer; respectively, according to the network layers
  • the layer definition is implemented in the terminal device by the Layer class to implement the corresponding target network layer of each network layer, so that each target network layer inherits from the Layer class; the Net class is connected to each target network layer; and the operating parameters of each network layer are converted into The target format is obtained, and the target operating parameters of each network layer are obtained.
  • the target operating parameters are loaded in the corresponding target network layer of each network layer according to the target operating parameters of each network layer, and the target neural network model deployed in the terminal device is obtained.
  • the neural network model deployment method provided by the embodiment of the present application is based on the defined framework body of the neural network model suitable for the terminal device, and the Layer class is used to redefine the network layers of the initial neural network model to obtain the inheritance from the Layer class.
  • Each target network layer; and each target network layer is connected by a Net class, and corresponding target operational parameters converted into a predetermined format are loaded in each target network layer to obtain a target neural network model.
  • the Layer class is an abstract class.
  • the embodiment of the present application deploys the initial neural network model to the terminal by using the Layer class as the base class of each target network layer of the target neural network model, and thus for the initial neural network model trained by different learning frames.
  • the framework body provided by the embodiment of the present application can be used to redefine the network layers of the initial neural network model with the Layer class as the base class of the target network layer, and the target network layers are redefined by the Net class connection.
  • the neural network model prediction can be realized directly based on the neural network model deployed by the terminal device, in order to improve the prediction of the neural network model.
  • Speed and real-time provide the foundation, no need to pass data to the server through the network, avoid network delay, save hardware cost, avoid the problem of excessive server load, and eliminate the need for terminal equipment networking, expanding the scope of application;
  • Application examples can be trained on different learning frameworks Carried out via the network to the terminal device model common deployment, reduces the limitations of using the neural network model to a terminal device deployed, to enhance the versatility of the neural network model deployment.
  • the Layer class may preset a name attribute, a Layer type attribute, and an initialization method to implement redefinition of a corresponding target network layer of each network layer; further, the Layer class may also preset a loading parameter method (such as load model). ), loading the target operating parameters in the target network layer after redefinition;
  • a loading parameter method such as load model
  • FIG. 2 shows an example of the network layer of the CNN inherited from the Layer class, as shown in FIG. 2, the input layer (Identify) in the initial neural network model, Convolution, Normalization (Batch Norm), Activation Layer (ReLU), Addition Layer (Eltwise), Active Layer with Parameters (PReLU), Zoom Layer (Resize), Mosaic (Concat) can be inherited From the Layer class; and the name attribute (var name), Layer type attribute (var type), and initialization method (init) preset in the Layer class implement redefinition of each network layer in the initial neural network model to obtain each target.
  • the input layer (Identify) in the initial neural network model Convolution, Normalization (Batch Norm), Activation Layer (ReLU), Addition Layer (Eltwise), Active Layer with Parameters (PReLU), Zoom Layer (Resize), Mosaic (Concat)
  • the name attribute var name
  • Layer type attribute var type
  • initialization method (init) preset in the Layer class implement redefinition of each network layer in
  • the network layer, and the loading parameter method (load model) preset in the Layer class is used to load the target operating parameters of each target network layer; further, after obtaining the target neural network model, the code can also be encoded in the Layer class. Method, encoding the layer computing task and submitting to the GPU;
  • FIG. 3 is another flowchart of a method for deploying a neural network model provided by an embodiment of the present application.
  • the method may include:
  • Step S200 reading an initial neural network model, obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer; wherein, a network layer layer definition includes: the network layer name information, Type information, initialization information.
  • Step S210 For any network layer of the initial neural network model, the Layer class is used as a base class, and the name attribute of the Layer class preset is adjusted according to the name information of the network layer in the terminal device, according to the type information of the network layer. Adjusting the Layer type property of the Layer class preset, according to the initialization information of the network layer, initializing the initialization method preset by the Layer class to obtain the corresponding target network layer of each network layer, so that the obtained target network layers are A derived class for the Layer class.
  • Step S220 Connect each target network layer with a Net class according to a connection structure of each network layer of the initial neural network model.
  • connection structure of each target network layer corresponds to the connection structure of each network layer of the initial neural network model.
  • Step S230 Convert the operating parameters of each network layer into a predetermined format, and obtain target operating parameters of each network layer.
  • Step S240 According to the target operating parameters of each network layer, load the corresponding target operating parameters in the corresponding target network layer of each network layer according to the loading parameter method preset by the Layer class, and obtain the target neural network model deployed in the terminal device. .
  • the Net class can also implement the following functions:
  • the embodiment of the present application can add a network layer to the target neural network model by adding a network layer method (such as the add layer method) preset by the Net class, and adding The network layer is also inherited from the Layer class;
  • a network layer method such as the add layer method
  • the embodiment of the present application can read and load the target neural network model required by the Net type preset read parameter method (such as the load model method). parameter;
  • the neural network model is run to perform forward prediction.
  • the embodiment of the present application can run the target neural network model and perform forward prediction through a Net preset prediction method (such as a forward method).
  • FIG. 4 is a schematic diagram showing the main body of the neural network model suitable for the terminal device provided by the embodiment of the present application; as shown in FIG. 4, the Layer class may be preset with a name attribute (var). Name), Layer type attribute (var type), initialization method (init), load parameter method (load model), encode method (used to encode this layer computing task and submitted to the GPU method); Net class preset has Layer class identifier (var Layers, used to indicate the connected Layer class), load layer add Layer method, load model parameter load model method, forward prediction forward method, etc.; at the same time, CNN network Identification, Convolution, BatchNorm, ReLU, Eltwise, PReLU, Resize, Concat are all inherited from the Layer class.
  • the embodiment of the present application can perform a general neural network model on an initial neural network model trained using the Tensorflow learning framework, an initial neural network model trained by Caffe (convolution neural network framework) learning framework, and the like. Deployment of terminal equipment;
  • the initial neural network model trained using the Tensorflow learning framework is deployed to the terminal device, and the embodiment of the present application can read the initial neural network model trained by the Tensorflow learning framework to obtain a model definition file (model.para) and a model parameter file (model.bin), the model definition file may record a layer definition of each network layer of the initial neural network model, and the model parameter file records the target operation parameters after the operation parameters of the network layers are converted into a predetermined format;
  • the corresponding target network layer of each network layer is implemented by the Layer class in the terminal device (such as in the GPU of the terminal device), so that each target network layer inherits from the Layer class and passes through the Net.
  • the class connects each target network layer, so that the connection structure of each target network layer corresponds to the connection structure of each network layer in the initial neural network model; based on the load model method of the Layer class, the model parameter file records the target operation parameters of each network layer. Loading in the corresponding target network layer of each network layer, obtaining the target neural network model, realizing the initial neural network model trained using the Tensorflow learning framework, and deploying to the terminal device.
  • the embodiment of the present application can redefine the network layers of the initial neural network model by using the Layer class, the redefining each target network layer inherits from the Layer class, and after connecting the target network layers through the Net class, the target can be The network layer is loaded with a corresponding target operating parameter that is converted into a predetermined format, and the initial neural network model that needs to be deployed in the terminal device is reconstructed, and the corresponding target neural network model is deployed on the terminal device;
  • the neural network model deployment method can redefine the initial neural network model trained by different learning frameworks on the terminal device through a unified framework body of the neural network model suitable for the terminal device, and realize the initial neural network model for training different learning frameworks.
  • the general deployment to the terminal device reduces the use limitations of the neural network model deployment; and when performing neural network model prediction, the neural network model prediction can be directly implemented based on the neural network model deployed by the terminal device, in order to improve the prediction speed of the neural network model And real-time provides the basis .
  • the neural network model deployment method provided by the embodiment of the present application can be applied to a GPU of a terminal device, and after the neural network model is deployed to the terminal device, the GPU can be utilized for calculation (eg, the GPU parallel operation is utilized to the utmost extent).
  • the neural network model prediction is implemented directly in the terminal device, and the speed and real-time performance of the forward prediction are improved, and the forward prediction of the neural network model of the GPU based on the terminal device is realized.
  • FIG. 5 is a flowchart of a neural network model prediction method provided by an embodiment of the present application, and the neural network model prediction method shown in FIG. 5 is applicable to a terminal device.
  • the neural network model prediction method may be include:
  • Step S300 Acquire predicted input data through a data input device of the terminal device.
  • the prediction input data may be input data required for prediction of the neural network model; the function of the prediction neural network model may be different, such as a neural network model with speech analysis function, and the prediction input data may be speech. Voice features, etc., a neural network model with image recognition function, the predicted input data may be an image feature of the image, etc. Obviously, the predicted input data may also be a set of data;
  • the data input device of the terminal device may be a device device having a capability of writing data to the terminal device, such as a mouse, a keyboard, a network interface, a touch screen, or the like.
  • Step S310 invoking a neural network model pre-deployed by the terminal device.
  • the neural network model pre-deployed by the terminal device may be deployed on the terminal device according to the neural network model deployment method provided by the embodiment of the present application; the neural network model pre-deployed by the terminal device may implement each network layer by the Layer class, so that each The network layer inherits from the Layer class, and is connected to each network layer by the Net class, and is implemented by loading corresponding operating parameters of the predetermined format in each network layer.
  • Step S320 The predicted input data is used as an input of the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • the predicted input data acquired by the terminal device may be used as an input of the neural network model, and the predicted input data is processed by the neural network model to obtain a prediction result;
  • the predicted result may be a classification result or a recognition result.
  • the neural network model prediction method provided by the embodiment of the present invention can directly predict the input data based on the neural network model pre-deployed to the terminal device when performing the neural network model prediction, thereby obtaining the prediction result directly at the terminal device, thereby avoiding
  • the interaction process between the terminal device and the server in the neural network model prediction process greatly improves the speed and real-time performance of the neural network model prediction.
  • the embodiment of the present application may improve the prediction speed based on the parallel operation using the GPU; taking the neural network model as the CNN as an example, in the GPU-based neural network model forward prediction process,
  • the embodiment of the present application can also perform convolutional layer optimization implementation on the neural network model in the form of CNN.
  • the more efficient method is the im2col+GEMM scheme, first using im2col (an algorithm that converts the original image data into a matrix) will be true Maps (characteristic map) and filter (filter) are converted into a matrix, and then GEMM (Generalized Matrix Multiplication) is called to inner product of the two matrices, so that the convolution operation is converted into matrix multiplication operation;
  • im2col an algorithm that converts the original image data into a matrix
  • Maps characteristic map
  • filter filter
  • GEMM Generalized Matrix Multiplication
  • the embodiment of the present application can maximize the parallelism of the GPU, and use the GPU for parallel scheduling of the width (width), height (height) and out put Channels (output channel) of the convolution layer to reduce The number of loop layers implemented by the convolutional layer; for example, the original 6-layer loop of the convolutional layer can be reduced to a 3-layer loop;
  • Filter is: filter[outputChannels][filterHeight][filterWidth][inputChannels];
  • the output image is: output[outputHeight][outputWidth][outputChannels];
  • the calculation of the output using the conventional convolution calculation method requires 6 layers of loops, and the pseudo code representation can be as shown in FIG. 6; in the embodiment of the present application, the GPU kernel (nuclear) function pseudo code.
  • the original loop 1, loop 2, loop 3 three-layer loop is calculated and scheduled by the GPU in parallel, leaving only the three-layer loop of calc_output(oc, y, x).
  • the embodiment of the present application can fully utilize the GPU to calculate the hidden memory delay characteristic, and increase the number of target points continuously calculated in one cycle; for example, continuously calculating 8 target points in one cycle, and increasing the cycles (cycles) calculated in the loop. , thus hiding the delay of memory reading and writing;
  • all the pixel points corresponding to the first target point are read when the first target point is calculated, and when the non-first target point is calculated, the two adjacent target points are duplicated.
  • the pixel point so when calculating the non-first target point, the same pixel in the previous calculated target point can be reused, and the pixel point different from the previously calculated target point is re-read, so that the memory is read.
  • the number of times is reduced; for example, for a 3x3 convolution, calculating a target point originally requires reading 9 adjacent pixels, and the adjacent two target points have 6 pixels being repeated, and the present application continuously calculates 8 Adjacent target points, only need to read 9 pixels for the first time, then multiplex the previous 6 pixels each time, only need to re-read 3 pixels at a time, so that the number of memory reads Reduced by more than half;
  • the implementation speed of the convolution layer of the embodiment of the present application is compared with the conventional schemes such as Caffe2, Tensorflow, and ncnn, as shown in Table 1 below. As a result, it can be seen that the embodiment of the present application can achieve 4-10 times acceleration when implementing the convolution layer.
  • the embodiment of the present application can be used for the deployment of a picture art filter model to an IOS mobile terminal (such as a smartphone of the IOS operating system), and forward prediction based on the deployed picture art filter model on the IOS mobile terminal, thereby using Torch.
  • IOS mobile terminal such as a smartphone of the IOS operating system
  • Torch is a scientific computing framework that supports machine learning algorithms extensively.
  • the trained picture art filter model is converted to the IOS mobile end model format, so when using the picture art filter model to add artistic filter effects to the image, it is based directly on
  • the picture art filter model after the deployment of the IOS mobile terminal adds an art filter effect to the acquired image on the IOS mobile terminal; taking the CNN network model as an example, the specific application process can be as follows:
  • the IOS mobile terminal can read and use the Torch-trained picture art filter model to convert the predetermined format (such as converting to binary format), and obtain the definitions of the network layers of the converted picture art filter model and the target operating parameters.
  • the model definition file model.para and the model parameter file model.bin are obtained; model.para defines the structure of the picture art filter model in the form of CNN, including the definition of each network layer of the picture art filter model, and the model.bin file includes Target operating parameters corresponding to each layer of the network;
  • the IOS mobile terminal invokes a framework body of a predetermined CNN network model in the GPU.
  • the framework body defines each network layer of the CNN network model by a Layer class, and each network layer defined by the Layer class is connected by a Net class;
  • the model definition file model.para is loaded, and the target network layer of the picture art filter model is implemented by the Layer class, and each target network layer is connected by the Net class; the model parameter file model.bin is loaded in the IOS.
  • the target network layer is loaded with the target operating parameters of the corresponding network layer to construct a redefinition picture art filter model;
  • the IOS mobile terminal can add art to the picture according to the redefined picture art filter model in the GPU. Filter effect.
  • the embodiment of the present application can support the training of the neural network model trained by different training frameworks to the deployment of the terminal device, such as a neural network model that supports training of learning frameworks such as torch, Tensorflow, caffe, and quick deployment to the terminal device, and reduces the neural network model.
  • the usage limitation of the deployment can be used for the general deployment of the neural network model trained by the different learning frameworks, and the versatility of the deployment of the neural network model is improved.
  • the embodiment of the present application can be implemented based on the IOS system without relying on the third-party library, but The data size of the library can be greatly reduced based on the IOS native Metal (an application programming interface) and the Objective-C language (a programming language); at the same time, the embodiment of the present application can support a rich layer.
  • the network layer is customized based on the layer class, and has a high expansion performance for the network layer of the neural network model.
  • the embodiment of the present application can implement the optimization of the convolution layer based on the parallel operation of the GPU, and improve the forward direction of the CNN network model. The speed of prediction.
  • the neural network model deployment device described below can be considered as a program module required for the terminal device to implement the neural network model deployment method provided by the embodiment of the present application;
  • the described neural network model deployment device can be referenced to each other in conjunction with the neural network model deployment method described above.
  • FIG. 9 is a structural block diagram of a neural network model deployment apparatus according to an embodiment of the present application.
  • the device may be applied to a terminal device, in particular, a GPU in a terminal device applicable to an IOS operating system.
  • the embodiment of the present application provides The neural network model deployment device may include:
  • the reading module 100 is configured to read an initial neural network model, obtain a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • the target network layer implementation module 200 is configured to implement a corresponding target network layer of each network layer by using a Layer class in the terminal device according to the layer definition of each network layer, so that each target network layer inherits from the Layer class;
  • the network layer connection module 300 is configured to connect each target network layer by using a Net class
  • a format conversion module 400 configured to convert operating parameters of each network layer into a predetermined format, to obtain target operating parameters of each network layer
  • the parameter loading module 500 is configured to load corresponding target operating parameters in the corresponding target network layer of each network layer according to target operating parameters of each network layer, to obtain a target neural network model deployed in the terminal device.
  • a network layer layer definition includes: the network layer name information, type information, and initialization information; and correspondingly, the target network layer implementation module 200 is configured to respectively determine, according to the layer definition of each network layer, the terminal
  • the Layer class implements the corresponding target network layer of each network layer in the device, so that each target network layer inherits from the Layer class, including:
  • the Layer class is used as a base class, and the name attribute of the Layer class preset is adjusted according to the name information of the network layer in the terminal device, and the Layer class is adjusted according to the type information of the network layer.
  • the preset Layer type attribute is initialized by the initialization method preset by the Layer class according to the initialization information of the network layer, so as to obtain the corresponding target network layer of each network layer, so that the obtained target network layers are all Layer class. Derived class.
  • the parameter loading module 500 is configured to load corresponding target operating parameters in the corresponding target network layer of each network layer according to target operating parameters of each network layer, to obtain a target neural network model deployed in the terminal device, specifically include:
  • the corresponding target operating parameters are loaded in the corresponding target network layer of each network layer according to the loading parameter preset by the Layer class, and the target neural network model deployed in the terminal device is obtained.
  • the network layer connection module 300 is configured to connect each target network layer by using a Net class, and specifically includes:
  • each target network layer is connected by a Net class, so that the connection structure of each target network layer corresponds to the connection structure of each network layer of the initial neural network model.
  • FIG. 10 is another structural block diagram of a neural network model deployment apparatus provided by an embodiment of the present application. As shown in FIG. 9 and FIG. 10, the apparatus may further include:
  • the Net application module 600 is configured to add a network layer to the target neural network model by adding a network layer method preset by the Net class, wherein the added network layer inherits from the Layer class; and/or, through the Net class.
  • the preset read parameter method reads and loads the parameters required for the operation of the target neural network model; and/or runs the target neural network model through the Net preset prediction method to perform forward prediction.
  • the initial neural network model is a CNN network model
  • FIG. 11 shows another structural block diagram of the neural network model deployment device provided by the embodiment of the present application, as shown in FIG. 9 and FIG.
  • the device may also include:
  • the GPU parallel scheduling module 700 is configured to use the GPU of the terminal device to perform parallel scheduling on the width, height, and output channel of the convolution layer of the CNN network model to reduce the number of loop layers implemented by the convolution layer;
  • the target point adding module 800 is configured to increase the number of target points continuously calculated in one cycle
  • the pixel point reading module 900 is configured to read all the pixel points corresponding to the first target point when calculating the first target point, and reuse the previous calculated target point when calculating the non-first target point. The same pixel point and re-read the pixel point different from the previously calculated target point.
  • the GPU parallel scheduling module 700 can also be separately applied to the apparatus shown in FIG. 8 or FIG.
  • the embodiment of the present application can perform the neural network model prediction by using the target neural network model deployed by the terminal device, for example, the terminal device can acquire the input data, and invoke the target deployed in the terminal device.
  • a neural network model (such as a target neural network model deployed in the GPU of the terminal device) operates on the input data to determine the classification or recognition result of the input data.
  • FIG. 12 shows a hardware structure of the terminal device.
  • the terminal device may include: at least one graphics processor 1, at least a communication interface 2, at least one memory 3 and at least one communication bus 4;
  • the number of the graphics processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the graphics processor 1, the communication interface 2, and the memory 3 complete communication with each other through the communication bus 4.
  • the memory stores a program
  • the graphics processor calls the program stored in the memory, the steps of the neural network model deployment method described above can be implemented.
  • the program can be used to:
  • Reading an initial neural network model obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • the corresponding target network layer of each network layer is implemented by the Layer class in the terminal device, so that each target network layer inherits from the Layer class;
  • the corresponding target operating parameters are loaded in the corresponding target network layers of each network layer, and the target neural network model deployed in the terminal device is obtained.
  • refinement function and the extended function of the program may refer to corresponding parts in the foregoing.
  • the embodiment of the present application further provides a storage medium, such as a memory or the like, which stores a program suitable for execution by a graphics processor, and when the program is executed by the graphics processor, implements the neural network described above.
  • a storage medium such as a memory or the like, which stores a program suitable for execution by a graphics processor, and when the program is executed by the graphics processor, implements the neural network described above.
  • the program can be used to:
  • Reading an initial neural network model obtaining a layer definition of each network layer of the initial neural network model, and operating parameters of each network layer;
  • the corresponding target network layer of each network layer is implemented by the Layer class in the terminal device, so that each target network layer inherits from the Layer class;
  • the corresponding target operating parameters are loaded in the corresponding target network layers of each network layer, and the target neural network model deployed in the terminal device is obtained.
  • refinement function and the extended function of the program may refer to corresponding parts in the foregoing.
  • the embodiment of the present application further provides a neural network model prediction device.
  • the neural network model deployment device provided by the embodiment of the present application deploys the neural network model to the terminal device
  • the neural network model deployed by the terminal device implements the neural network model prediction.
  • FIG. 13 is a structural block diagram of a neural network model predicting apparatus provided by an embodiment of the present application, and the apparatus is applicable to a terminal device, in particular, a GPU in a terminal device applicable to an IOS operating system.
  • the neural network model prediction apparatus provided by the embodiment may include:
  • the data obtaining module 10 is configured to obtain predicted input data by using a data input device of the terminal device;
  • the model invoking module 20 is configured to invoke a neural network model pre-deployed by the terminal device;
  • the model processing module 30 is configured to use the predicted input data as an input of the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • the neural network model may be a CNN network model
  • the model processing module 30 is configured to process the predicted input data by the neural network model, which may specifically include:
  • the width, height and output channel of the convolution layer of the CNN network model are concurrently scheduled using the GPU of the terminal device to reduce the number of loop layers implemented by the convolution layer.
  • model processing module 30 is configured to process the predicted input data by the neural network model, and may further include
  • the embodiment of the present application further provides a terminal device, where the hardware structure of the terminal device is as shown in FIG. 12, the terminal may include: at least one memory and at least one graphics processor; the memory stores a program, The steps of the neural network model prediction method described above are implemented when the graphics processor invokes the program stored in the memory.
  • the program can be used to:
  • the predicted input data is used as an input to the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • the embodiment of the present application further provides a storage medium, where the storage medium stores a program suitable for execution by a graphics processor, where the program is used to:
  • the predicted input data is used as an input to the neural network model, and the predicted input data is processed by the neural network model to obtain a predicted result.
  • refinement function and the extended function of the program may refer to corresponding parts in the foregoing.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Combined Controls Of Internal Combustion Engines (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

一种神经网络模型部署方法、预测方法及设备,该方法包括:读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数(S100);分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类(S110);以Net类连接各目标网络层(S120);将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数(S130);分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型(S140)。该方法可实现神经网络模型至终端设备的部署,并提升神经网络模型至终端设备部署的通用性。

Description

一种神经网络模型部署方法、预测方法及设备
本申请要求于2017年12月13日提交的、申请号为201711330928.6、发明名称为“一种神经网络模型部署方法、预测方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体涉及一种神经网络模型部署方法、预测方法及设备。
背景技术
神经网络,特别是卷积神经网络(CNN,Convolutional Neural Network),作为深度学习的一个重要分支,已成为语音分析、图像识别等领域的研究热点。神经网络的实际应用一般分为神经网络模型训练,和神经网络模型预测。
神经网络模型训练可以理解为是,基于大量的样本数据,对神经网络的参数进行学习和调整,得到具备需求功能的神经网络模型;神经网络模型预测可以理解为是,依据训练好的神经网络模型对预测输入数据进行运算,确定预测结果(如分类或识别结果),实现需求功能。
一般而言,神经网络模型训练是在服务器进行,并部署在服务器,这就使得神经网络模型预测需由终端设备和服务器协同进行,如神经网络模型训练好后,由终端设备获取预测输入数据并提交到服务器,由在服务器部署的神经网络模型进行神经网络模型预测,这无疑导致神经网络模型预测对速度和实时性的要求满足存在一定的困难;因此如何改进神经网络模型的部署方式,以为提升神经网络模型预测的速度和实时性提供基础,成为了本领域技术人员需要考虑的问题。
发明内容
有鉴于此,本申请实施例提供一种神经网络模型部署方法、预测方法及设备,以将神经网络模型部署在终端设备,为提升神经网络模型预测的速度和实 时性提供基础,并且所提供的神经网络模型部署方法具有通用性高的特点。
为实现上述目的,本申请实施例提供如下技术方案:
一种神经网络模型部署方法,应用于终端设备中,包括:
读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
以Net类连接各目标网络层;
将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
本申请实施例还提供一种神经网络模型预测方法,应用于终端设备中,包括:
通过所述终端设备的数据输入装置,获取预测输入数据;
调用所述终端设备预部署的神经网络模型;
将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
本申请实施例还提供一种神经网络模型部署装置,应用于终端设备中,包括:
读取模块,用于读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
目标网络层实现模块,用于分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
网络层连接模块,用于以Net类连接各目标网络层;
格式转换模块,用于将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
参数加载模块,用于分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
本申请实施例还提供一种神经网络模型预测装置,应用于终端设备中,包 括:
数据获取模块,用于通过所述终端设备的数据输入装置,获取预测输入数据;
模型调用模块,用于调用所述终端设备预部署的神经网络模型;
模型处理模块,用于将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
本申请实施例还提供一种终端设备,包括:至少一个存储器和至少一个图形处理器;所述存储器存储有程序,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
以Net类连接各目标网络层;
将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
本申请实施例还提供一种存储介质,所述存储介质存储有适于图形处理器执行的程序,所述程序被所述图形处理器执行时,实现上述所述的神经网络模型部署方法的各个步骤。
本申请实施例还提供一种终端设备,包括:至少一个存储器和至少一个图形处理器;所述存储器存储有程序,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
通过所述终端设备的数据输入装置,获取预测输入数据;
调用所述终端设备预部署的神经网络模型;
将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
基于上述技术方案,本申请实施例提供的神经网络模型部署方法及设备,基于所定义的适于终端设备的神经网络模型的框架主体,以Layer类对初始神经网络模型的各网络层进行重定义,得到继承自Layer类的各目标网络层;并且通过Net类连接各目标网络层,在各目标网络层中加载转换为预定格式的相 应的目标运行参数,得到目标神经网络模型。Layer类是一个抽象类,本申请实施例通过将Layer类作为目标神经网络模型的各目标网络层的基类,从而对于不同学习框架训练的初始神经网络模型,在将初始神经网络模型部署至终端设备上时,可使用本申请实施例提供的框架主体,以Layer类为目标网络层的基类进行初始神经网络模型的各网络层的重定义,并以Net类连接重定义的各目标网络层,实现不同学习框架训练的初始神经网络模型至终端设备的通用部署,从而在进行神经网络模型预测时,可直接基于终端设备部署的神经网络模型实现神经网络模型预测,为提升神经网络模型预测的速度和实时性提供了基础;同时,本申请实施例可对不同学习框架训练的神经网络模型进行至终端设备的通用部署,降低了神经网络模型至终端设备部署的使用局限,提升了神经网络模型部署的通用性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的神经网络模型部署方法的流程图;
图2为继承自Layer类的CNN的网络层示例图;
图3为本申请实施例提供的神经网络模型部署方法的另一流程图;
图4为本申请实施例提供的神经网络模型的框架主体示意图;
图5为本申请实施例提供的神经网络模型预测方法的流程图;
图6为常规的卷积计算方式的部分伪码示意图;
图7为本申请实施例提供的GPU的kernel函数的部分伪码示意图;
图8为本申请实施例提供的GPU的kernel函数的另一部分伪码示意图;
图9为本申请实施例提供的神经网络模型部署装置的结构框图;
图10为本申请实施例提供的神经网络模型部署装置的另一结构框图;
图11为本申请实施例提供的神经网络模型部署装置的再一结构框图;
图12为本申请实施例提供的终端设备的硬件结构框图;
图13为本申请实施例提供的神经网络模型预测装置的结构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的神经网络模型部署方法,可将服务器训练好的神经网络模型部署在终端设备,同时可实现服务器基于不同学习框架训练的神经网络模型至终端设备的通用部署;如可实现服务器基于Tensorflow(Tensorflow是谷歌研发的一种人工智能学习系统)、caffe(卷积神经网络框架)等不同学习框架训练的神经网络模型至终端设备的通用部署,避免在进行神经网络模型至终端设备的部署时,终端设备需要安装与服务器训练神经网络模型的学习框架相应的移动版的学习框架的情况发生;
为提升神经网络模型至终端设备部署的通用性,本申请的发明人定义了适于终端设备的神经网络模型的一种新型的框架主体,该框架主体由Layer(层)类重定义神经网络模型的各网络层,即神经网络模型的各网络层可继承自Layer类,并由Layer类实现各网络层的具体方法;同时,继承自Layer类的神经网络模型的各网络层可通过Net(网络)类连接,组成神经网络模型的网络结构。其中,Layer类是一个抽象类,本申请实施例通过将Layer类作为构建神经网络模型的网络层的基类,从而对于不同学习框架训练的神经网络模型,在将神经网络模型部署至终端设备上时,可使用本申请实施例提供的框架主体,以Layer类为网络层的基类进行神经网络模型的各网络层的重定义,并以Net类连接各网络层,实现不同学习框架训练的神经网络模型至终端设备的通用部署。
基于此思路,在将服务器训练好的神经网络模型部署至终端设备上时,图1示出了本申请实施例提供的神经网络模型部署方法的流程图,该方法可应用于终端设备,尤其是可应用于IOS操作系统的终端设备(IOS操作系统是苹果公司的移动操作系统),如具体可由IOS终端的GPU(Graphics Processing Unit,图形处理器)执行实现;显然,本申请实施例提供的神经网络模型部署方法并不限于终端设备的操作系统为IOS,也可在其他操作系统的终端设备上进行应用,由终端设备的GPU执行实现神经网络模型的部署;
参照图1,本申请实施例提供的神经网络模型部署方法可以包括:
步骤S100、读取初始神经网络模型,得到所述初始神经网络模型的各网络 层的层定义,及各网络层的运行参数。
可选的,初始神经网络模型可以认为是需部署到终端设备的神经网络模型;作为一种可选方式,初始神经网络模型可以是服务器基于特定学习框架训练好的神经网络模型,如服务器基于Tensorflow等学习框架训练好的神经网络模型,Tensorflow学习框架是谷歌研发的一种人工智能学习系统;
作为另一种可选方式,初始神经网络模型可以是已部署在终端设备上,但需基于本申请实施例提供的框架主体重新进行部署的神经网络模型,如对于终端设备通过移动版的特定学习框架部署到终端设备的神经网络模型,可利用本申请实施例提供的神经网络模型部署方法重新进行部署,以使得部署在终端设备的神经网络模型具有本申请实施例提供的框架主体的特性;例如对于终端设备通过移动版的Tensorflow学习框架部署到终端设备的神经网络模型,可利用本申请实施例提供的神经网络模型部署方法进行重新部署。
在确定需部署到终端设备的初始神经网络模型后,本申请实施例可读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
可选的,本申请实施例可读取初始神经网络模型的网络结构,得到各网络层的层定义,初始神经网络模型的网络结构一般包括多个类型的相互连接的网络层,如以初始神经网络模型的网络形式为CNN为例,CNN的一种可选网络结构可以包括:规范化层(BatchNorm),卷积层(Convolution),反卷积层(Deconvolution),激活层(ReLU),相加层(Eltwise),带参数的激活层(PReLU),降采样层(Pooling),缩放层(Resize),深度可分离卷积层(Depthwise Convolution),拼接层(Concat)等;显然,此处的CNN网络结构的示例仅是可选的,具体的网络结构还可根据实际需要调整设置;
通过读取初始神经网络模型的网络结构,本申请实施例可读取得到初始神经网络模型的各网络层的层定义,一网络层的层定义可对该网络层的层属性进行描述,包括该网络层的名字信息、类型信息、初始化信息等。
不同网络层的名字信息不同,根据名字信息可以对网络层加以区分,例如该名字信息可以为为网络层设置的网络层名称、网络层在初始神经网络模型中的层序号等。类型信息用于说明网络层所属的类型,如规范化层、卷积层、反卷积层、池化层等。初始化信息用于说明网络层的初始化参的运行参数。
可选的,本申请实施例可从初始神经网络模型的各网络层中,读取各网络 层的运行参数;初始神经网络模型的各网络层的运行参数可以认为是,初始神经网络模型的各网络层的权重,初始神经网络模型的各网络层的运行参数,决定了初始神经网络模型所具有的功能,在神经网络模型训练过程中,学习调整的主要是神经网络模型的各网络层的运行参数。
步骤S110、分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类。
可选的,继承是编程语言中的一种方法,若类A继承自类B,则类B可称为基类,类A可称为派生类,且类A可以获得类B的属性和方法;本申请实施例在对初始神经网络模型进行至终端设备的部署时,可对初始神经网络模型的各网络层进行重定义,在终端设备中得到各网络层相应的目标网络层,目标网络层可以认为是目标神经网络模型中的网络层,目标神经网络模型可以认为是基于本申请实施例提供的神经网络模型部署方法,完成初始神经网络模型至终端设备的部署,所得到的部署结果;相应的,初始神经网络模型的一网络层可对应目标神经网络模型的一目标网络层;在对初始神经网络模型的各网络层进行重定义,得到各网络层相应的目标网络层时,本申请实施例可借助继承方式,由Layer类实现各网络层相应的目标网络层,使得重定义的各目标网络层均继承自Layer类;
可选的,作为一种示例,初始神经网络模型的一网络层的层定义可以包括:该网络层的名字信息、类型信息、初始化信息等;相应的,Layer类可以预置有名字属性,Layer类型属性(Layer type),以及初始化方法(如init方法);
可选的,作为一种可选实现,分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类的实现过程可以是:对于任一网络层,以Layer类为基类,在终端设备中根据该网络层的名字信息,调整Layer类预置的名字属性,根据该网络层的类型信息,调整Layer类预置的Layer类型属性,根据该网络层的初始化信息,由Layer类预置的初始化方法进行初始化操作,以得到各网络层相应的目标网络层,使所得到的各目标网络层均为Layer类的派生类。从而由Layer类对初始神经网络模型中的每一网络层均进行此处理,则可实现各网络层相应的目标网络层。
步骤S120、以Net类连接各目标网络层。
可选的,在得到初始神经网络模型的各网络层相应的目标网络层后,初始 神经网络模型中的每一网络层的结构均以Layer类为基类进行了重定义,得到了重定义的各目标网络层;本申请实施例可根据所述初始神经网络模型的各网络层的连接结构,以Net类连接各目标网络层,形成目标神经网络模型的各目标网络层的连接结构,并且目标神经网络模型的各目标网络层的连接结构,与初始神经网络模型的各网络层的连接结构相应;
可选的,为便于连接,以初始神经网络模型具有A1、B1、C1网络层,且A1网络层连接B1网络层、B1网络层连接C1网络层为例,则通过步骤S110以Layer类实现A1网络层相应的目标网络层A2,B1网络层相应的目标网络层B2,C1网络层相应的目标网络层C2后,可根据初始神经网络模型中A1、B1、C1网络层的连接结构,以Net类连接目标网络层A2、目标网络层B2、目标网络层C2,使得目标网络层A2连接目标网络层B2,目标网络层B2连接目标网络层C2,从而得到与初始神经网络模型的各网络层的连接结构相应的,各目标网络层的连接结构。
步骤S130、将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数。
在通过前述步骤实现各目标网络层,以及以Net类连接各目标网络层,形成目标神经网络模型的网络层结构后,本申请实施例还需对各目标网络层的运行参数进行加载,以使得目标神经网络模型具有与初始神经网络模型相应的功能;
基于此,本申请实施例可将读取的初始神经网络模型的各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;一网络层的目标运行参数可以理解为是,需要加载到相应的目标网络层中的运行参数;可选的,预定格式可以是本申请实施例提供的适于终端设备的神经网络模型的框架主体的格式,具体格式类型可根据实际情况设定。
需要说明的是,步骤S130可以是步骤S100执行之后便执行,不一定是在步骤S120执行之后。
步骤S140、分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,对于一网络层的目标运行参数而言,本申请实施例可在该网络层相应的目标网络层中加载相应的目标运行参数,得到加载有相应目标运行参数的目标网络层;从而对于每一目标网络层均进行相应的目标运行参数的加 载,则可在终端设备中已重定义得到各目标网络层,且连接各目标网络层的基础上,进行各目标网络层的目标运行参数的加载,得到目标神经网络模型,从而基于本申请实施例提供的神经网络模型部署方法,完成初始神经网络模型至终端设备的部署,得到的部署结果为目标神经网络模型。
本申请实施例提供的神经网络模型部署方法包括:读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类;以Net类连接各目标网络层;将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
本申请实施例提供的神经网络模型部署方法,基于所定义的适于终端设备的神经网络模型的框架主体,以Layer类对初始神经网络模型的各网络层进行重定义,得到继承自Layer类的各目标网络层;并且通过Net类连接各目标网络层,在各目标网络层中加载转换为预定格式的相应的目标运行参数,得到目标神经网络模型。Layer类是一个抽象类,本申请实施例通过将Layer类作为目标神经网络模型的各目标网络层的基类,从而对于不同学习框架训练的初始神经网络模型,在将初始神经网络模型部署至终端设备上时,可使用本申请实施例提供的框架主体,以Layer类为目标网络层的基类进行初始神经网络模型的各网络层的重定义,并以Net类连接重定义的各目标网络层,实现不同学习框架训练的初始神经网络模型至终端设备的通用部署,从而在进行神经网络模型预测时,可直接基于终端设备部署的神经网络模型实现神经网络模型预测,为提升神经网络模型预测的速度和实时性提供了基础,无需通过网络将数据传递给服务器,避免了网络延迟,节省了硬件成本,避免了服务器负载过高的问题,也无需终端设备联网,扩大了应用范围;同时,本申请实施例可对不同学习框架训练的神经网络模型进行至终端设备的通用部署,降低了神经网络模型至终端设备部署的使用局限,提升了神经网络模型部署的通用性。
可选的,Layer类中可预置名字属性,Layer类型属性以及初始化方法,以实现各网络层相应的目标网络层的重定义;进一步,Layer类中还可预置加载参数方法(如load model),实现重定义后的目标网络层中目标运行参数的加载;
作为一种示例,以初始神经网络模型为CNN形式为例,图2示出了继承自 Layer类的CNN的网络层示例,如图2所示,初始神经网络模型中的输入层(Identify),卷积层(Convolution),规范化层(Batch Norm),激活层(ReLU),相加层(Eltwise),带参数的激活层(PReLU),缩放层(Resize),拼接层(Concat)均可继承自Layer类;并由Layer类中预置的名字属性(var name)、Layer类型属性(var type)、初始化方法(init)实现上述初始神经网络模型中的各网络层的重定义,得到各目标网络层,并通过Layer类中预置的加载参数方法(load model)实现各目标网络层的目标运行参数的加载;进一步,在得到目标神经网络模型后,还可通过Layer类中encode(编码)方法,编码本层计算任务并提交给GPU;
可选的,图3示出了本申请实施例提供的神经网络模型部署方法的另一流程图,参照图3,该方法可以包括:
步骤S200、读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;其中,一网络层的层定义包括:该网络层的名字信息,类型信息,初始化信息。
步骤S210、对于初始神经网络模型的任一网络层,以Layer类为基类,在终端设备中根据该网络层的名字信息,调整Layer类预置的名字属性,根据该网络层的类型信息,调整Layer类预置的Layer类型属性,根据该网络层的初始化信息,由Layer类预置的初始化方法进行初始化操作,以得到各网络层相应的目标网络层,使所得到的各目标网络层均为Layer类的派生类。
步骤S220、根据初始神经网络模型的各网络层的连接结构,以Net类连接各目标网络层。
可选的,以Net类连接各目标网络层后,各目标网络层的连接结构与初始神经网络模型的各网络层的连接结构相应。
步骤S230、将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数。
步骤S240、分别根据各网络层的目标运行参数,以Layer类预置的加载参数方法,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,Net类除连接各目标网络层外,还可实现如下功能:
添加一层网络层到神经网络模型中;相应的,本申请实施例可通过Net类预置的添加网络层方法(如add Layer方法),添加一层网络层到目标神经网络模型中,所添加的网络层也是继承自Layer类;
读取并加载运行神经网络模型所需的参数;相应的,本申请实施例可通过Net类预置的读取参数方法(如load model方法),读取并加载目标神经网络模型运行所需的参数;
运行神经网络模型,进行前向预测;相应的,本申请实施例可通过Net预置的预测方法(如forward方法),运行目标神经网络模型,进行前向预测。
可选的,以CNN模型为例,图4示出了本申请实施例提供的适于终端设备的神经网络模型的框架主体示意;如图4所示,Layer类可预置有名字属性(var name),Layer类型属性(var type),初始化方法(init),加载参数方法(load model),encode方法(用于编码本层计算任务并提交给GPU方法);Net类预置有Layer类标识(var Layers,用于指示所连接的Layer类),加载层add Layer方法,加载模型参数load model方法,进行前向预测forward方法等;同时,CNN网络的Identify,Convolution,BatchNorm,ReLU,Eltwise,PReLU,Resize,Concat均继承自Layer类。
使用图4所示框架主体,本申请实施例可对使用Tensorflow学习框架训练的初始神经网络模型,Caffe(卷积神经网络框架)学习框架训练的初始神经网络模型等,进行通用的神经网络模型至终端设备的部署;
以对使用Tensorflow学习框架训练的初始神经网络模型,进行至终端设备的部署为例,则本申请实施例可读取Tensorflow学习框架训练的初始神经网络模型,得到模型定义文件(model.para)和模型参数文件(model.bin),模型定义文件可记录有初始神经网络模型的各网络层的层定义,模型参数文件记录有各网络层的运行参数转换成预定格式后的目标运行参数;
从而可根据各网络层的层定义,在终端设备中(如在终端设备的GPU中)由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类,并通过Net类连接各目标网络层,使得各目标网络层的连接结构,与初始神经网络模型中各网络层的连接结构相应;基于Layer类的load model方法,将模型参数文件记录各网络层的目标运行参数,在各网络层相应的目标网络层中进行加载,得到目标神经网络模型,实现使用Tensorflow学习框架训练的初始神经网络模型,至终端设备的部署。
由于本申请实施例可通过Layer类对初始神经网络模型的各网络层进行重定义,使得重定义各目标网络层均继承自Layer类,并且通过Net类连接各目标网络层后,可在各目标网络层中加载转换为预定格式的相应的目标运行参数, 实现需在终端设备部署的初始神经网络模型的重构,在终端设备上部署得到相应的目标神经网络模型;基于本申请实施例提供的神经网络模型部署方法,可通过统一的适于终端设备的神经网络模型的框架主体,将不同学习框架训练的初始神经网络模型在终端设备上进行重定义,实现不同学习框架训练的初始神经网络模型至终端设备的通用部署,降低了神经网络模型部署的使用局限;并且在进行神经网络模型预测时,可直接基于终端设备部署的神经网络模型实现神经网络模型预测,为提升神经网络模型预测的速度和实时性提供了基础。
可选的,本申请实施例提供的神经网络模型部署方法可应用于终端设备的GPU,在将神经网络模型部署至终端设备后,可利用GPU计算(如最大限度的利用了GPU的并行运算)直接在终端设备实现神经网络模型预测,提升前向预测的速度和实时性,实现基于终端设备的GPU的神经网络模型的前向预测。
可选的,图5示出了本申请实施例提供的神经网络模型预测方法的流程图,图5所示神经网络模型预测方法可应用于终端设备,参照图5,该神经网络模型预测方法可以包括:
步骤S300、通过终端设备的数据输入装置,获取预测输入数据。
预测输入数据可以是进行神经网络模型预测所需的输入数据;视神经网络模型的功能的不同,预测输入数据的形式也可能不同,如具有语音分析功能的神经网络模型,预测输入数据可以是语音的语音特征等,具有图像识别功能的神经网络模型,预测输入数据可以是图像的图像特征等,显然,预测输入数据也可能是一组数据;
可选的,终端设备的数据输入装置可以是鼠标、键盘、网络接口、触摸屏等具有向终端设备写入数据能力的装置设备。
步骤S310、调用终端设备预部署的神经网络模型。
可选的,终端设备预部署的神经网络模型可基于本申请实施例提供的神经网络模型部署方法,部署在终端设备;终端设备预部署的神经网络模型可以由Layer类实现各网络层,使各网络层均继承自Layer类,并由Net类连接各网络层,通过在各网络层加载相应的预定格式的运行参数实现。
步骤S320、将预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
在调用终端设备预部署的神经网络模型后,可将终端设备获取的预测输入 数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果;可选的,预测结果可以是分类结果或者识别结果。
本申请实施例提供的神经网络模型预测方法,可在进行神经网络模型预测时,直接基于预部署到终端设备的神经网络模型进行预测输入数据的处理,从而直接在终端设备处得到预测结果,免去神经网络模型预测过程中终端设备与服务器的交互过程,极大的提升了神经网络模型预测的速度和实时性。
可选的,在进行神经网络模型预测时,本申请实施例可基于利用GPU的并行运算,提升预测速度;以神经网络模型为CNN为例,在基于GPU的神经网络模型前向预测过程中,本申请实施例还可对CNN形式的神经网络模型进行卷积层优化实现。
CNN网络因为网络层次深,参数量大,所以将CNN网络模型部署到终端设备后,在终端GPU上进行前向预测时,将涉及到非常大的计算量,而通常卷积层的参数是占比最多的,计算量一般占据70%以上;在常规的卷积层的实现中,比较高效的方法是im2col+GEMM方案,首先使用im2col(一种将原始图像数据转化为矩阵的算法)将featrue maps(特征图)和filter(滤波器)转换成矩阵,再调用GEMM(Generalized Matrix Multiplication,广义矩阵乘法)对两矩阵内积,这样一来卷积操作就被转化为了矩阵乘法运算;这种办法对于filter数目越多(也即featrue maps通道数越多),filter尺寸越大的情况,效率越高;而本申请实施例是在终端设备上进行CNN网络模型的前向预测,而终端设备因为计算能力有限,一般feature maps(特征图)通道数不宜过大,因此若使用常规的im2col进行CNN网络模型卷积层在终端设备的计算,反而会增加内存读取的次数。
针对此情况,本申请实施例可最大限度利用GPU的并行性,将卷积层的width(宽度),height(高度)和out put Channels(输出通道)三个维度使用GPU进行并行调度,以降低卷积层实现的循环层数;例如可将卷积层实现原本的6层循环降低到3层循环;
可选的,作为一种示例,假设卷积层的输入image为:
input[inputHeight][inputWidth][inputChannels];
filter为:filter[outputChannels][filterHeight][filterWidth][inputChannels];
输出image为:output[outputHeight][outputWidth][outputChannels];
以步长stride=1为例,使用常规的卷积计算方式计算output需要6层循环,伪码表示可如图6所示;而在本申请实施例中,GPU的kernel(核)函数伪码可 如图7所示,原本loop 1,loop 2,loop 3三层循环由GPU并行的计算和调度,只余下calc_output(oc,y,x)的三层循环。
进一步,本申请实施例可充分利用GPU计算隐藏内存延迟的特性,增加一个循环内连续计算的目标点数量;如在一次循环中连续计算8个目标点,增加了循环内计算的cycles(周期),从而隐藏了内存读写的延迟;
同时,本申请实施例可在计算第一个目标点时,读取第一个目标点相应的全部像素点,而在计算非第一个目标点时,由于相邻的两个目标点存在重复的像素点,因此在计算非第一个目标点时,可复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点,使得内存读取的次数得到减少;比如对于3x3的卷积,计算一个目标点原本需要读取9个相邻的像素点,而相邻的两个目标点有6个像素点是重复的,本申请连续计算8个相邻目标点,只有第一次需要读取9个像素点,之后每次都复用前一次的6个像素点,一次只需重新读取3个像素点,从而使内存读取的次数减少了一半以上;
作为一种示例,针对kernel size(核尺寸)为3x3的卷积,优化后GPU的kernel函数伪码如图8示;可见,convolution中的loop 4循环内只计算了output[y][x][oc]一个点,convolution3x3中的loop 4循环内计算了y到y+7连续的8个点,循环内计算所需的cycles是原来的8倍,可以更好的利用GPU计算隐藏内存读取延迟的特性;而convolution中每计算一个点,都需要读取input image 9(filterHeight*filterWidth)次,convolution3x3连续计算垂直y方向上相邻的8个点,计算相邻的两点所需的9个pixel中有6个点(pixel_line1,pixel_line2)是重复的,每次只需读取新的3个点;convolution中计算8点读取的次数为9*8=72次,convolution3x3中为9+3*7=30次,内存读取次数减少了58%。
在通过本申请实施例进行卷积层优化实现后,基于IOS系统,将本申请实施例的卷积层的实现速度与Caffe2、Tensorflow和ncnn等传统方案相比,如下表1示出的比对结果,可以看出,本申请实施例可在实现卷积层时,实现4-10倍的加速。
Figure PCTCN2018116958-appb-000001
Figure PCTCN2018116958-appb-000002
表1
本申请实施例可用于图片艺术滤镜模型至IOS移动端(如IOS操作系统的智能手机)的部署,及在IOS移动端基于部署好的图片艺术滤镜模型进行前向预测,从而将使用Torch(Torch是一个广泛支持机器学习算法的科学计算框架)训练好的图片艺术滤镜模型转换成IOS移动端的模型格式,从而在使用图片艺术滤镜模型对图片添加艺术滤镜效果时,直接基于在IOS移动端部署后的图片艺术滤镜模型,在IOS移动端对获取的图片添加艺术滤镜效果;以CNN网络模型为例,具体应用过程可以如下:
IOS移动端可读取使用Torch训练好的图片艺术滤镜模型,进行预定格式的转换(如转换成二进制格式),得到转换后的图片艺术滤镜模型的各网络层的定义和目标运行参数,获取到模型定义文件model.para和模型参数文件model.bin;model.para定义了CNN形式的图片艺术滤镜模型的结构,包括图片艺术滤镜模型的各网络层的定义,model.bin文件包括了网络各层对应的目标运行参数;
IOS移动端调用在GPU中预定的CNN网络模型的框架主体,该框架主体由Layer类定义CNN网络模型的各网络层,且由Net类连接Layer类定义的各网络层;
在IOS移动端的GPU中,加载模型定义文件model.para,由Layer类实现图片艺术滤镜模型的各目标网络层,并以Net类连接各目标网络层;加载模型参数文件model.bin,在IOS移动端的GPU中,对各目标网络层进行相应网络层的目标运行参数的加载,构建出重定义的图片艺术滤镜模型;
基于在IOS移动端的GPU中重定义的图片艺术滤镜模型,IOS移动端在对一个图片添加艺术滤镜效果时,IOS移动端可根据GPU中重定义的图片艺术滤镜模型,对图片添加艺术滤镜效果。
本申请实施例可对不同训练框架训练的神经网络模型实现至终端设备部署的支持,如可支持torch,Tensorflow,caffe等学习框架训练的神经网络模型,至终端设备快捷部署,降低了神经网络模型部署的使用局限,可对不同学习框架训练的神经网络模型进行通用部署,提升了神经网络模型部署的通用性;进 一步,本申请实施例可基于IOS系统实现,不需依赖第三方库,而是可基于IOS原生的Metal(一种应用程序编程接口)和Objective-C语言(一种编程语言)编写,库的数据大小得到了极大的减少;同时,本申请实施例可支持丰富的layer,基于layer类进行网络层的自定义,对于神经网络模型的网络层具有较高的扩展性能;进一步,本申请实施例可基于GPU并行运行实现卷积层的实现优化,提升了CNN网络模型前向预测的速度。
下面对本申请实施例提供的神经网络模型部署装置进行介绍,下文描述的神经网络模型部署装置可以认为是,终端设备为实现本申请实施例提供的神经网络模型部署方法所需设置的程序模块;下文描述的神经网络模型部署装置可与上文描述的神经网络模型部署方法,相互对应参照。
图9为本申请实施例提供的神经网络模型部署装置的结构框图,该装置可应用于终端设备,尤其是可应用于IOS操作系统的终端设备中的GPU,参照图9,本申请实施例提供的神经网络模型部署装置可以包括:
读取模块100,用于读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
目标网络层实现模块200,用于分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类;
网络层连接模块300,用于以Net类连接各目标网络层;
格式转换模块400,用于将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
参数加载模块500,用于分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,一网络层的层定义包括:该网络层的名字信息,类型信息,初始化信息;相应的,目标网络层实现模块200,用于分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类,具体包括:
对于初始神经网络模型的任一网络层,以Layer类为基类,在终端设备中根据该网络层的名字信息,调整Layer类预置的名字属性,根据该网络层的类 型信息,调整Layer类预置的Layer类型属性,根据该网络层的初始化信息,由Layer类预置的初始化方法进行初始化操作,以得到各网络层相应的目标网络层,使所得到的各目标网络层均为Layer类的派生类。
可选的,参数加载模块500,用于分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型,具体包括:
分别根据各网络层的目标运行参数,以Layer类预置的加载参数方法,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,网络层连接模块300,用于以Net类连接各目标网络层,具体包括:
根据初始神经网络模型的各网络层的连接结构,以Net类连接各目标网络层,使各目标网络层的连接结构与初始神经网络模型的各网络层的连接结构相应。
可选的,图10示出了本申请实施例提供的神经网络模型部署装置的另一结构框图,结合图9和图10所示,该装置还可以包括:
Net类应用模块600,用于通过Net类预置的添加网络层方法,添加一层网络层到目标神经网络模型中,其中,所添加的网络层继承自Layer类;和/或,通过Net类预置的读取参数方法,读取并加载目标神经网络模型运行所需的参数;和/或,通过Net预置的预测方法,运行目标神经网络模型,进行前向预测。
可选的,所述初始神经网络模型为CNN网络模型;相应的,图11示出了本申请实施例提供的神经网络模型部署装置的再一结构框图,结合图9和图11所示,该装置还可以包括:
GPU并行调度模块700,用于将CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用终端设备的GPU进行并行调度,以降低卷积层实现的循环层数;
目标点增加模块800,用于增加一个循环内连续计算的目标点数量;
像素点读取模块900,用于在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
可选的,GPU并行调度模块700也可单独应用于图8或图9所示装置中。
在终端设备中得到部署好的目标神经网络模型后,本申请实施例可通过终端设备部署的目标神经网络模型,进行神经网络模型预测,如终端设备可获取输入数据,调用终端设备中部署的目标神经网络模型(如在终端设备的GPU中部署的目标神经网络模型)对输入数据进行运算,确定输入数据的分类或识别结果。
本申请实施例提供的神经网络模型部署装置可应用于终端设备,可选的,图12示出了终端设备的硬件结构,参照图12,该终端设备可以包括:至少一个图形处理器1,至少一个通信接口2,至少一个存储器3和至少一个通信总线4;
在本申请实施例中,图形处理器1、通信接口2、存储器3、通信总线4的数量为至少一个,且图形处理器1、通信接口2、存储器3通过通信总线4完成相互间的通信;
其中,存储器存储有程序,图形处理器调用存储器所存储的程序时,可实现上述所述的神经网络模型部署方法的各个步骤。
可选的,所述程序可用于:
读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类;
以Net类连接各目标网络层;
将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,所述程序的细化功能和扩展功能可参照上文相应部分。
本申请实施例还提供一种存储介质,该存储介质如存储器等,该存储介质存储有适于图形处理器执行的程序,所述程序被所述图形处理器执行时,实现上述所述的神经网络模型部署方法的各个步骤。
可选的,所述程序可用于:
读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
分别根据所述各网络层的层定义,在终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自Layer类;
以Net类连接各目标网络层;
将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在终端设备部署的目标神经网络模型。
可选的,所述程序的细化功能和扩展功能可参照上文相应部分。
本申请实施例还提供一种神经网络模型预测装置,通过本申请实施例提供的神经网络模型部署装置,将神经网络模型部署到终端设备后,由终端设备部署的神经网络模型实现神经网络模型预测;图13示出了本申请实施例提供的神经网络模型预测装置的结构框图,该装置可应用于终端设备,尤其是可应用于IOS操作系统的终端设备中的GPU,参照图13,本申请实施例提供的神经网络模型预测装置可以包括:
数据获取模块10,用于通过终端设备的数据输入装置,获取预测输入数据;
模型调用模块20,用于调用终端设备预部署的神经网络模型;
模型处理模块30,用于将预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
可选的,所述神经网络模型可以为CNN网络模型;模型处理模块30,用于由所述神经网络模型处理所述预测输入数据,可具体包括:
在处理所述预测输入数据时,将CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用终端设备的GPU进行并行调度,以降低卷积层实现的循环层数。
可选的,模型处理模块30,用于由所述神经网络模型处理所述预测输入数据,还可包括
增加一个循环内连续计算的目标点数量;
在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
可选的,本申请实施例还提供一种终端设备,该终端设备的硬件结构可如图12所示,该终端可包括:至少一个存储器和至少一个图形处理器;所述存储器存储有程序,所述图形处理器调用所述存储器存储的程序时,实现上文所述的神经网络模型预测方法的各个步骤。
可选的,所述程序可用于:
通过终端设备的数据输入装置,获取预测输入数据;
调用终端设备预部署的神经网络模型;
将预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
可选的,本申请实施例还提供一种存储介质,该存储介质存储有适于图形处理器执行的程序,所述程序用于:
通过终端设备的数据输入装置,获取预测输入数据;
调用终端设备预部署的神经网络模型;
将预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
可选的,所述程序的细化功能和扩展功能可参照上文相应部分。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的核心思想或范围的情况 下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (23)

  1. 一种神经网络模型部署方法,其特征在于,应用于终端设备中,包括:
    读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
    分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
    以Net类连接各目标网络层;
    将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
    分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
  2. 根据权利要求1所述的神经网络模型部署方法,其特征在于,一网络层的层定义包括:所述网络层的名字信息、类型信息、初始化信息;所述分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类包括:
    对于所述初始神经网络模型的任一网络层,以所述Layer类为基类,在所述终端设备中根据所述网络层的名字信息,调整所述Layer类预置的名字属性,根据所述网络层的类型信息,调整所述Layer类预置的Layer类型属性,根据所述网络层的初始化信息,由所述Layer类预置的初始化方法进行初始化操作,以得到各网络层相应的目标网络层,使所得到的各目标网络层均为所述Layer类的派生类。
  3. 根据权利要求1或2所述的神经网络模型部署方法,其特征在于,所述分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型包括:
    分别根据各网络层的目标运行参数,以所述Layer类预置的加载参数方法,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
  4. 根据权利要求1或2所述的神经网络模型部署方法,其特征在于,所述以Net类连接各目标网络层包括:
    根据所述初始神经网络模型的各网络层的连接结构,以Net类连接各目标网络层,使各目标网络层的连接结构与所述初始神经网络模型的各网络层的连接结构相应。
  5. 根据权利要求4所述的神经网络模型部署方法,其特征在于,所述方法还包括:
    通过所述Net类预置的添加网络层方法,添加一层网络层到目标神经网络模型中,其中,所添加的网络层继承自所述Layer类;
    和/或,通过所述Net类预置的读取参数方法,读取并加载目标神经网络模型运行所需的参数;
    和/或,通过所述Net预置的预测方法,运行目标神经网络模型,进行前向预测。
  6. 根据权利要求1或2所述的神经网络模型部署方法,其特征在于,所述初始神经网络模型为卷积神经网络CNN模型;所述方法还包括:
    将所述CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用终端设备的图形处理器GPU进行并行调度,以降低所述卷积层实现的循环层数。
  7. 根据权利要求6所述的神经网络模型部署方法,其特征在于,所述方法还包括:
    增加一个循环内连续计算的目标点数量;
    在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
  8. 一种神经网络模型预测方法,其特征在于,应用于终端设备中,包括:
    通过所述终端设备的数据输入装置,获取预测输入数据;
    调用所述终端设备预部署的神经网络模型;
    将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
  9. 根据权利要求8所述的神经网络模型预测方法,其特征在于,所述神经网络模型为卷积神经网络CNN模型;所述由所述神经网络模型处理所述预测输入数据包括:
    在处理所述预测输入数据时,将所述CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用所述终端设备的GPU进行并行调度,以降低所述卷积层实现的循环层数。
  10. 根据权利要求9所述的神经网络模型预测方法,其特征在于,所述由所述神经网络模型处理所述预测输入数据还包括:
    增加一个循环内连续计算的目标点数量;
    在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
  11. 一种神经网络模型部署装置,其特征在于,应用于终端设备中,包括:
    读取模块,用于读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
    目标网络层实现模块,用于分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
    网络层连接模块,用于以Net类连接各目标网络层;
    格式转换模块,用于将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
    参数加载模块,用于分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
  12. 一种神经网络模型预测装置,其特征在于,应用于终端设备中,包括:
    数据获取模块,用于通过所述终端设备的数据输入装置,获取预测输入数据;
    模型调用模块,用于调用所述终端设备预部署的神经网络模型;
    模型处理模块,用于将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
  13. 一种终端设备,其特征在于,包括:至少一个存储器和至少一个图形处理器;所述存储器存储有程序,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    读取初始神经网络模型,得到所述初始神经网络模型的各网络层的层定义,及各网络层的运行参数;
    分别根据所述各网络层的层定义,在所述终端设备中由Layer类实现各网络层相应的目标网络层,使各目标网络层均继承自所述Layer类;
    以Net类连接各目标网络层;
    将各网络层的运行参数转换为预定格式,得到各网络层的目标运行参数;
    分别根据各网络层的目标运行参数,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
  14. 根据权利要求13所述的终端设备,其特征在于,一网络层的层定义包括:所述网络层的名字信息、类型信息、初始化信息;所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    对于所述初始神经网络模型的任一网络层,以所述Layer类为基类,在所述终端设备中根据所述网络层的名字信息,调整所述Layer类预置的名字属性,根据所述网络层的类型信息,调整所述Layer类预置的Layer类型属性,根据所述网络层的初始化信息,由所述Layer类预置的初始化方法进行初始化操作,以得到各网络层相应的目标网络层,使所得到的各目标网络层均为所述Layer类的派生类。
  15. 根据权利要求13或14所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    分别根据各网络层的目标运行参数,以所述Layer类预置的加载参数方法,在各网络层相应的目标网络层中加载相应的目标运行参数,得到在所述终端设备部署的目标神经网络模型。
  16. 根据权利要求13或14所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    根据所述初始神经网络模型的各网络层的连接结构,以Net类连接各目标网络层,使各目标网络层的连接结构与所述初始神经网络模型的各网络层的连接结构相应。
  17. 根据权利要求16所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    通过所述Net类预置的添加网络层方法,添加一层网络层到目标神经网络模型中,其中,所添加的网络层继承自所述Layer类;
    和/或,通过所述Net类预置的读取参数方法,读取并加载目标神经网络模型运行所需的参数;
    和/或,通过所述Net预置的预测方法,运行目标神经网络模型,进行前向预测。
  18. 根据权利要求13或14所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    将所述CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用终端设备的图形处理器GPU进行并行调度,以降低所述卷积层实现的循环层数。
  19. 根据权利要求18所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    增加一个循环内连续计算的目标点数量;
    在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
  20. 一种存储介质,其特征在于,所述存储介质存储有适于图形处理器执行的程序,所述程序被所述图形处理器执行时,实现如权利要求1-7任一项所述的神经网络模型部署方法的各个步骤。
  21. 一种终端设备,其特征在于,包括:至少一个存储器和至少一个图形处理器;所述存储器存储有程序,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    通过所述终端设备的数据输入装置,获取预测输入数据;
    调用所述终端设备预部署的神经网络模型;
    将所述预测输入数据作为所述神经网络模型的输入,由所述神经网络模型处理所述预测输入数据,得到预测结果。
  22. 根据权利要求21所述的终端设备,其特征在于,所述神经网络模型为卷积神经网络CNN模型;所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    在处理所述预测输入数据时,将所述CNN网络模型的卷积层的宽度,高度和输出通道三个维度使用所述终端设备的GPU进行并行调度,以降低所述卷积层实现的循环层数。
  23. 根据权利要求22所述的终端设备,其特征在于,所述图形处理器调用所述存储器存储的程序时,实现以下操作:
    增加一个循环内连续计算的目标点数量;
    在计算第一个目标点时,读取第一个目标点相应的全部像素点,在计算非第一个目标点时,复用前一次计算的目标点中相同的像素点,并重新读取与前一次计算的目标点不同的像素点。
PCT/CN2018/116958 2017-12-13 2018-11-22 一种神经网络模型部署方法、预测方法及设备 WO2019114517A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18887768.2A EP3614316A4 (en) 2017-12-13 2018-11-22 METHOD OF APPLYING A MODEL OF NEURAL NETWORK, PREDICTION METHOD AND DEVICE
US16/659,888 US12020142B2 (en) 2017-12-13 2019-10-22 Neural network model deployment method, prediction method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711330928.6A CN109919308B (zh) 2017-12-13 2017-12-13 一种神经网络模型部署方法、预测方法及相关设备
CN201711330928.6 2017-12-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/659,888 Continuation US12020142B2 (en) 2017-12-13 2019-10-22 Neural network model deployment method, prediction method and related device

Publications (1)

Publication Number Publication Date
WO2019114517A1 true WO2019114517A1 (zh) 2019-06-20

Family

ID=66818916

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116958 WO2019114517A1 (zh) 2017-12-13 2018-11-22 一种神经网络模型部署方法、预测方法及设备

Country Status (4)

Country Link
US (1) US12020142B2 (zh)
EP (1) EP3614316A4 (zh)
CN (1) CN109919308B (zh)
WO (1) WO2019114517A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3767548A1 (en) * 2019-07-03 2021-01-20 Nokia Technologies Oy Delivery of compressed neural networks
EP3767549A1 (en) * 2019-07-03 2021-01-20 Nokia Technologies Oy Delivery of compressed neural networks
CN113469328A (zh) * 2021-06-24 2021-10-01 上海寒武纪信息科技有限公司 执行转数穿过的装置、板卡、方法及可读存储介质
EP4024875A4 (en) * 2019-08-30 2022-10-26 Sony Group Corporation RECEIVING DEVICE, RECEIVING METHOD AND TRANSMISSION DEVICE AND TRANSMISSION METHOD
CN116128046A (zh) * 2023-04-14 2023-05-16 杭州国芯科技股份有限公司 嵌入式设备的多输入神经网络模型串行块的存储方法

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583287B (zh) 2017-09-29 2024-04-12 浙江莲荷科技有限公司 实物识别方法及验证方法
CN109919308B (zh) 2017-12-13 2022-11-11 腾讯科技(深圳)有限公司 一种神经网络模型部署方法、预测方法及相关设备
CN108268619B (zh) 2018-01-08 2020-06-30 阿里巴巴集团控股有限公司 内容推荐方法及装置
CN108446817B (zh) 2018-02-01 2020-10-02 阿里巴巴集团控股有限公司 确定业务对应的决策策略的方法、装置和电子设备
CN110569856B (zh) 2018-08-24 2020-07-21 阿里巴巴集团控股有限公司 样本标注方法及装置、损伤类别的识别方法及装置
CN110570316A (zh) 2018-08-31 2019-12-13 阿里巴巴集团控股有限公司 训练损伤识别模型的方法及装置
CN110569696A (zh) 2018-08-31 2019-12-13 阿里巴巴集团控股有限公司 用于车辆部件识别的神经网络系统、方法和装置
CN110569864A (zh) 2018-09-04 2019-12-13 阿里巴巴集团控股有限公司 基于gan网络的车损图像生成方法和装置
CN110263731B (zh) * 2019-06-24 2021-03-16 电子科技大学 一种单步人脸检测系统
CN110458285B (zh) * 2019-08-14 2021-05-14 中科寒武纪科技股份有限公司 数据处理方法、装置、计算机设备和存储介质
US11556450B2 (en) * 2019-10-11 2023-01-17 International Business Machines Corporation Hybrid data-model parallelism for efficient deep learning
CN110942139A (zh) * 2019-11-22 2020-03-31 深圳市魔数智擎人工智能有限公司 深度学习神经网络部署系统及其方法
CN110852449B (zh) * 2019-11-25 2023-11-14 北京百度网讯科技有限公司 模型迁移方法和电子设备
CN110955470B (zh) * 2019-12-06 2024-01-19 深圳前海环融联易信息科技服务有限公司 算法模型接口化方法、装置、计算机设备及存储介质
CN111242286A (zh) * 2020-01-14 2020-06-05 Oppo广东移动通信有限公司 一种数据格式变换方法、装置及计算机可读存储介质
CN111222637B (zh) * 2020-01-17 2023-11-28 上海商汤智能科技有限公司 神经网络模型部署方法及装置、电子设备和存储介质
CN111290762B (zh) * 2020-01-19 2023-05-12 深圳云天励飞技术有限公司 一种深度学习网络的部署方法、装置及终端设备
CN111340215B (zh) * 2020-02-21 2024-05-31 平安科技(深圳)有限公司 一种网络模型推理加速方法、装置、存储介质和智能设备
CN111652351A (zh) * 2020-05-09 2020-09-11 济南浪潮高新科技投资发展有限公司 一种神经网络模型的部署方法、设备及介质
CN113743598B (zh) * 2020-05-27 2023-08-04 杭州海康威视数字技术股份有限公司 一种ai芯片的运行方式的确定方法和装置
CN111753948B (zh) * 2020-06-23 2022-11-01 展讯通信(上海)有限公司 模型处理方法及相关设备
CN112115974B (zh) * 2020-08-18 2024-04-09 郑州睿如信息技术有限公司 一种城市垃圾分类处理智能视觉检测方法
CN111966361B (zh) * 2020-09-25 2024-04-05 北京百度网讯科技有限公司 用于确定待部署模型的方法、装置、设备及其存储介质
EP3975060A1 (en) * 2020-09-29 2022-03-30 Samsung Electronics Co., Ltd. Method and apparatus for analysing neural network performance
CN112016665B (zh) * 2020-10-20 2021-04-06 深圳云天励飞技术股份有限公司 计算神经网络在处理器上运行时间的方法及装置
CN112749626B (zh) * 2020-12-10 2022-09-13 同济大学 一种面向dsp平台的快速人脸检测与识别方法
CN113570030B (zh) * 2021-01-18 2024-05-10 腾讯科技(深圳)有限公司 数据处理方法、装置、设备以及存储介质
CN113301127B (zh) * 2021-05-07 2022-06-14 淮阴工学院 一种牲畜饲料检测系统
CN116151352B (zh) * 2023-04-13 2024-06-04 中浙信科技咨询有限公司 基于大脑信息通路整合机制的卷积循环神经网络诊断方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104964719A (zh) * 2015-06-30 2015-10-07 安徽工业大学 一种基于bp神经网络的户用电子水表流量计量方法
CN105205558A (zh) * 2015-09-23 2015-12-30 南京磐能电力科技股份有限公司 一种面向建筑能耗预测的bp神经网络模型配置方法
CN106295803A (zh) * 2016-08-10 2017-01-04 中国科学技术大学苏州研究院 深度神经网络的构建方法
US20170068889A1 (en) * 2015-09-04 2017-03-09 Baidu Usa Llc Systems and methods for efficient neural network deployments

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5142665A (en) * 1990-02-20 1992-08-25 International Business Machines Corporation Neural network shell for application programs
US5235673A (en) * 1991-04-18 1993-08-10 International Business Machines Corporation Enhanced neural network shell for application programs
US20070061277A1 (en) * 2003-09-05 2007-03-15 International Business Machines Corporation Method, system, and storage medium for providing dynamic deployment of grid services over a computer network
US10614354B2 (en) * 2015-10-07 2020-04-07 Altera Corporation Method and apparatus for implementing layers on a convolutional neural network accelerator
KR102161902B1 (ko) * 2016-03-31 2020-10-05 후지쯔 가부시끼가이샤 신경망 모델에 대한 훈련 방법, 장치 및 전자 장치
CN107329936A (zh) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 一种用于执行神经网络运算以及矩阵/向量运算的装置和方法
CN107341761A (zh) * 2017-07-12 2017-11-10 成都品果科技有限公司 一种深度神经网络的计算执行方法和系统
CN109919308B (zh) 2017-12-13 2022-11-11 腾讯科技(深圳)有限公司 一种神经网络模型部署方法、预测方法及相关设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104964719A (zh) * 2015-06-30 2015-10-07 安徽工业大学 一种基于bp神经网络的户用电子水表流量计量方法
US20170068889A1 (en) * 2015-09-04 2017-03-09 Baidu Usa Llc Systems and methods for efficient neural network deployments
CN105205558A (zh) * 2015-09-23 2015-12-30 南京磐能电力科技股份有限公司 一种面向建筑能耗预测的bp神经网络模型配置方法
CN106295803A (zh) * 2016-08-10 2017-01-04 中国科学技术大学苏州研究院 深度神经网络的构建方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3614316A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3767548A1 (en) * 2019-07-03 2021-01-20 Nokia Technologies Oy Delivery of compressed neural networks
EP3767549A1 (en) * 2019-07-03 2021-01-20 Nokia Technologies Oy Delivery of compressed neural networks
EP4024875A4 (en) * 2019-08-30 2022-10-26 Sony Group Corporation RECEIVING DEVICE, RECEIVING METHOD AND TRANSMISSION DEVICE AND TRANSMISSION METHOD
CN113469328A (zh) * 2021-06-24 2021-10-01 上海寒武纪信息科技有限公司 执行转数穿过的装置、板卡、方法及可读存储介质
CN113469328B (zh) * 2021-06-24 2024-03-19 上海寒武纪信息科技有限公司 执行转数穿过的装置、板卡、方法及可读存储介质
CN116128046A (zh) * 2023-04-14 2023-05-16 杭州国芯科技股份有限公司 嵌入式设备的多输入神经网络模型串行块的存储方法

Also Published As

Publication number Publication date
CN109919308A (zh) 2019-06-21
EP3614316A4 (en) 2021-01-27
EP3614316A1 (en) 2020-02-26
US12020142B2 (en) 2024-06-25
US20200050939A1 (en) 2020-02-13
CN109919308B (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
WO2019114517A1 (zh) 一种神经网络模型部署方法、预测方法及设备
US11205118B2 (en) Power-efficient deep neural network module configured for parallel kernel and parallel input processing
CN110390387B (zh) 对深度学习应用所用资源进行评估
US11087203B2 (en) Method and apparatus for processing data sequence
WO2022042113A1 (zh) 数据处理方法、装置、电子设备及存储介质
WO2019184888A1 (zh) 一种基于卷积神经网络的图像处理的方法和装置
TWI775210B (zh) 用於卷積運算的資料劃分方法及處理器
CN107578375B (zh) 图像处理方法及装置
CN110689136A (zh) 一种深度学习模型获得方法、装置、设备及存储介质
CN111617480A (zh) 一种点云渲染方法及装置
EP4040378A1 (en) Burst image-based image restoration method and apparatus
EP3971781A1 (en) Method and apparatus with neural network operation
CN108389153A (zh) 一种视图加载的方法及终端设备
CN108648136B (zh) 对二维查找表进行压缩的方法及装置
CN115829000A (zh) 数据处理方法、装置、电子设备及存储介质
JP2021103519A (ja) 行動認識のための時空間の平滑化フィーチャを正規化する方法およびシステム
CN107247944A (zh) 基于深度学习的人脸检测速度优化方法及装置
US20240054618A1 (en) Method and apparatus with image processing based on a neural network
CN114727082B (zh) 图像处理装置、图像信号处理器、图像处理方法和介质
CN112163985B (zh) 图像处理方法、装置、存储介质及电子设备
WO2023217270A1 (zh) 图像超分方法、超分网络参数调整方法、相关装置及介质
US20230281458A1 (en) Method and system for reducing complexity of a processing pipeline using feature-augmented training
Yuan et al. Image Inpainting with Semantic U-Transformer
CN111340215A (zh) 一种网络模型推理加速方法、装置、存储介质和智能设备
CN116893851A (zh) 指令生成方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18887768

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018887768

Country of ref document: EP

Effective date: 20191119

NENP Non-entry into the national phase

Ref country code: DE