CN110766145A - Learning task compiling method of artificial intelligence processor and related product - Google Patents

Learning task compiling method of artificial intelligence processor and related product Download PDF

Info

Publication number
CN110766145A
CN110766145A CN201911296833.6A CN201911296833A CN110766145A CN 110766145 A CN110766145 A CN 110766145A CN 201911296833 A CN201911296833 A CN 201911296833A CN 110766145 A CN110766145 A CN 110766145A
Authority
CN
China
Prior art keywords
layer
neural network
parameter
processor
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911296833.6A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Publication of CN110766145A publication Critical patent/CN110766145A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The present application relates to a method, apparatus, storage medium, and system for compiling a learning task for an artificial intelligence processor. The method comprises the following steps: the method comprises the steps of firstly fusing a redundant neural network layer into a convolutional layer to optimize a convolutional neural network structure, and then compiling a learning task of the artificial intelligence processor based on the optimized convolutional neural network. The learning task compiling efficiency of the artificial intelligence processor compiled by the method is high, and the data exchange in the processing process can be reduced when the artificial intelligence processor is executed on equipment.

Description

Learning task compiling method of artificial intelligence processor and related product
Related applications:
the application is filed in 2018 on 29.12.8, application number is 201811639927.4, and the invention is named as priority of a method, a device, a storage medium and a system for optimizing a convolutional neural network.
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to a learning task compiling method for an artificial intelligence processor and a related product.
Background
When the artificial intelligence processor runs the neural network, generally, a general purpose processor (CPU) is first required to compile the neural network including operators of the neural network to obtain an executable file, where the executable file includes device information, that is, on which device in the heterogeneous computer system the executable file needs to be executed. The executable program of the neural network can be obtained after the executable file is assembled and linked, and the executable program is stored.
The CPU may read the executable program from a storage location of the executable program and obtain a plurality of tasks of the program from the executable program. The tasks are distributed to the artificial intelligent processor to be executed, and finally, an operation result is obtained.
Generally, there are a large number of operators in the neural network, and the artificial intelligence processor, when executing the operation logic of these operators, will usually loop through the following steps:
and reading the operation result of the previous operator from the off-chip cache, executing the operation task of the current operator based on the operation result of the previous operator, and writing the operation result of the operation task into the off-chip cache after the operation task of the current operator is executed.
Therefore, when the device executes the operation task of the neural network, data exchange is needed once every time the related operation of one operator is executed. This not only results in reduced data processing efficiency, but also occupies inter-chip communication resources.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a learning task compiling method for an artificial intelligence processor and a related product.
A method of compiling a learning task for an artificial intelligence processor, the method comprising:
a general processor acquires configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a Scale layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor;
the general processor fuses the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result;
the general processor fuses the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network to obtain a second fusion result;
optimizing the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network;
and compiling the optimized convolutional neural network to obtain a corresponding binary instruction sequence so as to be distributed to the artificial intelligent processor to execute a corresponding learning task.
In one embodiment, the fusing, by the general processor, the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result, including:
and the general processor performs multiplication operation on the first training parameter of the Scale layer and the weight parameter of the convolution layer to obtain the first fusion result.
In one embodiment, the general purpose processor fuses the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network to obtain a second fusion result, including:
and the general processor performs addition operation on the second training parameter of the Scale layer and the bias parameter of the convolution layer to obtain a second fusion result.
In one embodiment, the optimizing the convolutional neural network by the general processor according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network includes:
and deleting the Scale layer by the general processor, changing the weight parameter of the convolution layer into the first fusion result, and changing the bias parameter of the convolution layer into the second fusion result.
In one embodiment, the general-purpose processor performs convolution calculation on input data of the convolutional layer and the first fusion result and the second fusion result respectively to obtain an output result of the convolutional layer.
In one embodiment, the performing, by the general-purpose processor, convolution calculation on the input data of the convolutional layer and the first fusion result and the second fusion result respectively to obtain an output result of the convolutional layer includes:
the general processor performs multiplication operation on the input data and the first fusion result to obtain a first operation result;
and the general processor performs addition operation on the first operation result and the second fusion result to obtain the output result.
In one embodiment, the first training parameters of the Scale layer include at least one first training sub-parameter for performing convolution calculation of the Scale layer; the second training parameters of the Scale layer comprise at least one second training sub-parameter for performing convolution calculations of the Scale layer.
A method of compiling a learning task for an artificial intelligence processor, the method comprising:
a general processor acquires configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a convolutional neural network corresponding to a learning task of the artificial intelligence processor;
the general processor fuses the first training parameter and the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result;
the general processor fuses the second training parameters with the bias parameters of the convolutional layer of the convolutional neural network to obtain a second fusion result;
optimizing the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network;
and compiling the optimized convolutional neural network to obtain corresponding binary instructions so as to distribute the binary instructions to an artificial intelligent processor to execute corresponding learning tasks.
A system for compiling a learning task for an artificial intelligence processor, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any embodiment of the disclosure when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the embodiments of the disclosure.
For the technical scheme, the redundant neural network layer in the convolutional neural network is deleted firstly, the network structure is simplified, and then the simplified convolutional network structure is compiled. Since the optimized neural network fuses the Batch Norm layer and/or the Scale layer into the convolutional layer, operators in the Batch Norm layer and/or the Scale layer are fused into convolutional operators in the convolutional layer, that is, the number of the operators in the fused neural network is reduced. In addition, when the learning task corresponding to the neural network is executed, data exchange needs to be performed once each time the related operation of one operator is executed, so that the neural network is compiled by the method for compiling the learning task of the artificial intelligence processor, and the data exchange in the processing process can be reduced when the neural network is executed on equipment.
Drawings
FIG. 1 is a block diagram of a general purpose processor 100 in one embodiment;
FIG. 2 is a flowchart illustrating step S110 according to an embodiment;
FIG. 3 is a schematic diagram of two-layer network architecture optimization in one embodiment;
FIG. 4 is a flowchart illustrating step S110 according to another embodiment;
FIG. 5 is a schematic diagram of two-layer network architecture optimization in another embodiment;
FIG. 6 is a flowchart illustrating step S110 according to another embodiment;
FIG. 7 is a schematic diagram of a three-tier network architecture optimization in one embodiment;
FIG. 8 is a flowchart illustrating step S110 according to another embodiment;
FIG. 9 is a flowchart illustrating step S110 according to another embodiment;
FIG. 10 is a block diagram of a learning task for compiling an artificial intelligence processor in one embodiment;
FIG. 11 is a block diagram of a computer system according to an embodiment;
fig. 12 is a schematic flow chart of a neural network processing method according to an embodiment;
FIG. 13 is a flowchart illustrating a method for compiling a learning task for an artificial intelligence processor, according to an embodiment;
fig. 14 is a block diagram of a task scheduling apparatus according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The terms "first," "second," and "third," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Fig. 11 is a schematic structural diagram of a computer system according to an embodiment of the present application. The computer system is a heterogeneous computer system. The general-purpose processor 100(CPU) may compile the computer program code to obtain an executable file. The general-purpose processor 100 may also execute the computer instructions of the general-purpose processor. The artificial Intelligence Processor (IPU)300 may be an accelerated Processing unit (apu), a graphics Processing unit (gpu), a neural Network Processing Unit (NPU), or other artificial intelligence processors. The artificial intelligence processor can execute computer instructions of the artificial intelligence processor.
Alternatively, compiling the computer program code by the general-purpose processor 100 generally compiles the general-purpose processor code and the artificial intelligence processor code separately to obtain executable files for the general-purpose processor 100 and the artificial intelligence processor 200. The executable file contains device information. The resulting executable files for general purpose processor 100 and artificial intelligence processor 200 are assembled and linked to produce a sequence of binary instructions (program executable).
The general-purpose processor 100 may derive a plurality of tasks from a sequence of binary instructions. The tasks are distributed to a general processor and an artificial intelligence processor for execution, and finally, the output result of the program is obtained.
The off-chip cache 300 is used for storing data in the program running process, for example, the execution result of each task may be stored.
Specifically, the general-purpose processor 100 as shown in fig. 1 includes: a memory 110 and a plurality of processor cores 120, the memory 110 having stored thereon instructions executable by the processor cores 120; the memory 110 may perform on-chip storage or off-chip storage; the respective processor cores may communicate via an internal bus. The processor core can execute the task of optimizing and compiling the structure of the neural network.
Further, in order to determine which processor core the tasks obtained from the binary instruction sequences execute on, and the execution order of the tasks, in an optional embodiment, the heterogeneous computer system may further be connected to a task scheduling device 400. The task scheduling means may schedule a plurality of tasks into which a binary instruction sequence obtained by the general-purpose processor is divided. The scheduling process may include: according to basic information (such as information of type, size, dependency relationship and the like) of the tasks, planning and splitting the tasks to obtain decomposition information of the tasks, namely obtaining a scheme of how to split the tasks to obtain the operation of the tasks; and scheduling the task to obtain scheduling information, namely obtaining the execution processor and/or the processor core of each task. The operation results of the corresponding tasks can be obtained after the operation is executed.
As shown in fig. 14, the task scheduling device 400 may include a first read/write circuit 410, a matching circuit 420, and a selection circuit 430, where the first read/write circuit 410, the matching circuit 420, and the selection circuit 430 are electrically connected in sequence, and the selection circuit 430 is connected to the artificial intelligence processor. The task scheduling device 400 may process the decomposition information and the entire task information of the task to obtain scheduling information used for the artificial intelligence processor to determine the job to be processed and the processing order of the jobs to be processed by the artificial intelligence processor. The scheduling information may include job identification for a plurality of jobs, artificial intelligence processor identity information corresponding to each job, and bit order information for the plurality of jobs to be processed by the corresponding artificial intelligence processor. Optionally, the artificial intelligence processor 200 may include a plurality of processor cores and a control device for controlling the operation of the processor cores. Optionally, a plurality of processor cores are each connected to the control device.
Specifically, the first read/write circuit 410 is configured to, when receiving a task scheduling request of a task (e.g., a learning task of an artificial intelligence processor), obtain decomposition information and overall task information of the task, and state information of the processor according to the task scheduling request of the task. Alternatively, the second read-write control circuit may be an I/O circuit.
The matching circuit 420 is configured to match each job of the task with the processor core, respectively, according to the decomposition information and all task information of each task and the state information of the processor core of the artificial intelligence processor, and add the job successfully matched with the processor core to the job set to be scheduled. The job set to be scheduled may include jobs of a plurality of tasks. Further, if more than one job in the task is not successfully matched with the processor core within a preset time (such as 128 beats or 256 beats), a scheduling failure signal of the task is obtained.
Specifically, the matching circuit 420 may obtain, according to the entire task information and the task resolution information of the task, processor core information (such as processor core type) required by each job of the task, and obtain, according to the size of each job, information such as processing capability of the processor core required by each job. The processor state information of the processor core may include type information of the processor core, operation state information of the processor core (whether the processor core is idle), and processing capability of the processor core. In this way, the matching circuit 120 can match each task of the task with the processor core based on the overall task information and task resolution information for the task, and the processor core state information. Alternatively, the matching circuit 420 may be formed by connecting more than one comparator in parallel, the input data of each comparator may be the decomposition information and the whole task information of each job, and the state information of the processor core, and the output data of the comparator may be a signal of matching success or matching failure. Further, if the job is successfully matched with the processor core, the matching circuit may further obtain information such as an identifier of the processor core matched with the job, where the identifier is used to identify an identity of the processor core (e.g., a processor core number, etc.).
The selection circuit 430 is configured to select a target job from the job set to be scheduled according to a target weight of each job in the job set to be scheduled, and obtain scheduling information. Specifically, the task scheduling device 400 may send the multiple jobs in the job set to be scheduled to the processor core one by one for processing, and the selection circuit 430 determines the target job to be currently scheduled according to the target weight of each job in the job set to be scheduled. The target weight of each job in the job set to be scheduled may be obtained by calculation, and of course, the target weight of each job in the job set to be scheduled may also be preset.
Alternatively, in one embodiment, the selection circuit 430 may include an operator, which may be coupled to the matching circuit 420, and a selector coupled to the operator, which may be coupled to the artificial intelligence processor 200. The arithmetic unit is used for determining the scheduling priority of each job according to the target weight of each job in the job set to be scheduled and the target weight of each job in the job set to be scheduled, namely the arithmetic unit can sequence each job according to the target weight of each job in the job set to be scheduled to obtain the scheduling priority of each job. The selector is used for taking the job with the highest scheduling priority in the job set to be scheduled as the target job according to the scheduling priority of each job and obtaining scheduling information. The job with the highest scheduling priority may be the job with the largest target weight, that is, the target job is the job with the largest target weight in the job set to be scheduled. Therefore, the task with the maximum target weight is scheduled preferentially, so that the target task can preempt the processor resource preferentially, and the task scheduling process can be optimized.
In one embodiment, the number of the job sets to be scheduled is more than one, each job set to be scheduled is used for storing jobs of the same job category, and the job category of each job may be the same as the task category to which the job belongs. In particular, the selection circuit comprises an operator, which may be connected to the matching circuit, and a selector, which may be connected to the second processor. The arithmetic unit is used for determining the target weight of each job in the job set to be scheduled corresponding to each job category according to the expected weight and the current historical weight of a plurality of jobs in the job set to be scheduled corresponding to each job category, and taking the job with the maximum target weight in the job set to be scheduled corresponding to the job category as the pre-emission job of the job category. The selector is used for determining the target operation according to the target weight of each pre-emission operation and obtaining the scheduling information.
In one embodiment, as shown in fig. 12, a neural network processing method is proposed, and the implementation of the method will be specifically described below by taking an example of applying the convolutional neural network processing method to the computer system shown in fig. 11. The method comprises the following steps:
and step S11, compiling the convolutional neural network of the learning task of the artificial intelligent processor to obtain a corresponding binary instruction sequence.
And step S12, distributing the binary instructions to the artificial intelligence processor so that the artificial intelligence processor executes the corresponding learning task.
The convolutional neural network is a neural network, and the neural network comprises a plurality of neural network layers, and operators are arranged on each neural network layer. The operator is a mapping of function space to function space O: x → X, an operator, indicates what operation/operation needs to be performed. Operators in the neural network are connected through the weights to form the structure of the neural network. Specifically, the general-purpose processor 100 obtains a binary instruction sequence after compiling the convolutional neural network. Optionally, the instructions in the binary instruction sequence are computer instructions of an artificial intelligence processor. The general purpose processor will get the corresponding learning task according to the binary instruction sequence. Further, since the instructions in the binary instruction sequence are computer instructions of the artificial intelligence processor, the general-purpose processor distributes the learning task to the devices on the artificial intelligence processor for execution, and a processing result of the convolutional neural network is obtained.
In one embodiment, when step S11 is executed, the neural network may be compiled by a method of compiling a learning task of the artificial intelligence processor of fig. 13, and the implementation of the method will be specifically described below by taking an example of applying the compiling method of the convolutional neural network to the computer system shown in fig. 11. The method comprises the following steps:
and step S110, optimizing the structure of the convolutional neural network corresponding to the learning task of the artificial intelligent processor to obtain the optimized convolutional neural network.
And step S120, compiling the optimized convolutional neural network to obtain a binary execution sequence.
As shown in fig. 2, the step S110, that is, the optimization process for the convolutional neural network, may specifically include:
at step 202, the general purpose processor obtains configuration parameters. Wherein the configuration parameters include a first training parameter and a second training parameter of the Batch Norm layer. Specifically, a first training parameter and a second training parameter for performing convolution calculations of the Batch Norm layer may be obtained under the Caffe framework. Optionally, the Batch Norm layer contains a Batch Norm operator, which indicates that a Batch Norm operation is required. Further, Caffe refers to a convolutional neural network framework, a commonly used deep learning framework. The Caffe source code file supports configuration and modification, that is, the model can be redefined and optimized in the Caffe configuration process. The Caffe framework refers to a mathematical model obtained by training using a machine learning algorithm.
And 204, fusing the first training parameter of the Batch Norm layer and the weight parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a first fusion result. Specifically, the first training parameter of the BatchNorm layer obtained in step 202 and the weight parameter of the convolutional layer may be subjected to fusion calculation to obtain a first fusion result. In an alternative embodiment, the first training parameter of the Batch Norm layer comprises at least one first training sub-parameter for performing a convolution calculation of the Batch Norm layer. Specifically, if the first training parameter of the Batch Norm layer includes a plurality of first training sub-parameters, all the first training sub-parameters of the Batch Norm layer and the weight parameters of the convolutional layer are subjected to fusion calculation.
And step 206, fusing the second training parameter of the Batch Norm layer with the bias parameter of the convolution layer of the convolution neural network by the general processor to obtain a second fusion result. Specifically, the second training parameter of the BatchNorm layer obtained in step 202 and the bias parameter of the convolutional layer may be subjected to fusion calculation to obtain a second fusion result. In an alternative embodiment, the second training parameter of the Batch Norm layer comprises at least one second training sub-parameter for performing a convolution calculation of the Batch Norm layer. Specifically, if the second training parameter of the Batch Norm layer includes a plurality of second training sub-parameters, all the second training sub-parameters of the Batch Norm layer and the bias parameters of the convolutional layer are subjected to the fusion calculation.
And 208, optimizing the convolutional neural network by the general processor according to the first fusion result and the second fusion result to obtain the optimized convolutional neural network. Specifically, the optimization of the convolutional neural network may be completed according to the first fusion result obtained in step 204 and the second fusion result obtained in step 206.
In the convolutional neural network optimization method, the computation process of the BatchNorm layer is fused into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the redundant neural network layer is deleted after network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
The method for compiling the learning task of the artificial intelligence processor based on the convolutional neural network optimization method comprises the steps of deleting redundant neural network layers in the convolutional neural network, simplifying a network structure, and compiling the simplified convolutional network structure. Since the optimized neural network fuses the Batch Norm layer into the convolutional layer, the Batch Norm operator in the Batch Norm layer is also fused into the convolution operator in the convolutional layer, that is, the number of operators in the fused neural network is reduced. In addition, when the learning task corresponding to the neural network is executed, data exchange needs to be performed once each time the related operation of one operator is executed, so that the neural network is compiled by the method for compiling the learning task of the artificial intelligence processor, and the data exchange in the processing process can be reduced when the neural network is executed on equipment.
In the neural network processing method in this embodiment, first, convolutional nerves in a neural network are optimized, redundant neural network layers in the convolutional neural network are deleted, a network structure is simplified, and then the optimized convolutional neural network is compiled to obtain a binary instruction sequence. When the binary instruction sequence is divided into a plurality of learning tasks to be executed, because operators in the convolutional neural network corresponding to the binary instruction sequence are fused, namely the number of the operators is reduced, and because the heterogeneous computer system needs to perform data exchange with an off-chip cache once after the related operation of one operator is executed. Therefore, the technical scheme can reduce the data exchange times with the off-chip cache. In one embodiment, the first training parameter of the Batch Norm layer and the weight parameter of the convolutional layer are multiplied to obtain the first fusion result.
In this case, the two-layer continuous structure including the Convolution layer contribution and the Batch Norm layer as shown in fig. 3 may be optimized to be a layer contribution, i.e., the calculation of the Batch Norm layer is fused to the contribution layer, so that the Batch Norm layer may be deleted.
The Batch Norm layer is mainly used for normalizing input data, i.e.
Figure BDA0002320795570000091
Wherein x represents input data of the Batch Norm layer; x is the number ofnormOutput data of the Batch Norm layer after normalization processing is shown; μ represents the mean of cumulative calculations; σ denotes the variance of the cumulative calculation.
The normalization processing is mainly to simplify the data processing process, and can map the input data into the interval of [0,1] or [ -1,1], transform the dimensional expression into a dimensionless expression to become a scalar, so that indexes of different units or orders can be compared and weighted conveniently, thereby enabling the data processing to be more convenient and faster.
For example, the calculation formula of the Batch Norm layer is
Figure BDA0002320795570000092
After simplified processing, formula (1) is obtained. Please refer to formula (1):
the training process of the Batch Norm layer is to extract mini-Batch samples from the total samples to perform multiple forward training, and update the calculation parameters in the Caffe framework in a moving average mode.
In one embodiment, if the first training parameter of the Batch Norm layer includes a plurality of first training sub-parameters, the plurality of first training sub-parameters are operated to obtain a first intermediate operation result; and fusing the first intermediate operation result with the weight parameter of the convolutional layer to obtain the first fusion result.
Specifically, to fuse the calculation process of the Batch Norm layer into the contribution layer, a first training parameter for performing Convolution calculation of the Batch Norm layer and a second training parameter for performing Convolution calculation of the Batch Norm layer may be acquired. Referring to formula (1), the first training parameter for performing the convolution calculation of the Batch Norm layer includes a plurality of first training sub-parameters (alpha, var, scale), and the second training parameter for performing the convolution calculation of the Batch Norm layer includes a plurality of second training sub-parameters (alpha, mean, var, scale, beta).
Wherein the first training parameter or the second training parameter (alpha, mean, var, scale, beta) is a vector. More than one first training parameter or second training parameter per class in the Caffe framework, for example: if a plurality of first training subparameters alpha in the Caffe framework exist, all the alpha in the Caffe framework are obtained.
Specifically, please refer to formula (1), a plurality of first training sub-parameters (alpha, var, scale) in the first training parameter are multiplied by the weights of the constraint layer, that is, the plurality of first training sub-parameters are operated to obtain a first intermediate operation result
Figure BDA0002320795570000102
Multiplying the weight weights of the constraint layer to obtain a first fusion result
Figure BDA0002320795570000103
In one embodiment, the general purpose processor adds the second training parameter of the Batch Norm layer and the bias parameter of the convolutional layer to obtain the second fusion result.
In one embodiment, if the second training parameter of the Batch Norm layer includes a plurality of second training sub-parameters, the plurality of second training sub-parameters are operated to obtain a second intermediate operation result; and fusing the second intermediate operation result with the bias parameters of the convolutional layer to obtain a second fusion result.
For example, please refer to formula (1) again, the multiple second training sub-parameters (alpha, mean, var, scale, beta) in the second training parameter of the Batch Norm layer are added to the bias of the contribution layer, that is, the multiple second training sub-parameters are operated to obtain the second intermediate operation result
Figure BDA0002320795570000104
Adding the bias of the constraint layer to obtain a second fusion result
Figure BDA0002320795570000112
In one embodiment, the Batch Norm layer is deleted, the weight parameter of the convolutional layer is changed to the first fusion result, and the bias parameter of the convolutional layer is changed to the second fusion result.
Among them, in the multi-layer neural network, the Batch Norm layer is a network structure that does not contribute much to the model inference. For example: in the lightweight convolutional neural network, a large number of continuous Convolution and Batch Norm layer structures exist, and when forward propagation is performed, a large number of computing resources are consumed for constructing and executing the Batch Norm layer, but the network structure is repeated and complicated, so that the Batch Norm layer can be deleted after Convolution calculation of the Batch Norm layer is fused to the convolutional layer through the steps 204 and 206.
Further, the weight parameter of the convolutional layer may be changed to the first fusion result obtained in step 204, please refer to formula (1), and the weight parameter of the convolutional layer is changed to the first fusion result
Figure BDA0002320795570000113
The bias parameters of the convolutional layer can be changed into the second fusion result obtained in step 206, please refer to formula (1) again, and the bias parameters of the convolutional layer are changed into the second fusion resultTherefore, the normalization processing process executed by the Batch Norm layer is fused into the contribution layer, the Batch Norm layer is deleted, and the optimization of the two-layer continuous structure of the Convolution layer contribution and the Batch Norm layer is completed. In the convolutional neural network optimization method, the normalized data processing process of the Batch Norm layer is fused into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the Batch Norm layer is deleted after the network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
As an optional implementation manner, if the convolutional neural network includes a plurality of Batch Norm layers, each Batch Norm layer performs the above optimization process, and the normalization processing processes of the plurality of Batch Norm layers are all merged into the contribution layer, so that a plurality of redundant Batch Norm layers can be deleted, the network structure is clearer, and the network performance is greatly improved.
In an embodiment, as shown in fig. 4, the step S110, that is, the optimization process of the convolutional neural network, may specifically include:
step 302, the general purpose processor obtains configuration parameters.
The configuration parameters comprise a first training parameter and a second training parameter of a Scale layer of the convolutional neural network corresponding to a learning task of the artificial intelligence processor. Specifically, a first training parameter and a second training parameter for performing convolution calculation of the Scale layer may be acquired from the Caffe framework.
And 304, fusing the first training parameter of the Scale layer and the weight parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a first fusion result.
Specifically, the first training parameter of the Scale layer obtained in step 302 and the weight parameter of the convolutional layer of the convolutional neural network may be subjected to fusion calculation to obtain a first fusion result.
As an optional implementation, the first training parameter of the Scale layer includes at least one first training sub-parameter for performing convolution calculation of the Scale layer.
Specifically, if the first training parameter of the Scale layer includes a plurality of first training sub-parameters, all the first training sub-parameters of the Scale layer and the weight parameter of the convolutional layer are subjected to fusion calculation.
And step 306, fusing the second training parameter of the Scale layer and the bias parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a second fusion result.
Specifically, the second training parameter of the Scale layer acquired in step 302 and the bias parameter of the convolutional layer may be subjected to fusion calculation to obtain a second fusion result.
As an optional implementation, the second training parameter of the Scale layer includes at least one second training sub-parameter for performing convolution calculation of the Scale layer.
Specifically, if the second training parameter of the Scale layer includes a plurality of second training sub-parameters, all the second training sub-parameters of the Scale layer and the bias parameter of the convolutional layer are subjected to fusion calculation.
And 308, optimizing the convolutional neural network by the general processor according to the first fusion result and the second fusion result to obtain the optimized convolutional neural network.
Specifically, the optimization of the convolutional neural network may be completed according to the first fusion result obtained in step 304 and the second fusion result obtained in step 306.
In the convolutional neural network optimization method, the computation process of the Scale layer is fused into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the redundant neural network layer is deleted after network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
The method for compiling the learning task of the artificial intelligence processor based on the convolutional neural network optimization method comprises the steps of deleting redundant neural network layers in the convolutional neural network, simplifying a network structure, and compiling the simplified convolutional network structure. The Scale layer is fused into the convolutional layer due to the optimized neural network. Therefore, the Scale layer operators in the Scale layer are also fused into the convolution operators in the convolution layer, that is, the number of operators in the fused neural network is reduced, and the CPU compiles the neural network according to the operators, so that the method for compiling the learning task of the artificial intelligence processor reduces the number of operators compiled during the compilation of the neural network, and the method for compiling the learning task of the artificial intelligence processor is high in compilation efficiency. In addition, when the learning task corresponding to the neural network is executed, data exchange is needed once every time the related operation of one operator is executed, so that the neural network is compiled by the method for compiling the learning task of the artificial intelligence processor, and the data exchange in the processing process can be reduced when the neural network is executed on equipment.
In one embodiment, the first training parameter of the Scale layer and the weight parameter of the convolutional layer are multiplied to obtain the first fusion result.
In this case, the two-layer continuous structure including the Convolution layer constraint and the Scale layer as shown in fig. 5 may be optimized to be a constraint layer, i.e., the Scale layer may be deleted by fusing the calculation of the Scale layer to the constraint layer.
The Scale layer mainly scales and translates the normalized data, i.e. y ═ γ × xnorm+ β where xnormThe input data of the Scale layer after normalization processing is shown, gamma represents the scaling amount, and β represents the translation amount.
For example, please refer to formula (2) for the calculation formula of Scale layer:
alpha*x+beta (2)
specifically, in order to fuse the calculation process of the Scale layer into the constraint layer, a first training parameter for performing Convolution calculation of the Scale layer and a second training parameter for performing Convolution calculation of the Scale layer may be acquired. Referring to formula (2), the first training parameter for performing the convolution calculation of the Scale layer includes a first training sub-parameter (alpha), and the second training parameter for performing the convolution calculation of the Scale layer includes a second training sub-parameter (beta).
Wherein the first training parameter or the second training parameter (alpha, beta) is a vector. More than one first training parameter or second training parameter per class in the Caffe framework, for example: if a plurality of first training subparameters alpha in the Caffe framework exist, all the alpha in the Caffe framework are obtained.
Specifically, please refer to formula (2), the first training subparameter (alpha) in the weight parameters is multiplied by the weight weights of the constraint layer, that is, the alpha in formula (2) is multiplied by the weight weights of the constraint layer, so as to obtain a first fusion result alpha weights.
In one embodiment, the second training parameter of the Scale layer and the bias parameter of the convolutional layer are added to obtain the second fusion result.
For example, please refer to equation (2) again, the second training subparameter (beta) in the second training parameter of the Scale layer is added to the bias of the constraint layer, that is, the beta in equation (2) is added to the bias of the constraint layer to obtain a second fusion result, beta + bias.
In one embodiment, the general purpose processor deletes the Scale layer, changes the weight parameter of the convolutional layer to the first fusion result, and changes the bias parameter of the convolutional layer to the second fusion result.
In the multi-layer neural network, the Scale layer is a network structure which does not contribute much to model training. For example: in the lightweight convolutional neural network Mobile Net, a large number of continuous contribution and Scale layer structures exist, and when forward propagation is performed, the Scale layer does not play a great role in the process of performing Convolution calculation, but the network structure is repeated and complex, so that the Scale layer can be deleted after the Convolution calculation of the Scale layer is fused to the convolutional layer through the step 304 and the step 306.
Further, the weight parameter of the convolutional layer may be changed to the first fusion result obtained in step 304, please refer to formula (1), and the weight parameter of the convolutional layer is changed to the first fusion result alpha weights; the bias parameter of the convolutional layer may be changed to the second fusion result obtained in step 306, please refer to formula (1) again, and the bias parameter of the convolutional layer is changed to the second fusion result-beta + bias. Therefore, the normalization processing process executed by the Scale layer is fused into the constraint layer, the Scale layer is deleted, and the optimization of the two-layer continuous structure of the Convolution layer constraint and the Scale layer is completed.
In the convolutional neural network optimization method, the normalized data processing process of the Scale layer is fused into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the Scale layer is deleted after the network fusion is realized, the network structure can be simplified, and the network operation speed is increased. As an optional implementation manner, if the convolutional neural network includes a plurality of Scale layers, each Scale layer executes the optimization process, and the scaling and translation processes of the plurality of Scale layers are fused into the constraint layer, so that a plurality of redundant Scale layers can be deleted, the network structure is clearer, and the network performance is greatly improved.
In one embodiment, as shown in fig. 6, the step S110, that is, the optimization process of the convolutional neural network, may specifically include:
step 402, the general purpose processor obtains a first configuration parameter and a second configuration parameter.
The first configuration parameter comprises a first training parameter and a second training parameter of a BatchNorm layer of the convolutional neural network corresponding to a learning task of the artificial intelligence processor; the second configuration parameters comprise a first training parameter of a Scale layer and a second training parameter of the Scale layer, wherein the learning task of the artificial intelligence processor corresponds to the Scale layer of the convolutional neural network.
Specifically, the first training parameter and the second training parameter for performing convolution calculation of the Batch Norm layer may be acquired from the Caffe framework, or the first training parameter and the second training parameter for performing convolution calculation of the Scale layer may be acquired from the Caffe framework.
Step 404, the general processor fuses the first training parameter of the Batch Norm layer and the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result.
Specifically, the first training parameter of the Batch Norm layer of the convolutional neural network corresponding to the learning task of the artificial intelligence processor obtained in step 402, and the first training parameter of the Scale layer of the convolutional neural network corresponding to the learning task of the artificial intelligence processor and the weight parameter of the convolutional layer may be subjected to fusion calculation to obtain a first fusion result.
In an alternative embodiment, the first training parameter of the Batch Norm layer comprises at least one first training sub-parameter for performing a convolution calculation of the Batch Norm layer. The first training parameters of the Scale layer comprise at least one first training sub-parameter for performing convolution calculations of the Scale layer.
Specifically, if the first training parameter of the Batch Norm layer includes a plurality of first training sub-parameters, all the first training sub-parameters of the Batch Norm layer and the weight parameters of the convolutional layer are subjected to fusion calculation. And if the first training parameters of the Scale layer comprise a plurality of first training sub-parameters, performing fusion calculation on all the first training sub-parameters of the Scale layer and the weight parameters of the convolutional layer.
And 406, fusing the second training parameter of the Batch Norm layer and the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a second fusion result.
Specifically, the second training parameter of the Batch Norm layer of the convolutional neural network corresponding to the learning task of the artificial intelligence processor obtained in step 402, and the second training parameter of the Scale layer of the convolutional neural network corresponding to the learning task of the artificial intelligence processor and the bias parameter of the convolutional layer may be subjected to fusion calculation to obtain a second fusion result.
In an alternative embodiment, the second training parameter of the Batch Norm layer comprises at least one second training sub-parameter for performing a convolution calculation of the Batch Norm layer. The second training parameters of the Scale layer comprise at least one second training sub-parameter for performing convolution calculations of the Scale layer.
Specifically, if the second training parameter of the Batch Norm layer includes a plurality of second training sub-parameters, all the second training sub-parameters of the Batch Norm layer and the bias parameters of the convolutional layer are subjected to the fusion calculation. And if the second training parameters of the Scale layer comprise a plurality of second training subparameters, performing fusion calculation on all the second training subparameters of the Scale layer and the bias parameters of the convolutional layer.
And 408, optimizing the convolutional neural network by the general processor according to the first fusion result and the second fusion result to obtain the optimized convolutional neural network.
In the convolutional neural network optimization method, the calculation processes of the Batch Norm layer and the Scale layer are fused into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the redundant neural network layer is deleted after network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
The method for compiling the learning task of the artificial intelligence processor based on the convolutional neural network optimization method comprises the steps of deleting redundant neural network layers in the convolutional neural network, simplifying a network structure, and compiling the simplified convolutional network structure. Since the optimized neural network fuses the Batch Norm layer and the Scale layer into the convolutional layer, both the Scale layer operator in the Scale layer and the Batch Norm operator in the Batch Norm layer are fused into the convolutional operator in the convolutional layer, that is, the number of operators in the fused neural network is reduced. In addition, when the learning task corresponding to the neural network is executed, data exchange needs to be performed once each time the related operation of one operator is executed, so that the neural network is compiled by the method for compiling the learning task of the artificial intelligence processor, and the data exchange in the processing process can be reduced when the neural network is executed on equipment.
In one embodiment, the first training parameter of the Batch Norm layer, the first training parameter of the Scale layer, and the weight parameter of the convolutional layer are multiplied to obtain the first fusion result.
In this case, the three-layer continuous structure including the Convolution layer contribution, the Batch Norm layer and the Scale layer as shown in fig. 7 may be optimized to be a layer contribution, that is, the calculation of the Batch Norm layer and the Scale layer are fused to the contribution layer, respectively, so that the Batch Norm layer and the Scale layer may be deleted. In addition, fig. 7 only shows one positional relationship among the three layers of the constraint layer, the Batch Norm layer and the Scale layer in the neural network, and the present technical solution is applied to an example in which the positions of the Batch Norm layer and the Scale layer in fig. 7 are exchanged.
Specifically, in order to fuse the calculation processes of both the Batch Norm layer and the Scale layer into the contribution layer, a first training parameter for performing Convolution calculation of the Batch Norm layer, a second training parameter for performing Convolution calculation of the Batch Norm layer, a first training parameter for performing Convolution calculation of the Scale layer, and a second training parameter for performing Convolution calculation of the Scale layer may be acquired.
In one embodiment, if the first training parameter of the Batch Norm layer includes a plurality of first training sub-parameters, the plurality of first training sub-parameters are operated to obtain a first intermediate operation result; and fusing the first intermediate operation result, the first training parameter of the Scale layer and the weight parameter of the convolution layer to obtain the first fusion result.
For example, please refer to formula (1) and formula (2), multiply the first training sub-parameters (alpha, sqrt, var, Scale) in the first training parameter of the Batch Norm layer and the first training sub-parameter (alpha) in the first training parameter of the Scale layer with the weight weights of the constraint layer, that is, calculate the first intermediate calculation result obtained by calculating the first training sub-parameters in formula (1)
Figure BDA0002320795570000161
And the alpha in the formula (2) is multiplied by the weight weights of the Convolation layer to obtain a first fusion result
Figure BDA0002320795570000162
In one embodiment, the general purpose processor adds the second training parameter of the Batch Norm layer, the second training parameter of the Scale layer, and the bias parameter of the convolutional layer to obtain the second fusion result.
In one embodiment, if the second training parameter of the Batch Norm layer includes a plurality of second training sub-parameters, the plurality of second training sub-parameters are operated to obtain a second intermediate operation result; and fusing the second intermediate operation result and the second training parameter of the Scale layer with the bias parameter of the convolution layer to obtain a second fusion result.
For example, please refer to formula (1) again, the multiple second training sub-parameters (alpha, mean, var, Scale, beta) in the second training parameter of the Batch Norm layer and the second training sub-parameter (beta) in the second training parameter of the Scale layer are added to the bias of the constraint layer, that is, the multiple second training sub-parameters in formula (1) are operated to obtain the second intermediate operation resultAnd beta in the formula (2) is added with the bias of the Convolume layer to obtain a second fusion result
Figure BDA0002320795570000172
Figure BDA0002320795570000173
In one embodiment, the Batch Norm layer and the Scale layer are deleted, the weight parameter of the convolutional layer is changed to the first fusion result, and the bias parameter of the convolutional layer is changed to the second fusion result.
In the multi-layer neural network, the Batch Norm layer and the Scale layer are network structures which do not contribute much to model training. For example: in the lightweight convolutional neural network, a large number of continuous Convolution, BatchNorm and Scale layer structures exist, and when forward propagation is performed, the BatchNorm layer and the Scale layer do not play a great role in the process of performing Convolution calculation, but the network structure is repeated and complicated, so that after the Convolution calculation of the BatchNorm layer and the Scale layer is fused into the convolutional layer through the steps 404 and 406, the BatchNorm layer and the Scale layer can be deleted.
Further, the weight parameter of the convolutional layer may be changed to the first fusion result obtained in step 404, please refer to formula (1), and the weight parameter of the convolutional layer is changed to the first fusion result
Figure BDA0002320795570000174
The bias parameters of the convolutional layer can be changed to the second fusion result obtained in step 406, please refer to formula (1) again, and the bias parameters of the convolutional layer are changed to the second fusion resultTherefore, the normalization processing procedures executed by the Batch Norm layer and the Scale layer are fused into the volume layer, the Batch Norm layer and the Scale layer are deleted, and the optimization of the three-layer continuous structure of the Convolution layer volume, the Batch Norm layer and the Scale layer is completed.
In the convolutional neural network optimization method, the normalized data processing processes of the Batch Norm layer and the Scale layer are integrated into the convolutional layer, so that the network performance can be greatly improved on the premise of not losing the network precision; meanwhile, the Batch Norm layer and the Scale layer are deleted after the network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
In one embodiment, a convolutional neural network optimization method is provided to implement convolutional neural network optimization in step S110, where the method is executed on a general-purpose processor as shown in fig. 1, and the method includes performing convolutional calculation on input data of a convolutional layer and the first and second fusion results respectively to obtain output results of the convolutional layer. As shown in fig. 8, the method specifically includes the following steps:
step 502, the general processor performs multiplication operation on the input data and the first fusion result to obtain a first operation result.
As an alternative embodiment, in the optimization method of the two-layer convolutional neural network in which the Batch Norm layer is fused to the convolutional layer as shown in fig. 2, please refer to formula (1), the input data x of the convolutional layer and the first fusion result are combinedPerforming multiplication to obtain a first operation result
As an alternative implementation, in the optimization method of the two-layer convolutional neural network in which Scale layers are fused to convolutional layers as shown in fig. 4, please refer to formula (2), the input data x of the convolutional layers is multiplied by the first fusion result alpha weight to obtain the first operation result x alpha weight.
As an alternative embodiment, in the optimization method of the three-layer convolutional neural network in which the Batch Norm layer and the Scale layer are fused to the convolutional layer as shown in fig. 6, please refer to formula (1) and formula (2), and input data x of the convolutional layer and the first fusion result are combined
Figure BDA0002320795570000181
Performing multiplication to obtain a first operation result
And step 504, the general processor performs addition operation on the first operation result and the second fusion result to obtain the output result.
As an alternative embodiment, a two-layer convolution is performed where the Batch Norm layer is fused to the convolution layer as shown in FIG. 2In the optimization method of neural network, please refer to formula (1), the second fusion result is
Figure BDA0002320795570000183
Figure BDA0002320795570000184
The result of the first operation
Figure BDA0002320795570000185
With the second fused result
Figure BDA0002320795570000187
Performing addition operation to obtain output result
Figure BDA0002320795570000188
Figure BDA0002320795570000189
As an optional implementation manner, in the optimization method of the two-layer convolutional neural network in which the Scale layer is fused to the convolutional layer as shown in fig. 4, please refer to formula (2), the second fusion result is beta + bias, and the first operation result x α weight and the second fusion result beta + bias are added to obtain the output result x appla weight + beta + bias.
As an alternative embodiment, in the optimization method of the three-layer convolutional neural network in which the Batch Norm layer and the Scale layer are fused to the convolutional layer as shown in FIG. 6, please refer to formula (1) and formula (2), and the second fusion result
Figure BDA00023207955700001810
The result of the first operation
Figure BDA00023207955700001812
Result of fusion with two
Figure BDA00023207955700001813
Performing addition operation to obtain output result
Figure BDA00023207955700001814
Figure BDA00023207955700001815
In the convolutional neural network optimization method, the input data of the convolutional layer is respectively subjected to convolutional calculation with the first fusion result and the second fusion result, so that the optimized network precision is not lost on the premise that the calculation does not overflow, and the network operation speed is increased.
In one embodiment, as shown in fig. 9, the step S110, that is, the optimization process of the convolutional neural network, may specifically include:
at step 602, the general purpose processor obtains configuration parameters.
Wherein the configuration parameters include a first training parameter and a second training parameter of a redundant neural network layer of the convolutional neural network. The first training parameters include one or more first training sub-parameters and the second training parameters include one or more second training sub-parameters.
And step 604, fusing the first training parameter of the redundant neural network layer of the convolutional neural network and the weight parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a first fusion result.
Specifically, the first training parameter obtained in step 602 and the weight parameter of the convolutional layer may be subjected to fusion calculation to obtain a first fusion result.
As an optional implementation manner, a multiplication operation may be performed on the first training parameter and the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result.
Specifically, if the first training parameter includes a plurality of first training subparameters, all the first training subparameters are multiplied by the weight parameters of the convolutional layer.
And 606, fusing the second training parameter of the redundant neural network layer of the convolutional neural network with the bias parameter of the convolutional layer of the convolutional neural network by the general processor to obtain a second fusion result.
Specifically, the second training parameter of the redundant neural network layer of the convolutional neural network obtained in step 602 and the bias parameter of the convolutional layer of the convolutional neural network may be subjected to corresponding fusion calculation to obtain a second fusion result.
As an alternative implementation, the second training parameter and the bias parameter of the convolutional layer of the convolutional neural network may be added to obtain a second fusion result.
Specifically, if the second training parameters include a plurality of second training subparameters, all the second training subparameters are added to the bias parameters of the convolutional layer.
And 608, optimizing the convolutional neural network by the general processor according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network.
Specifically, the optimization of the convolutional neural network may be completed according to the first fusion result obtained in step 604 and the second fusion result obtained in step 606.
As an optional implementation, deleting the redundant neural network layer, changing the weight parameter of the convolutional layer to the first fusion result, and changing the bias parameter of the convolutional layer to the second fusion result.
Wherein, the redundant neural network layer refers to a network structure which is deployed in a multilayer neural network and does not contribute much to model reasoning. For example: in the lightweight convolutional neural network Mobile Net, a large number of continuous contribution, Batch Norm and Scale layer structures exist, and when forward propagation is carried out, the Batch Norm layer and the Scale layer do not play a great role in the process of carrying out convolutional calculation, but the network structure is repeated and complex, so that the Batch Norm layer and the Scale layer can be regarded as a redundant neural network layer. However, the redundant neural network layer is not limited to the Batch Norm layer and the Scale layer.
Further, the weight parameter of the convolutional layer may be changed to the first fusion result obtained in step 604; the bias parameters of the convolutional layer may be changed to the second fused result obtained in step 606. Therefore, the data processing process of the redundant neural network layer is fused into the contribution layer, the redundant neural network layer is deleted, and the structure optimization of the Convolution layer and the redundant neural network layer is completed.
In the convolutional neural network optimization method, the network performance can be greatly improved on the premise of not losing the network precision by fusing the calculation process of the redundant neural network layer into the convolutional layer; meanwhile, the redundant neural network layer is deleted after network fusion is realized, the network structure can be simplified, and the network operation speed is increased.
The method for compiling the learning task of the artificial intelligence processor based on the convolutional neural network optimization method comprises the steps of deleting redundant neural network layers in the convolutional neural network, simplifying a network structure, and compiling the simplified convolutional network structure. The optimized neural network fuses the redundant neural network layers into the convolutional layers, so that operators in the redundant neural network layers are all fused into convolution operators in the convolutional layers, namely the number of the operators in the fused neural network is reduced. In addition, when the learning task corresponding to the neural network is executed, data exchange needs to be performed once each time the related operation of one operator is executed, so that the neural network is compiled by the method for compiling the learning task of the artificial intelligence processor, and the data exchange in the processing process can be reduced when the neural network is executed on equipment.
In Mobile Net, add new network parameters, such as opt _ level, to the source code file, ca ffe. By setting the parameter value, the network structure of the convolutional neural network is automatically detected, and the corresponding convolutional neural network optimization method is automatically called according to the set parameter value, so that the learning cost of a user can be saved, the usability is improved, and the right of the user to select is reserved.
It should be understood that although the various steps in the flow charts of fig. 2-9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-9 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in FIG. 10, there is provided an apparatus for compiling a learning task of an artificial intelligence processor, comprising: a configuration parameter obtaining module 701, a first fusion result obtaining module 702, a second fusion result obtaining module 703 and an optimizing module 704, wherein:
a configuration parameter obtaining module 701, configured to obtain configuration parameters, where the configuration parameters include a first training parameter and a second training parameter of a Batch Norm layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor;
a first fusion result obtaining module 702, configured to fuse the first training parameter of the Batch Norm layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result;
a second fusion result obtaining module 703, configured to fuse the second training parameter of the Batch Norm layer with the bias parameter of the convolutional layer of the convolutional neural network, so as to obtain a second fusion result;
and the optimizing module 704 is configured to optimize the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network, and compile the optimized convolutional neural network to obtain a corresponding binary instruction, so as to allocate the binary instruction to an artificial intelligent processor to execute a corresponding learning task.
In one embodiment, the configuration parameter obtaining module 701 is further configured to obtain a configuration parameter, where the configuration parameter includes a first training parameter of a Scale layer and a second training parameter of the Scale layer.
In one embodiment, the configuration parameter obtaining module 701 is further configured to obtain a first configuration parameter and a second configuration parameter, where the first configuration parameter includes a first training parameter and a second training parameter of the Batch Norm layer; the second configuration parameter includes a first training parameter of the Scale layer and a second training parameter of the Scale layer.
In one embodiment, the first fusion result obtaining module 702 is further configured to fuse the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result.
In one embodiment, the first fusion result obtaining module 702 is further configured to fuse the first training parameter of the Batch Norm layer and the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result.
In one embodiment, the second fusion result obtaining module 703 is further configured to fuse the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network to obtain a second fusion result.
In one embodiment, the second fusion result obtaining module 703 is further configured to fuse the second training parameter of the Batch Norm layer and the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network, so as to obtain a second fusion result.
In one embodiment, the optimizing module 704 is further configured to delete the Batch Norm layer, change the weight parameter of the convolutional layer to the first fused result, and change the bias parameter of the convolutional layer to the second fused result.
In one embodiment, the optimization module 704 is further configured to delete the Scale layer, change the weight parameter of the convolutional layer to the first fusion result, and change the bias parameter of the convolutional layer to the second fusion result.
In one embodiment, the optimizing module 704 is further configured to delete the Batch Norm layer and the Scale layer, change the weight parameter of the convolutional layer to the first fusion result, and change the bias parameter of the convolutional layer to the second fusion result.
For the specific limitation of the operation device, reference may be made to the above limitation on the operation method, which is not described herein again. The modules in the computing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a Batch Norm layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor; fusing the first training parameter of the Batch Norm layer with the weight parameter of the convolution layer of the convolution neural network to obtain a first fusion result; fusing the second training parameter of the Batch Norm layer with the bias parameter of the convolution layer of the convolution neural network to obtain a second fusion result; and optimizing the convolutional neural network according to the first fusion result and the second fusion result to obtain the optimized convolutional neural network.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a Scale layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor; fusing the first training parameter of the Scale layer with the weight parameter of the convolution layer of the convolution neural network to obtain a first fusion result; fusing the second training parameter of the Scale layer with the bias parameter of the convolution layer of the convolution neural network to obtain a second fusion result; and compiling the optimized convolutional neural network to obtain a corresponding binary instruction so as to be distributed to an artificial intelligent processor to execute a corresponding learning task.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a first configuration parameter and a second configuration parameter, wherein the first configuration parameter comprises a first training parameter and a second training parameter of a Batch Norm layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor; the second configuration parameters comprise a first training parameter and a second training parameter of a Scale layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor; fusing the first training parameter of the Batch Norm layer and the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result; fusing the second training parameter of the Batch Norm layer and the second training parameter of the Scale layer with the bias parameter of the convolution layer of the convolution neural network to obtain a second fusion result; and compiling the optimized convolutional neural network to obtain a corresponding binary instruction so as to be distributed to an artificial intelligent processor to execute a corresponding learning task.
It should be clear that, the steps implemented when the computer program in the embodiment of the present application is executed by the processor are consistent with the execution process of each step of the method in the above embodiments, and specific reference may be made to the above description, and no further description is given here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for learning task compilation for an artificial intelligence processor, the method comprising:
a general processor acquires configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a Scale layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor;
the general processor fuses the first training parameter of the Scale layer with the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result;
the general processor fuses the second training parameter of the Scale layer with the bias parameter of the convolutional layer of the convolutional neural network to obtain a second fusion result;
the general processor optimizes the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network;
and the general processor compiles the optimized convolutional neural network to obtain a corresponding binary instruction sequence so as to distribute the binary instruction sequence to the artificial intelligent processor to execute a corresponding learning task.
2. The method according to claim 1, wherein the general purpose processor fuses the first training parameters of the Scale layer with the weight parameters of the convolutional layer of the convolutional neural network to obtain a first fusion result, and comprises:
and the general processor performs multiplication operation on the first training parameter of the Scale layer and the weight parameter of the convolution layer to obtain the first fusion result.
3. The method of claim 1, wherein the general purpose processor fuses the second training parameters of the Scale layer with the bias parameters of the convolutional layer of the convolutional neural network to obtain a second fusion result, comprising:
and the general processor performs addition operation on the second training parameter of the Scale layer and the bias parameter of the convolution layer to obtain a second fusion result.
4. The method of claim 1, wherein the general purpose processor optimizes the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network, comprising:
and deleting the Scale layer by the general processor, changing the weight parameter of the convolution layer into the first fusion result, and changing the bias parameter of the convolution layer into the second fusion result.
5. The method of claim 1, further comprising:
and the general processor performs convolution calculation on the input data of the convolutional layer and the first fusion result and the second fusion result respectively to obtain an output result of the convolutional layer.
6. The method of claim 5, wherein the general purpose processor performs convolution calculations on the input data of the convolutional layer with the first and second fusion results, respectively, to obtain an output result of the convolutional layer, and comprises:
the general processor performs multiplication operation on the input data and the first fusion result to obtain a first operation result;
and the general processor performs addition operation on the first operation result and the second fusion result to obtain the output result.
7. The method of claim 1, wherein the first training parameters of the Scale layer comprise at least one first training sub-parameter for performing convolution calculations of the Scale layer; the second training parameters of the Scale layer comprise at least one second training sub-parameter for performing convolution calculations of the Scale layer.
8. A method for learning task compilation for an artificial intelligence processor, the method comprising:
a general processor acquires configuration parameters, wherein the configuration parameters comprise a first training parameter and a second training parameter of a redundant neural network layer of a convolutional neural network corresponding to a learning task of the artificial intelligence processor;
the general processor fuses the first training parameter and the weight parameter of the convolutional layer of the convolutional neural network to obtain a first fusion result;
the general processor fuses the second training parameters with the bias parameters of the convolutional layer of the convolutional neural network to obtain a second fusion result;
optimizing the convolutional neural network according to the first fusion result and the second fusion result to obtain an optimized convolutional neural network;
and compiling the optimized convolutional neural network to obtain corresponding binary instructions so as to distribute the binary instructions to an artificial intelligent processor to execute corresponding learning tasks.
9. A learning task compilation system for an artificial intelligence processor, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method as claimed in any one of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201911296833.6A 2018-12-29 2019-12-16 Learning task compiling method of artificial intelligence processor and related product Pending CN110766145A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018116399274 2018-12-29
CN201811639927.4A CN109766996A (en) 2018-12-29 2018-12-29 Optimization method, device, storage medium and the system of convolutional neural networks

Publications (1)

Publication Number Publication Date
CN110766145A true CN110766145A (en) 2020-02-07

Family

ID=66453189

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811639927.4A Pending CN109766996A (en) 2018-12-29 2018-12-29 Optimization method, device, storage medium and the system of convolutional neural networks
CN201911296833.6A Pending CN110766145A (en) 2018-12-29 2019-12-16 Learning task compiling method of artificial intelligence processor and related product

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201811639927.4A Pending CN109766996A (en) 2018-12-29 2018-12-29 Optimization method, device, storage medium and the system of convolutional neural networks

Country Status (1)

Country Link
CN (2) CN109766996A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860773A (en) * 2020-06-30 2020-10-30 北京百度网讯科技有限公司 Processing apparatus and method for information processing
CN112269992A (en) * 2020-06-01 2021-01-26 中国科学院信息工程研究所 Real-time malicious sample detection method based on artificial intelligence processor and electronic device
WO2022224330A1 (en) * 2021-04-20 2022-10-27 日本電気株式会社 Neural network architecture search device and neural network architecture search method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766996A (en) * 2018-12-29 2019-05-17 北京中科寒武纪科技有限公司 Optimization method, device, storage medium and the system of convolutional neural networks
CN110929860B (en) * 2019-11-07 2020-10-23 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN115408568B (en) * 2021-05-26 2024-04-05 中科寒武纪科技股份有限公司 Method for fusing operators of neural network and related products

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms
CN109754011A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device and Related product based on Caffe
CN109766996A (en) * 2018-12-29 2019-05-17 北京中科寒武纪科技有限公司 Optimization method, device, storage medium and the system of convolutional neural networks
CN110070176A (en) * 2019-04-18 2019-07-30 北京中科寒武纪科技有限公司 The processing method of off-line model, the processing unit of off-line model and Related product

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms
CN109754011A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device and Related product based on Caffe
CN109766996A (en) * 2018-12-29 2019-05-17 北京中科寒武纪科技有限公司 Optimization method, device, storage medium and the system of convolutional neural networks
CN110070176A (en) * 2019-04-18 2019-07-30 北京中科寒武纪科技有限公司 The processing method of off-line model, the processing unit of off-line model and Related product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李孝安 等: "《神经网络与神经计算机导论》", 31 October 1995, 西北工业大学出版社 *
黄小平 等: "《多核结构多粒度短向量并行计算技术》", 31 May 2019, 航空工业出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269992A (en) * 2020-06-01 2021-01-26 中国科学院信息工程研究所 Real-time malicious sample detection method based on artificial intelligence processor and electronic device
CN112269992B (en) * 2020-06-01 2023-10-20 中国科学院信息工程研究所 Real-time malicious sample detection method based on artificial intelligent processor and electronic device
CN111860773A (en) * 2020-06-30 2020-10-30 北京百度网讯科技有限公司 Processing apparatus and method for information processing
CN111860773B (en) * 2020-06-30 2023-07-28 北京百度网讯科技有限公司 Processing apparatus and method for information processing
WO2022224330A1 (en) * 2021-04-20 2022-10-27 日本電気株式会社 Neural network architecture search device and neural network architecture search method

Also Published As

Publication number Publication date
CN109766996A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN110889497B (en) Learning task compiling method of artificial intelligence processor and related product
CN110766145A (en) Learning task compiling method of artificial intelligence processor and related product
CN109086031B (en) Business decision method and device based on rule engine
JáJá Parallel algorithms
CN112199190B (en) Memory allocation method and device, storage medium and electronic equipment
CN111126668B (en) Spark operation time prediction method and device based on graph convolution network
US20230236888A1 (en) Memory allocation method, related device, and computer-readable storage medium
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
CN110766146B (en) Learning task compiling method of artificial intelligence processor and related product
CN114237869B (en) Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN112052027A (en) Method and device for processing AI task
CN112764893A (en) Data processing method and data processing system
CN111831355A (en) Weight precision configuration method, device, equipment and storage medium
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN117271101B (en) Operator fusion method and device, electronic equipment and storage medium
Chen et al. Modeling design iteration in product design and development and its solution by a novel artificial bee colony algorithm
CN111736463B (en) Adaptive deep learning control method based on operation platform
CN110377769B (en) Modeling platform system, method, server and medium based on graph data structure
CN114021733A (en) Model training optimization method and device, computer equipment and storage medium
CN115587922A (en) Tensor blocking method and device and storage medium
Zykov et al. Application of information processes applicative modelling to virtual machines auto configuration
CN116610456B (en) Memory optimization method based on eager memory reuse algorithm
CN116991560B (en) Parallel scheduling method, device, equipment and storage medium for language model
WO2023207630A1 (en) Task solving method and apparatus therefor
CN113704687B (en) Tensor calculation operation method, device and operation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207

RJ01 Rejection of invention patent application after publication