CN115481717A - Method for operating neural network model, readable medium and electronic device - Google Patents

Method for operating neural network model, readable medium and electronic device Download PDF

Info

Publication number
CN115481717A
CN115481717A CN202211109253.3A CN202211109253A CN115481717A CN 115481717 A CN115481717 A CN 115481717A CN 202211109253 A CN202211109253 A CN 202211109253A CN 115481717 A CN115481717 A CN 115481717A
Authority
CN
China
Prior art keywords
layer
size
sub
tensor
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211109253.3A
Other languages
Chinese (zh)
Inventor
许礼武
余宗桥
黄敦博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202211109253.3A priority Critical patent/CN115481717A/en
Publication of CN115481717A publication Critical patent/CN115481717A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application relates to the field of artificial intelligence and discloses an operation method of a neural network model, a readable medium and electronic equipment. The method is applied to the electronic equipment, wherein the method comprises the following steps: the electronic equipment predicts the size of a calculation result according to the size of input data and model parameters of each calculation layer; determining the sizes of a plurality of sub-results according to the sizes of the prediction calculation results; the electronic equipment determines the range of a data block corresponding to each sub-result in the input data based on the size of each sub-result and the model parameter of each layer of calculation layer, and then calculates each data block to obtain the sub-result corresponding to each data block; and the electronic equipment determines a calculation result corresponding to the input data according to the sub-result corresponding to each data block. Therefore, the electronic equipment only calculates partial data in the input data each time, and does not generate more intermediate data at one time, so that the occupation of the memory in the process of operating the neural network model is reduced.

Description

Method for operating neural network model, readable medium and electronic device
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an operating method of a neural network model, a readable medium, and an electronic device.
Background
With the rapid development of Artificial Intelligence (AI), the convolutional neural network model is applied more and more widely in the field of artificial intelligence. In the calculation process of the convolutional neural network, all data to be calculated are input into the convolutional layer, and then calculation is performed layer by layer. Because the number of channels of the conventional convolutional neural network is increased, if all data channels are input simultaneously, the data volume is large, and a large amount of storage and calculation resources are consumed in the whole convolutional calculation process.
Due to the limited computing power of hardware devices (e.g., computer devices), in the calculation process of the neural network, if there are more input data to be calculated and more weight channels per convolutional layer, many intermediate results are generated, which results in an excessive requirement for the hardware devices. For some hardware devices with insufficient performance, excessive input data and generated intermediate data may cause data overflow, resulting in calculation errors.
Disclosure of Invention
In view of the above, embodiments of the present application provide an operating method of a neural network model, a readable medium, and an electronic device.
In a first aspect, an embodiment of the present application provides an operation method of a neural network model, which is applied to an electronic device, where the neural network model includes a plurality of computation layers, and the method includes: predicting the size of a calculation result tensor corresponding to the input data tensor according to the size of the input data tensor and the model parameters of each layer of calculation layer; determining the sizes of a plurality of sub-result tensors according to the size of the calculated result tensor; determining the range of the data block corresponding to each sub-result tensor in the input data tensor according to the size of each sub-result tensor and the model parameters of each layer of calculation layer; calculating each data block according to the model parameters of each layer of calculation layer to obtain a sub-result tensor corresponding to each data block; and determining a calculation result tensor corresponding to the input data tensor according to the sub-result tensor corresponding to each data block.
By the method provided by the embodiment of the application, the electronic equipment only calculates partial data in the input data at each time, so that more intermediate data cannot be generated at one time, and the occupation of a memory in the process of running the neural network model is reduced. The electronic equipment does not need to send the model to other computing equipment for operation, so that the occupation of network resources is reduced, and the time delay caused by sending the model to other computing equipment is also reduced. And the neural network model can be deployed on local equipment to run, and after calculation of all convolutional layers is completed, the final result is written into the external memory again, so that the number of times of accessing the external memory is reduced. The input data is split into a plurality of data blocks, different data blocks can be delivered to different computing engines of the processor for computing, different computing engines can run synchronously, and the running efficiency is improved.
In one possible implementation of the first aspect, the computation layer is a convolution layer or an deconvolution layer, the convolution layer is configured to perform a convolution operation on the input data tensor, and the deconvolution layer is configured to perform a deconvolution operation on the input data tensor; the model parameters include the step size of the convolution or deconvolution operation, the convolution kernel size, and the number of filler blocks.
In one possible implementation of the first aspect, determining a range of a corresponding data block of each sub-result tensor in the input data tensor based on a size of each sub-result tensor and the model parameters of each layer of the computation layer includes: determining the size of each data block according to the size of each sub-result tensor and the model parameters of each layer of calculation layer; determining the size of an overlapping area between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each layer of calculation layer; and determining the range of the data block corresponding to each sub-result tensor in the input data tensor according to the size of each data block and the size of an overlapping area between adjacent data blocks.
In one possible implementation of the first aspect, determining the size of the overlapping region between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each layer of computation layer includes determining the size of the overlapping region between a data block corresponding to the current sub-result tensor and a data block corresponding to the previous sub-result tensor in different dimension directions according to the size of the previous sub-result tensor of each sub-result tensor in different dimension directions and the model parameters of each layer of computation layer.
In one possible implementation of the first aspect, the computation layers have M layers in common, and the size of the overlapping region between adjacent data blocks is determined according to the size of each sub-result tensor and the model parameters of each computation layer, and the method further includes determining the size of the overlapping region between adjacent data blocks in the input data tensor corresponding to the computation layer of the mth layer according to the size of each sub-result tensor and the model parameters of the computation layer of the mth layer; and taking the input data tensor of the M-th layer computing layer as the output data of the M-1-th layer computing layer, and determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the M-1-th layer computing layer until determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the first layer computing layer.
In a possible implementation of the first aspect, calculating each data block according to the model parameter of each computation layer to obtain a sub-result tensor corresponding to each data block includes: and respectively sending the plurality of data blocks to different calculation engines, and respectively calculating the received data blocks by the different calculation engines according to the model parameters of each calculation layer to obtain the sub-result tensor corresponding to each data block.
In one possible implementation of the first aspect, the computing engine is a computing engine of a processor of the electronic device, or the computing engine is a computing engine of a processor of another electronic device.
In a second aspect, the present application provides a readable medium, which contains instructions that, when executed by a processor of an electronic device, cause the electronic device to implement the first aspect and any one of the operation methods of the neural network model provided in various possible implementations of the first aspect.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory to store instructions for execution by one or more processors of an electronic device; and a processor, which is one of the processors of the electronic device, and is configured to execute instructions to enable the electronic device to implement the first aspect and any one of the operation methods of the neural network model provided in various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product, where the computer program product includes instructions that, when executed by an electronic device, cause the electronic device to implement the first aspect and any one of the neural network models provided in the various possible implementations of the first aspect.
Drawings
FIG. 1 illustrates an architectural schematic of a convolutional neural network, according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of a computation process of a convolution operation, according to some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of a computational process of a deconvolution operation, according to some embodiments of the present application;
FIG. 4 illustrates a schematic diagram of a deployment of a convolutional neural network, according to some embodiments of the present application;
FIG. 5 illustrates a schematic diagram of overlapping regions in a convolution calculation, according to some embodiments of the present application;
FIG. 6 illustrates a schematic diagram of a data block partitioning method, according to some embodiments of the present application;
FIG. 7 illustrates a flow diagram of a method of operation of a neural network model, in accordance with some embodiments of the present application;
8A-8C illustrate a schematic diagram of a computation process for a three-layer convolutional layer, according to some embodiments of the present application;
9A-9C illustrate a schematic diagram of calculating a starting location of a data block, according to some embodiments of the present application;
FIG. 10 illustrates an architectural diagram of an apparatus for operating a neural network model, according to some embodiments of the present application;
FIG. 11 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the present application include, but are not limited to, methods of operation of neural network models, readable media, and electronic devices.
For ease of understanding, terms referred to in the embodiments of the present application will be first introduced.
A Convolutional Neural Network (CNN) is a deep neural Network with a Convolutional structure, and is a deep learning (deep learning) architecture, where the deep learning architecture refers to learning at multiple levels on different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward (feed-forward) artificial neural network, and can be applied to image processing, natural language processing, and computer vision.
The following exemplifies the structure of CNN by taking the application of CNN to image processing as an example.
Fig. 1 illustrates a structural schematic diagram of a CNN model, according to some embodiments of the present application. As shown in fig. 1, the CNN model may include an input layer, a convolutional layer, a pooling layer, a fully-connected layer, and an output layer, wherein the pooling layer is optional.
The input layer is used to perform pre-processing on the input image, such as normalization, de-averaging, principal Component Analysis (PCA) dimensionality reduction.
The convolutional layers may comprise multiple layers of convolutional computation, as shown in fig. 1 may comprise 3 convolutional layers as illustrated, e.g., conv1, conv2, and conv3. The convolutional layer may include a number of convolution operators, also known as convolution kernels (kernel), which correspond to a filter that extracts specific information from the input image. The convolution kernel is essentially a weight matrix, the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, the weight values in the weight matrix need to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can extract information from the input image, so that the convolution neural network can be helped to carry out correct prediction.
The calculation process of the convolutional layer is to perform convolution operation on the convolution kernel and the data of the size of the convolution kernel (kernel size) in the input image, after the data of one kernel size is calculated, moving one step (stride) backwards to calculate the data of the next kernel size until the data of the whole layer is calculated, and finally obtaining the result, namely the feature map (feature map) of the convolutional layer. It should be understood that in the convolution operation, the feature map calculated by each convolutional layer is smaller and smaller, and in order to ensure the size of the feature map and to increase the number of times that the pixel points at the edge of the image are used in the convolutional layer, padding (padding) is added to the edge of the input image.
The pooling layer, which is not shown in fig. 1, is used to reduce the computational load of the convolutional neural network. The purpose of the pooling layer is to reduce the spatial size of the image during image processing. The pooling layer may be introduced periodically after the convolutional layer, specifically, one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. The pooling layer may comprise an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller size images. The average pooling operator may calculate pixel values in the image over a particular range to produce an average. The max pooling operator may take the pixel with the largest value in a particular range as a result of the max pooling.
The full link layer is used for integrating local information with category distinction in the convolutional layer or the pooling layer to generate final output information (required category information or other related information). The fully-connected layer may include a multi-layer structure, and parameters of each fully-connected layer may be obtained by pre-training according to training data associated with a specific task type, for example, the task type may include image recognition, image classification, image super-resolution reconstruction, and the like.
The output layer maps the inputs of the neurons to the outputs using an activation function. And once the forward propagation of the whole convolutional neural network is finished, the backward propagation starts to update the weight values and the deviations of the aforementioned layers so as to reduce the loss of the convolutional neural network and the error between the result output by the convolutional neural network through the output layer and an ideal result.
It should be noted that the convolutional neural network shown in fig. 1 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models.
In some embodiments, the neural network model further includes a deconvolution neural network including a despooling layer, a rectification layer, and a deconvolution layer (deconvo). The deconvolution layer corresponds to a convolution layer in the CNN, and the deconvolution layer may also be referred to as a transposed convolution.
Convolutional layers, which perform convolution operations, and deconvolution layers, which perform deconvolution operations, may be collectively referred to as computation layers. The following illustrates the calculation process of the convolution and deconvolution operations in conjunction with the image. In fig. 2 and 3, the kernel size is 3 × 3, fig. 2 illustrates a two-dimensional matrix with 4 × 4 input images, and fig. 3 illustrates a two-dimensional matrix with 2 × 2 input images. It should be understood that the input image is two-dimensional, and the kernel should also be two-dimensional.
As shown in fig. 2, when the input image is 4 × 4, the kernel size is 3 × 3, and the stride is 1 in the convolution operation, the convolution kernel convolves the first 3 × 3 units of data in the input image as shown in fig. 2 (a), and a calculation result 1 is obtained. Then, as shown in fig. 2 (B), 3 × 3 units of data are calculated by shifting stride equal to 1 by one unit to the right, and a calculation result 2 is obtained. By analogy, as shown in fig. 2 (C) and fig. 2 (D), calculation result 3 and calculation result 4 are obtained. Finally, the calculation results 1-4 are the feature map output by the convolution layer.
As shown in fig. 3, the deconvolution operation has an input image of 2 × 2, a kernel size of 3 × 3, and a stride of 1, and the input image is filled with two padding lines each. Referring to the convolution process shown in fig. 2, the resulting feature map output is 4 × 4.
In some embodiments, at the input layer, if the input image is a gray-scale picture, only one feature map is obtained, and if the input image is a color picture, there are feature maps of 3 channels of red, green and blue, i.e. a three-dimensional matrix.
That is, after the convolution operation, the output feature map will typically be smaller than the input image, and after the deconvolution operation, the output feature map will typically be larger than the input image. It should be understood that fig. 2 and fig. 3 are described by taking as an example that the convolution operation is not padded with padding, and the deconvolution operation is padded with padding, and that padding may be performed on the input image in the convolution operation, which is not specifically limited in this application.
Therefore, in the calculation process of the calculation layer, no matter the convolution operation or the deconvolution operation is performed, all data to be calculated are input into the calculation layer, and then calculation is performed. Generally, the amount of data input to the computation layer is large, for example, if the input data tensor is [1,1024,512,3], the memory size occupied by the input data needs at least 1 × 1024 × 512 × 3 bytes (byte), i.e., 1.5 Megabytes (MB), and the input data also generates a large amount of intermediate computation results in the computation of the computation layer. If all input data are input into the terminal equipment to run, the data volume is large, a large amount of memory of the terminal equipment is occupied in the convolution calculation process, and for some hardware equipment with low performance, data overflow may be caused by excessive input data and generated intermediate data, so that calculation errors occur.
In some embodiments, when the electronic device 100 is a terminal device 101 with a small memory, the terminal device 101 may also deploy the neural network model to a cloud server or an edge device for execution, so as to reduce occupation of the memory of the terminal device. As shown in fig. 4, the terminal device 101 may deliver the operation of the neural network model to the cloud server 200, or send the operation to the edge computing device 300, and after the cloud server 200 and/or the edge computing device 300 obtains the operation result of the neural network model, the operation result is returned to the terminal device 101.
The cloud server 200 is an entity that provides cloud services to users by using basic resources in a cloud computing mode, and has a large amount of basic resources (including computing resources, storage resources, and network resources). Edge computing device 300 refers to an edge computing device in an edge environment or a software system running on one or more edge computing devices. The edge computing device 300 is geographically located closer to the terminal device 101, such as edge computing kiosks located on both sides of a road, or county-level edge servers.
However, the terminal device 101 sends the input data of the neural network model to the cloud server 200 for operation, which takes a long sending time and occupies a large amount of network resources. The memory of the edge server 300 is also typically insufficient to support convolution and pooling operations for large data volumes if sent to the edge computing device 300 to be run.
In other embodiments, when the memory of the computing device running the neural network model is not enough, the input data, the intermediate calculation results, and the like may also be stored in the external memory. When the neural network model is operated, the data fragments are read into the memory, for example, the color image red, green and blue 3 channel input fragments are read, the data of one channel is calculated, after the corresponding feature map is obtained, the feature map is written into the external memory, and then the data of the next channel is read and calculated. However, if the number of computing layers is large, for example, there are multiple convolution layers, then frequent access to the external memory is required, which may occupy too much bus bandwidth and affect the normal operation of other applications of the device. Moreover, if the data volume of one channel is larger, the method still occupies larger memory when the neural network model is operated.
In order to solve the problem that the running of the neural network model occupies more memory, the application provides a running method of the neural network model, which is applied to the neural network model with a plurality of computing layers, wherein the plurality of computing layers can be multilayer convolution layers or multilayer deconvolution layers. The method comprises the steps of determining the size of an output result according to the size of an input data tensor and model parameters of each layer of calculation layer, and determining that the output result tensor needs to be divided into a plurality of blocks in each dimension direction according to the size of the output result tensor, namely determining the sizes of a plurality of sub-result tensors. Then, a corresponding region (data block) of each sub-result tensor in the input data tensor is determined, and then the electronic device 100 performs operation of a plurality of calculation layers on each data block to obtain each sub-result tensor, and finally combines each sub-result tensor to obtain an output result. The input data tensor and the output result tensor can be a first-order tensor, a second-order tensor or a multidimensional tensor. Hereinafter, the input data tensor and the output result tensor are simply referred to as input data and output result.
In this way, the electronic device 100 only needs to calculate part of the data in the input data each time, and does not generate more intermediate data at one time, thereby reducing the memory occupation in the process of operating the neural network model. The electronic device 100 does not need to send the model to other computing devices for running, so that occupation of network resources is reduced, and time delay caused by sending the model to other computing devices is also reduced. And the neural network model can be deployed on local equipment to run, and after calculation of all convolutional layers is completed, the final result is written into the external memory again, so that the number of times of accessing the external memory is reduced. The input data is split into a plurality of data blocks, different data blocks can be delivered to different computing engines of the processor for computing, different computing engines can run synchronously, and the running efficiency is improved.
It is understood that, since the convolution kernel and the data in the input data are calculated, each time the data involved in the calculation and the data involved in the previous calculation overlap, as shown in fig. 2, the data involved in the calculation in the input image when (a) in fig. 2 obtains a calculation result 1 and the data involved in the calculation in the input image when (B) in fig. 2 obtains a calculation result 2 overlap. Therefore, when the operation method of the neural network model provided by the application determines the data block corresponding to each sub-result in the input data, the data block corresponding to the first sub-result can be determined according to the model parameters of each calculation layer and the size of the first sub-result. And then determining the overlapping area of the data block corresponding to each sub-result and the data block corresponding to the previous sub-result in different dimension directions to obtain the data block of each sub-result. Specifically, the overlap region may first calculate an overlap region existing in the input of the last convolutional layer of the adjacent data blocks in the different dimension directions according to the input size of the previous data block in the different dimension directions, the size of the output result of the previous data block in the last convolutional layer, and the model parameter of each convolutional layer, and then sequentially determine the size of the overlap region of the adjacent data blocks in the input (i.e., input data) of the first convolutional layer upward from the last convolutional layer, thereby obtaining the starting position of the data block corresponding to the sub-calculation result in the input data. And then determining the size of the corresponding data block according to the size of the sub-result, and determining the range of the data block corresponding to the sub-result according to the initial position of the data block corresponding to the sub-result in the input data and the size of the data block. The determination method of the data block corresponding to each sub-result in the input data will be described below, and will not be described herein again.
For example, as shown in FIG. 5, the calculation result is A 3 0-A 3 4 five data, if A is equal 3 0-A 3 2 as sub-result 1, A 3 3-A 3 4 as sub-result 2. Then, according to sub-result 1 and the model parameters of the second convolution layer, it can be determined that the data participating in the calculation of sub-result 1 in the second convolution layer should be a 2 0-A 2 3, according to the sub-result 2 and the model parameters of the second convolution layer, it can be determined that the data participating in the calculation of the sub-result 2 in the second convolution layer should be A 2 3-A 2 5, that is, there is an overlapping area A 2 3。
According to A 2 0-A 2 3, model parameters of the first layer convolution layer, can determine the input data to participate in the calculation to obtain A 2 0-A 2 Data of 3 should be A 1 0-A 1 5, according to A 2 3-A 2 5, model parameters of the first layer convolution layer, can be determined to participate in calculation in the input data to obtain A 2 3-A 2 5 should be A 1 4-A 1 8, that is, there is an overlapping area A 1 4-A 1 5。
That is, for splitting the calculation result into a plurality of sub-results, there may be an overlapping region in the input data corresponding to the sub-results, so to avoid data omission, it is necessary to calculate the overlapping region between the data blocks, and determine the start position of the next data block according to the overlapping region.
Similarly, for two-dimensional input data, in order to ensure that data is not subjected to missing calculation when convolution operation is performed on each data block, it is necessary to determine an overlapping area between adjacent data blocks in different dimensions according to division of calculation results. For example, as shown in fig. 6, the input data is two-dimensional data of 8 × 8, and the input data is subjected to multi-layer convolution calculation. If the electronic device 100 determines to divide the calculation result into 2 sub-results in the height direction (light) and the width direction (width) according to the size of the calculation result, that is, the electronic device 100 will obtain 4 sub-results respectively, the electronic device 100 calculates 4 data blocks respectively corresponding to the input data, and the range of the corresponding data block 1 can be determined according to the model parameter of each layer according to the size of the sub-result 1, for example, the range of the data block 1 is as shown in (a) of fig. 6.
When determining the area of the data block 2 corresponding to the sub-result 2, the electronic device 100 calculates the size of the overlapping area between the data block 2 and the previous data block in the two dimensions of the height direction and the width direction. The previous data block of the data block 2 in the width direction is the data block 1, and the size of the overlapping area of the data block 2 and the data block 1 in the width direction can be determined according to the width of the data block 1, the width of the output result of the data block 1 and the value of each layer of convolutional layer model parameters in the width direction. Since the data block 2 is also the first data block in the height direction, the data block 2 corresponds to the same range as the data block 1 in the height direction, and the overlapping area between the data block 2 and the data block 1 in the height direction is 0. For example, when the overlapping area size of the data block 1 and the data block 2 in the width direction is 1, the area range of the data block 2 is as shown in (B) in fig. 6.
When the electronic device 100 determines the area of the data block 3 corresponding to the sub-result 3, if the previous data block of the data block 3 in the height direction is the data block 1, the size of the overlapping area between the data block 3 in the height direction and the data block 1 can be determined according to the height of the data block 1, the height of the output result of the data block 1, and the value of the convolution layer model parameter of each layer in the height direction. Since the data block 3 is also the first data block in the width direction, the data block 3 has the same range in the width direction as the data block 1. For example, when the overlapping area size of the data block 3 and the data block 1 in the height direction is 1, the area range of the data block 3 is as shown in (C) in fig. 6.
When the electronic device 100 determines the area of the data block 4 corresponding to the sub-result 4, if the previous data block of the data block 4 in the height direction is the data block 2, the size of the overlapping area between the data block 4 and the data block 1 in the height direction can be determined according to the height of the data block 2, the height of the output result of the data block 2, and the value of the convolution layer model parameter of each layer in the height direction. If the previous data block of the data block 4 in the width direction is the data block 3, the size of the overlapping area of the data block 4 and the data block 3 in the width direction can be determined according to the width of the data block 3, the width of the output result of the data block 3 and the value of the convolution layer model parameter of each layer in the width direction. For example, when the overlapping area size of the data block 4 and the data block 3 in the width direction is 1 and the overlapping area size of the data block 4 and the data block 2 in the height direction is 1, the area range of the data block 4 is as shown in (D) of fig. 6.
The electronic device 100 in this embodiment of the application may be a virtual reality device such as a mobile phone, a tablet computer, a wearable device, an in-vehicle device, a notebook computer, a VR/AR, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or a special camera (e.g., a single lens reflex camera, a card camera), and the like, and the specific type of the terminal device is not limited in this application.
With reference to fig. 7, a detailed description is given below of a flow of an operation method of the neural network model provided in the present application, where the method is applied to the electronic device 100, and the method is described by using the electronic device 100 to perform multi-layer convolution and/or deconvolution operations, and the method specifically includes:
s710: and determining the size of the output result according to the size of the input data and the model parameters of each calculation layer.
The input data refers to preprocessed data obtained by the electronic device 100 according to the CNN input layer, that is, data input to the computation layer. The computation layer may be a convolution layer or a deconvolution layer, and the model parameter may be a correlation parameter required for convolution operation or a correlation parameter required for deconvolution operation, such as kernel size and stride. In some embodiments, the input data is also subject to padding operations.
The electronic device 100 may determine the size of the output result obtained after the input data is calculated by all the calculation layers according to the size of the input data and the model parameters of each calculation layer.
The calculation process of the size of the obtained output result will be described in detail below. The calculation process of the output result in the width direction can specifically refer to the following formula (1), where width (t) is the width of the input data of the t-th layer, out w (t) is the width of the data after the convolution operation is carried out on the t-th layer input data, kernel _ size w (t) size in width direction of convolution kernel for t-th layer convolution operation, pad w (t) represents the data size of the t-th layer width direction padding pad, stride w And (t) is the step size of the t-th layer convolution operation.
Figure BDA0003842505990000071
It should be understood that in the embodiments of the present application, the calculation process is described by taking the width as an example, and the calculation process of the height or the depth may refer to the calculation method of the width, which is not described again.
S720: and determining the sizes of the plurality of sub-results according to the size of the output result.
According to the size of the output result, the electronic device 100 determines how many sub-results the output result is to be divided into, and further determines the number of the data blocks, so that the electronic device 100 calculates the data blocks respectively to obtain the corresponding sub-results. For example, the electronic device 100 obtains a two-dimensional matrix with a size of 16 × 16 of the output result, the electronic device 100 may divide the output result into 4 sub-results according to the size, and then the electronic device 100 calculates the data blocks of the input data corresponding to the 4 sub-results, so as to obtain the 4 sub-results respectively.
It should be understood that when the electronic device 100 divides the sub-results, the division should be the same in the same dimension direction, for example, for two-dimensional input data, the output result is divided into 2 blocks in the height direction and 3 blocks in the width direction, then the heights of the sub-results in the same row should be the same, and the widths of the sub-results in the same column should be the same.
S730: and determining the area of each data block corresponding to each sub-result in the input data according to the size of the sub-result.
The electronic device 100 determines the size of each data block according to the size of each sub-result, then determines the overlapping area between the data block and the adjacent data block in the different dimension directions according to the size of each data block, the size of the corresponding sub-result, and the model parameter of each layer of computation layer, then determines the starting position of the data block corresponding to the sub-result in the input data according to the overlapping area between the data block in the different dimension directions and the previous data block, and further determines the range of the second data block according to the starting position of the data block corresponding to the sub-result in the input data and the size of the data block.
The method specifically comprises the following steps: the electronic device 100 determines the size of each data block according to the size of each sub-result, then determines the previous data block of each data block in different dimension directions, calculates an overlapping region existing in the input of the last layer of convolutional layer between the data block and the previous data block in different dimension directions according to the size of the previous data block in different dimension directions, the size of the sub-result corresponding to the previous data block, and the model parameter of each layer of convolutional layer, and then sequentially determines the size of the overlapping region of the adjacent data blocks in the input (i.e., the input data) of the first layer of convolutional layer from the last layer of convolutional layer upwards. The initial position of the data block corresponding to the sub-result in the input data is determined according to the overlapping area between the data block and the previous data block in different dimension directions, and then the range of each data block can be determined.
The following describes in detail the overlapping area of a data block and a previous data block in different dimension directions when the electronic device 100 performs a convolution operation at a calculation layer, and a calculation process of a start position of the data block.
The following overlaps in the width direction w (t) the calculation of the overlap area is illustrated, it being understood that the calculation in the height and depth directions may refer to the calculation in the width direction. Size overlap of overlapping area of adjacent data blocks in width direction w The specific calculation process of (T) can refer to the following formula (2), wherein T is more than or equal to 1 and less than or equal to T, and T is a positive integer, and T is the number of layers of the convolutional layer. w is a w (t) is the width of the input data corresponding to the previous data block on the t-th convolutional layer, w w The calculation manner of (t) can refer to the following formula (3). pos w (t + 1) is the size of the non-overlapping part between the calculation result output by the previous data block after the t-th convolutional layer and the calculation result output by the next data block after the t-th convolutional layer, pad w (t) is the number of pads in the width direction of the previous block in the t-th convolutional layer.
overlap w (t)=w w (t)-pos w (t+1)×stride w (t)+pad w (t) (2)
w w (t)=(out w (t)-1)×stride w (t)+kernel_size w (t)-pad w (t) (3)
According to overlap w (t) and the last data position of the previous block l w (t) the start position p of the next data block in the t-th convolutional layer in the width direction can be determined w (t), the following formula (4) can be specifically referred to.
p w (t)=l w (t)-overlap w (t) (4)
According to out w (t-1) and overlap w (t) pos can be determined w (t) wherein out w (t-1) is equal to w w (t) then adding pos w (t) substituting into equation (2) above, and so on, the starting location of each data block in the first layer convolutional layer can be finally determined.
In some embodiments, when the electronic device 100 performs a pooling operation on the data block in addition to the convolution operation on the data block, but there may be a case where the size of the data input to the pooling layer is not an integral multiple of the kernel _ size in the pooling operation, the pooling layer may also perform pooling on the remaining data which is not enough for the kernel _ size, that is, perform a round-robin (ceil) operation on the input data. In the above case, the redundant data due to ceil also needs to be correspondingly subtracted according to the result calculated by formula (3).
The following describes in detail the calculation process of the overlap area of the data block and the previous data block in the different dimension directions and the start position of the data block when the electronic device 100 performs the deconvolution operation.
The calculation process of the size of the overlapping area of the data block with the previous data block in the width direction is exemplified. According to the size of the output result of the current data block and the size of the convolution kernel _ size w ' (t) determines the data size of the input deconvolution operation, wherein the calculation process in the width direction specifically refers to the following formula (5).
Figure BDA0003842505990000091
W _ int in the above equation (5) w (t) is the intermediate value in the width direction in the calculation process of the size of the input data to which the deconvolution operation of the layer is input, stride _ int w Equal to 1,newpad w ' (t) is determined according to kernel _ size in the width direction of the deconvolution operation and the actual number of padding. newpad w ' (t) can be calculated by referring to the following equation (6).
newpad w ’(t)=kernel_size w ’(t)-pad w ’(t)-1 (6)
Then, the intermediate value overlap _ int in the width direction in the repetitive region size calculation process is determined with reference to formula (7) w (t), the final repeat region size also needs to be in accordance with overlap _ int w (t) is calculated to determine.
overlap_int w (t)=w_int w (t)-pos’(t)×stride_int w +newpad’(t) (7)
To go upW _ int obtained by the calculation w (t) and overlap _ int w (t) adjusting to obtain the width-wise magnitude w of data actually input for deconvolution operation w ' (t) and overlap w ' (t). According to w _ int w (t) adjustment to obtain w w ' (t) can be referred to the following equation (8) according to overlap _ int w (t) adjustment to obtain overlap w ' (t) can be referred to the following equation (9) where left is initially equal to 0 w ' (t) is the step size of the t-th layer deconvolution operation.
Figure BDA0003842505990000092
Figure BDA0003842505990000093
Wherein, (w _ int) w (t)-left)//stride w ' (t) denotes w _ int w (t) -left value divided by stride w ' (t) and rounded down, stride w ' (t)! =0 denotes stride w ' (t) is not equal to 0, (w _ int w (t)-left)%stride w ' (t) denotes w _ int w Value pair stride of (t) -left w ' (t) remainder.
Then adjusting the intermediate parameter remaining in the width direction w ,remain w The following formula (10) can be referred to for the calculation process of (c).
Figure BDA0003842505990000094
And adjusting to obtain new left according to the intermediate parameter, wherein the adjusted left can refer to the following formula (11), and the adjusted left is used for calculating the deconvolution repeated region of the next layer.
left(t-1)=remain w (t)-1+stride w ’(t)×(remain w (t)==0) (11)
Finally, referring to equation (4), the data block and the previous data block may be referred to in the width directionOverlap region size overlap of w ' (t) determines the start position of the data block.
In addition, according to the operation method of the neural network model provided by the application, a plurality of data blocks are determined according to the input data, and the computation of padding of the middle block is also adjusted, wherein the padding on the left side of the data block is taken as pad _ l, and the padding on the right side of the data block is taken as pad _ r, for example, in the width direction. The calculation process of pad _ r can refer to the following formula (12), and the calculation process of pad _ l can refer to the following formula (13), wherein w out ' (t) is the width of the calculation result of the data block output in the t layer, w pos ' (t) is the width of the input data corresponding to the t-th layer convolution layer of the data block.
pad_r=kernel_size w ’(t)-(w out ’(t)+pad_l-(w pos ’(t)-1)×stride w ’(t)) (12)
Figure BDA0003842505990000101
S740: and respectively calculating each data block to obtain corresponding sub-results, and combining all the sub-results to obtain an output result.
The electronic device 100 calculates each data block according to the model parameter of each calculation layer, specifically, performs convolution operation or deconvolution operation on the first data block according to the model parameter, and further obtains a corresponding sub-result. The electronic device 100 combines the sub-results to obtain a final calculation result.
In some embodiments, the electronic device 100 may further respectively deliver each data block to different computing engines of the processor for computing, and the different computing engines may run synchronously, so as to improve the running efficiency.
In some embodiments, after the electronic device 100 calculates each data block to obtain the final calculation result, the pooling operation is performed on the first data block.
In summary, according to the operation method of the neural network model provided in the present application, the electronic device 100 only calculates part of the data in the input data each time, and does not generate more intermediate data at one time, thereby reducing the memory usage in the operation process of the neural network model. The electronic device 100 does not need to send the model to other computing devices for running, so that occupation of network resources is reduced, and time delay caused by sending the model to other computing devices is also reduced. And the neural network model can be deployed on local equipment to run, and after calculation of all convolutional layers is completed, the final result is written into the external memory again, so that the number of times of accessing the external memory is reduced. The input data is divided into a plurality of data blocks, different data blocks can be sent to different computing engines of the processor for computing, different computing engines can run synchronously, and the running efficiency is improved.
For ease of understanding, the following description will exemplify a calculation process of the start position of the data block in the width direction when the electronic apparatus 100 performs the convolution operation. Here, an example in which CNN has three convolutional layers (conv 1, conv2, and conv 3) will be described.
First, the input data are data 0-data 32, i.e., the input data width is equal to 33 w (1)=2,kernel_size w (1) The model parameter of =3,conv2 is stride w (2)=1,kernel_size w (2) The model parameter of =3,conv3 is stride w (3)=2,kernel_size w (3) =3, pad _ l = pad _ r =1 for both left and right sides of the padding, it can be determined from equation (1) that the final output result of the input data is 9 units S 3 0-S 3 9, it should be understood that S is described above 3 0-S 3 9 is merely used for illustration and does not represent a specific numerical value, and the electronic device 100 does not obtain a specific numerical value of an output result at this time.
The electronic device 100 presets that the output result is divided into 2 sub-results, wherein the sub-result A is 4 units S 3 0-S 3 3, sub-result B is 5 units S 3 4-S 3 9。
The following first explains the calculation process of the first data block corresponding to the sub-result a.
As shown in FIG. 8A, the model parameter of conv3 is stride w (3)=2,kernel_size w (3) =3, and the padding on the left and right sides is pad _ l = pad _ r =1, because conv3 outputs S 3 0-S 3 3, i.e. the output result size is 4 units, S in the input data of which the first data block corresponds to conv3 can be directly determined by the above formula (3) 2 0-S 2 The size of the input data of 7,conv3 is 8 units.
As shown in FIG. 8B, the conv3 input data is conv2 output data, and the model parameter of conv2 is stride w (2)=1,kernel_size w (2) =3, the padding on the left and right sides is pad _ l = pad _ r =1, because conv2 outputs S 2 0-S 2 7, it can be determined that the first data block corresponds to S in the input data of conv2 through the above formula (3) 1 0-S 1 And 8, the size of the input data is 8 units.
As shown in FIG. 8C, the conv2 input data is conv1 output data, and the model parameters of conv1 are stride w (1)=2,kernel_size w (1) =3, the padding on the left and right sides is pad _ l = pad _ r =1, because conv1 outputs S 1 0-S 1 8, it can be determined by the above equation (3) that the first data block size is 18 units, i.e., data 0 to data 17.
According to the above calculation result, the electronic device 100 determines the start position of the second data block corresponding to the calculation sub-result B.
As shown in fig. 9A, the result is S output according to the first data block conv3 3 0-S 3 3, the size pos (4) = out (3) =4 of the output result can be determined, and the size of conv3 input data can be determined as w (3) =8 according to the formula (3) and out (3) =4, stride (3) =2, kernel_size (3) =3, and pad (3) =1. Then, according to the formula (2) and substituting the above numerical values, overlap (3) =1 can be determined again. That is, the input data of the second data block corresponding to conv3 overlaps the input data of the first data block corresponding to conv3 by 1 unit, and it can be determined according to the formula (4) that the start position of the input data of the second data block corresponding to conv3 is shifted forward by one unit compared with the input data of the first data block corresponding to conv3, and then the start position of the input data of the second data block corresponding to conv3 is shifted forward by one unitPosition S 2 7。
As shown in fig. 9B, the result is S output according to the first data block conv2 2 0-S 2 7, the start position of the second data block in conv3 is S 2 7, the size pos (3) =7, out (2) =8 of the output result can be determined, and the size of conv2 input data can be determined as w (2) =9 according to the formula (3) and out (2) =8, stride (2) =1, kernel_size (2) =3, pad (2) =1. Then, according to the formula (2) and substituting the above numerical values, overlap (2) =3 can be determined again. That is, the input data of the second data block corresponding to conv2 overlaps with the input data of the first data block corresponding to conv2 by 3 units, and it can be determined according to equation (4) that the start position of the input data of the second data block corresponding to conv2 is shifted forward by 3 units compared with the input data of the first data block corresponding to conv2, and the start position of the second data block at conv2 is S 1 6。
As shown in fig. 9C, the result is S output according to the first data block conv1 1 0-S 1 8, the start position of the second data block in conv2 is S 1 6, the size pos (2) =6,out (1) =9 of the output result may be determined, and the size of conv3 input data may be determined as w (1) =18 according to formula (3) and out (1) =9, stride (2) =2, kernel_size (2) =3, pad (2) =1. Then, according to the formula (2) and substituting the above numerical values, overlap (1) =7 can be determined again. That is, the input data of the second data block corresponding to conv1 overlaps with the input data of the first data block corresponding to conv1 by 3 units, and it can be determined according to equation (4) that the start position of the input data of the second data block corresponding to conv2 is shifted forward by 3 units compared with the input data of the first data block corresponding to conv1, and the start position of the second data block in conv1 is data 11.
Therefore, the electronic device 100 can reversely derive the starting position of the second data block in the input data according to the size of the calculation result of the first data block, so that when the electronic device 100 calculates the second data block after calculating the first data block, the data of the middle overlapping part of the first data block and the second data block is not missed. Furthermore, the electronic device 100 only needs to calculate part of the data in the input data each time, and does not generate more intermediate data at one time, thereby reducing the memory occupation during the CNN operation. The electronic device 100 does not need to send the CNN to other computing devices for running, thereby reducing occupation of network resources and reducing time delay caused by sending the CNN to other computing devices.
And the CNN can be deployed on local equipment to run, and after the calculation of all the convolution layers is completed, the final result is written into the external memory again, so that the number of times of accessing the external memory is reduced. The input data is split into a plurality of data blocks, different data blocks can be delivered to different computing engines of the processor for computing, different computing engines can run synchronously, and the running efficiency is improved.
In order to solve the problem that the operation of the neural network model occupies a large amount of memory, the present application provides an operation apparatus 1000 of the neural network model, which includes a prediction unit 1010, a determination unit 1020, and a transmission unit 1030.
The prediction unit 1010 predicts the magnitude of the calculation result tensor corresponding to the input data tensor according to the magnitude of the input data tensor and the model parameter of each layer of calculation layer;
the determining unit 1020 is configured to determine the magnitudes of the multiple sub-result tensors according to the magnitude of the computation result tensor; the determining unit 1020 is further configured to determine a range of a corresponding data block of each sub-result tensor in the input data tensor based on the size of each sub-result tensor and the model parameter of each layer of the computation layer; the determining unit 1020 is further configured to calculate each data block according to the model parameter of each layer of the calculation layer to obtain a sub-result tensor corresponding to each data block; the determining unit 1020 is further configured to determine a computed result tensor corresponding to the input data tensor according to the sub-result tensor corresponding to each data block.
In some embodiments, the computation layer is a convolution layer or a deconvolution layer, the convolution layer is used for performing convolution operation on the input data tensor, and the deconvolution layer is used for performing deconvolution operation on the input data tensor; the model parameters include the step size of the convolution or deconvolution operation, the convolution kernel size, and the number of filler blocks.
In some embodiments, the determining unit 1020 is further configured to determine a size of each data block according to the size of each sub-result tensor and the model parameters of each layer of the computation layer; determining the size of an overlapping area between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each layer of calculation layer; and determining the range of the data block corresponding to each sub-result tensor in the input data tensor according to the size of each data block and the size of an overlapping area between adjacent data blocks.
In some embodiments, the determining unit 1020 is further configured to determine, according to the size of the previous sub-result tensor in the different dimension direction of each sub-result tensor and the model parameter of each layer of the calculation layer, the size of the overlapping area between the data block corresponding to the current sub-result tensor and the data block corresponding to the previous sub-result tensor in the different dimension direction.
In some embodiments, the computation layers have M layers in common, and the determining unit 1020 is further configured to determine the size of the overlapping region of the adjacent data blocks in the input data tensor corresponding to the M-th computation layer according to the size of each sub-result tensor and the model parameters of the M-th computation layer; and taking the input data tensor of the M-th layer computing layer as the output data of the M-1-th layer computing layer, and determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the M-1-th layer computing layer until determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the first layer computing layer.
In some embodiments, the sending unit 1030 sends the multiple data blocks to different computation engines, and the different computation engines respectively compute the received data blocks according to the model parameters of each computation layer to obtain the sub-result tensor corresponding to each data block.
In some embodiments, the compute engine is a compute engine of a processor of the electronic device, or the compute engine is a compute engine of a processor of another electronic device.
Therefore, the operation device of the neural network model can reversely deduce the initial position of the second data block in the input data according to the size of the calculation result of the first data block, and when the operation device of the neural network model calculates the second data block after calculating the first data block, the operation device of the neural network model does not neglect to calculate the data of the middle overlapping part of the first data block and the second data block. Furthermore, the operation device of the neural network model only calculates partial data in the input data each time, so that more intermediate data cannot be generated at one time, and the occupation of a memory in the CNN operation process is reduced. The operation device of the neural network model does not need to send the CNN to other computing equipment for operation, so that the occupation of network resources is reduced, and the time delay caused by sending the CNN to other computing equipment is also reduced. And the CNN can be deployed on local equipment to run, and after the calculation of all the convolution layers is completed, the final result is written into the external memory again, so that the number of times of accessing the external memory is reduced. The input data is divided into a plurality of data blocks, different data blocks can be sent to different computing engines of the processor for computing, different computing engines can run synchronously, and the running efficiency is improved.
Further, fig. 11 illustrates a schematic structural diagram of an electronic device 100, according to some embodiments of the present application. As shown in fig. 11, the electronic device 100 includes one or more processors 101A, NPU 101B, system Memory 102, non-Volatile Memory (NVM) 103, a communication interface 104, an input/output (I/O) device 105, and system control logic 106 for coupling the processors 101A, the system Memory 102, the NVM 103, the communication interface 104, and the I/O device 105. Wherein:
the Processor 101A may include one or more Processing units, for example, a Processing module or Processing circuit that may include a Central Processing Unit CPU (Central Processing Unit), a Graphics Processing Unit GPU (Graphics Processing Unit), a Digital Signal Processor DSP (Digital Signal Processor), a microprocessor MCU (Micro-programmed Control Unit), an AI (Artificial Intelligence) Processor, or a Programmable logic device FPGA (Field Programmable Gate Array) may include one or more single-core or multi-core processors.
The neural network processor 101B may be configured to perform inference of the neural network model and execute instructions corresponding to the operation method of the neural network model provided in the embodiment of the present application. The neural network processor 101B may be a stand-alone processor or may be integrated within the processor 101A.
The system Memory 102 is a volatile Memory, such as a Random-Access Memory (RAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like. The system memory is used for temporarily storing data and/or instructions, for example, in some embodiments, the system memory 102 may be used for storing related instructions of the neural network model, calculation results of the data blocks, and the like.
Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a Hard Disk Drive (HDD), compact Disc (CD), digital Versatile Disc (DVD), solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like.
In particular, system memory 102 and non-volatile storage 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: when executed by at least one of the processors 101A and/or 101B, causes the electronic device 100 to implement the method of operation of the neural network model provided by the embodiments of the present application.
The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the electronic device 100 to communicate with any other suitable device over one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the electronic device 100, for example, the communication interface 104 may be integrated in the processor 101A. In some embodiments, the electronic device 100 may communicate with other devices through the communication interface 104, for example, the electronic device 100 may obtain the neural network model to be run from other electronic devices through the communication interface 104.
Input/output (I/O) device 105 may include an input device such as a keyboard, mouse, etc., an output device such as a display, etc., and a user may interact with electronic device 100 through input/output (I/O) device 105.
System control logic 106 may include any suitable interface controllers to provide any suitable interfaces for the other modules of electronic device 100. For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface to system memory 102 and non-volatile memory 103.
In some embodiments, at least one of the processors 101A may be packaged together with logic for one or more controllers of the System control logic 106 to form a System In Package (SiP). In other embodiments, at least one of the processors 101A may also be integrated on the same Chip with logic for one or more controllers of the System control logic 106 to form a System-on-Chip (SoC).
It is understood that the electronic device 100 may be any electronic device capable of running a neural network model, including but not limited to a mobile phone, a wearable device (e.g., a smart watch, etc.), a tablet, a desktop, a laptop, a handheld computer, a notebook, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device, etc., and the embodiments of the present application are not limited thereto.
It is understood that the configuration of electronic device 100 shown in fig. 11 is merely an example, and in other embodiments, electronic device 100 may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media.
Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-Only memories (CD-ROMs), magneto-optical disks, read-Only memories (ROMs), random Access Memories (RAMs), erasable Programmable Read-Only memories (EPROMs), electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or a tangible machine-readable Memory for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in an electrical, optical, acoustical or other form of propagated signal.
Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodological feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that in the examples and specification of this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (10)

1. An operation method of a neural network model applied to an electronic device, wherein the neural network model comprises a plurality of computing layers, the method comprising:
predicting the size of a calculation result tensor corresponding to the input data tensor according to the size of the input data tensor and the model parameters of each layer of calculation layer;
determining the sizes of a plurality of sub-result tensors according to the size of the calculated result tensor;
determining a range of a corresponding data block of each sub-result tensor in the input data tensor based on a size of the each sub-result tensor and model parameters of the each layer of computation layer;
calculating each data block according to the model parameters of each calculation layer to obtain a sub-result tensor corresponding to each data block;
and determining a calculation result tensor corresponding to the input data tensor according to the sub-result tensor corresponding to each data block.
2. The method of claim 1,
the computation layer is a convolution layer or an deconvolution layer, the convolution layer is used for performing convolution operation on the input data tensor, and the deconvolution layer is used for performing deconvolution operation on the input data tensor;
the model parameters include the step size of the convolution or deconvolution operation, the convolution kernel size, and the number of filler blocks.
3. The method of claim 2, wherein determining a range of data blocks corresponding to each sub-result tensor in the input data tensor for each sub-result tensor based on a size of each sub-result tensor and model parameters of the per-layer computation layer comprises:
determining the size of each data block according to the size of each sub-result tensor and the model parameters of each layer of calculation layer;
determining the size of an overlapping area between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each layer of calculation layer;
and determining the range of the data block corresponding to each sub-result tensor in the input data tensor according to the size of each data block and the size of the overlapping area between the adjacent data blocks.
4. The method of claim 3, wherein determining the size of the overlap region between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each computed layer comprises:
and determining the size of an overlapping region between a data block corresponding to the current sub-result tensor and a data block corresponding to the previous sub-result tensor in the different dimensionality directions according to the size of the previous sub-result tensor of each sub-result tensor in the different dimensionality directions and the model parameters of each layer of calculation layer.
5. The method of claim 4, wherein the computation layers have M layers, and wherein determining the size of the overlap region between adjacent data blocks according to the size of each sub-result tensor and the model parameters of each computation layer further comprises:
determining the size of an overlapping area of the adjacent data blocks in the input data tensor corresponding to the Mth layer of calculation according to the size of each sub-result tensor and the model parameters of the Mth layer of calculation;
and taking the input data tensor of the M-th layer computing layer as the output data of the M-1-th layer computing layer, and determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the M-1-th layer computing layer until determining the size of the overlapping area of the adjacent data blocks in the input data tensor corresponding to the first layer computing layer.
6. The method of claim 5, wherein calculating each data block according to the model parameters of each computation layer to obtain the sub-result tensor corresponding to each data block comprises:
and respectively sending the data blocks to different calculation engines, wherein the different calculation engines respectively calculate the received data blocks according to the model parameters of each layer of calculation layer to obtain the sub-result tensor corresponding to each data block.
7. The method of claim 6,
the computing engine is a computing engine of the electronic device processor, or the computing engine is a computing engine of another electronic device processor.
8. A readable medium containing instructions that, when executed by a processor of an electronic device, cause the electronic device to implement the method of operating a neural network model of any one of claims 1 to 7.
9. An electronic device, comprising:
a memory to store instructions for execution by one or more processors of an electronic device;
and a processor, which is one of the processors of the electronic device, for executing the instructions to cause the electronic device to implement the operation method of the neural network model according to any one of claims 1 to 7.
10. A computer program product, characterized in that it comprises instructions which, when executed, cause a computer to carry out a method of operation of a neural network model according to any one of claims 1 to 7.
CN202211109253.3A 2022-09-13 2022-09-13 Method for operating neural network model, readable medium and electronic device Pending CN115481717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211109253.3A CN115481717A (en) 2022-09-13 2022-09-13 Method for operating neural network model, readable medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211109253.3A CN115481717A (en) 2022-09-13 2022-09-13 Method for operating neural network model, readable medium and electronic device

Publications (1)

Publication Number Publication Date
CN115481717A true CN115481717A (en) 2022-12-16

Family

ID=84392826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211109253.3A Pending CN115481717A (en) 2022-09-13 2022-09-13 Method for operating neural network model, readable medium and electronic device

Country Status (1)

Country Link
CN (1) CN115481717A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700995A (en) * 2023-08-03 2023-09-05 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700995A (en) * 2023-08-03 2023-09-05 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool
CN116700995B (en) * 2023-08-03 2023-11-03 浪潮电子信息产业股份有限公司 Concurrent access method, device, equipment and storage medium for heterogeneous memory pool

Similar Documents

Publication Publication Date Title
US11907830B2 (en) Neural network architecture using control logic determining convolution operation sequence
US20220261615A1 (en) Neural network devices and methods of operating the same
CN109919311B (en) Method for generating instruction sequence, method and device for executing neural network operation
US20210224125A1 (en) Operation Accelerator, Processing Method, and Related Device
US11816559B2 (en) Dilated convolution using systolic array
KR20200091623A (en) Method and device for performing convolution operation on neural network based on Winograd transform
US11675507B2 (en) Method and apparatus for allocating memory space for driving neural network
KR20180050928A (en) Convolutional neural network processing method and apparatus
US20230359876A1 (en) Efficient utilization of processing element array
CN113673701A (en) Method for operating neural network model, readable medium and electronic device
CN107909537B (en) Image processing method based on convolutional neural network and mobile terminal
CN111465943A (en) On-chip computing network
WO2020233709A1 (en) Model compression method, and device
CN111767986A (en) Operation method and device based on neural network
US11568323B2 (en) Electronic device and control method thereof
CN115481717A (en) Method for operating neural network model, readable medium and electronic device
CN112884137A (en) Hardware implementation of neural network
US11501145B1 (en) Memory operation for systolic array
US20220043630A1 (en) Electronic device and control method therefor
CN113630375A (en) Compression apparatus and method using parameters of quadtree method
JP7108702B2 (en) Processing for multiple input datasets
WO2023272432A1 (en) Image processing method and image processing apparatus
CN115294361A (en) Feature extraction method and device
US20220188612A1 (en) Npu device performing convolution operation based on the number of channels and operating method thereof
CN112884138A (en) Hardware implementation of neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination