CN113052292A

CN113052292A - Convolutional neural network technology method, device and computer readable storage medium

Info

Publication number: CN113052292A
Application number: CN201911376391.6A
Authority: CN
Inventors: 徐兵; 张楠赓
Original assignee: Canaan Bright Sight Co Ltd
Current assignee: Beijing Sisheng Technology Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-06-29
Anticipated expiration: 2039-12-27
Also published as: CN113052292B

Abstract

The invention provides a method and a device for calculating a convolutional neural network and a computer readable storage medium, wherein the method comprises the following steps: determining a data volume ratio of the input image data and the weight data for a target convolutional layer of a convolutional neural network; and determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer. By using the method, the preset processing mode suitable for the target convolution layer is determined according to the data volume ratio, and the access bandwidth requirement on the external memory can be reduced.

Description

Convolutional neural network technology method, device and computer readable storage medium

Technical Field

The invention belongs to the field of deep learning, and particularly relates to a convolutional neural network computing method and device and a computer readable storage medium.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

For a common embedded platform, without considering the external memory, its chip itself does not have a large enough storage space for storing these input/output characteristic diagrams (intermediate results of operations), and huge weighting parameters, so that the external memory (usually DRAM) and the chip interior have inevitable frequent data transmission.

Therefore, large-scale data exchange between the internal and external memories of the chip may cause a large amount of power consumption waste during convolution calculation.

Disclosure of Invention

The problem that data exchange between the chip internal memory and the external memory is too frequent in the prior art is solved. The embodiment of the invention provides a convolutional neural network computing method, a convolutional neural network computing device and a computer readable storage medium. With this method and apparatus, the above-mentioned problems can be solved.

The following schemes are provided in the examples of the present invention.

In a first aspect, a method for calculating a convolutional neural network is provided, including: determining a data volume ratio of the input image data and the weight data for a target convolutional layer of a convolutional neural network; and determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

In one possible embodiment, if the data amount ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer comprises: reading the weight data and a first patch of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the first patch; thereafter, the second block of the input image data is read from the external memory, so that convolution calculation is performed based on the weight data and the second block.

In one possible embodiment, the method further comprises: after convolution calculation is carried out on the basis of the weight data and the first blocks, intermediate results of the target convolution layer corresponding to the first blocks are obtained, and the intermediate results are stored in an internal memory of the calculation platform; the intermediate results are read from the internal memory and directly participate in the convolution calculation of the next layer of the target convolution layer.

In one possible embodiment, the method further comprises: if the weight data is smaller than the preset threshold, after the weight data is read from the external memory, caching the weight data in an internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold, repeatedly executing the operation of reading the weight data from the external memory.

In one possible embodiment, if the data amount ratio is smaller than a second threshold value, and the second threshold value is smaller than the first threshold value, determining the preset processing mode of the target convolutional layer includes: reading a first part of the weight data from the external memory, sequentially reading each block of the input image data from the external memory for the first part of the weight data, thereby sequentially performing convolution calculation based on the first part of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.

In one possible embodiment, the method further comprises: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.

In one possible embodiment, the preset processing mode of the target convolutional layer further includes: reading at least a portion of the weight data from an external memory and caching in an internal memory of the computing platform; sequentially reading each block of input image data of consecutive frames from an external memory; at least a part of the weight data stored in the internal memory is multiplexed and sequentially subjected to convolution calculation with each block of the input image data of the consecutive frames.

In one possible implementation, the number of frames of consecutive multiframes is determined according to the storage space of the external memory.

In a second aspect, a convolutional neural network computing device is provided, including: a ratio determination module for determining a data amount ratio of the input image data and the weight data for a target convolutional layer of the convolutional neural network; and the mode determining module is used for determining the preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

In one possible embodiment, the mode determination module is further configured to: after convolution calculation is carried out on the basis of the weight data and the first blocks, intermediate results of the target convolution layer corresponding to the first blocks are obtained, and the intermediate results are stored in an internal memory of the calculation platform; the intermediate results are read from the internal memory and directly participate in the convolution calculation of the next layer of the target convolution layer.

In one possible embodiment, the mode determination module is further configured to: if the weight data is smaller than the preset threshold, after the weight data is read from the external memory, caching the weight data in an internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold, repeatedly executing the operation of reading the weight data from the external memory.

In one possible embodiment, the apparatus is further configured to: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.

In a third aspect, a computing device of a neural network is provided, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: determining a data volume ratio of the input image data and the weight data for a target convolutional layer of a convolutional neural network; and determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

In a fourth aspect, there is provided a computer readable storage medium storing a program which, when executed by a multicore processor, causes the multicore processor to perform the method of the first aspect.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: by calculating the data volume ratio of the input image data and the weight data of the target convolutional layer and further determining the preset processing mode suitable for the target convolutional layer according to the data volume ratio, the access bandwidth requirement on an external memory can be reduced.

It should be understood that the above description is only an overview of the technical solutions of the present invention, so as to clearly understand the technical means of the present invention, and thus can be implemented according to the content of the description. In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

The advantages and benefits described herein, as well as other advantages and benefits, will be apparent to those of ordinary skill in the art upon reading the following detailed description of the exemplary embodiments. The drawings are only for purposes of illustrating exemplary embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like elements throughout. In the drawings:

FIG. 1 is a schematic diagram of a computing device of a convolutional neural network;

FIG. 2 is a schematic diagram of a convolutional neural network;

FIG. 3 is a flow chart illustrating a method for computing a convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating splitting of input image data according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a default processing mode according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of another default processing mode according to an embodiment of the present invention;

FIG. 7 is a block diagram of a computing device of a convolutional neural network according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a computing device of a convolutional neural network according to another embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the present invention, it is to be understood that terms such as "including" or "having," or the like, are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility of the presence of one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 is a schematic structural diagram of a computing device of a convolutional neural network. As shown in fig. 1, the computing unit 11 is disposed on the computing platform 10 and is mainly used for computing the neural network, the internal memory 12 is used for caching data or results required by the intermediate computing, the internal memory 12 is usually an SRAM, and such a storage structure is relatively expensive and usually does not use a large capacity, otherwise, the chip cost is too high. The external memory 13 generally refers to a DRAM, a DDR, a NAND Flash, and the like with relatively low cost, and although the external memory 13 has an advantage of low cost, it also has disadvantages of limited access bandwidth and large access power consumption. Due to the above characteristics of internal and external storage, a chip design usually uses a smaller internal memory, configures a larger external memory, and updates data through internal and external data exchange during calculation, so that the data amount of the internal and external data exchange will have a larger influence on the power consumption of the system.

Fig. 2 is a schematic diagram of a convolutional neural network. As shown in fig. 2, convolutional neural network 200 includes a plurality of convolutional layers. When the convolutional neural network shown in fig. 2 is implemented by using the computing chip shown in fig. 1, a layer-by-layer computing method is usually used. For example, the convolution calculation of the nth layer is performed first, the convolution calculation result of the nth layer is exported to the external memory, and then the convolution calculation result of the nth layer is read in and the convolution calculation of the (N + 1) th layer is performed.

FIG. 3 shows a flow diagram of a method 200 of computing a convolutional neural network, according to an embodiment of the present invention. As shown in fig. 3, the method 300 may include:

step 301: determining a data volume ratio of the input image data and the weight data for a target convolutional layer of a convolutional neural network;

step 302: and determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

Specifically, the target convolutional layer may be any one of a plurality of convolutional layers, and this embodiment is not particularly limited thereto. The input image data refers to the input feature map of the target convolutional layer, and the weight data is the convolutional kernel data of the target convolutional layer. In the embodiment of the present invention, different preset processing modes may be set in advance for a case where the data amount of the input image data is large and a case where the data amount of the weight data is large, for example, the calculation of the target convolution layer may be split into the block convolution operation for a case where the data amount of the input image data is large. Further, by first calculating a data amount ratio of the input image data and the weight data of the target convolutional layer, a preset processing mode suitable for the target convolutional layer can be determined according to the data amount ratio. For example, when the data amount ratio is too large, a preset processing mode suitable for large-scale input of image data is selected as a preset processing mode suitable for the target convolution layer; when the data volume ratio is too small, a preset processing mode suitable for large-scale weight data is selected as the preset processing mode suitable for the target convolution layer. Thereby, the access bandwidth requirements to the external memory may be reduced.

For example, taking a common VGG model as an example, table 1 shows the weight data of each convolutional layer, the data size of the input image data, and the data size of the output image data in the VGG model.

Table 1:

convolutional layer index	Weight data (KB)	Input image data (KB)	Output image data (KB)
				1	1.69	147.00	3136.00
2	36.00	3136.00	3136.00
				3	72.00	784.00	1568.00
4	144.00	1568.00	1568.00
				5	288.00	392.00	784.00
6	576.00	784.00	784.00
				7	576.00	784.00	784.00
8	1152.00	196.00	392.00
				9	2304.00	392.00	392.00
10	2304.00	392.00	392.00
				11	2304.00	98.00	98.00
12	2304.00	98.00	98.00
				13	2304.00	98.00	98.00
Total up to	14365.69	8869.00	13230.00

As can be seen from table 1, the input image data and the output image data become gradually smaller with the layer-by-layer convolution calculation, and the weight data portions are opposite. The convolution calculation process is actually a calculation process for gradually reducing the resolution of the feature map and increasing the number of channels of convolution. Therefore, in the embodiment of the present invention, when performing convolution calculation on the 1 st layer to the 6 th layer, since the input image data of the 1 st layer to the 6 th layer is larger than the weight data, the input image data may be partitioned and each partition may be read successively, and based on the partition of the input image data read each time, all calculation processes related to the partition may be performed continuously as much as possible, thereby avoiding repeated reading of large-scale input image data. In contrast, when performing convolution calculation for layers 7 to 13, since the input image data for layers 7 to 13 is smaller than the weight data, the weight data can be divided into a plurality of portions and read out in divided portions, and the entire calculation process involving the portion of the weight data is performed as continuously as possible based on the portion of the weight data read out each time, thereby avoiding repeated reading of large-scale weight data.

Based on the calculation method of the convolutional neural network of fig. 3, some embodiments of the present application also provide some specific embodiments of the method, and an extension scheme, which are described below.

In some possible embodiments, if the data volume ratio is greater than the first threshold, determining the preset processing mode of the target convolutional layer includes: reading the weight data and a first patch of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the first patch; thereafter, the second block of the input image data is read from the external memory, so that convolution calculation is performed based on the weight data and the second block.

For example, assuming that the data amount ratio between the input feature map and the weight data is greater than the first threshold, for example, if the first threshold is set to 1, the size of the input feature map of the target convolutional layer is 224 × 224 × 64, and the size of the weight data is 12 × 12 × 64 × 128, so the data amount ratio between the input feature map and the weight data is greater than 1. As shown in fig. 4 and 5, the input feature map may be partitioned into, for example, a first partition with a size of 72 × 72 × 64, a second partition, …, an nth partition, and so on. Accordingly, all the weight data can be read from the external memory, the first block of the input feature map can be read from the external memory, convolution calculation can be performed using the first block and the weight data to obtain an output result for the first block, then the second block of the input feature map can be read from the external memory, and the above steps can be repeatedly performed, so that the output feature map of the target convolutional layer can be finally obtained by combination.

Alternatively, the size of each partition may be determined by resources such as bandwidth information between the computing platform and external memory, and storage space of internal memory in the computing platform.

Alternatively, the first threshold may be determined by resources such as bandwidth information between the computing platform and the external memory, a storage space of the internal memory, and the like.

It should be understood that, in the present embodiment, by partitioning the input feature map when the input image data is large, and reading only one partition at a time and performing convolution operation, the size of the internal memory required by this scheduling method is small, and it is not necessary to repeatedly read large-size input data, so that the requirement for access bandwidth to the external memory can be reduced.

In some possible embodiments, after performing convolution calculation based on the weight data and the first partition, obtaining an intermediate result of the target convolution layer corresponding to the first partition, and storing the intermediate result in an internal memory of the computing platform; the intermediate results are read from the internal memory and directly participate in the convolution calculation of the next layer of the target convolution layer.

For example, the weight data of the nth layer of the plurality of convolutional layers may be read from the external memory, the first block of the input feature map of the nth layer may be read from the external memory, and convolution calculation as shown in fig. 5 may be performed using the first block and the weight data of the nth layer to obtain the first block output of the nth layer. And then, caching the first block output of the nth layer in an internal memory of the computing platform, reading the weight data of the (N + 1) th layer from the external memory, and performing convolution calculation shown in fig. 5 by using the first block output of the nth layer and the weight data of the (N + 1) th layer to obtain the first block output of the (N + 1) th layer. And reading the second block of the input characteristic diagram of the Nth layer from the external memory, and repeatedly executing the steps to finally obtain the output characteristic diagram of the (N + 1) th layer directly.

It should be understood that, in this embodiment, by taking the block output of the nth layer as the block input of the next layer, which is equivalent to merging the convolution calculations of several consecutive layers together, and performing the convolution calculation of the block based on the merged consecutive layers, the input and output flow of the intermediate data is reduced, and the access bandwidth requirement on the external memory is further reduced.

In some possible embodiments, the method 200 further comprises: if the weight data is smaller than the preset threshold, after the weight data is read from the external memory, caching the weight data in an internal memory of the computing platform until the convolution calculation of the target convolution layer is finished; and if the weight data is larger than the preset threshold, repeatedly executing the operation of reading the weight data from the external memory.

Specifically, the preset threshold is determined by the size of the internal memory. For example, as shown in fig. 4, since the same weight data is used for convolution operations on the first partition, the second partition, …, and the nth partition of the target convolutional layer, if the internal memory of the computing platform is sufficient to buffer the weight data, it is possible to read the weight data once and use the buffered weight data multiple times in convolution operations on multiple partitions, thereby reducing read-in and further reducing the access bandwidth requirement on the external memory. Conversely, if the internal memory of the computing platform is not sufficient to cache the weight data, then the required weight data read from the external memory needs to be repeated for each block of convolution calculations.

In some possible embodiments, if the data amount ratio is smaller than the second threshold, determining the preset processing mode of the target convolutional layer includes: reading a first part of the weight data from the external memory, sequentially reading each block of the input image data from the external memory for the first part of the weight data, thereby sequentially performing convolution calculation based on the first part of the weight data and each block; after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.

Alternatively, the second threshold may be determined by resources such as bandwidth information between the computing platform and the external memory, a storage space of the internal memory, and the like.

For example, if the data amount ratio between the input feature map and the weight data is smaller than the second threshold, for example, if the second threshold is set to 0.8, the size of the input feature map of the target convolutional layer is 224 × 224 × 64, the size of the weight data is 12 × 12 × 64 × 1028, and the data amount ratio between the input feature map and the weight data is larger than 0.8. As shown in fig. 6, the weight data of the target convolutional layer may be split into several parts, such as a first part and a second part. Based on this, a first part of the weight data may be first read from the external memory, and then a first partition of the input feature map may be read from the external memory, and a convolution operation may be performed on the first partition and the first part of the weight data to obtain a first partial output for the first partition, where the first partial output may be output data corresponding to a part of the output channels, such as the first partial output in fig. 6 corresponding to output channel 1 and output channel 2. And then, reading a second block of the input feature map from the external memory, performing convolution operation on the second block and the first part of the weight data to obtain first part output aiming at the second block, and repeating the operation to obtain first part output aiming at each block in the input feature map, so that the first part output aiming at the input feature map, namely the output results corresponding to the output channel 1 and the output channel 2 can be obtained through combination. After the calculation has completed the entire calculation for the first part of the weight data, the second part of the weight data is read from the external memory and the above operation is repeatedly performed, resulting in the second part of the output for the input feature map, i.e. the output results corresponding to output channel 3 and output channel 4. After the above steps are performed once for each part of the weight data, the outputs of each part of the input feature map are combined to obtain the output feature map of the target convolutional layer.

Alternatively, the size of each portion of the weight data may be determined by resources such as bandwidth information between the computing platform and an external memory, and a storage space of an internal memory in the computing platform.

In the embodiment, when the data volume of the weight data is large, only part of the weight data is read each time, and the weight data of other parts is replaced after the execution of the related calculation related to the part of the weight data is completed, so that the repeated reading of large-scale weight data is avoided, and the requirement on the access bandwidth of an external memory is further reduced.

In some possible embodiments, the method further comprises: the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.

Specifically, the four dimensions of the weight data include width, height, number of input channels, and number of output channels. As shown in fig. 6, in the present embodiment, the first part and the at least one second part of the weight data are obtained by performing segmentation based on the dimension of the number of output channels.

In some possible embodiments, the preset processing mode of the target convolutional layer further includes: reading at least a portion of the weight data from an external memory and caching in an internal memory of the computing platform; sequentially reading each block of input image data of consecutive frames from an external memory; at least a part of the weight data stored in the internal memory is multiplexed and sequentially subjected to convolution calculation with each block of the input image data of the consecutive frames.

For example, as shown in fig. 6, a first part of the weight data may be first read from the external memory and buffered in the internal memory of the computing platform, and then the first partition, the second partition, …, and the nth partition of the first frame input feature map may be read from the external memory and subjected to the convolution operation one by one, so as to obtain a first part output for the first frame input feature map. And then, reading the first block, the second block, … and the Nth block of the second frame input feature map from the external memory, and performing convolution operation one by one to obtain a first part output aiming at the second frame input feature map. And then, reading a second part of the weight data from the external memory, and repeatedly executing the steps to obtain output characteristic graphs aiming at the input characteristic graphs of the first frame and the second frame.

It should be understood that this embodiment necessarily causes a certain degree of delay, and thus the number of frames of consecutive frames of input image data can be determined by the degree of delay. In the present embodiment, the convolution operation is performed on the multi-frame input image data by multiplexing the weight data, and the requirement for the access bandwidth to the external memory can be further reduced.

In some possible embodiments, the number of frames of consecutive multiframes is determined according to the storage space of the external memory.

Specifically, the input image data of the above-described consecutive frames cannot exceed the storage space of the external memory because the storage space of the external memory is limited.

Based on the aspects of the embodiments described above, it is possible to select a preset processing mode suitable for a target convolution layer according to a data amount ratio of input image data and weight data, and thereby reduce an access bandwidth requirement to an external memory.

Based on the same or similar technical concept, as shown in fig. 7, an embodiment of the present invention further provides a convolutional neural network computing apparatus 700 for performing the convolutional neural network computing method shown in fig. 3, where the apparatus 700 includes:

a ratio determining module 701, configured to determine a data amount ratio between the input image data and the weight data for a target convolutional layer of the convolutional neural network;

a mode determining module 702, configured to determine a preset processing mode of the target convolutional layer according to the data amount ratio, so that the computing platform performs convolutional calculation of the target convolutional layer based on the preset processing mode of the target convolutional layer.

Fig. 8 is a schematic diagram of a convolutional neural network computing device 800 according to an embodiment of the present application, for performing the convolutional neural network computing method shown in fig. 3, the device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform:

An embodiment of the present application also provides a computer-readable storage medium storing a program that, when executed by a multi-core processor, causes the multi-core processor to perform:

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and computer-readable storage medium embodiments, the description is simplified because they are substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for their relevance.

The apparatus, the computer-readable storage medium and the method provided in the embodiment of the present application are in one-to-one correspondence, and therefore, the apparatus, the device and the computer-readable storage medium also have similar beneficial technical effects to those of the corresponding method.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of computing a convolutional neural network, the method comprising:

determining a data volume ratio of the input image data and the weight data for a target convolutional layer of a convolutional neural network;

and determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

2. The method of claim 1, wherein determining the predetermined processing mode for the target convolutional layer if the data amount ratio is greater than a first threshold comprises:

reading the weight data and a first block of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the first block; after that time, the user can use the device,

reading a second partition of the input image data from an external memory, thereby performing convolution calculation based on the weight data and the second partition.

3. The method of claim 2, further comprising:

after convolution calculation is carried out on the basis of the weight data and the first partition, an intermediate result of the target convolution layer corresponding to the first partition is obtained, and the intermediate result is stored in an internal memory of the calculation platform;

reading the intermediate result from the internal memory and directly participating in the convolution calculation of the next layer of the target convolution layer.

4. The method of claim 2, further comprising:

if the weight data is smaller than a preset threshold value, after the weight data is read from the external memory, caching the weight data in an internal memory of the computing platform until the convolution calculation of the target convolution layer is finished;

and if the weight data is larger than the preset threshold, repeatedly executing the operation of reading the weight data from the external memory.

5. The method of claim 2, wherein if the data volume ratio is less than a second threshold, the second threshold being less than the first threshold, determining the predetermined processing mode for the target convolutional layer comprises:

reading a first portion of the weight data from the external memory, sequentially reading each block of the input image data from the external memory for the first portion of the weight data, thereby sequentially performing convolution calculation based on the first portion of the weight data and each block;

after the convolution calculation related to the first part of the weight data is completed, the second part of the weight data is read from the external memory, and each block of the input image data is sequentially read from the external memory for the second part of the weight data, so that the convolution calculation is sequentially performed based on the second part of the weight data and each block.

6. The method of claim 5, further comprising:

the weight data includes four dimensions, wherein a slicing is performed in an output channel number dimension of the weight data to determine a first portion and at least one second portion of the weight data.

7. The method of claim 1, wherein the preset processing mode of the target convolutional layer further comprises:

reading at least a portion of the weight data from an external memory and caching in an internal memory of the computing platform;

sequentially reading each block of input image data of consecutive frames from an external memory;

multiplexing at least a part of the weight data stored in the internal memory, and sequentially performing convolution calculation with each block of the input image data of the consecutive frames.

8. The method of claim 7, wherein the number of frames of the consecutive multiple frames is determined according to a storage space of the external memory.

9. A convolutional neural network computing device, the device comprising:

a ratio determination module for determining a data amount ratio of the input image data and the weight data for a target convolutional layer of the convolutional neural network;

and the mode determining module is used for determining a preset processing mode of the target convolutional layer according to the data volume ratio, so that the convolution calculation of the target convolutional layer is carried out by the calculation platform based on the preset processing mode of the target convolutional layer.

10. The apparatus of claim 9, wherein if the data volume ratio is greater than a first threshold, determining a preset processing mode for the target convolutional layer comprises:

11. The apparatus of claim 10, wherein the mode determination module is further configured to:

12. The apparatus of claim 10, wherein the mode determination module is further configured to:

13. The apparatus of claim 10, wherein if the data volume ratio is less than a second threshold, the second threshold being less than the first threshold, determining the predetermined processing mode for the target convolutional layer comprises:

14. The apparatus of claim 13, wherein the apparatus is further configured to:

15. The apparatus of claim 9, wherein the preset processing mode of the target convolutional layer further comprises:

16. The apparatus of claim 15, wherein the number of frames of the consecutive multiple frames is determined according to a storage space of the external memory.

17. A computing device for a neural network, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform:

18. A computer-readable storage medium storing a program that, when executed by a multi-core processor, causes the multi-core processor to perform the method of any one of claims 1-8.