CN116596043B

CN116596043B - Convolutional neural network calculation method, system, electronic equipment and storage medium

Info

Publication number: CN116596043B
Application number: CN202310856405.4A
Authority: CN
Inventors: 丁昊杰; 王慧渊; 刘玉宣; 杨飞; 周吉星
Original assignee: Hangzhou Flyslice Technologies Co ltd
Current assignee: Hangzhou Flyslice Technologies Co ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-13
Anticipated expiration: 2043-07-13
Also published as: CN116596043A

Abstract

The application provides a convolutional neural network computing method, a convolutional neural network computing system, electronic equipment and a storage medium. The method comprises the following steps: acquiring a first network model, wherein the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input feature map and an output feature map; merging the first convolution layers with the same input feature images into a second convolution layer to obtain a second network model; and respectively reading the corresponding input characteristic diagrams and storing the corresponding output characteristic diagrams in the two on-chip caches alternately according to the second network model. The convolutional neural network calculation method provided by the application can improve the calculation efficiency and fully exert the calculation performance of hardware.

Description

Convolutional neural network calculation method, system, electronic equipment and storage medium

Technical Field

The application relates to the field of deep learning, in particular to a convolutional neural network computing method, a convolutional neural network computing system, electronic equipment and a storage medium.

Background

Neural networks in deep learning are currently being widely focused and rapidly developed. As a hotspot technology in deep learning, convolutional neural networks are widely used in image and video processing, autopilot, and the like. The convolution operation is the part with the largest calculation amount in the convolution neural network, however, due to the limitation of the hardware architecture and the influence of the characteristic of changeable topological structure of the convolution neural network, the calculation resource of the hardware cannot be fully used when the convolution neural network is calculated in some cases, so that the performance loss is caused.

Disclosure of Invention

In view of the above, the present application proposes a convolutional neural network calculation method, a convolutional neural network calculation system, an electronic device, and a storage medium.

The embodiment of the application provides a convolutional neural network calculation method, which comprises the following steps: acquiring a first network model, wherein the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input feature map and an output feature map; merging the first convolution layers with the same at least two input characteristics into a second convolution layer to obtain a second network model; and respectively reading the corresponding input characteristic diagrams and storing the corresponding output characteristic diagrams in the two on-chip caches alternately according to the second network model.

Further, in the convolutional neural network calculation method, according to the second network model, corresponding input feature maps are alternately read from and stored in two on-chip caches respectively, and the method includes: calculating according to the second network model, and reading an input feature map of the ith convolution layer from the first in-chip cache; performing calculation of an ith layer of convolution layer based on the input feature map of the ith layer of convolution layer, obtaining an output feature map, and storing the output feature map into a second piece of cache; the output feature map stored in the second intra-chip buffer is also used as an input feature map of the i+1th layer convolution layer, and the output feature map of the i+1th layer convolution layer is stored by the first intra-chip buffer.

Further, in the convolutional neural network calculation method, each first convolutional layer corresponds to a weighted parameter, and the calculation of the ith convolutional layer is performed based on the input feature map of the ith convolutional layer, including: and reading the input feature map of the ith layer of convolution layer from the first on-chip buffer, simultaneously reading the weight parameter of the ith layer of convolution layer from the off-chip buffer, and carrying out calculation of the ith layer of convolution layer according to the input feature map and the weight parameter.

Further, in the convolutional neural network calculation method, merging the first convolutional layers with at least two identical input feature maps into a second convolutional layer includes: traversing the first network model, and searching an input feature map corresponding to each first convolution layer; taking a first convolution layer with the same input characteristic diagram as a same-input convolution layer group; a second convolutional layer is derived based on the same input convolutional layer set.

Further, in the above convolutional neural network calculation method, each convolutional layer corresponds to a convolutional layer parameter, and further includes: if at least more than two convolution layers with the same convolution layer parameters exist in the same input convolution layer group, taking the at least more than two convolution layers as the same calculation convolution layer group; and obtaining a second convolution layer based on the same-computation convolution layer group.

Further, in the convolutional neural network calculation method, the convolutional layer parameter includes an output channel value, and further includes: if the sum of the output channel values of a plurality of convolution layers in the same calculation convolution layer group is not larger than the maximum output channel value of the corresponding hardware architecture, taking the plurality of convolution layers as a convolution layer group to be combined; all convolution layers in the convolution layer groups to be combined are combined into a second convolution layer.

Further, in the convolutional neural network calculation method, the convolutional layer parameter includes an output channel value, and further includes: taking a convolution layer corresponding to the output characteristic diagram which is used first before merging as a main merging convolution layer; and taking the sum of the output channel values of all the convolution layers in the convolution layer groups to be combined as the output channel value of the second convolution layer.

Another embodiment of the present application also proposes a convolutional neural network computing system, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first network model, the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input characteristic diagram and an output characteristic diagram; the merging unit is used for merging the first convolution layers with the same at least two input feature graphs into a second convolution layer to obtain a second network model; and the calculation unit is used for alternately reading the corresponding input characteristic graphs in the two on-chip caches according to the second network model to calculate the convolution layer.

Another embodiment of the present application further provides an electronic device, including a processor, a memory, a communication interface, and a communication bus, where the processor, the memory, and the communication interface complete communication with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the convolutional neural network calculation method.

Another embodiment of the present application further provides a storage medium, where at least one executable instruction is stored in the storage medium, where the executable instruction causes a processor to perform operations corresponding to a convolutional neural network calculation method.

The embodiment of the application has the following beneficial effects:

the embodiment of the application provides a convolutional neural network calculation method, which reduces the calculation time of the convolutional neural network and fully develops the calculation performance of hardware by optimizing a network model and increasing an on-chip cache mode.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of the present application. Like elements are numbered alike in the various figures.

FIG. 1 is a schematic illustration of a first flow chart of a convolutional neural network calculation method according to some embodiments of the present application;

FIG. 2 illustrates a first network model schematic diagram of a convolutional neural network calculation method of some embodiments of the present application;

FIG. 3 illustrates a second flow diagram of a convolutional neural network calculation method in accordance with some embodiments of the present application;

FIG. 4 is a third flow diagram of a convolutional neural network calculation method according to some embodiments of the present application;

FIG. 5 is a fourth flow chart of a convolutional neural network calculation method of some embodiments of the present application;

FIG. 6 illustrates a second network model schematic diagram of a convolutional neural network calculation method of some embodiments of the present application;

FIG. 7 is a fifth flow chart of a convolutional neural network calculation method of some embodiments of the present application;

FIG. 8 illustrates a schematic diagram of a convolutional neural network computing system in accordance with some embodiments of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.

The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

The terms "comprises," "comprising," "including," or any other variation thereof, are intended to cover a specific feature, number, step, operation, element, component, or combination of the foregoing, which may be used in various embodiments of the present application, and are not intended to first exclude the presence of or increase the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments of the application.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The embodiments described below and features of the embodiments may be combined with each other without conflict.

In general, a method of chip parallel computing such as GPU, FPGA, ASIC is widely used to accelerate the convolution operation, but the performance loss is caused by the fact that the hardware resources cannot be fully used in the computing process.

Accordingly, in order to solve the above-mentioned problems, the present application proposes a convolutional neural network calculation method suitable for the deep learning field and other fields.

Referring to fig. 1, a flowchart of a convolutional neural network calculation method according to an embodiment of the present application is shown. The convolutional neural network calculation method is exemplarily applied to a chip for calculating a convolutional neural network, for example, the chip may be GPU, FPGA, ASIC or the like.

In some embodiments, as shown in fig. 1, a convolutional neural network calculation method includes:

s110, acquiring a first network model, wherein the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input feature map and an output feature map.

Exemplary, as shown in fig. 2, a typical network topology exists in YOLOv5, convX is a convolution layer, concatX is a splice layer, X is an operator number, and other operators (e.g., batchnorm, scale, eltwise, silu, etc.) following the convolution layer are omitted for convenience of illustration.

Taking the simplified model in fig. 2 as a first network model, wherein the first network model comprises 5 convolution layers (conv 0-conv 3 and conv5 in the figure respectively), each convolution layer has a corresponding input characteristic diagram and an output characteristic diagram, for example, the input characteristic diagram corresponding to the first convolution layer conv0 in the figure is input_fm, and the corresponding output characteristic diagram is conv0_fm; the input feature map corresponding to another first convolution layer conv2 in the figure is conv1_fm, and the corresponding output feature map is conv2_fm.

Of course, the above model is only an example, and the method of the present application can be applied to other models, and still be applicable if there are other algorithms after the current convolutional layer and before the lower convolutional layer.

And S210, merging the first convolution layers with the same at least two input feature graphs into a second convolution layer to obtain a second network model.

Specifically, when the input feature maps of the plurality of first convolution layers are the same, at least two convolution layers in the plurality of first convolution layers are combined to obtain a second convolution layer. And taking the model formed by all the combined first convolution layers and/or second convolution layers as a second network model.

For example, if the first convolution layer includes five convolution layers A, B, C, D and E, and the input feature maps corresponding to A, C and E are the same, at least 2 convolution layers of A, C and E may be combined into the second convolution layer M, and if three convolution layers of A, C and E are combined, the combined model formed by the second convolution layer M, the first convolution layer B and the first convolution layer D is used as the second network model.

In some embodiments of the convolutional neural network calculation method, as shown in fig. 3, merging the first convolutional layer with at least two identical input feature maps into a second convolutional layer, including:

s211, traversing the first network model, and searching an input feature map corresponding to each first convolution layer.

S212, taking the first convolution layer with the same input characteristic diagram as a same input convolution layer group.

S213, obtaining a second convolution layer based on the same input convolution layer group.

Specifically, the input feature maps corresponding to all the first convolutional layers within the input convolutional layer set are the same.

Illustratively, if the first convolution layer includes five convolution layers A, B, C, D and E, and the input feature maps corresponding to A, C and D convolution layers are the same, the set of A, C and D convolution layers may be referred to as the same input convolution layer group.

Further, as shown in fig. 4, in the convolutional neural network calculation method in some embodiments, each convolutional layer corresponds to a convolutional layer parameter, and further includes:

s214, if at least more than two convolution layers with the same convolution layer parameters exist in the input convolution layer group, the at least more than two convolution layers are regarded as the same calculation convolution layer group.

Specifically, considering that each convolution layer corresponds to a convolution layer parameter, if the convolution layer parameters are different, a certain problem is caused, so that the convolution layers with the same convolution layer parameters are combined, and the corresponding problem is reduced. The convolutional layer parameters include Pad (padding mode), stride (Stride), KW (input wide data), KH (input long data), OW (output wide size), OH (output high size), OC (output channel number), and the like. The convolution layer group composed of the convolution layers with the same convolution layer parameters is called as the same calculation convolution layer group.

S215, a second convolution layer is obtained based on the same calculation convolution layer group.

Specifically, at least two convolution layers in the same-computation convolution layer group are combined to obtain a second convolution layer.

Further, as shown in fig. 5, in the convolutional neural network calculation method of some embodiments, the convolutional layer parameters include output channel values, and further include:

and S216, if the sum of the output channel values of a plurality of convolution layers in the convolution layer group is not larger than the maximum output channel value of the corresponding hardware architecture, taking the plurality of convolution layers as the convolution layer group to be combined.

S217, merging all convolution layers in the convolution layer groups to be merged into a second convolution layer.

Specifically, since the number of channels of the parallel computing of the hardware architecture is limited, if the number of channels output each time is smaller than the maximum number of channels (i.e. the maximum number of channels) output by the parallel computing of the hardware architecture, another part of computing units (Processing Element, PE) is idle each time data is output. So that the calculation efficiency is lowered. When the number of channels of data to be output is small, the number of times of output increases, and the overall calculation time increases. Therefore, it is necessary to maximize the number of data channels per output so that the calculation efficiency is maximized each time data is output.

For example, if the convolution layer set includes the convolution layers W, R and T, the output channel values corresponding to the three convolution layers are 64, 64 and 128, respectively, and the maximum output channel value corresponding to the hardware architecture is 128, so the set of the convolution layers W and R is taken as the convolution layer set to be combined. And combining the convolution layers W and R in the convolution layer group to be combined to obtain a second convolution layer. If the convolution layers W and R are not combined, only half of the PEs (PE 0-63) are occupied and the other half of the PEs (PE 64-127) are idle when the convolution layers W and R output data respectively, the calculation efficiency is only 50%, and the total number of times of outputting the convolution layers W, R and T is 3. If the first two convolution layers are combined, when the convolution layers W and the convolution layers R output data in parallel, all PEs (PE 0 to 127) are occupied, the calculation efficiency is 100%, and the total number of times of outputting all three convolution layers is 2, which is reduced by 1 time compared with the previous one, that is, the calculation of the combined convolution layers R or W is reduced.

The same and different complications are considered for the channel values corresponding to each convolutional layer in the set of calculated convolutional layers. How to choose to combine several of the convolutional layers to maximize the overall computational efficiency is provided by the following embodiments of several schemes:

in the first scheme, if the convolution layer group includes the convolution layer W, the convolution layer R, the convolution layer M and the convolution layer T, and the output channel values corresponding to the four convolution layers are 64, 64 and 128 respectively, and the maximum output channel value corresponding to the hardware architecture is 128, 2 convolution layers with the channel value of 64 are arbitrarily selected and combined into one second convolution layer.

In the second scheme, if the same calculation convolutional layer group includes a convolutional layer W, a convolutional layer R, a convolutional layer M and a convolutional layer T, and output channel values corresponding to the four convolutional layers are 64, 64 and 128 corresponding to the maximum output channel value of the hardware architecture, 2 convolutional layers with channel values of 64 are arbitrarily selected and combined into one convolutional layer, for example, the convolutional layer W and the convolutional layer R are combined, and the convolutional layer M and the convolutional layer T are combined to obtain 2 combined second convolutional layers.

In the third scheme, if the same calculation convolutional layer group includes a convolutional layer W, a convolutional layer R, a convolutional layer M, and a convolutional layer T, and output channel values corresponding to the four convolutional layers are 64, 32, and 32 respectively, and a maximum output channel value corresponding to the hardware architecture is 128, the first combination scheme is as follows: combining the convolution layers W and R, and combining the convolution layers M and T to obtain 2 combined second convolution layers; the second combination scheme is as follows: combining the convolution layer W, the convolution layer M and the convolution layer T; the third combination scheme is as follows: the convolution layers R, M, and T are combined.

Further, in the convolutional neural network calculation method of some embodiments, the convolutional layer parameter includes an output channel value, and further includes:

taking a convolution layer corresponding to the output characteristic diagram which is used first before merging as a main merging convolution layer;

and taking the sum of the output channel values of all the convolution layers in the convolution layer groups to be combined as the output channel value of the second convolution layer.

Specifically, in order to optimize the calculation sequence of the whole model after the combination, the convolution layer corresponding to the output feature map that is first used before the combination is used as the main combination convolution layer. The other parameters of the second convolution layer are the same as the parameters of the primary or secondary combined convolution layer. The calculation sequence of the calculation content of the primary merging convolutional layer in the merged model is unchanged, and the calculation sequence of the calculation content of the primary merging convolutional layer is changed to be consistent with that of the primary merging convolutional layer.

For example, if the channel values corresponding to conv1 and conv3 are both 64 and the maximum channel value is 128, the conv1 and conv3 calculated in series need to be combined, and since the output feature map of conv1 is used earlier than the output feature map of conv3, conv1 is used as the main combining convolution layer, and finally the second network model in fig. 6 is obtained, as shown in fig. 2. The output channel value of new_conv1 obtained by combining conv1 and conv3 is ocnew_conv1=occonv1+occonv3=64+64=128.

S310, respectively reading the corresponding input characteristic diagrams and storing the corresponding output characteristic diagrams in the two on-chip caches alternately according to the second network model.

Specifically, the convolutional neural network comprises a plurality of convolutional layers, and the calculation of each convolutional layer calculates and then obtains an output characteristic diagram according to the input characteristic diagram data and outputs the output characteristic diagram. The input feature map is read from one of the two on-chip caches, and the calculated output feature map is output and stored in the other on-chip cache, so that the corresponding feature map is alternately read and stored.

It should be noted that, for the on-chip cache of a single interface, each on-chip cache has only one interface, so that data cannot be read and stored simultaneously, and only data can be read and stored sequentially. However, since each convolution layer is simultaneously read and output during calculation, if the calculation unit reads the input feature map from the on-chip buffer, the output feature map can only be stored in the off-chip buffer. When the feature map data needs to be reused, the feature map data must be read from the off-chip cache, so that the data transmission speed is low, and the calculation efficiency is affected. And the reading bandwidth of the weight parameters is occupied by reading the feature map data from the off-chip cache, so that the calculation time is further increased.

If the on-chip cache with two interfaces is used for simultaneously completing the operations of reading and storing data in the calculation of the convolution layer, the data in the on-chip cache cannot be ensured to be uncovered, and when the input feature map needs to be used for a plurality of times, the output feature map can still only be stored in the off-chip cache in order to ensure that the data is not covered, so that the calculation efficiency of the convolution neural network is greatly influenced.

For this reason, in this embodiment, 2 on-chip caches are used to alternately read the corresponding input feature map and store the corresponding output feature map.

Further, as shown in fig. 7, in the convolutional neural network calculation method of some embodiments, reading and storing corresponding input feature maps and corresponding output feature maps in two on-chip caches alternately according to the second network model includes:

s311, calculating according to the second network model, and reading the input feature map of the ith convolution layer from the first in-chip cache.

S312, performing calculation of the ith layer of convolution layer based on the input feature map of the ith layer of convolution layer, obtaining an output feature map, and storing the output feature map into a second on-chip cache.

The output feature map stored in the second intra-chip buffer is also used as an input feature map of the i+1th layer convolution layer, and the output feature map of the i+1th layer convolution layer is stored by the first intra-chip buffer.

In particular, the on-chip cache is inside the chip, so that the reading/writing speed is faster and the efficiency is higher. If two on-chip caches are used for alternately reading/storing, the reading and writing of the output feature map data basically does not occupy the data transmission bandwidth of the access off-chip caches, so that the reading of the parameter data basically occupies the data transmission bandwidth of all off-chip caches, and the reading of the on-chip caches and the reading of the off-chip caches can be simultaneously carried out.

Exemplarily, as shown in fig. 6, the input feature map input_fm of the convolutional layer conv0 is read from the first on-chip buffer memory, and then the calculated output feature map conv0_fm is stored in the second on-chip buffer memory; when the convolution layer new_conv1 is calculated, reading conv0_fm from the second in-chip cache, and storing the calculated conv1_fm into the first in-chip cache; when calculating the convolution layer conv2, conv1_fm is read from the first on-chip cache, and the calculated conv2_fm is stored in the second on-chip cache. In this way, two on-chip caches are used alternately.

Further, in the convolutional neural network calculation method of some embodiments, each first convolutional layer corresponds to a weighted parameter, and the calculation of the ith convolutional layer is performed based on the input feature map of the ith convolutional layer, including:

and reading the input feature map of the ith layer of convolution layer from the first on-chip buffer, simultaneously reading the weight parameter of the ith layer of convolution layer from the off-chip buffer, and carrying out calculation of the ith layer of convolution layer according to the input feature map and the weight parameter.

Specifically, since each convolution layer is calculated according to the weight parameter corresponding to each convolution layer, in order to reduce the storage space of the on-chip cache, the weight parameter corresponding to each convolution layer is stored in the off-chip cache. Therefore, when each convolution layer performs calculation, the corresponding weight parameters are read from the off-chip cache at the same time as the input feature map is read from the on-chip cache. And under the condition that the on-chip cache is not enough, the output characteristic diagram is stored in the off-chip cache.

Illustratively, as shown in fig. 6, when the convolution layer new_conv1 is calculated, conv0_fm is read from the second on-chip buffer, and at the same time, the corresponding weight parameter is read from the off-chip buffer, the calculated conv1_fm is stored in the first on-chip buffer, and conv3_fm is stored in the off-chip buffer. When calculating the convolution layer conv2, the conv1_fm is read from the first on-chip cache, and the corresponding weight parameter is read from the off-chip cache at the same time, and the calculated conv2_fm is stored in the second on-chip cache. When calculating the convolution layer conv5, the conv2_fm is read from the second on-chip cache, the conv3_fm and the corresponding weight parameters are read from the off-chip cache, and the calculated conv5_fm is stored in the first on-chip cache.

The application provides a convolutional neural network calculation method, which reduces the calculation time of the convolutional neural network without changing the hardware architecture by optimizing a network model and changing the mode of using on-chip cache, and fully exerts the calculation performance of hardware.

Another embodiment of the present application also proposes a convolutional neural network computing system 400, as shown in fig. 8, the system 400 comprising:

the obtaining unit 410 is configured to obtain a first network model, where the first network model includes a plurality of first convolution layers, and each first convolution layer corresponds to an input feature map and an output feature map.

And the merging unit 420 is configured to merge the first convolution layers with the same at least two input feature maps into a second convolution layer, so as to obtain a second network model.

And the calculating unit 430 is configured to alternately read the corresponding input feature maps in the two on-chip caches according to the second network model to perform calculation of the convolution layer.

It will be appreciated that the method steps of the present embodiment correspond to the convolutional neural network calculation method in the above embodiment, and the options of the convolutional neural network calculation method described above are equally applicable to the present embodiment, and will not be repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules or units in various embodiments of the application may be integrated together to form a single part, or the modules may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims

1. A method for improving computational efficiency of a convolutional neural network, comprising:

acquiring a first network model, wherein the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input feature map and an output feature map;

merging the first convolution layers with the same input feature images into a second convolution layer to obtain a second network model;

respectively reading corresponding input characteristic diagrams from two on-chip caches and storing the corresponding input characteristic diagrams into the corresponding output characteristic diagrams according to the second network model alternately;

calculating according to the second network model, and reading an input feature map of an ith convolution layer from the first in-chip cache;

performing calculation of the ith layer of convolution layer based on the input feature map of the ith layer of convolution layer, obtaining an output feature map, and storing the output feature map into a second on-chip cache;

the output characteristic diagram stored in the second intra-chip cache is also used as an input characteristic diagram of an i+1th layer convolution layer, and the output characteristic diagram of the i+1th layer convolution layer is stored by the first intra-chip cache;

each first convolution layer corresponds to a weighted parameter, and the calculating of the ith convolution layer based on the input feature diagram of the ith convolution layer comprises the following steps:

reading the input feature map of the ith layer of convolution layer from the first intra-chip cache, simultaneously reading the weight parameter of the ith layer of convolution layer from the off-chip cache, and calculating the ith layer of convolution layer according to the input feature map and the weight parameter.

2. The method for improving computation efficiency of a convolutional neural network according to claim 1, wherein merging the first convolutional layers, in which at least two of the input feature maps are identical, into a second convolutional layer, comprises:

traversing the first network model, and searching an input feature map corresponding to each first convolution layer;

taking a first convolution layer with the same input characteristic diagram as a same-input convolution layer group;

and obtaining the second convolution layer based on the same-input convolution layer group.

3. The method for improving the computational efficiency of a convolutional neural network of claim 2, wherein each of the convolutional layers corresponds to a convolutional layer parameter, further comprising:

if at least more than two convolution layers with the same parameters as the convolution layers exist in the same-input convolution layer group, taking the at least more than two convolution layers as a same-calculation convolution layer group;

and obtaining the second convolution layer based on the same calculation convolution layer group.

4. The method for improving computational efficiency of a convolutional neural network of claim 3, the convolutional layer parameters comprising output channel values, further comprising:

if the sum of the output channel values of a plurality of convolution layers in the same-computation convolution layer group is not larger than the maximum output channel value of the corresponding hardware architecture, the plurality of convolution layers are used as convolution layer groups to be combined;

and merging all convolution layers in the convolution layer groups to be merged into the second convolution layer.

5. The method for improving computational efficiency of a convolutional neural network of claim 4, the convolutional layer parameters comprising output channel values, further comprising:

and taking the sum of the output channel values of all the convolution layers in the convolution layer group to be combined as the output channel value of the second convolution layer.

6. A system for improving the computational efficiency of a convolutional neural network, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first network model, the first network model comprises a plurality of first convolution layers, and each first convolution layer corresponds to an input characteristic diagram and an output characteristic diagram;

the merging unit is used for merging the first convolution layers with the same input feature images into a second convolution layer to obtain a second network model;

the computing unit is used for alternately reading the corresponding input characteristic diagrams from the two on-chip caches and storing the corresponding output characteristic diagrams according to the second network model;

7. A computer comprising a storage unit in which a computer program is stored and a processing unit that performs the steps of the method for improving the computational efficiency of a convolutional neural network according to any one of claims 1 to 5 by calling the computer program stored in the storage unit.

8. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor to perform the steps of the method of improving the computational efficiency of a convolutional neural network according to any one of claims 1 to 5.