CN107451654B

CN107451654B - Acceleration operation method of convolutional neural network, server and storage medium

Info

Publication number: CN107451654B
Application number: CN201710544330.0A
Authority: CN
Inventors: 谌璟; 孙庆新
Original assignee: Shenzhen Autocruis Technology Co ltd
Current assignee: Shenzhen Autocruis Technology Co ltd
Priority date: 2017-07-05
Filing date: 2017-07-05
Publication date: 2021-05-18
Anticipated expiration: 2037-07-05
Also published as: CN107451654A

Abstract

The invention discloses an accelerated operation method of a convolutional neural network, a server and a storage medium, which are used for splitting a map to be split into a preset number of sub-maps and acquiring the position information of each sub-map; and performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, storing the intermediate result into an internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can run in an embedded terminal at high speed and high efficiency.

Description

Acceleration operation method of convolutional neural network, server and storage medium

Technical Field

The invention relates to the field of computer application, in particular to an accelerated operation method of a convolutional neural network, a server and a storage medium.

Background

Machine Learning (ML) and Artificial Intelligence (AI) are basic theories, methods and techniques for studying how computers can complete tasks that require human Intelligence to be sufficient, and how computer software and hardware can be used to simulate certain behaviors and ideas of human beings. AI is a science of crossing of multiple disciplines, including natural science and social science, and more relates to a plurality of disciplines such as philosophy and cognition science, mathematics, neurophysiology, psychology, computer science, information science, and cybernetics. Therefore, the barriers to theoretical research and application techniques in the AI field are high, and problems such as multidisciplinary cooperation and difficulty in implementation are involved.

In recent years, advances in Deep Learning (DL) have greatly facilitated technological advances and practical applications of ML and AI. For example, applications in which AI is a technical core have penetrated many fields, security, education, finance, medical treatment, transportation, and the like, and typical applications include: remote account opening (finance, securities), intelligent security, image recognition, natural language processing, automatic driving and the like. DL is a multi-implicit, multi-layer perceptual deep learning structure, which forms more abstract high-level representation attribute classes or features by combining low-level features to discover distributed feature representation of data. Convolutional Neural Networks (CNN) are a supervised deep learning model.

Deep learning is a method for simulating thinking ways and Processing problems of a human brain, the number of computing neurons of the human brain is in the order of billions, even a small-sized CNN network needs huge computation, almost all deep learning networks run on a Central Processing Unit (CPU) or a CPU cluster or a Graphics Processing Unit (GPU) or a GPU cluster hardware platform, required hardware resources are huge, and cost and power consumption are huge. In order to be able to apply CNN in embedded terminals, it is necessary to run CNN on a much weaker hardware platform or processor than processor cluster, which is already the biggest technical obstacle currently limiting CNN application on terminals. For this reason, it is necessary to research a method for multiplexing computing resources on a processor, which can realize that the CNN runs on a low-cost and relatively few-resource processor.

The CNN calculation process is layer-by-layer calculation and comprises a convolutional layer, a pooling layer and a full-connection layer, wherein the CNN calculation process can adopt interlayer multiplexing so as to reduce the requirement on processor hardware resources, namely all convolutional layers or pooling layers sequentially run in the same module, when designing a convolutional layer module and a pooling layer module, relevant parameters such as the size of a convolutional core and the size and the number of characteristic data map input and output by each layer of an input/input convolutional neural network need to be designed according to the maximum resource occupation condition of all layers, and the CNN network has the characteristics that: the maps of the bottom layer are large in size but small in number, and the maps of the top layer are small in size but large in number. In the interlayer multiplexing scheme, a dual port access memory (RAM), also called a Random Access Memory (RAM), needs to be designed according to a large-size and large-quantity standard, and is an internal memory for directly exchanging data with a CPU, which causes inefficient use of computing resources, and also restricts the number of parallel paths, affecting real-time performance.

Disclosure of Invention

The invention mainly aims to provide an accelerated operation method of a convolutional neural network, a server and a storage medium, and aims to solve the technical problems that the convolutional neural network in the prior art is large in calculation amount and hard to fully utilize hardware resources to accelerate calculation.

In order to achieve the above object, the present invention provides an accelerated operation method of a convolutional neural network, including the steps of:

obtaining a map to be split;

performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;

performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;

extracting intermediate results in the internal memory to participate in the next layer of operation until a final operation result is obtained;

and splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.

Preferably, before the performing the convolution and pooling operation on the map to be split, the method further includes:

and judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.

Preferably, the performing boundary supplementation on the map to be split and taking the map to be split after the boundary supplementation as a new map to be split specifically includes:

acquiring a preset number of layers and a preset convolution kernel size;

calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size;

performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.

Preferably, the obtaining of the preset number of layers and the preset convolution kernel size includes:

and acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.

Preferably, the calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size specifically includes:

calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size by the following formula:

wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array L_tRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array N_iRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array K_ijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-N_iThe corresponding convolution kernel is a K_ij*K_ijMatrix, S, T, L_t、N_i、K_ijAre all integers greater than 0.

Preferably, after the calculating the number of the boundary complements according to the preset number of layers and the preset convolution kernel size by the following formula, the method further includes:

performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;

splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;

wherein R0 ═ R + S, C0 ═ C + S, R is the number of rows of the map to be split in the R × C matrix, C is the number of columns of the map to be split in the R × C matrix, R0 is the number of rows of the map to be split in the R0 × C0 matrix, and C0 is the number of columns of the map to be split in the R0 × C0 matrix.

Preferably, the splitting the map to be split of the R0 × C0 matrix into a × b sub-maps according to a preset splitting manner specifically includes:

when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;

when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;

wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.

Preferably, after determining whether the map to be torn needs to be boundary-supplemented, the method further includes:

when the convolution and pooling operation does not need boundary supplement, splitting the map to be split of the R-C matrix into a-b sub-maps according to the preset splitting mode;

In addition, to achieve the above object, the present invention further provides an accelerated operation server of a convolutional neural network, including: the acceleration operation program of the convolutional neural network is configured to realize the steps of the acceleration operation method of the convolutional neural network as described above.

Further, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an accelerated operation program of a convolutional neural network, which when executed by a processor, implements the steps of the accelerated operation method of a convolutional neural network as described above.

The acceleration operation method of the convolutional neural network, provided by the invention, comprises the steps of splitting a to-be-split map into a preset number of sub-maps and acquiring the position information of each sub-map; and performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, storing the intermediate result into an internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can run in an embedded terminal at high speed and high efficiency.

Drawings

FIG. 1 is a schematic diagram of an accelerated operation server of a convolutional neural network in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a method for accelerating a convolutional neural network according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for accelerating a convolutional neural network according to the present invention;

FIG. 4 is a flowchart illustrating a method for accelerating a convolutional neural network according to a third embodiment of the present invention;

FIG. 5 is a schematic diagram of a boundary supplement in the acceleration operation method of the convolutional neural network of the present invention;

FIG. 6 is a flowchart illustrating a fourth embodiment of an acceleration operation method of a convolutional neural network according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The solution of the embodiment of the invention is mainly as follows: the method comprises the steps of obtaining a to-be-split map, carrying out convolution and pooling on the to-be-split map, splitting the to-be-split map subjected to convolution and pooling into a preset number of sub-maps, obtaining position information of each sub-map, carrying out cross-layer operation on each sub-map respectively, obtaining an intermediate result of each layer of operation, storing the intermediate result into an internal memory, extracting the intermediate result in the internal memory to participate in the next layer of operation until a final operation result is obtained, splicing the final operation result according to the position information, and obtaining a splicing result for subsequent network calculation. By the technical scheme of the embodiment of the invention, the technical problems that the convolutional neural network in the prior art is large in calculation amount and hard to fully utilize hardware resources to accelerate calculation are solved.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an acceleration operation server of a convolutional neural network in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the acceleration operation server of the convolutional neural network may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage server separate from the processor 1001.

Those skilled in the art will appreciate that the turbo server architecture of the convolutional neural network shown in fig. 1 does not constitute a limitation of the turbo server of the convolutional neural network, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an acceleration operation program of a convolutional neural network. The acceleration operation server of the convolutional neural network calls the acceleration operation program of the convolutional neural network stored in the memory 1005 through the processor 1001, and performs the following operations:

obtaining a map to be split;

Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:

acquiring a preset number of layers and a preset convolution kernel size;

According to the scheme, the map to be split is obtained, the map to be split is subjected to convolution and pooling operation, the map to be split after the convolution and pooling operation is split into the preset number of sub-maps, the position information of each sub-map is obtained, cross-layer operation is performed on each sub-map respectively, the intermediate result of each layer of operation is obtained, the intermediate result is stored in the internal memory, the intermediate result in the internal memory is extracted to participate in the next layer of operation, so that the internal memory is fully utilized, resources are multiplexed, the interaction time with the external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can operate in the embedded terminal at high speed and high efficiency.

Based on the hardware structure, the embodiment of the acceleration operation method of the convolutional neural network is provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of an acceleration operation method of a convolutional neural network according to the present invention.

In a first embodiment, the method for accelerating operation of the convolutional neural network includes the following steps:

step S10, obtaining a map to be split;

it should be noted that, the map is feature data input and output in each layer of a Convolutional Neural Network (CNN), the map to be split is an input map selected from the Convolutional Neural Network CNN, a relatively large map may be selected as the input map, such as a map of a 256 × 256 matrix, or a map of other specifications, and the selection as the input map, that is, the map to be split, may be flexibly selected, and is not necessarily selected only by the size of the map, but may also be selected by other manners, which is not limited in this embodiment.

Step S20, performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;

it can be understood that the to-be-split map is subjected to convolution and pooling operations, that is, the to-be-split map is scanned according to a preset convolution kernel size and a preset moving mode, and corresponding features are extracted by combining preset weights;

it should be understood that the preset weights and the preset convolution kernel size may also be obtained through training, or may also be fixed values determined through long-term experiments or experience by those skilled in the art, or may also be numerical values obtained in other manners, which is not limited in this embodiment.

It should be noted that the map to be split after performing the convolution and pooling operations is split into a preset number of sub-maps, where the preset number may be determined according to a preset number of layers and a splitting manner, or may be determined by other parameters or other manners, which is not limited in this embodiment; because the same convolution kernel is shared in the same map to be split, the original position information can be retained after the map to be split is subjected to convolution and pooling operation; after splitting the to-be-split map after performing the convolution and pooling operations into a preset number of sub-maps, the position information of each sub-map may be quickly stored by using an internal memory, or the position information of each sub-map may be stored by using a non-volatile memory (NVRAM), or may be stored by using a double Data Rate SDRAM (DDR SDRAM), or may be stored by using other manners, which is not limited in this embodiment.

In the specific implementation, the map to be split is generally split into the smallest operation resource unit, i.e., the sub-map, after performing convolution and pooling operations on the map to be split, and the sub-map can be flexibly designed on such small unit in a hardware platform such as a Field-Programmable Gate Array (FPGA), for example, multiplexing, and fast storing and reading by using a small internal memory, so that the potential of the hardware platform can be fully explored, and the advantage that the internal resources of the FPGA are small and dispersed is fully utilized.

Step S30, performing cross-layer operation on each sub map respectively, acquiring an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;

it can be understood that the process of performing the cross-layer operation on each sub-map can be a parallel operation, that is, different sub-maps simultaneously cross the preset number of cross-layer layers, and simultaneously perform the operation of each layer; of course, all the sub-maps may be processed in batch, that is, after all the sub-maps in the first batch are operated, the sub-maps in the next batch are operated, or multiple sub-maps may be operated at a certain layer at the same time, and the operation result may be synchronously output, or cross-layer operation may be performed in other manners.

It should be noted that the intermediate result refers to a result of convolution operation of each sub map at a certain layer, that is, a calculation result of an intermediate link; the intermediate result of each layer is stored in the internal memory, and the intermediate result is stored at high speed by utilizing the advantage of high reading and writing speed of the internal memory so as to facilitate subsequent operation.

In the specific implementation, an internal memory in a hardware platform such as an FPGA is usually used to store an operation result of an intermediate link, and compared with a double Data Rate SDRAM (DDR SDRAM), the RAM in the hardware platform such as the FPGA has a faster read-write speed, and the control is simpler and more convenient.

Step S40, extracting the intermediate result in the internal memory to participate in the next layer of operation until the final operation result is obtained;

it should be noted that the final operation result refers to a result of the last convolution completed by the last layer of the preset number of cross-layer layers in the cross-layer operation of each sub map; the final operation result is an operation result obtained by the continuous participation of the intermediate result in the operation of the next layer;

it is understood that the final operation result may be a set containing the last operation result of each sub-map; certainly, the final operation result of the single sub map after the cross-layer operation of the preset number of cross-layer layers is also available; the map to be split may also be a result that is in other forms and can indicate that the map to be split is output after being subjected to cross-layer operation, which is not limited in this embodiment.

In a specific implementation, the intermediate result is generally stored in an internal memory, so that the intermediate result can be read and written repeatedly and rapidly when the next layer of operation is performed or the next operation is performed; generating a final operation result after all the cross-layer operations of the preset number of cross-layer layers are completely operated, generally storing the final operation result into a memory (namely DDR) outside a hardware platform such as the FPGA, wherein the final operation result is large in data and important for subsequent network calculation, so that the final operation result is stored into the DDR by utilizing the characteristics of high working frequency, good compatibility and permanent storage of the DDR; of course, the data may also be stored in other types of memories, and the embodiment is not limited to this.

And step S50, splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.

It should be noted that the splicing operation needs to be performed after all the sub-maps end the cross-layer operation of the preset number of cross-layer layers, and then the sub-maps are spliced according to the original positions according to the position information to form the output map, that is, the multi-layer calculation result of the to-be-split map, although the to-be-split map is split to generate the sub-maps for performing the cross-layer operation of each layer, compared with the conventional map cross-layer operation, consumes more operation resources, the speed of the cross-layer operation is greatly increased, the problem of time sequence tension is improved, and the cost is reduced.

In a specific implementation, the final operation result is spliced according to the position information, where the position information may be an index number corresponding to each sub-map, such as 00, 01, 02 … …, and the like, the index number may be stored in the DDR or stored in another memory in advance, and then each sub-map is spliced according to its own position according to the index number to obtain an output map, and certainly, other information capable of marking each sub-map may also be used, which is not limited in this embodiment.

In this embodiment, a map to be split is obtained, convolution and pooling operations are performed on the map to be split, the map to be split after the convolution and pooling operations is split into a preset number of sub-maps, and position information of each sub-map is obtained; performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory; and extracting the intermediate result in the internal memory to participate in the next layer of operation, storing the intermediate result in the internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource use efficiency of the CNN are greatly improved, and the CNN can run in the embedded terminal at high speed and high efficiency.

Further, as shown in fig. 3, a second embodiment of the acceleration operation method of a convolutional neural network of the present invention is proposed based on the first embodiment, and in this embodiment, before the step S20, the method further includes the steps of:

and step S11, judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.

It should be noted that the boundary complementation is to perform outer-ring complementation on the map to be split, the outer-ring complementation generally includes zero complementation and the number for complementing the periphery of the sub-map, and the number of turns of the outer-ring complementation or the rows and columns of the outer-ring complementation are determined by at least one of the preset number of layers of the map to be split, the preset reduction size of layers, the convolution kernel size and the original size of the map to be split; of course, other parameters may be used to determine the number of turns of the outer ring complement or the row and column of the outer ring complement, which is not limited in this embodiment.

It can be understood that the number of the boundary supplements to the map to be split is not necessarily the number of the boundary supplements which need a fixed value, and may be flexibly configured according to the actual situation, and the number of the boundary supplements to different maps to be split is not necessarily the same, and the number of the boundary supplements, that is, the total number of the outer-ring supplements to the map to be split or the rows and columns of the outer-ring supplements to the map to be split are also affected by the current network stack, and the number of the boundary supplements to the same map to be split, that is, the outer-ring supplements, in different network frames are also not necessarily the same.

In a specific implementation, because the map to be split loses one turn of data or loses many rows and/or columns when spanning one layer, that is, each layer of operation, in the cross-layer operation, the map to be split is subjected to boundary supplementation, that is, outer-ring supplementation, in order to protect the map that needs to be used, outer-ring zero supplementation is generally performed on the map to be split, and some more zeros are supplemented according to actual conditions, so that the later-stage splitting operation is facilitated.

In this embodiment, whether the map to be split needs to be subjected to boundary supplementation is judged, if so, the map to be split is subjected to boundary supplementation, and the map to be split after the boundary supplementation is taken as a new map to be split. The boundary supplementation of the to-be-split map can prevent the data loss caused when the to-be-split map is subjected to cross-layer operation, avoid the damage of the map needing to be protected, and effectively provide the protection for the to-be-split map.

Further, as shown in fig. 4, a third embodiment of the acceleration operation method of the convolutional neural network of the present invention is proposed based on the second embodiment, and in this embodiment, the step S11 specifically includes the following steps:

s12, acquiring a preset number of cross-layer layers and a preset convolution kernel size;

it should be noted that the preset number of cross-layer layers is a preset number of cross-layer layers to be performed on the map to be split, and the preset convolution kernel size may be obtained through training, may also be a fixed value determined through long-term experiments or experience by a person skilled in the art, or may also be a numerical value obtained in other manners, which is not limited in this embodiment.

S13, calculating the number of the boundary supplements according to the preset number of cross-layer layers and the preset convolution kernel size;

it should be noted that the number of the boundary supplements, i.e. the number of turns of the outer ring complement or the rows and columns of the outer ring complement, is determined by at least one of the preset number of layers of the map to be split, the preset reduction size of layers, the convolution kernel size, and the original size of the map to be split; of course, other parameters may be used to determine the number of turns of the outer ring complement or the row and column of the outer ring complement, which is not limited in this embodiment.

It can be understood that the number of boundary supplements of the map to be split can be calculated only by presetting the number of cross-layer layers and the size of the convolution kernel, and certainly, the manner of calculating the number of boundary supplements of the map to be split is flexible and variable, and an algorithm or a calculation module specially designed may also be used for calculation, which is not limited in this embodiment.

In a specific implementation, the number of boundary complements may be calculated according to the preset number of layers and the preset convolution kernel size by the following formula:

wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array L_tRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array N_iRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array K_ijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-N_iThe corresponding convolution kernel is a K_ij*K_ijMatrix, S, T, L_t、N_i、K_ijAre all integers greater than 0. The sequence of the preset number of cross-layer layers is that N is firstly passed₀A convolution layer, then 1L₁Multiple pooling layers, and then N₁A convolution layer passing through 1L₂The pooling layer is multiplied by the same way, and finally the T th L is passed through_TThe pooling layer is complete.

when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the R × C matrix sub map, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R × C matrix, and C is the column number of the map to be split of the R × C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;

wherein R0 ═ S, C0 ═ C + S, R is the number of rows of the map to be split of the R × C matrix, C is the number of columns of the map to be split of the R × C matrix, R0 is the number of rows of the map to be split of the R0 × C0 matrix, C0 is the number of columns of the map to be split of the R0 × C0 matrix, and S is the number of the boundary complements.

It should be understood that the number of preset cross-layer layers for the cross-layer operation is not necessarily limited to crossing over the convolutional layer and then crossing over the pooling layer, and may also be a preset cross-layer number of other types of components.

S14, performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.

It can be understood that, performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and using the map to be split after the boundary supplementation as the new map to be split, where the new map to be split can be subjected to subsequent convolution and pooling operations, and the preset number of cross-layer layers is the number of layers of the map to be split spanning the convolutional neural network; after the number of boundary supplements is consumed by performing cross-layer operation on the new map to be split after the boundary supplements are performed, the result of convolution performed on the map to be split without the boundary supplements can be reduced.

In a specific implementation, there is also a special case when the to-be-split map does not need boundary supplementation for volume and pooling operation, and at this time, the to-be-split map of the R × C matrix is split into a × b sub-maps according to the preset splitting manner; the preset splitting mode is consistent with the map to be split operation mode for performing boundary supplement when the split mode is uniform split or non-uniform split, and is not described herein again.

Based on the method shown in fig. 4, fig. 5 is a schematic diagram of boundary supplement in the acceleration operation method of the convolutional neural network of the present invention, and referring to fig. 4, the case of performing boundary supplement on the map to be split is as follows:

obtaining an input map size 240 x 320 and a convolution kernel size 3 x 3 of the convolution layer; obtaining the preset cross-layer sequence and quantity, firstly passing through 1 convolution layer, then passing through 12 times of pooling layer, and finally passing through 1 convolution layer; uniformly complementing 6 circles of 0 outside the input map, wherein the size of the input map is changed from 240 × 320 to 246 × 326; the input map is evenly complemented by 6 circles of 0, and the size of the input map is changed from 240 × 320 to 246 × 326. The input map is uniformly cut into 5 × 25 sub-maps, each having a size of 54 × 70, and 6 elements are overlapped between rows and columns of adjacent sub-maps. And each sub map is independently subjected to convolution pooling operation, peripheral edge data is discarded to obtain 24 × 32 convolution sub map data, and then 25 convolution sub maps are spliced in situ to obtain 120 × 160 convolution maps, namely the convolution result of the original input map.

It can be understood that the splitting mode is judged to be uniform splitting or non-uniform splitting to adopt different corresponding splitting modes, so that the split of the to-be-split map into the sub-maps is realized, the splitting condition can be quickly utilized to calculate the size of the sub-maps, the split process of the sub-maps is flexible and changeable, the split is conveniently and quickly carried out, and the efficiency and the speed of cross-layer operation are improved.

In the embodiment, a preset number of layers and a preset convolution kernel size are obtained; calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size; performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; through predetermineeing the number of layers of striding with the calculation mode of predetermineeing convolution kernel size can be reachd fast and go on the quantity of boundary supplementation, adjustment that can be more nimble convenient treat the quantity of boundary supplementation of split map cross-layer operation, optimized treat split map operation implementation process avoids treat the data loss that split map caused when cross-layer operation, can effectively provide right treat split map's protection.

Further, as shown in fig. 6, a fourth embodiment of the acceleration operation method of a convolutional neural network of the present invention is proposed based on the third embodiment, and in this embodiment, the step S12 specifically includes the following steps:

s121, acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.

In a specific implementation, after the number of boundary supplements is consumed by performing cross-layer operation on a new map to be split after the boundary supplements are performed, the new map to be split can be reduced to a result of performing convolution on the map to be split without the boundary supplements, the preset convolution kernel may be a convolution kernel of a common 3 × 3 or 5 × 5 matrix type, or other types of convolution kernels, or a convolution kernel type obtained through repeated training, or one or more convolution kernel types obtained according to ordinary experiments or experience, which is not limited in this embodiment, and then the size of the convolution kernel during corresponding convolution and pooling operations can be obtained according to the convolution kernel type.

In this embodiment, by obtaining the type of the preset convolution kernel for performing convolution and pooling on the to-be-split map and obtaining the size of the preset convolution kernel according to the type of the preset convolution kernel, the size of the convolution kernel to be used for performing the convolution and pooling can be determined more quickly, and the efficiency and speed of the convolution operation of the to-be-split map are improved.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an accelerated operation program of a convolutional neural network is stored on the computer-readable storage medium, and when executed by a processor, the accelerated operation program of the convolutional neural network implements the following operations:

obtaining a map to be split;

Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:

acquiring a preset number of layers and a preset convolution kernel size;

In the embodiment, a map to be split is obtained; performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map; performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory; and extracting the intermediate result in the internal memory to participate in the next layer of operation, thereby fully utilizing the internal memory and multiplexing resources, reducing the interaction time with the external memory, greatly improving the operation speed and the resource utilization efficiency of the CNN, and enabling the CNN to operate in the embedded terminal at high speed and high efficiency.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for accelerating a convolutional neural network, the method comprising:

the accelerating operation server obtains a map to be split;

judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split;

2. The method according to claim 1, wherein the performing boundary supplementation on the map to be split and taking the map to be split after the boundary supplementation as a new map to be split specifically includes:

acquiring a preset number of layers and a preset convolution kernel size;

3. The method of claim 2, wherein the obtaining the predetermined number of cross-layer layers and the predetermined convolution kernel size comprises:

4. The method of claim 2, wherein the calculating the number of boundary complements based on the predetermined number of cross-layer layers and the predetermined convolution kernel size specifically comprises:

5. The method of claim 4, wherein after calculating the number of boundary complements according to the predetermined number of cross-layer layers and the predetermined convolution kernel size by:

6. The method of claim 5,

splitting the map to be split of the R0 × C0 matrix into a × b sub-maps according to a preset splitting mode specifically includes:

7. The method of claim 6, wherein after determining whether the map to be torn requires boundary replenishment, the method further comprises:

8. An accelerated operation server of a convolutional neural network, comprising: a memory, a processor, and an accelerated operation program of a convolutional neural network stored on the memory and executable on the processor, the accelerated operation program of the convolutional neural network being configured to implement the steps of the accelerated operation method of the convolutional neural network according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an accelerated operation program of a convolutional neural network, which when executed by a processor, realizes the steps of the accelerated operation method of a convolutional neural network as set forth in any one of claims 1 to 7.