CN107451654B - Acceleration operation method of convolutional neural network, server and storage medium - Google Patents

Acceleration operation method of convolutional neural network, server and storage medium Download PDF

Info

Publication number
CN107451654B
CN107451654B CN201710544330.0A CN201710544330A CN107451654B CN 107451654 B CN107451654 B CN 107451654B CN 201710544330 A CN201710544330 A CN 201710544330A CN 107451654 B CN107451654 B CN 107451654B
Authority
CN
China
Prior art keywords
map
split
sub
matrix
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710544330.0A
Other languages
Chinese (zh)
Other versions
CN107451654A (en
Inventor
谌璟
孙庆新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Autocruis Technology Co ltd
Original Assignee
Shenzhen Autocruis Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Autocruis Technology Co ltd filed Critical Shenzhen Autocruis Technology Co ltd
Priority to CN201710544330.0A priority Critical patent/CN107451654B/en
Publication of CN107451654A publication Critical patent/CN107451654A/en
Application granted granted Critical
Publication of CN107451654B publication Critical patent/CN107451654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses an accelerated operation method of a convolutional neural network, a server and a storage medium, which are used for splitting a map to be split into a preset number of sub-maps and acquiring the position information of each sub-map; and performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, storing the intermediate result into an internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can run in an embedded terminal at high speed and high efficiency.

Description

Acceleration operation method of convolutional neural network, server and storage medium
Technical Field
The invention relates to the field of computer application, in particular to an accelerated operation method of a convolutional neural network, a server and a storage medium.
Background
Machine Learning (ML) and Artificial Intelligence (AI) are basic theories, methods and techniques for studying how computers can complete tasks that require human Intelligence to be sufficient, and how computer software and hardware can be used to simulate certain behaviors and ideas of human beings. AI is a science of crossing of multiple disciplines, including natural science and social science, and more relates to a plurality of disciplines such as philosophy and cognition science, mathematics, neurophysiology, psychology, computer science, information science, and cybernetics. Therefore, the barriers to theoretical research and application techniques in the AI field are high, and problems such as multidisciplinary cooperation and difficulty in implementation are involved.
In recent years, advances in Deep Learning (DL) have greatly facilitated technological advances and practical applications of ML and AI. For example, applications in which AI is a technical core have penetrated many fields, security, education, finance, medical treatment, transportation, and the like, and typical applications include: remote account opening (finance, securities), intelligent security, image recognition, natural language processing, automatic driving and the like. DL is a multi-implicit, multi-layer perceptual deep learning structure, which forms more abstract high-level representation attribute classes or features by combining low-level features to discover distributed feature representation of data. Convolutional Neural Networks (CNN) are a supervised deep learning model.
Deep learning is a method for simulating thinking ways and Processing problems of a human brain, the number of computing neurons of the human brain is in the order of billions, even a small-sized CNN network needs huge computation, almost all deep learning networks run on a Central Processing Unit (CPU) or a CPU cluster or a Graphics Processing Unit (GPU) or a GPU cluster hardware platform, required hardware resources are huge, and cost and power consumption are huge. In order to be able to apply CNN in embedded terminals, it is necessary to run CNN on a much weaker hardware platform or processor than processor cluster, which is already the biggest technical obstacle currently limiting CNN application on terminals. For this reason, it is necessary to research a method for multiplexing computing resources on a processor, which can realize that the CNN runs on a low-cost and relatively few-resource processor.
The CNN calculation process is layer-by-layer calculation and comprises a convolutional layer, a pooling layer and a full-connection layer, wherein the CNN calculation process can adopt interlayer multiplexing so as to reduce the requirement on processor hardware resources, namely all convolutional layers or pooling layers sequentially run in the same module, when designing a convolutional layer module and a pooling layer module, relevant parameters such as the size of a convolutional core and the size and the number of characteristic data map input and output by each layer of an input/input convolutional neural network need to be designed according to the maximum resource occupation condition of all layers, and the CNN network has the characteristics that: the maps of the bottom layer are large in size but small in number, and the maps of the top layer are small in size but large in number. In the interlayer multiplexing scheme, a dual port access memory (RAM), also called a Random Access Memory (RAM), needs to be designed according to a large-size and large-quantity standard, and is an internal memory for directly exchanging data with a CPU, which causes inefficient use of computing resources, and also restricts the number of parallel paths, affecting real-time performance.
Disclosure of Invention
The invention mainly aims to provide an accelerated operation method of a convolutional neural network, a server and a storage medium, and aims to solve the technical problems that the convolutional neural network in the prior art is large in calculation amount and hard to fully utilize hardware resources to accelerate calculation.
In order to achieve the above object, the present invention provides an accelerated operation method of a convolutional neural network, including the steps of:
obtaining a map to be split;
performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;
performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;
extracting intermediate results in the internal memory to participate in the next layer of operation until a final operation result is obtained;
and splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.
Preferably, before the performing the convolution and pooling operation on the map to be split, the method further includes:
and judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.
Preferably, the performing boundary supplementation on the map to be split and taking the map to be split after the boundary supplementation as a new map to be split specifically includes:
acquiring a preset number of layers and a preset convolution kernel size;
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size;
performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.
Preferably, the obtaining of the preset number of layers and the preset convolution kernel size includes:
and acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.
Preferably, the calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size specifically includes:
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size by the following formula:
Figure BDA0001342611600000031
wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array LtRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array NiRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array KijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-NiThe corresponding convolution kernel is a Kij*KijMatrix, S, T, Lt、Ni、KijAre all integers greater than 0.
Preferably, after the calculating the number of the boundary complements according to the preset number of layers and the preset convolution kernel size by the following formula, the method further includes:
performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;
splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;
wherein R0 ═ R + S, C0 ═ C + S, R is the number of rows of the map to be split in the R × C matrix, C is the number of columns of the map to be split in the R × C matrix, R0 is the number of rows of the map to be split in the R0 × C0 matrix, and C0 is the number of columns of the map to be split in the R0 × C0 matrix.
Preferably, the splitting the map to be split of the R0 × C0 matrix into a × b sub-maps according to a preset splitting manner specifically includes:
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
Preferably, after determining whether the map to be torn needs to be boundary-supplemented, the method further includes:
when the convolution and pooling operation does not need boundary supplement, splitting the map to be split of the R-C matrix into a-b sub-maps according to the preset splitting mode;
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
In addition, to achieve the above object, the present invention further provides an accelerated operation server of a convolutional neural network, including: the acceleration operation program of the convolutional neural network is configured to realize the steps of the acceleration operation method of the convolutional neural network as described above.
Further, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an accelerated operation program of a convolutional neural network, which when executed by a processor, implements the steps of the accelerated operation method of a convolutional neural network as described above.
The acceleration operation method of the convolutional neural network, provided by the invention, comprises the steps of splitting a to-be-split map into a preset number of sub-maps and acquiring the position information of each sub-map; and performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, storing the intermediate result into an internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can run in an embedded terminal at high speed and high efficiency.
Drawings
FIG. 1 is a schematic diagram of an accelerated operation server of a convolutional neural network in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for accelerating a convolutional neural network according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of a method for accelerating a convolutional neural network according to the present invention;
FIG. 4 is a flowchart illustrating a method for accelerating a convolutional neural network according to a third embodiment of the present invention;
FIG. 5 is a schematic diagram of a boundary supplement in the acceleration operation method of the convolutional neural network of the present invention;
FIG. 6 is a flowchart illustrating a fourth embodiment of an acceleration operation method of a convolutional neural network according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The solution of the embodiment of the invention is mainly as follows: the method comprises the steps of obtaining a to-be-split map, carrying out convolution and pooling on the to-be-split map, splitting the to-be-split map subjected to convolution and pooling into a preset number of sub-maps, obtaining position information of each sub-map, carrying out cross-layer operation on each sub-map respectively, obtaining an intermediate result of each layer of operation, storing the intermediate result into an internal memory, extracting the intermediate result in the internal memory to participate in the next layer of operation until a final operation result is obtained, splicing the final operation result according to the position information, and obtaining a splicing result for subsequent network calculation. By the technical scheme of the embodiment of the invention, the technical problems that the convolutional neural network in the prior art is large in calculation amount and hard to fully utilize hardware resources to accelerate calculation are solved.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an acceleration operation server of a convolutional neural network in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the acceleration operation server of the convolutional neural network may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage server separate from the processor 1001.
Those skilled in the art will appreciate that the turbo server architecture of the convolutional neural network shown in fig. 1 does not constitute a limitation of the turbo server of the convolutional neural network, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an acceleration operation program of a convolutional neural network. The acceleration operation server of the convolutional neural network calls the acceleration operation program of the convolutional neural network stored in the memory 1005 through the processor 1001, and performs the following operations:
obtaining a map to be split;
performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;
performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;
extracting intermediate results in the internal memory to participate in the next layer of operation until a final operation result is obtained;
and splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
and judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
acquiring a preset number of layers and a preset convolution kernel size;
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size;
performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
and acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size by the following formula:
Figure BDA0001342611600000081
wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array LtRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array NiRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array KijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-NiThe corresponding convolution kernel is a Kij*KijMatrix, S, T, Lt、Ni、KijAre all integers greater than 0.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;
splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;
wherein R0 ═ R + S, C0 ═ C + S, R is the number of rows of the map to be split in the R × C matrix, C is the number of columns of the map to be split in the R × C matrix, R0 is the number of rows of the map to be split in the R0 × C0 matrix, and C0 is the number of columns of the map to be split in the R0 × C0 matrix.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
Further, the processor 1001 may call the acceleration operation program of the convolutional neural network stored in the memory 1005, and further perform the following operations:
when the convolution and pooling operation does not need boundary supplement, splitting the map to be split of the R-C matrix into a-b sub-maps according to the preset splitting mode;
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
According to the scheme, the map to be split is obtained, the map to be split is subjected to convolution and pooling operation, the map to be split after the convolution and pooling operation is split into the preset number of sub-maps, the position information of each sub-map is obtained, cross-layer operation is performed on each sub-map respectively, the intermediate result of each layer of operation is obtained, the intermediate result is stored in the internal memory, the intermediate result in the internal memory is extracted to participate in the next layer of operation, so that the internal memory is fully utilized, resources are multiplexed, the interaction time with the external memory is reduced, the operation speed and the resource utilization efficiency of the CNN are greatly improved, and the CNN can operate in the embedded terminal at high speed and high efficiency.
Based on the hardware structure, the embodiment of the acceleration operation method of the convolutional neural network is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of an acceleration operation method of a convolutional neural network according to the present invention.
In a first embodiment, the method for accelerating operation of the convolutional neural network includes the following steps:
step S10, obtaining a map to be split;
it should be noted that, the map is feature data input and output in each layer of a Convolutional Neural Network (CNN), the map to be split is an input map selected from the Convolutional Neural Network CNN, a relatively large map may be selected as the input map, such as a map of a 256 × 256 matrix, or a map of other specifications, and the selection as the input map, that is, the map to be split, may be flexibly selected, and is not necessarily selected only by the size of the map, but may also be selected by other manners, which is not limited in this embodiment.
Step S20, performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;
it can be understood that the to-be-split map is subjected to convolution and pooling operations, that is, the to-be-split map is scanned according to a preset convolution kernel size and a preset moving mode, and corresponding features are extracted by combining preset weights;
it should be understood that the preset weights and the preset convolution kernel size may also be obtained through training, or may also be fixed values determined through long-term experiments or experience by those skilled in the art, or may also be numerical values obtained in other manners, which is not limited in this embodiment.
It should be noted that the map to be split after performing the convolution and pooling operations is split into a preset number of sub-maps, where the preset number may be determined according to a preset number of layers and a splitting manner, or may be determined by other parameters or other manners, which is not limited in this embodiment; because the same convolution kernel is shared in the same map to be split, the original position information can be retained after the map to be split is subjected to convolution and pooling operation; after splitting the to-be-split map after performing the convolution and pooling operations into a preset number of sub-maps, the position information of each sub-map may be quickly stored by using an internal memory, or the position information of each sub-map may be stored by using a non-volatile memory (NVRAM), or may be stored by using a double Data Rate SDRAM (DDR SDRAM), or may be stored by using other manners, which is not limited in this embodiment.
In the specific implementation, the map to be split is generally split into the smallest operation resource unit, i.e., the sub-map, after performing convolution and pooling operations on the map to be split, and the sub-map can be flexibly designed on such small unit in a hardware platform such as a Field-Programmable Gate Array (FPGA), for example, multiplexing, and fast storing and reading by using a small internal memory, so that the potential of the hardware platform can be fully explored, and the advantage that the internal resources of the FPGA are small and dispersed is fully utilized.
Step S30, performing cross-layer operation on each sub map respectively, acquiring an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;
it can be understood that the process of performing the cross-layer operation on each sub-map can be a parallel operation, that is, different sub-maps simultaneously cross the preset number of cross-layer layers, and simultaneously perform the operation of each layer; of course, all the sub-maps may be processed in batch, that is, after all the sub-maps in the first batch are operated, the sub-maps in the next batch are operated, or multiple sub-maps may be operated at a certain layer at the same time, and the operation result may be synchronously output, or cross-layer operation may be performed in other manners.
It should be noted that the intermediate result refers to a result of convolution operation of each sub map at a certain layer, that is, a calculation result of an intermediate link; the intermediate result of each layer is stored in the internal memory, and the intermediate result is stored at high speed by utilizing the advantage of high reading and writing speed of the internal memory so as to facilitate subsequent operation.
In the specific implementation, an internal memory in a hardware platform such as an FPGA is usually used to store an operation result of an intermediate link, and compared with a double Data Rate SDRAM (DDR SDRAM), the RAM in the hardware platform such as the FPGA has a faster read-write speed, and the control is simpler and more convenient.
Step S40, extracting the intermediate result in the internal memory to participate in the next layer of operation until the final operation result is obtained;
it should be noted that the final operation result refers to a result of the last convolution completed by the last layer of the preset number of cross-layer layers in the cross-layer operation of each sub map; the final operation result is an operation result obtained by the continuous participation of the intermediate result in the operation of the next layer;
it is understood that the final operation result may be a set containing the last operation result of each sub-map; certainly, the final operation result of the single sub map after the cross-layer operation of the preset number of cross-layer layers is also available; the map to be split may also be a result that is in other forms and can indicate that the map to be split is output after being subjected to cross-layer operation, which is not limited in this embodiment.
In a specific implementation, the intermediate result is generally stored in an internal memory, so that the intermediate result can be read and written repeatedly and rapidly when the next layer of operation is performed or the next operation is performed; generating a final operation result after all the cross-layer operations of the preset number of cross-layer layers are completely operated, generally storing the final operation result into a memory (namely DDR) outside a hardware platform such as the FPGA, wherein the final operation result is large in data and important for subsequent network calculation, so that the final operation result is stored into the DDR by utilizing the characteristics of high working frequency, good compatibility and permanent storage of the DDR; of course, the data may also be stored in other types of memories, and the embodiment is not limited to this.
And step S50, splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.
It should be noted that the splicing operation needs to be performed after all the sub-maps end the cross-layer operation of the preset number of cross-layer layers, and then the sub-maps are spliced according to the original positions according to the position information to form the output map, that is, the multi-layer calculation result of the to-be-split map, although the to-be-split map is split to generate the sub-maps for performing the cross-layer operation of each layer, compared with the conventional map cross-layer operation, consumes more operation resources, the speed of the cross-layer operation is greatly increased, the problem of time sequence tension is improved, and the cost is reduced.
In a specific implementation, the final operation result is spliced according to the position information, where the position information may be an index number corresponding to each sub-map, such as 00, 01, 02 … …, and the like, the index number may be stored in the DDR or stored in another memory in advance, and then each sub-map is spliced according to its own position according to the index number to obtain an output map, and certainly, other information capable of marking each sub-map may also be used, which is not limited in this embodiment.
In this embodiment, a map to be split is obtained, convolution and pooling operations are performed on the map to be split, the map to be split after the convolution and pooling operations is split into a preset number of sub-maps, and position information of each sub-map is obtained; performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory; and extracting the intermediate result in the internal memory to participate in the next layer of operation, storing the intermediate result in the internal memory, and extracting the intermediate result in the internal memory to participate in the next layer of operation, so that the internal memory is fully utilized and resources are multiplexed, the interaction time with an external memory is reduced, the operation speed and the resource use efficiency of the CNN are greatly improved, and the CNN can run in the embedded terminal at high speed and high efficiency.
Further, as shown in fig. 3, a second embodiment of the acceleration operation method of a convolutional neural network of the present invention is proposed based on the first embodiment, and in this embodiment, before the step S20, the method further includes the steps of:
and step S11, judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.
It should be noted that the boundary complementation is to perform outer-ring complementation on the map to be split, the outer-ring complementation generally includes zero complementation and the number for complementing the periphery of the sub-map, and the number of turns of the outer-ring complementation or the rows and columns of the outer-ring complementation are determined by at least one of the preset number of layers of the map to be split, the preset reduction size of layers, the convolution kernel size and the original size of the map to be split; of course, other parameters may be used to determine the number of turns of the outer ring complement or the row and column of the outer ring complement, which is not limited in this embodiment.
It can be understood that the number of the boundary supplements to the map to be split is not necessarily the number of the boundary supplements which need a fixed value, and may be flexibly configured according to the actual situation, and the number of the boundary supplements to different maps to be split is not necessarily the same, and the number of the boundary supplements, that is, the total number of the outer-ring supplements to the map to be split or the rows and columns of the outer-ring supplements to the map to be split are also affected by the current network stack, and the number of the boundary supplements to the same map to be split, that is, the outer-ring supplements, in different network frames are also not necessarily the same.
In a specific implementation, because the map to be split loses one turn of data or loses many rows and/or columns when spanning one layer, that is, each layer of operation, in the cross-layer operation, the map to be split is subjected to boundary supplementation, that is, outer-ring supplementation, in order to protect the map that needs to be used, outer-ring zero supplementation is generally performed on the map to be split, and some more zeros are supplemented according to actual conditions, so that the later-stage splitting operation is facilitated.
In this embodiment, whether the map to be split needs to be subjected to boundary supplementation is judged, if so, the map to be split is subjected to boundary supplementation, and the map to be split after the boundary supplementation is taken as a new map to be split. The boundary supplementation of the to-be-split map can prevent the data loss caused when the to-be-split map is subjected to cross-layer operation, avoid the damage of the map needing to be protected, and effectively provide the protection for the to-be-split map.
Further, as shown in fig. 4, a third embodiment of the acceleration operation method of the convolutional neural network of the present invention is proposed based on the second embodiment, and in this embodiment, the step S11 specifically includes the following steps:
s12, acquiring a preset number of cross-layer layers and a preset convolution kernel size;
it should be noted that the preset number of cross-layer layers is a preset number of cross-layer layers to be performed on the map to be split, and the preset convolution kernel size may be obtained through training, may also be a fixed value determined through long-term experiments or experience by a person skilled in the art, or may also be a numerical value obtained in other manners, which is not limited in this embodiment.
S13, calculating the number of the boundary supplements according to the preset number of cross-layer layers and the preset convolution kernel size;
it should be noted that the number of the boundary supplements, i.e. the number of turns of the outer ring complement or the rows and columns of the outer ring complement, is determined by at least one of the preset number of layers of the map to be split, the preset reduction size of layers, the convolution kernel size, and the original size of the map to be split; of course, other parameters may be used to determine the number of turns of the outer ring complement or the row and column of the outer ring complement, which is not limited in this embodiment.
It can be understood that the number of boundary supplements of the map to be split can be calculated only by presetting the number of cross-layer layers and the size of the convolution kernel, and certainly, the manner of calculating the number of boundary supplements of the map to be split is flexible and variable, and an algorithm or a calculation module specially designed may also be used for calculation, which is not limited in this embodiment.
In a specific implementation, the number of boundary complements may be calculated according to the preset number of layers and the preset convolution kernel size by the following formula:
Figure BDA0001342611600000141
wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array LtRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array NiRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array KijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-NiThe corresponding convolution kernel is a Kij*KijMatrix, S, T, Lt、Ni、KijAre all integers greater than 0. The sequence of the preset number of cross-layer layers is that N is firstly passed0A convolution layer, then 1L1Multiple pooling layers, and then N1A convolution layer passing through 1L2The pooling layer is multiplied by the same way, and finally the T th L is passed throughTThe pooling layer is complete.
Performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;
splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the R × C matrix sub map, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R × C matrix, and C is the column number of the map to be split of the R × C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein R0 ═ S, C0 ═ C + S, R is the number of rows of the map to be split of the R × C matrix, C is the number of columns of the map to be split of the R × C matrix, R0 is the number of rows of the map to be split of the R0 × C0 matrix, C0 is the number of columns of the map to be split of the R0 × C0 matrix, and S is the number of the boundary complements.
It should be understood that the number of preset cross-layer layers for the cross-layer operation is not necessarily limited to crossing over the convolutional layer and then crossing over the pooling layer, and may also be a preset cross-layer number of other types of components.
S14, performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.
It can be understood that, performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and using the map to be split after the boundary supplementation as the new map to be split, where the new map to be split can be subjected to subsequent convolution and pooling operations, and the preset number of cross-layer layers is the number of layers of the map to be split spanning the convolutional neural network; after the number of boundary supplements is consumed by performing cross-layer operation on the new map to be split after the boundary supplements are performed, the result of convolution performed on the map to be split without the boundary supplements can be reduced.
In a specific implementation, there is also a special case when the to-be-split map does not need boundary supplementation for volume and pooling operation, and at this time, the to-be-split map of the R × C matrix is split into a × b sub-maps according to the preset splitting manner; the preset splitting mode is consistent with the map to be split operation mode for performing boundary supplement when the split mode is uniform split or non-uniform split, and is not described herein again.
Based on the method shown in fig. 4, fig. 5 is a schematic diagram of boundary supplement in the acceleration operation method of the convolutional neural network of the present invention, and referring to fig. 4, the case of performing boundary supplement on the map to be split is as follows:
obtaining an input map size 240 x 320 and a convolution kernel size 3 x 3 of the convolution layer; obtaining the preset cross-layer sequence and quantity, firstly passing through 1 convolution layer, then passing through 12 times of pooling layer, and finally passing through 1 convolution layer; uniformly complementing 6 circles of 0 outside the input map, wherein the size of the input map is changed from 240 × 320 to 246 × 326; the input map is evenly complemented by 6 circles of 0, and the size of the input map is changed from 240 × 320 to 246 × 326. The input map is uniformly cut into 5 × 25 sub-maps, each having a size of 54 × 70, and 6 elements are overlapped between rows and columns of adjacent sub-maps. And each sub map is independently subjected to convolution pooling operation, peripheral edge data is discarded to obtain 24 × 32 convolution sub map data, and then 25 convolution sub maps are spliced in situ to obtain 120 × 160 convolution maps, namely the convolution result of the original input map.
It can be understood that the splitting mode is judged to be uniform splitting or non-uniform splitting to adopt different corresponding splitting modes, so that the split of the to-be-split map into the sub-maps is realized, the splitting condition can be quickly utilized to calculate the size of the sub-maps, the split process of the sub-maps is flexible and changeable, the split is conveniently and quickly carried out, and the efficiency and the speed of cross-layer operation are improved.
In the embodiment, a preset number of layers and a preset convolution kernel size are obtained; calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size; performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; through predetermineeing the number of layers of striding with the calculation mode of predetermineeing convolution kernel size can be reachd fast and go on the quantity of boundary supplementation, adjustment that can be more nimble convenient treat the quantity of boundary supplementation of split map cross-layer operation, optimized treat split map operation implementation process avoids treat the data loss that split map caused when cross-layer operation, can effectively provide right treat split map's protection.
Further, as shown in fig. 6, a fourth embodiment of the acceleration operation method of a convolutional neural network of the present invention is proposed based on the third embodiment, and in this embodiment, the step S12 specifically includes the following steps:
s121, acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.
In a specific implementation, after the number of boundary supplements is consumed by performing cross-layer operation on a new map to be split after the boundary supplements are performed, the new map to be split can be reduced to a result of performing convolution on the map to be split without the boundary supplements, the preset convolution kernel may be a convolution kernel of a common 3 × 3 or 5 × 5 matrix type, or other types of convolution kernels, or a convolution kernel type obtained through repeated training, or one or more convolution kernel types obtained according to ordinary experiments or experience, which is not limited in this embodiment, and then the size of the convolution kernel during corresponding convolution and pooling operations can be obtained according to the convolution kernel type.
In this embodiment, by obtaining the type of the preset convolution kernel for performing convolution and pooling on the to-be-split map and obtaining the size of the preset convolution kernel according to the type of the preset convolution kernel, the size of the convolution kernel to be used for performing the convolution and pooling can be determined more quickly, and the efficiency and speed of the convolution operation of the to-be-split map are improved.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an accelerated operation program of a convolutional neural network is stored on the computer-readable storage medium, and when executed by a processor, the accelerated operation program of the convolutional neural network implements the following operations:
obtaining a map to be split;
performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;
performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;
extracting intermediate results in the internal memory to participate in the next layer of operation until a final operation result is obtained;
and splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
and judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
acquiring a preset number of layers and a preset convolution kernel size;
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size;
performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
and acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size by the following formula:
Figure BDA0001342611600000181
wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array LtRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array NiRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array KijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-NiThe corresponding convolution kernel is a Kij*KijMatrix, S, T, Lt、Ni、KijAre all integers greater than 0.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;
splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;
wherein R0 ═ R + S, C0 ═ C + S, R is the number of rows of the map to be split in the R × C matrix, C is the number of columns of the map to be split in the R × C matrix, R0 is the number of rows of the map to be split in the R0 × C0 matrix, and C0 is the number of columns of the map to be split in the R0 × C0 matrix.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
Further, the acceleration operation program of the convolutional neural network further realizes the following operations when executed by a processor:
when the convolution and pooling operation does not need boundary supplement, splitting the map to be split of the R-C matrix into a-b sub-maps according to the preset splitting mode;
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
In the embodiment, a map to be split is obtained; performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map; performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory; and extracting the intermediate result in the internal memory to participate in the next layer of operation, thereby fully utilizing the internal memory and multiplexing resources, reducing the interaction time with the external memory, greatly improving the operation speed and the resource utilization efficiency of the CNN, and enabling the CNN to operate in the embedded terminal at high speed and high efficiency.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A method for accelerating a convolutional neural network, the method comprising:
the accelerating operation server obtains a map to be split;
judging whether the map to be split needs to be subjected to boundary supplementation, if so, performing boundary supplementation on the map to be split, and taking the map to be split after boundary supplementation as a new map to be split;
performing convolution and pooling operation on the map to be split, splitting the map to be split after the convolution and pooling operation into a preset number of sub-maps, and acquiring position information of each sub-map;
performing cross-layer operation on each sub map respectively to obtain an intermediate result of each layer of operation, and storing the intermediate result into an internal memory;
extracting intermediate results in the internal memory to participate in the next layer of operation until a final operation result is obtained;
and splicing the final operation result according to the position information to obtain a splicing result for subsequent network calculation.
2. The method according to claim 1, wherein the performing boundary supplementation on the map to be split and taking the map to be split after the boundary supplementation as a new map to be split specifically includes:
acquiring a preset number of layers and a preset convolution kernel size;
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size;
performing boundary supplementation on the map to be split according to the number of the boundary supplementation, and taking the map to be split after the boundary supplementation as the new map to be split; and the preset number of layers is the number of layers of the map to be split spanning the convolutional neural network.
3. The method of claim 2, wherein the obtaining the predetermined number of cross-layer layers and the predetermined convolution kernel size comprises:
and acquiring the type of a preset convolution kernel of the to-be-split map for convolution and pooling, and acquiring the size of the preset convolution kernel according to the type of the preset convolution kernel.
4. The method of claim 2, wherein the calculating the number of boundary complements based on the predetermined number of cross-layer layers and the predetermined convolution kernel size specifically comprises:
calculating the number of the boundary supplements according to the preset number of layers and the preset convolution kernel size by the following formula:
Figure FDA0002948671410000021
wherein S is the number of the boundary supplements, T represents the total number of the pooling layers in the preset cross-layer number, and an array LtRepresenting the multiple of the T-th pooling layer in the preset cross-layer number, wherein the index number is T, and T is 1-T; array NiRepresenting the number of the convolution layers between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein the index number is i, and i is 1-T; two-dimensional array KijRepresenting the convolution kernel size of a jth convolution layer between the ith-1 th pooling layer and the ith pooling layer in the preset cross-layer number, wherein j is the index number of a plurality of convolution layers between the ith-1 th pooling layer and the ith pooling layer, and j is 1-NiThe corresponding convolution kernel is a Kij*KijMatrix, S, T, Lt、Ni、KijAre all integers greater than 0.
5. The method of claim 4, wherein after calculating the number of boundary complements according to the predetermined number of cross-layer layers and the predetermined convolution kernel size by:
performing boundary supplementation on the map to be split of the R & ltC & gt matrix according to the number of the boundary supplementation to obtain the map to be split of the R0 & ltC & gt 0 matrix;
splitting the map to be split of the R0C 0 matrix into a sub maps according to a preset splitting mode;
wherein R0 ═ R + S, C0 ═ C + S, R is the number of rows of the map to be split in the R × C matrix, C is the number of columns of the map to be split in the R × C matrix, R0 is the number of rows of the map to be split in the R0 × C0 matrix, and C0 is the number of columns of the map to be split in the R0 × C0 matrix.
6. The method of claim 5,
splitting the map to be split of the R0 × C0 matrix into a × b sub-maps according to a preset splitting mode specifically includes:
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
7. The method of claim 6, wherein after determining whether the map to be torn requires boundary replenishment, the method further comprises:
when the convolution and pooling operation does not need boundary supplement, splitting the map to be split of the R-C matrix into a-b sub-maps according to the preset splitting mode;
when the preset splitting mode is uniform splitting, obtaining preset sizes R and C of the sub map of the R x C matrix, wherein a is R/R, b is C/C, R is the row number of the map to be split of the R x C matrix, and C is the column number of the map to be split of the R x C matrix; r is the row number of the r × c matrix submap, and c is the column number of the r × c matrix submap;
when the preset splitting mode is non-uniform splitting, a is up (R/R), and b is up (C/C); the sub-maps in the a row and the b column are [ R-R down (R/R) ]. sub-maps with the size of [ R-R down (R/R) ]. sub-map; the sub-maps in the a x b sub-maps in the 1 st to the a-1 st rows and in the b th column have the size r [ C-C down (C/C) ]; the sub-map sizes of the a and b sub-maps in the a row and the b column are [ R-R-down (R/R) ] [ C-C-down (C/C) ]; the sizes of sub-maps except sub-maps in the a row and/or the b column are r & ltc & gt;
wherein up (x) represents rounding up the real number x in parentheses, and down (x) represents rounding down the real number x in parentheses; r is the row number of the map to be split of the R-C matrix, and C is the column number of the map to be split of the R-C matrix; r is the row number of the R-C matrix submap, C is the column number of the R-C matrix submap, and a, b, R, C, R and C are positive integers.
8. An accelerated operation server of a convolutional neural network, comprising: a memory, a processor, and an accelerated operation program of a convolutional neural network stored on the memory and executable on the processor, the accelerated operation program of the convolutional neural network being configured to implement the steps of the accelerated operation method of the convolutional neural network according to any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an accelerated operation program of a convolutional neural network, which when executed by a processor, realizes the steps of the accelerated operation method of a convolutional neural network as set forth in any one of claims 1 to 7.
CN201710544330.0A 2017-07-05 2017-07-05 Acceleration operation method of convolutional neural network, server and storage medium Active CN107451654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710544330.0A CN107451654B (en) 2017-07-05 2017-07-05 Acceleration operation method of convolutional neural network, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710544330.0A CN107451654B (en) 2017-07-05 2017-07-05 Acceleration operation method of convolutional neural network, server and storage medium

Publications (2)

Publication Number Publication Date
CN107451654A CN107451654A (en) 2017-12-08
CN107451654B true CN107451654B (en) 2021-05-18

Family

ID=60488325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710544330.0A Active CN107451654B (en) 2017-07-05 2017-07-05 Acceleration operation method of convolutional neural network, server and storage medium

Country Status (1)

Country Link
CN (1) CN107451654B (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437032B2 (en) 2017-09-29 2022-09-06 Shanghai Cambricon Information Technology Co., Ltd Image processing apparatus and method
CN107844828B (en) * 2017-12-18 2021-07-30 南京地平线机器人技术有限公司 Convolution calculation method in neural network and electronic device
CN108122031B (en) * 2017-12-20 2020-12-15 杭州国芯科技股份有限公司 Low-power consumption neural network accelerator device
CN108074211B (en) * 2017-12-26 2021-03-16 浙江芯昇电子技术有限公司 Image processing device and method
US11874898B2 (en) 2018-01-15 2024-01-16 Shenzhen Corerain Technologies Co., Ltd. Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal
CN109313723B (en) * 2018-01-15 2022-03-15 深圳鲲云信息科技有限公司 Artificial intelligence convolution processing method and device, readable storage medium and terminal
CN109313663B (en) * 2018-01-15 2023-03-31 深圳鲲云信息科技有限公司 Artificial intelligence calculation auxiliary processing device, method, storage medium and terminal
US11663002B2 (en) 2018-02-13 2023-05-30 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
EP3651075B1 (en) 2018-02-13 2021-10-27 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
CN116991226A (en) 2018-02-14 2023-11-03 上海寒武纪信息科技有限公司 Control device, method and equipment of processor
CN108364320B (en) * 2018-03-29 2021-12-21 深圳市自行科技有限公司 Camera calibration method, terminal device and computer readable storage medium
CN110321999B (en) * 2018-03-30 2021-10-01 赛灵思电子科技(北京)有限公司 Neural network computational graph optimization method
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN110321998B (en) * 2018-03-31 2022-06-14 赛灵思公司 Convolutional neural network implementation method and device, acceleration equipment and storage medium
CN108647776A (en) * 2018-05-08 2018-10-12 济南浪潮高新科技投资发展有限公司 A kind of convolutional neural networks convolution expansion process circuit and method
EP3624020A4 (en) 2018-05-18 2021-05-05 Shanghai Cambricon Information Technology Co., Ltd Computing method and related product
CN110533666B (en) * 2018-05-25 2022-09-23 杭州海康威视数字技术股份有限公司 Method for obtaining data block size, method and device for processing data
WO2020001438A1 (en) 2018-06-27 2020-01-02 上海寒武纪信息科技有限公司 On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
CN110865792B (en) * 2018-08-28 2021-03-19 中科寒武纪科技股份有限公司 Data preprocessing method and device, computer equipment and storage medium
WO2020042739A1 (en) * 2018-08-28 2020-03-05 中科寒武纪科技股份有限公司 Data preprocessing method and apparatus, computer device, and storage medium
CN112732601A (en) * 2018-08-28 2021-04-30 中科寒武纪科技股份有限公司 Data preprocessing method and device, computer equipment and storage medium
WO2020062392A1 (en) 2018-09-28 2020-04-02 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
CN109146065B (en) * 2018-09-30 2021-06-08 中国人民解放军战略支援部队信息工程大学 Convolution operation method and device for two-dimensional data
CN109615059B (en) * 2018-11-06 2020-12-25 海南大学 Edge filling and filter expansion operation method and system in convolutional neural network
CN111385462A (en) 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
CN109509475B (en) * 2018-12-28 2021-11-23 出门问问信息科技有限公司 Voice recognition method and device, electronic equipment and computer readable storage medium
CN109886390B (en) * 2019-01-10 2023-11-24 平安科技(深圳)有限公司 Convolutional neural network model optimization method, device, computer equipment and storage medium
CN109886400B (en) * 2019-02-19 2020-11-27 合肥工业大学 Convolution neural network hardware accelerator system based on convolution kernel splitting and calculation method thereof
CN111832737B (en) 2019-04-18 2024-01-09 中科寒武纪科技股份有限公司 Data processing method and related product
US20200334522A1 (en) 2019-04-18 2020-10-22 Cambricon Technologies Corporation Limited Data processing method and related products
US11676029B2 (en) 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products
CN112085184B (en) 2019-06-12 2024-03-29 上海寒武纪信息科技有限公司 Quantization parameter adjustment method and device and related product
CN113807489B (en) * 2020-06-17 2024-04-02 安徽寒武纪信息科技有限公司 Method for performing deconvolution operation, board card and computing device thereof
CN111738424B (en) * 2020-06-29 2023-12-26 湖南国科微电子股份有限公司 Neural network processing method and device, electronic equipment and storage medium
CN114070675A (en) * 2020-08-05 2022-02-18 展讯半导体(南京)有限公司 AI network model matching method and device, storage medium and user equipment
CN112862100B (en) * 2021-01-29 2022-02-08 网易有道信息技术(北京)有限公司 Method and apparatus for optimizing neural network model inference

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106447030A (en) * 2016-08-30 2017-02-22 深圳市诺比邻科技有限公司 Computing resource optimization method and system of convolutional neural network
CN106709441A (en) * 2016-12-16 2017-05-24 北京工业大学 Convolution theorem based face verification accelerating method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
CN106447030A (en) * 2016-08-30 2017-02-22 深圳市诺比邻科技有限公司 Computing resource optimization method and system of convolutional neural network
CN106709441A (en) * 2016-12-16 2017-05-24 北京工业大学 Convolution theorem based face verification accelerating method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于预决策的卷积神经网络加速研究;林辉煌;《中国优秀硕士学位论文全文数据库(电子期刊)》;20170315;全文 *

Also Published As

Publication number Publication date
CN107451654A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
CN107451654B (en) Acceleration operation method of convolutional neural network, server and storage medium
CN110546611B (en) Reducing power consumption in a neural network processor by skipping processing operations
CN107437110B (en) Block convolution optimization method and device of convolutional neural network
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
DE112020003127T5 (en) Extension of dynamic processing element array
US11348004B2 (en) Method of managing data representation for deep learning, method of processing data for deep learning and deep learning system performing the same
US11586903B2 (en) Method and system of controlling computing operations based on early-stop in deep neural network
CN109843401B (en) AI object behavior model optimization method and device
CN113822209B (en) Hyperspectral image recognition method and device, electronic equipment and readable storage medium
CN111465943A (en) On-chip computing network
CN111325664A (en) Style migration method and device, storage medium and electronic equipment
US11144291B1 (en) Loop-oriented neural network compilation
CN113705775A (en) Neural network pruning method, device, equipment and storage medium
CN113449859A (en) Data processing method and device
CN111709415B (en) Target detection method, device, computer equipment and storage medium
DE112020005789T5 (en) HIERARCHICAL PARTITIONING OF OPERATORS
DE102022128165A1 (en) DATA PATH CIRCUIT DESIGN USING REINFORCEMENT LEARNING
CN110009644B (en) Method and device for segmenting line pixels of feature map
CN113761220A (en) Information acquisition method, device, equipment and storage medium
CN110837567A (en) Method and system for embedding knowledge graph
CN111639523B (en) Target detection method, device, computer equipment and storage medium
DE112020006070T5 (en) HARDWARE ACCELERATOR WITH RECONFIGURABLE INSTRUCTION SET
DE102021107510A1 (en) TRAINING OF A NEURAL NETWORK UNDER MEMORY RESTRICTION
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN103955443A (en) Ant colony algorithm optimization method based on GPU (Graphic Processing Unit) acceleration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant