CN113743587A - Convolutional neural network pooling calculation method, system and storage medium - Google Patents

Convolutional neural network pooling calculation method, system and storage medium Download PDF

Info

Publication number
CN113743587A
CN113743587A CN202111056544.6A CN202111056544A CN113743587A CN 113743587 A CN113743587 A CN 113743587A CN 202111056544 A CN202111056544 A CN 202111056544A CN 113743587 A CN113743587 A CN 113743587A
Authority
CN
China
Prior art keywords
pooling
data
row
line
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111056544.6A
Other languages
Chinese (zh)
Other versions
CN113743587B (en
Inventor
徐天赐
景璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111056544.6A priority Critical patent/CN113743587B/en
Publication of CN113743587A publication Critical patent/CN113743587A/en
Priority to PCT/CN2022/078265 priority patent/WO2023035557A1/en
Application granted granted Critical
Publication of CN113743587B publication Critical patent/CN113743587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network pooling calculation method and a storage medium, wherein the storage medium comprises the following components: receiving data input in a one-dimensional row vector form from the convolutional layer output and storing the data input in a pooling computing unit; establishing a line pooling cache, reading data of each line pooling calculation from line vector input data, and sequentially performing line pooling calculation on each line vector input data based on a line vector sliding window to obtain line pooling output data and storing the line pooling output data in the line pooling cache; and responding to the line pooling output data in the line pooling cache to meet the required data size of column pooling calculation, reading data of each column pooling calculation from the line pooling cache, and sequentially performing column pooling calculation on the line pooling output data based on a column vector sliding window to obtain column pooling output data. By the scheme of the invention, the continuity of continuous loading and pooling calculation of the input feature map data is realized, the repeated loading data is less, the calculation speed is improved, and the method can be flexibly adapted to different types of convolutional neural network models.

Description

Convolutional neural network pooling calculation method, system and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a convolutional neural network pooling computing method, a convolutional neural network pooling computing system and a storage medium.
Background
The convolutional neural network is one of artificial neural networks, and is widely applied to the fields of image classification, target recognition, behavior recognition, voice recognition, natural language processing, document classification and the like. The structure of a convolutional neural network generally consists of an input layer, a plurality of hidden layers and an output layer. The input layer is used for multi-dimensional input data (such as color pictures), the output layer is used for outputting recognition results, and the hidden layer is used for neural network calculation. The neural network calculation operators include convolution, excitation function, pooling, Batch Norm, full join calculation, and the like. The input of the first hidden layer is multidimensional input data input through the input layer, the output is a characteristic diagram, the input of other hidden layers is a characteristic diagram of the previous layer, and the output is a characteristic diagram of the current layer.
In recent years, with the increase of computer computing power and the development of convolutional neural network structures, the recognition accuracy of convolutional neural networks has been greatly improved, but at the same time, the depth of convolutional neural networks has been deepened, network structures have become more complex, and the amount of computation has become larger and larger, so heterogeneous computing devices such as a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit) and the like are required to accelerate convolutional neural network inference computation.
Pooling calculation is a common calculation process in a convolutional neural network, and is mainly implemented by dividing a pooling area with the size of N × N on a characteristic diagram, performing certain operation on data of the pooling area, and finally obtaining an output result, so that the effect of reducing the scale of image data is achieved. The pooling calculation method is mainly divided into maximum pooling and average pooling. The maximum pooling is taking the maximum value in the pooled region as output, and the average pooling is taking the average of all data in the pooled region as output.
The conventional convolutional neural network pooling computing unit generally performs pooling computing by caching all data of a pooling area, and performs repeated loading or cyclic updating on the data of the pooling area. For the data reloading mode, the data of the pooled region of the input feature map is reloaded every time the pooling calculation is started, and a large part of the data of the pooled regions of the input feature maps corresponding to two adjacent output data in the result of the pooling calculation is overlapped. For the data cycle updating mode, only the non-overlapped part of the input data caused by the movement of the pooling area is updated in the process of carrying out pooling calculation. Although the circular updating mode of the pooled region can avoid the repeated loading of data, the pooled region data loaded by each calculation in the two modes are two-dimensional non-continuous storage data, the optimal memory access bandwidth of the memory cannot be played when the pooled region data is loaded, and the calculation complexity of addressing is increased. In addition, this calculation is sensitive to the size of the pooling area, and pooling calculations of different sizes typically require different hardware circuits or instruction flows to implement.
Therefore, a pooling optimization scheme that can solve the problem that pooling layer data in a deep neural network must be completely loaded for pooling is needed. To improve the pooling capability of the pooling layer in deep neural networks applied to various artificial intelligence services, such as face recognition, speech recognition, natural language processing, and the like. Providing faster deep neural network computation speed.
Disclosure of Invention
In view of the above, the present invention provides a convolutional neural network pooling calculation method, system and storage medium, wherein pooling calculation is divided into two steps of row pooling calculation and column pooling calculation, so that the two steps are performed in parallel, the repeated loading data is less, the calculation speed is increased, the continuity of continuous loading of input feature map data and pooling calculation is realized, and different types of convolutional neural network models can be flexibly adapted by flexibly setting row vector output size, calculation type, sliding window size and the like.
In view of the foregoing, an aspect of the embodiments of the present invention provides a convolutional neural network pooling calculation method, including:
receiving data input in a one-dimensional row vector form from the convolutional layer output and storing the data input in a pooling computing unit; establishing a line pooling cache, reading data of each line pooling calculation from line vector input data, and sequentially performing line pooling calculation on each line vector input data based on a line vector sliding window to obtain line pooling output data and storing the line pooling output data in the line pooling cache;
and responding to the line pooling output data in the line pooling cache to meet the required data size of column pooling calculation, reading data of each column pooling calculation from the line pooling cache, and sequentially performing column pooling calculation on the line pooling output data based on a column vector sliding window to obtain column pooling output data.
In some embodiments of the invention, receiving and storing data input in the form of one-dimensional row vectors from the convolutional layer output to a pooling computation unit, further comprises:
and establishing an edge data cache, and storing the edge data of the line vector input data into the edge data cache.
In some embodiments of the present invention, reading data per row pooling calculation from row vector input data further comprises:
and responding to the data read by the current line pooling calculation containing the edge data of the last line vector input data, and reading the edge data of the current pooling calculation from the edge data cache.
In some embodiments of the invention, the method further comprises:
and in response to the completion of the calculation of the edge data of the previous line vector input data, storing the edge data of the line vector input data in which the line pooling calculation is positioned in an edge data cache.
In some embodiments of the present invention, reading data for each column pooling calculation from the row pooling output data comprises:
and reading the row pooling output data of each column pooling calculation from the row pooling output data and storing the row pooling output data into a row pooling cache in the form of an input feature map.
In some embodiments of the present invention, performing column pooling calculations on the row pooled output data in sequence based on a column vector sliding window to obtain column pooled output data comprises:
and performing row-column pooling calculation on the data read this time and the data in the row pooling cache based on the column vector sliding window to obtain the row-column pooling output data in response to that the number of the read row-pooling output data reaches the number of preset row input feature map data.
In some embodiments of the present invention, in response to that the number of the read line-pooled output data reaches the number of preset line-input feature map data, performing a line-pooled calculation on the data read this time and data in a line-pooled buffer based on the column vector sliding window to obtain the column-pooled output data this time, further including:
and storing the row pooling output data of the current row pooling calculation into the row pooling cache for the next row pooling calculation to obtain the next row pooling calculation output data.
Yet another aspect of the present invention further provides a convolutional neural network pooling computing system, including:
an input module configured to receive data input in the form of one-dimensional row vectors from the convolutional layer output and store to the pooling computing unit;
the line pooling computing module is configured to establish a line pooling cache, read data of each line pooling computing from the line vector input data, and sequentially perform line pooling computing on each line vector input data based on a line vector sliding window to obtain line pooling output data and store the line pooling output data in the line pooling cache;
a column pooling calculation module configured to, in response to row pooling output data in the row pooling cache satisfying a required data size for column pooling calculation, read data for each column pooling calculation from the row pooling cache, and sequentially perform column pooling calculation on the row pooling output data based on a column vector sliding window to obtain column pooling output data.
In another aspect of the embodiments of the present invention, there is provided a convolutional neural network pooling computing method, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In another aspect of the embodiments of the present invention, a computer storage medium is further provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: by dividing the pooling calculation into two steps of row pooling calculation and column pooling calculation, the two steps are performed in a parallel flow mode, the repeated loading data is less, the calculation speed is improved, the continuity of continuous loading and pooling calculation of input feature map data is realized, and different types of convolutional neural network models can be flexibly adapted by flexibly setting the row vector output size, the calculation type, the sliding window size and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a block diagram of an embodiment of a convolutional neural network pooling calculation method provided by the present invention;
FIG. 2 is a diagram illustrating an embodiment of a column pooling calculation process provided by the present invention;
FIG. 3 is a diagram illustrating an embodiment of a row pooling output data reading and updating process provided by the present invention;
FIG. 4 is a diagram of one embodiment of a convolutional neural network pooling computing system provided by the present invention;
FIG. 5 is a schematic structural diagram of an embodiment of a computer device provided in the present invention;
FIG. 6 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present invention;
FIG. 7 is a diagram of an embodiment of a convolutional neural network pooling computing system provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In order to solve the problem that output data of a convolutional layer in a deep neural network is in a pooling layer, the invention establishes two caches to realize rapid transfer of the data in the pooling process of the pooling layer in the deep neural network, loads the whole pooling data in the original neural network during pooling operation and then performs pooling calculation, and changes the operation into a mode of realizing pooling operation of the pooling layer by loading while calculating and outputting in a streamlined operation mode. Namely, the row pooling and the column pooling of the pooling operation are basically in a state of synchronous completion, and the occupation of a GPU memory or a CPU memory in the pooling process is extremely low. The solution of the present invention can be widely applied to AI applications for designing deep neural networks, specifically including neural networks involving pooling operations such as face recognition, speech recognition, natural language processing, and the like, and it is necessary to cooperate with output data of a convolutional layer on a pooling layer as one-dimensional output data.
In particular, in a first aspect of embodiments of the present invention, embodiments of a convolutional neural network pooling calculation method are presented. As shown in fig. 1, it includes the following steps:
s101, receiving data input in a one-dimensional row vector form output by the convolutional layer and storing the data input in a pooling computing unit;
s103, establishing a line pooling cache, reading data of each line pooling calculation from line vector input data, and sequentially performing line pooling calculation on each line vector input data based on a line vector sliding window to obtain line pooling output data and storing the line pooling output data in the line pooling cache;
and S105, responding to the fact that the row pooling output data in the row pooling cache meet the required data size of column pooling calculation, reading the data of each column pooling calculation from the row pooling cache, and sequentially performing column pooling calculation on the row pooling output data based on a column vector sliding window to obtain column pooling output data.
In this embodiment, in step S101, the pooling operation of the pooling layer is performed by receiving output data from the convolutional layer, which is one-dimensional data in this embodiment.
In step S02, a certain size data space is created from the GPU memory or the host memory, which is not too large, for storing the data generated by the row pooling calculation, and the size of the row pooling sliding window is determined according to the size of the column pooling sliding window, for example, the size of the column pooling sliding window is 3 rows and 6 columns, the space of the row pooling buffer can be set to 5 rows and 6 columns, and then a space is created in the GPU memory or the host memory according to the data bit width of each column of each row, in the embodiment of the present invention, the output data is int8 integer data, for example, data occupies 8 bits of space, the data range is [ -128,0,127], and the size of the space created according to the size of the column pooling sliding window is 5 × 6 × 8 ═ 240bit ═ 30byte, only 32 bytes of data space is needed, and larger or smaller data space can be created according to needs. For example, when the data type of the convolutional layer output data is fp32 floating point number, one data bit occupies 4 bytes, and according to the size of the sliding window, the line pooling buffer size should be 120 bytes, and the opened space should be 128 bytes. And flexibly setting the size of the line pooling cache according to the data type and the data length of the convolutional layer output data and the size of the column pooling sliding window.
In step S05, if the row count of the output data of the row pooling calculation stored in the row pooling buffer is equal to the column pooling data sliding window calculation size, the data satisfying the size is read from the row pooling buffer to perform the column pooling calculation. And outputting the calculation result of column pooling after calculation. The row pooling calculation continuously outputs data to the row pooling cache, and the column pooling calculation reads and calculates the data in the row pooling cache one by one next line after updating one row of pooled output data, and deletes the oldest row of data. For example, when the current column pooling is completed, the data of 1, 2 and 3 rows are calculated, the row pooling calculation after the calculation is completed writes a new row at row 4, and when the column pooling is performed, the data is the data of 2, 3 and 4 rows, and the data of the first row is deleted, the data in the row pooling buffer is exemplified as the data of 1 st, 2 nd and 3 rd, and is actually the mapping address of the buffer. And performing pooling calculation of the pooling layer according to the flow. It can be ensured that the column pooling calculations end in succession at the same time as the row pooling ends. The computational efficiency of the pooling layer is greatly improved.
Specifically, assuming that the input feature map has a size of H × W, when performing convolution pooling calculation, data in the input feature map is input to the pooling calculation unit in the form of a fixed one-dimensional row vector, the size of the row vector input data may be flexibly set, for example, to 8, and the following describes the process of row pooling calculation and column pooling calculation in the embodiment of the present invention with row vector input data having a size of 8.
The line pooling calculation unit reads the line vector input data, sets the size of a line vector sliding window in advance, for example, 4, and calculates each data in the line vector data in turn based on the line vector sliding window to obtain line calculation output data, wherein the number of the line calculation output data is 8 at this time.
And sequentially reading the row calculation output data as input data of column pooling calculation, setting the size of a column vector sliding window to be 4 as same as that of a row vector sliding window, and starting to sequentially read the row calculation output data for column pooling calculation when the column pooling calculation unit waits for 4 rows of the row calculation output data (namely, the row vector data for column calculation) to obtain the column pooling output data. The size of the row vector data for performing the column pooling calculation is not necessarily the same as the size of the row vector data for performing the row pooling calculation, and can be flexibly set, and is generally set based on the input feature map size.
It should be noted that both the row pooling calculation and the column pooling calculation may be calculated by the maximum value of the dimension or by the average value of the dimension. For example, if the pooling type is maximum pooling of size N x N, then both the row pooling calculation and the column pooling calculation are one-dimensional maximum calculations of size N; if the pooling type is an average pooling of size N, then both the row pooling calculation and the column pooling calculation are one-dimensional average calculations of size N.
Further, as shown in fig. 7 and 3, in fig. 7, in the neural network of the present invention, the output data of the convolutional layer is input as input data to the pooling layer. Particularly, according to the concept of the present invention, since one-dimensional data is more convenient to be transmitted between layers or operators in a deep neural network, data of convolutional layers in the present invention are all set to be one-dimensional output data, that is, data input to a pooling layer for performing row pooling in a row pooling process shown in fig. 7 are all one-dimensional data, as shown in a first row data pooling calculation of fig. 7, a size of a pooling sliding window (shaded portion) is 3, and the number of row pooling data is 6, according to a pooling principle, a maximum value (max-posing) sample of data in a shaded area in the graph may be selected as a first output, or a mean value (mena-posing) sample may be selected as a first output data. After performing the first round of pooling, the sliding window is moved to the right according to the step size, which is 1 in this embodiment. Performing pooling sampling on the second output data, repeating the steps until the right edge of the sliding window moves to the edge of the first row of data, wherein the stepping distance is 4 and the number of the output data is 4, namely pooling one-dimensional data is completed, the output one-dimensional data with the length of 4 is taken as row pooling output data, and then performing the row pooling process on the one-dimensional input data of the second row to obtain second row pooling output data. And repeating the steps to pool the one-dimensional data output by the convolution layer. When the row pooling data output by the row pooling calculation satisfies the column pooling calculation size data, the calculation of the row pooling output data by column pooling is shown in the right-hand portion of fig. 7. Fig. 7 shows a column pooling calculation window of 3 rows and 4 columns, and the output column pooling output data is obtained by using the maximum value or the average value of the data within the window position (within the shadow) as the output data as well as the row pooling calculation.
In some embodiments of the present invention, the line pooling calculation can be performed in parallel, because the one-dimensional data is easy to transmit in the calculation of the depth network, and the one-dimensional data transferred from the convolutional layer can be performed in parallel in the GPU using a plurality of physical operation units. The size of the one-dimensional data is only required to be matched with the size of the data storage space of the physical operation unit.
Through the embodiment, it can be seen that the pooling computing unit loads data in a row-vectorization manner, the row vector data is processed in parallel in a one-dimensional sliding window manner in the row pooling computing process, an intermediate computing result of the row pooling computing is obtained, the intermediate computing result enters the column pooling computing in a pipelining manner, the row vectors are processed in parallel in the column pooling computing, the continuity of the intermediate result after the row pooling computing entering the column sliding window in the column pooling computing is ensured, the repeatedly loaded data is less, the computing speed is improved, the continuity of the continuous loading of the input feature map data and the pooling computing is realized, and the flexibility of adapting to different convolutional neural networks is improved by flexibly setting the row vector output size, the computing type, the sliding window size and the like.
In some embodiments, inputting and storing the input feature map data in the form of one-dimensional row vectors to the pooling computing unit further comprises:
and establishing an edge data cache, and storing the edge data of the line vector input data into the edge data cache.
Specifically, the edge data is determined based on the sliding window size, which is generally the sliding window size minus 1, for example, the size of the row vector input data is 8, the size of the sliding window is 4, and then the last 3 data of the row vector input data are edge data.
In some embodiments, reading data from the row vector input data for each row pooling calculation further comprises:
and responding to the data read by the current line pooling calculation containing the edge data of the last line vector input data, and reading the edge data of the current pooling calculation from the edge data cache.
In some embodiments, the method further comprises:
and in response to the completion of the calculation of the edge data of the previous line vector input data, storing the edge data of the line vector input data in which the line pooling calculation is positioned in an edge data cache.
In some embodiments, reading data for each column pooling calculation from the row pooling output data comprises:
and reading the row pooling output data of each column pooling calculation from the row pooling output data and storing the row pooling output data into a row pooling cache in the form of an input feature map.
In some embodiments, sequentially performing column pooling calculations on the row pooled output data based on a column vector sliding window to obtain column pooled output data, comprises:
and performing row-column pooling calculation on the data read this time and the data in the row pooling cache based on the column vector sliding window to obtain the row-column pooling output data in response to that the number of the read row-pooling output data reaches the number of preset row input feature map data.
In some embodiments, in response to that the number of the read row-pooling output data reaches the number of preset row-input feature map data, performing row-pooling calculation on the data read this time and data in the row-pooling buffer based on the column vector sliding window to obtain the column-pooling output data, further includes:
and storing the row pooling output data of the current row pooling calculation into the row pooling cache for the next row pooling calculation to obtain the next row pooling calculation output data.
Several embodiments of the present invention are described below with reference to specific examples.
For example, the input feature map data size is 16 × 16, the row vector data size for the row pooling calculation is 8, the row vector sliding window and column vector buffer size are both 4, and the edge data size is 3.
Continuously inputting the row vector data into a pooling calculation unit from the input feature map, sequentially reading each row vector data by the row pooling unit, and performing row pooling calculation on the row vector data based on a row vector sliding window, wherein the row vector sliding window comprises an address range of the row pooling input data used for performing row pooling operation each time, and completing the pooling calculation on the row pooling input data of each row along with the movement of the row vector sliding window. In this embodiment, since the size of the row vector sliding window is set to 4, 4 data are used for calculation each time a row vector is calculated, and then in the calculation process of each row vector data, the first 3 data all use the edge data of the previous row vector data, so the edge data of the previous row vector data needs to be put into the edge data cache for use by the current row vector data, and when the current row vector data calculates the 4 th data, the previous row vector data is not used any more, the edge data of the current row vector data is put into the edge data cache for use by the next row vector data.
In this case, the column pooling calculation is performed in parallel, but when the column pooling calculation is performed for the first time, it is necessary to wait for 3 lines of data in the line pooling buffer before starting the column pooling calculation, and the line vector data size of the column pooling calculation is generally equal to the size of each line in the input feature map data, so that the line vector data size here is 16, that is, after 3 × 16 lines of data are present in the line pooling buffer, the column pooling calculation is started.
The column pooling computing unit sequentially reads the row pooling output data and puts it at the last position of the row pooling output data, and the other 3 row pooling output data come from the row pooling cache, as shown in fig. 2, which is a schematic diagram of the reading and updating process of the row pooling output data. And performing column pooling calculation on the read 4 line pooling output data based on the column vector sliding window, and storing the line pooling output data of the calculation into a line pooling cache to cover the data of the previous calculation after the calculation is finished. As shown in fig. 3, the column pooling calculation process is schematically illustrated and implemented by a column vector sliding window, where the column vector sliding window includes an address range of column pooled input data (i.e., row pooled output data) used for each column pooling operation, and the column pooled input data is pooled according to the movement of the column sliding window, so as to obtain a row-vectored pooled calculation output data.
Through the embodiment, the pooling computing unit continuously loads data in a row vector mode, the row vector data is subjected to one-dimensional sliding window type pipelining parallel processing in the row pooling computing process to obtain the intermediate computing result of the row pooling computing, the intermediate computing result is subjected to row pooling computing in a pipelining mode, the row vectors are subjected to row pooling computing in a parallel mode, only a small amount of data needs to be repeatedly loaded in the row pooling computing process, the continuity of the intermediate result after the row pooling computing enters the row sliding window in the row pooling computing can be guaranteed, the computing speed is improved, the continuity of continuous loading and pooling computing of input feature map data is achieved, and the flexibility of adapting to different convolutional neural networks is improved through flexible setting of the row vector output size, the computing type, the sliding window size and the like.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, the present invention further provides a convolutional neural network pooling computing system, comprising:
an input module 110 configured to receive data input in the form of one-dimensional row vectors from the convolutional layer output and store to the pooling computing unit;
a line pooling calculation module 120 configured to establish a line pooling cache, read data of each line pooling calculation from the line vector input data, and sequentially perform line pooling calculation on each line vector input data based on a line vector sliding window to obtain line pooling output data to be stored in the line pooling cache;
a column pooling calculation module 130 configured to, in response to the row pooling output data in the row pooling cache satisfying a required data size for column pooling calculation, read data of each column pooling calculation from the row pooling cache, and sequentially perform column pooling calculation on the row pooling output data based on a column vector sliding window to obtain column pooling output data.
In some embodiments, the input module is further configured to: and storing the edge data of the line vector input data into an edge data cache.
In some embodiments, reading data from the row vector input data for each row pooling calculation further comprises:
and responding to the data read by the current line pooling calculation containing the edge data of the last line vector input data, and reading the edge data of the current pooling calculation from the edge data cache.
In some embodiments, the column pooling calculation module is further configured to:
and in response to the completion of the calculation of the edge data of the previous line vector input data, storing the edge data of the line vector input data in which the line pooling calculation is positioned in an edge data cache.
In some embodiments, reading data for each column pooling calculation from the row pooling output data comprises:
and reading the row pooling output data of each column pooling calculation from the row pooling output data and storing the row pooling output data into a row pooling cache in the form of an input feature map.
In some embodiments, sequentially performing column pooling calculations on the row pooled output data based on a column vector sliding window to obtain column pooled output data, comprises:
and performing row-column pooling calculation on the data read this time and the data in the row pooling cache based on the column vector sliding window to obtain the row-column pooling output data in response to that the number of the read row-pooling output data reaches the number of preset row input feature map data.
In some embodiments, in response to that the number of the read row-pooling output data reaches the number of preset row-input feature map data, performing row-pooling calculation on the data read this time and data in the row-pooling buffer based on the column vector sliding window to obtain the column-pooling output data, further includes:
and storing the row pooling output data of the current row pooling calculation into the row pooling cache for the next row pooling calculation to obtain the next row pooling calculation output data.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 5, an embodiment of the present invention further provides a computer device 20, in which the computer device 20 includes a processor 210 and a memory 220, the memory 220 stores a computer program 221 executable on the processor, and the processor 210 executes the program to perform the following method steps:
inputting and storing input feature map data into a pooling computing unit in a one-dimensional row vector form;
reading data of each row pooling calculation from the row vector input data, and sequentially performing the row pooling calculation on the row vector input data based on the row vector sliding window to obtain row pooling output data;
and reading data of each column pooling calculation from the row pooling output data, and sequentially performing column pooling calculation on the row pooling output data based on a column vector sliding window to obtain column pooling output data.
In some embodiments, inputting and storing the input feature map data in the form of one-dimensional row vectors to the pooling computing unit further comprises:
storing edge data of the row vector input data to an edge data buffer.
In some embodiments, reading data from the row vector input data for each row pooling calculation further comprises:
and responding to the data read by the current line pooling calculation containing the edge data of the last line vector input data, and reading the edge data of the current pooling calculation from the edge data cache.
In some embodiments, the steps further comprise:
and in response to the completion of the calculation of the edge data of the previous line vector input data, storing the edge data of the line vector input data in which the line pooling calculation is positioned in an edge data cache.
In some embodiments, reading data for each column pooling calculation from the row pooling output data comprises:
and reading the row pooling output data of each column pooling calculation from the row pooling output data and storing the row pooling output data into a row pooling cache in the form of an input feature map.
In some embodiments, sequentially performing column pooling calculations on the row pooled output data based on a column vector sliding window to obtain column pooled output data, comprises:
and performing row-column pooling calculation on the data read this time and the data in the row pooling cache based on the column vector sliding window to obtain the row-column pooling output data in response to that the number of the read row-pooling output data reaches the number of preset row input feature map data.
In some embodiments, in response to that the number of the read row-pooling output data reaches the number of preset row-input feature map data, performing row-pooling calculation on the data read this time and data in the row-pooling buffer based on the column vector sliding window to obtain the column-pooling output data, further includes:
and storing the row pooling output data of the current row pooling calculation into the row pooling cache for the next row pooling calculation to obtain the next row pooling calculation output data.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 6, an embodiment of the present invention further provides a computer storage medium 30, and the computer storage medium 30 stores a computer program 310 which executes the above method when executed by a processor.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer storage medium, such as a read-only memory, a magnetic disk or an optical disk.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A convolutional neural network pooling calculation method is characterized by comprising the following steps:
receiving data input in a one-dimensional row vector form from the convolutional layer output and storing the data input in a pooling computing unit;
establishing a line pooling cache, reading data of each line pooling calculation from line vector input data, and sequentially performing line pooling calculation on each line vector input data based on a line vector sliding window to obtain line pooling output data and storing the line pooling output data in the line pooling cache;
and responding to the line pooling output data in the line pooling cache to meet the required data size of column pooling calculation, reading data of each column pooling calculation from the line pooling cache, and sequentially performing column pooling calculation on the line pooling output data based on a column vector sliding window to obtain column pooling output data.
2. The method of claim 1, wherein receiving and storing data input in the form of one-dimensional row vectors from convolutional layer output to a pooling computation unit, further comprises:
and establishing an edge data cache, and storing the edge data of the line vector input data into the edge data cache.
3. The method of claim 2, wherein reading data from the row vector input data for each row pooling calculation further comprises:
and responding to the data read by the current line pooling calculation containing the edge data of the last line vector input data, and reading the edge data of the current pooling calculation from the edge data cache.
4. The method of claim 3, further comprising:
and in response to the completion of the calculation of the edge data of the previous line vector input data, storing the edge data of the line vector input data in which the line pooling calculation is positioned in an edge data cache.
5. The method of claim 1, wherein reading data from the row pooled output data for each column pooled calculation comprises:
and reading the row pooling output data of each column pooling calculation from the row pooling output data and storing the row pooling output data into a row pooling cache in the form of an input feature map.
6. The method of claim 5, wherein performing column pooling calculations on the row-pooled output data sequentially based on a column vector sliding window to obtain column-pooled output data comprises:
and performing row-column pooling calculation on the data read this time and the data in the row pooling cache based on the column vector sliding window to obtain the row-column pooling output data in response to that the number of the read row-pooling output data reaches the number of preset row input feature map data.
7. The method of claim 6, wherein in response to the number of the read row-pooled output data reaching the number of preset row-input feature map data, performing a row-pooling calculation on the current read data and data in a row-pooled buffer based on the column vector sliding window to obtain a current column-pooled output data, further comprising:
and storing the row pooling output data of the current row pooling calculation into the row pooling cache for the next row pooling calculation to obtain the next row pooling calculation output data.
8. A convolutional neural network pooled computing system, comprising:
an input module configured to receive data input in the form of one-dimensional row vectors from the convolutional layer output and store to the pooling computing unit;
the line pooling computing module is configured to establish a line pooling cache, read data of each line pooling computing from the line vector input data, and sequentially perform line pooling computing on each line vector input data based on a line vector sliding window to obtain line pooling output data and store the line pooling output data in the line pooling cache;
a column pooling calculation module configured to, in response to row pooling output data in the row pooling cache satisfying a required data size for column pooling calculation, read data for each column pooling calculation from the row pooling cache, and sequentially perform column pooling calculation on the row pooling output data based on a column vector sliding window to obtain column pooling output data.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.
10. A computer storage medium storing a computer program, characterized in that the computer program, when executed by a processor, performs the steps of the method according to any of claims 1-7.
CN202111056544.6A 2021-09-09 2021-09-09 Convolutional neural network pooling calculation method, system and storage medium Active CN113743587B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111056544.6A CN113743587B (en) 2021-09-09 2021-09-09 Convolutional neural network pooling calculation method, system and storage medium
PCT/CN2022/078265 WO2023035557A1 (en) 2021-09-09 2022-02-28 Convolutional neural network pooling calculation method and system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111056544.6A CN113743587B (en) 2021-09-09 2021-09-09 Convolutional neural network pooling calculation method, system and storage medium

Publications (2)

Publication Number Publication Date
CN113743587A true CN113743587A (en) 2021-12-03
CN113743587B CN113743587B (en) 2024-02-13

Family

ID=78737555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111056544.6A Active CN113743587B (en) 2021-09-09 2021-09-09 Convolutional neural network pooling calculation method, system and storage medium

Country Status (2)

Country Link
CN (1) CN113743587B (en)
WO (1) WO2023035557A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049885A (en) * 2022-08-16 2022-09-13 之江实验室 Storage and calculation integrated convolutional neural network image classification device and method
WO2023035557A1 (en) * 2021-09-09 2023-03-16 苏州浪潮智能科技有限公司 Convolutional neural network pooling calculation method and system, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763612A (en) * 2018-04-02 2018-11-06 复旦大学 A kind of pond layer of neural network accelerates the method and circuit of operation
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN111931918A (en) * 2020-09-24 2020-11-13 深圳佑驾创新科技有限公司 Neural network accelerator
CN112100514A (en) * 2020-08-31 2020-12-18 浙江工业大学 Social network service platform friend recommendation method based on global attention mechanism representation learning
WO2020258529A1 (en) * 2019-06-28 2020-12-30 东南大学 Bnrp-based configurable parallel general convolutional neural network accelerator
CN113361695A (en) * 2021-06-30 2021-09-07 南方电网数字电网研究院有限公司 Convolutional neural network accelerator
WO2023035557A1 (en) * 2021-09-09 2023-03-16 苏州浪潮智能科技有限公司 Convolutional neural network pooling calculation method and system, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11507831B2 (en) * 2020-02-24 2022-11-22 Stmicroelectronics International N.V. Pooling unit for deep learning acceleration

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN108763612A (en) * 2018-04-02 2018-11-06 复旦大学 A kind of pond layer of neural network accelerates the method and circuit of operation
WO2020258529A1 (en) * 2019-06-28 2020-12-30 东南大学 Bnrp-based configurable parallel general convolutional neural network accelerator
CN112100514A (en) * 2020-08-31 2020-12-18 浙江工业大学 Social network service platform friend recommendation method based on global attention mechanism representation learning
CN111931918A (en) * 2020-09-24 2020-11-13 深圳佑驾创新科技有限公司 Neural network accelerator
CN113361695A (en) * 2021-06-30 2021-09-07 南方电网数字电网研究院有限公司 Convolutional neural network accelerator
WO2023035557A1 (en) * 2021-09-09 2023-03-16 苏州浪潮智能科技有限公司 Convolutional neural network pooling calculation method and system, and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023035557A1 (en) * 2021-09-09 2023-03-16 苏州浪潮智能科技有限公司 Convolutional neural network pooling calculation method and system, and storage medium
CN115049885A (en) * 2022-08-16 2022-09-13 之江实验室 Storage and calculation integrated convolutional neural network image classification device and method
CN115049885B (en) * 2022-08-16 2022-12-27 之江实验室 Storage and calculation integrated convolutional neural network image classification device and method

Also Published As

Publication number Publication date
WO2023035557A1 (en) 2023-03-16
CN113743587B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN110458279B (en) FPGA-based binary neural network acceleration method and system
CN108133270B (en) Convolutional neural network acceleration method and device
US20230229931A1 (en) Neural processing apparatus and method with neural network pool processing
CN111401406B (en) Neural network training method, video frame processing method and related equipment
US11989638B2 (en) Convolutional neural network accelerating device and method with input data conversion
CN111310904A (en) Apparatus and method for performing convolutional neural network training
CN113743587A (en) Convolutional neural network pooling calculation method, system and storage medium
CN111414994A (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
US20220083857A1 (en) Convolutional neural network operation method and device
CN108717571B (en) Acceleration method and device for artificial intelligence
CN111582465B (en) Convolutional neural network acceleration processing system and method based on FPGA and terminal
US20190311266A1 (en) Device and method for artificial neural network operation
CN108875914B (en) Method and device for preprocessing and post-processing neural network data
CN112668708A (en) Convolution operation device for improving data utilization rate
CN112005251A (en) Arithmetic processing device
KR20230081697A (en) Method and apparatus for accelerating dilatational convolution calculation
JP2023541350A (en) Table convolution and acceleration
CN114764615A (en) Convolution operation implementation method, data processing method and device
CN109685208B (en) Method and device for thinning and combing acceleration of data of neural network processor
CN109598335B (en) Two-dimensional convolution pulse array structure and implementation method
CN115238863A (en) Hardware acceleration method, system and application of convolutional neural network convolutional layer
CN114003201A (en) Matrix transformation method and device and convolutional neural network accelerator
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN111882028B (en) Convolution operation device for convolution neural network
CN112905526B (en) FPGA implementation method for multiple types of convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant