CN113379046B - Acceleration calculation method for convolutional neural network, storage medium and computer equipment - Google Patents

Acceleration calculation method for convolutional neural network, storage medium and computer equipment Download PDF

Info

Publication number
CN113379046B
CN113379046B CN202010158212.8A CN202010158212A CN113379046B CN 113379046 B CN113379046 B CN 113379046B CN 202010158212 A CN202010158212 A CN 202010158212A CN 113379046 B CN113379046 B CN 113379046B
Authority
CN
China
Prior art keywords
convolution
elements
data
frame data
reuse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010158212.8A
Other languages
Chinese (zh)
Other versions
CN113379046A (en
Inventor
陈伟光
王峥
喻之斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010158212.8A priority Critical patent/CN113379046B/en
Publication of CN113379046A publication Critical patent/CN113379046A/en
Application granted granted Critical
Publication of CN113379046B publication Critical patent/CN113379046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an acceleration calculation method of a convolutional neural network, which comprises the following steps: acquiring a plurality of frames of input data to be processed, wherein at least one group of adjacent frame data exists in the frames of input data, and the adjacent frame data comprises a previous frame data and a subsequent frame data; sequentially performing convolution calculation on the plurality of frames of input data, wherein when the convolution calculation is performed on the next frame data in the adjacent frame data, whether time reuse elements exist in a convolution window of the next frame data and the previous frame data in the adjacent frame data at the same position or not is judged, and the time reuse elements are elements which are in the same position and the same in the convolution window; if the time reuse element exists, the convolution value of the same element which is in the same position as the time reuse element in the convolution window of the previous frame data is used as the convolution value of the time reuse element. By identifying the time reuse elements and the space reuse elements in advance, repeated calculation is avoided, and the operand is saved.

Description

Acceleration calculation method for convolutional neural network, storage medium and computer equipment
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an acceleration calculation method, a computer readable storage medium and computer equipment of a convolutional neural network.
Background
With the advent of the big data age, deep convolutional neural networks with more hidden layers have more complex network structures and stronger feature learning and feature expression capabilities than traditional machine learning. Since the introduction of deep convolutional neural networks, it has achieved significant effort in the fields of computer vision, speech recognition, and natural language processing. In order to enhance the accuracy of the neural network, deeper and deeper network structures are designed, but the number of parameters and the calculation amount are increased sharply, which has high requirements on the data bandwidth and the calculation power of the hardware platform.
Modern neural network accelerators mainly increase computational power by increasing the operating frequency and increasing the number of computing units, which have faced problems such as low utilization of computing units and poor scalability, while increasing the operating frequency and increasing computing units necessarily results in increased system power consumption.
Disclosure of Invention
First, the present invention solves the problems
The invention solves the technical problems that: how to reduce the actual calculation amount to reduce the system power consumption.
(II) the technical proposal adopted by the invention
An acceleration computing method of a convolutional neural network, the acceleration computing method comprising:
acquiring a plurality of frames of input data to be processed, wherein at least one group of adjacent frame data exists in the frames of input data, and the adjacent frame data comprises a previous frame data and a subsequent frame data;
sequentially performing convolution calculation on the plurality of frames of input data, wherein when the convolution calculation is performed on the next frame data in the adjacent frame data, whether time reuse elements exist in a convolution window of the next frame data and the previous frame data in the adjacent frame data at the same position or not is judged, and the time reuse elements are elements which are in the same position and the same in the convolution window;
if the time reuse element exists, the convolution value of the same element which is in the same position as the time reuse element in the convolution window of the previous frame data is used as the convolution value of the time reuse element.
Preferably, the acceleration calculation method further includes:
and judging whether all elements in the convolution window of the next frame of data are time reuse elements, and if not, performing convolution calculation on other elements except the time reuse elements in the convolution window of the next frame of data.
Preferably, the specific method for performing convolution calculation on the previous frame data in the adjacent frame data comprises the following steps:
judging whether a space reuse element exists in an N-th convolution window of the previous frame data, wherein 1<N is less than or equal to M, M is the last convolution window of the previous frame data, and the space reuse element is an element which is the same as at least one convolution window in the N-th convolution window and the previous N-1 convolution windows and is positioned at the same position;
if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows is used as the convolution value of the spatial reuse element.
Preferably, the acceleration calculation method further includes:
and judging whether all elements of the N-th convolution window are spatial reuse elements, and if not, carrying out convolution calculation on other elements except the spatial reuse elements in the N-th convolution window.
Preferably, the acceleration calculation method further includes:
judging whether all elements in an N-th convolution window of the next frame of data are time reuse elements, wherein 1<N is less than or equal to M, and M is the last convolution window of the next frame of data;
if not, judging whether a spatial reuse element exists in an N-th convolution window of the later frame data except for a time reuse element, wherein the spatial reuse element is an element which is the same as at least one convolution window in the former N-1 convolution windows of the later frame data and is positioned at the same position;
if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows is used as the convolution value of the spatial reuse element.
Preferably, the specific method for performing convolution calculation on each frame of input data except for adjacent frame data in the plurality of frames of input data includes:
judging whether a space reuse element exists in an N-th convolution window of each frame of input data, wherein 1<N is less than or equal to M, M is the last convolution window of each frame of input data, and the space reuse element is an element which is the same in the same position as at least one convolution window in the N-1 th convolution window;
if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows of the input data of each frame is taken as the convolution value of the spatial reuse element.
Preferably, the acceleration calculation method further includes:
and judging whether all elements of the N-th convolution window are spatial reuse elements, and if not, carrying out convolution calculation on other elements except the spatial reuse elements in the N-th convolution window.
Preferably, the specific method for acquiring the to-be-processed frames of input data comprises the following steps:
extracting frames from the original video data according to a preset frame rate to obtain a plurality of picture data;
and respectively preprocessing the plurality of picture data to generate a plurality of frames of input data of the convolutional neural network.
The invention also discloses a computer readable storage medium, which stores an acceleration calculation program of the convolutional neural network, and the acceleration calculation method of the convolutional neural network is realized when the acceleration calculation program of the convolutional neural network is executed by a processor.
The invention also discloses a computer device, which comprises a computer readable storage medium, a processor and an acceleration calculation program of the convolutional neural network stored in the computer readable storage medium, wherein the acceleration calculation method of the convolutional neural network is realized when the acceleration calculation program of the convolutional neural network is executed by the processor.
(III) beneficial effects
The invention discloses an acceleration calculation method of a convolutional neural network, which has the following technical effects compared with the traditional calculation method:
(1) The method can simultaneously utilize the time correlation and the space correlation, and can eliminate more unnecessary calculation.
(2) The method can also achieve acceleration for a single input, but cannot accelerate only by using time correlation.
(3) The parallelism in the time dimension is increased, more multiplexing is achieved by the weight, and the throughput is larger.
(4) The method has good expandability, and can select the parallelism of three dimensions according to the richness of the physical resources on the chip, thereby achieving the effect of making the best use of things. In conclusion, the method can effectively accelerate the operation of the convolutional neural network, and achieve the aim of reducing energy consumption.
Drawings
FIG. 1 is a flow chart of a method of accelerating computation of a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a calculation process in a time dimension according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a calculation process in a spatial dimension according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method of accelerating computation of a convolutional neural network according to another embodiment of the present invention;
fig. 5 is a diagram of a space-time multiplexing architecture according to an embodiment of the present invention;
fig. 6 is a functional block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Before describing in detail the various embodiments of the present application, the inventive concepts of the present application are briefly described first: in the traditional convolutional neural network calculation, a certain size of convolutional windows are utilized to sequentially slide in image data to be processed, then point multiplication calculation is carried out on elements selected by each convolutional window and weights of convolutional kernels in one-to-one correspondence, and point multiplication results are added to finish operation of each convolutional window. Each frame of data adopts the calculation method until the convolution calculation of all input data is completed. When the data volume is large, the convolution process can generate huge calculation amount, and the data bandwidth and calculation of the operation platform are high. In the research process, it is found that more identical data exist for the front and rear frames of picture data, namely, identical data exist in convolution windows at the same positions of two frames of pictures, and the application is based on the fact that when convolution operation is carried out on each frame of data, whether the data in the convolution window are already present in the corresponding convolution window in the previous frame of data is judged first, if so, dot multiplication operation is not carried out on the already present data, and the dot multiplication value already calculated in the previous frame of data is directly used as the dot multiplication value of the current frame of data, so that a large amount of multiplication operations are saved, repeated calculation is avoided, and operation resources are saved.
Specifically, as shown in fig. 1, the acceleration calculation method of the convolutional neural network of the present application includes the following steps:
step S10: and acquiring a plurality of frames of input data to be processed, wherein at least one group of adjacent frame data exists in the plurality of frames of input data, and the adjacent frame data comprises the previous frame data and the next frame data.
As a preferred embodiment, the specific method for acquiring the input data of the frames to be processed is as follows: extracting frames from the original video data according to a preset frame rate to obtain a plurality of picture data; and respectively preprocessing the plurality of picture data to generate a plurality of frames of input data of the convolutional neural network. The preprocessing includes scaling and normalization, which are all prior art, and are not described herein. Further, the adjacent frame data includes a previous frame data and a next frame data, which are feature data extracted from two adjacent frames of pictures, respectively, and the two adjacent frames of pictures have high similarity due to the general continuity of the video image, so that repeated data exists between the previous frame data and the next frame data.
Step S20: and carrying out convolution calculation on the plurality of frames of input data in sequence, wherein when carrying out convolution calculation on the next frame data in the adjacent frame data, judging whether time reuse elements exist in a convolution window of the next frame data and the previous frame data in the adjacent frame data at the same position, and the time reuse elements are elements which are in the same position and the same in the convolution window.
Specifically, a conventional convolution calculation method is adopted to sequentially perform convolution processing on input data of each frame. When processing the following frame data, the former frame data is convolved, so as to reduce the operation amount of the following frame dataAnd judging whether a time reuse element exists in the convolution window of the next frame data and the previous frame data in the adjacent frame data at the same position, wherein the time reuse element is the same element at the same position in the convolution window. For example, as shown in FIG. 2, the previous frame data and the next frame data each include 9 data, wherein the first convolution window of the previous frame data includes x 0 、y 0 、g 0 、h 0 Four elements, the first convolution window of the next frame of data comprising x 1 、y 1 、g 1 、h 1 Four elements, y 1 And y is 0 At the same position of the convolution window, if x 1 And x 0 Identical, then x 1 For time reuse elements, similarly, at the same position of the convolution window, if y 1 And y is 0 Identical, then y 1 For time reuse elements, and so on, the above-described judgment is performed in advance for each element of each convolution window of the following frame data to identify time reuse elements. It should be noted that the example of fig. 2 is merely exemplary, and in other embodiments, the size of the convolution windows may be different, i.e., a different number of elements may be selected for each convolution window.
Step S30: if the time reuse element exists, the convolution value of the same element which is in the same position as the time reuse element in the convolution window of the previous frame data is used as the convolution value of the time reuse element.
Illustratively, as shown in fig. 2, assuming that the convolution kernel includes four weight values a, b, c, d, in the conventional convolution calculation method, the calculation result of the first convolution window of the previous frame data is: o (O) 11 =a*x 0 +b*y 0 +c*g 0 +d*h 0 The calculation result of the first convolution window of the data of the following frame is: o (O) 21 =a*x 1 +b*y 1 +c*g 1 +d*h 1 After the calculation method is adopted, the calculation result of the first convolution window of the next frame of data is as follows: o (O) 21 =a*x 1 +b*y 1 +c*g 1 +d*h 1 =O 11 +a*(x 1 -x 0 )+b*(y 1 -y 0 )+c*(g 1 -g 0 )+d*(h 1 -h 0 ) Let x in the first convolution window of the next frame of data 1 And y 1 For time reuse of elements, g 1 And h 1 Not time reuse element, then a (x 1 -x 0 ) And b (y 1 -y 0 ) The result of both are zero, i.e. x is skipped 1 And a, y 1 Point multiplication with b, directly using x 0 And a, y 0 And the point multiplication value of b, so that for convolution calculation of a first convolution window of the data of the following frame, two times of multiplication operations are reduced, and the whole operation amount is saved.
Further, the acceleration method further includes: and judging whether all elements in the convolution window of the next frame of data are time reuse elements, and if not, performing convolution calculation on other elements except the time reuse elements in the convolution window of the next frame of data. Illustratively, as shown in FIG. 2 and as can be seen in conjunction with the description above, assume x in the first convolution window of the next frame of data 1 And y 1 For time reuse of elements, g 1 And h 1 Not time reuse elements, i.e. c (g 1 -g 0 ) And d (h 1 -h 0 ) These two terms are not zero, for g 1 And h 1 And normal convolution operation is carried out, so that complete calculation of each convolution window can be realized, and meanwhile, the data reuse of the time dimension is realized, the calculation speed is increased, the operation amount is reduced, and the energy consumption of the system is reduced.
As a preferred embodiment, in order to further reduce the calculation amount, a specific method of performing convolution calculation on previous frame data in adjacent frame data includes the steps of:
step S11: judging whether a space reuse element exists in an N-th convolution window of the previous frame data, wherein 1<N is less than or equal to M, M is the last convolution window of the previous frame data, and the space reuse element is an element which is the same in the same position as at least one convolution window in the N-1 th convolution window.
Illustratively, as shown in FIG. 3, the 1 st convolution window of the previous frame data includes x 0 、y 0 、g 0 、h 0 Four elements, the 2 nd convolution window including y 0 、z 0 、h 0 、I 0 Wherein y is 0 And x 0 In the same position of the convolution window, if y 0 And x 0 Identical, then y 0 For time reuse of elements, similarly, where y 0 And x 0 In the same position of the convolution window, if y 0 And x 0 Identical, then y 0 For time reuse elements, and so on, the above-described determination is made in advance for each element of the second convolution window of the previous frame data to identify a space reuse element.
As another embodiment, the 3 rd convolution window of the previous frame data includes g 0 、h 0 、m 0 、n 0 Four elements, g 0 And x 0 、y 0 All at the same position of the convolution window, if g 0 And x 0 、y 0 At least one of them is the same, g 0 For time reuse elements, similarly, where h 0 And y is 0 、z 0 All at the same position of the convolution window, if h 0 And y is 0 、z 0 At least one of them is the same, h 0 For spatial reuse of elements, and so on, the above-described determination is made in advance for each element of the third convolution window of the previous frame data to identify the spatial reuse element.
Step S12: if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows is used as the convolution value of the spatial reuse element.
By way of example, using a conventional convolution calculation method, the calculation result of the first convolution window is: o (O) 11 =a*x 0 +b*y 0 +c*g 0 +d*h 0 The calculation result of the second convolution window is: o (O) 12 =a*y 0 +b*z 0 +c*h 0 +d*I 0 After the calculation method is adopted, the calculation result of the second convolution window is as follows: o (O) 12 =a*y 0 +b*z 0 +c*h 0 +d*I 0 =O 11 +a*(y 0 -x 0 )+b*(z 0 -y 0 )+c*(h 0 -g 0 )+d*(I 0 -h 0 ) Let y in the second convolution window 0 And z 0 For time reuse of elements, h 0 And I 0 Not a time reuse element, then a (y 0 -x 0 )+b*(z 0 -y 0 ) The result of both are zero, i.e. y is skipped 0 And a, z 0 Point multiplication with b, directly using x 0 And a, y 0 And the point multiplication value of b, so that for convolution calculation of a second convolution window, two multiplication operations are reduced, and the whole operation amount is saved. Similarly, in other embodiments, a similar calculation method may be used for the third convolution window, the nth convolution window of the fourth convolution window … …, and the calculation amount may be saved.
Further, the acceleration method further includes: and judging whether all elements of the N-th convolution window are spatial reuse elements, and if not, carrying out convolution calculation on other elements except the spatial reuse elements in the N-th convolution window. Illustratively, as shown in FIG. 3 and as will be appreciated in conjunction with the description above, assume y in the second convolution window 0 、z 0 To spatially reuse elements, h 0 、I 0 Not time reuse elements, i.e. c (h 0 -g 0 )+d*(I 0 -h 0 ) The two terms are not zero, for h 0 、I 0 And the normal convolution operation is respectively carried out, so that the complete calculation of the second convolution window can be realized, the data reuse of the space dimension is realized, the calculation speed is accelerated, the operation amount is reduced, and the energy consumption of the system is reduced.
Further, the specific method for performing convolution calculation on each frame of input data except for adjacent frame data in the plurality of frames of input data comprises the following steps: judging whether a space reuse element exists in an N-th convolution window of each frame of input data, wherein 1<N is less than or equal to M, M is the last convolution window of each frame of input data, and the space reuse element is an element which is the same in the same position as at least one convolution window in the N-1 th convolution window; if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows of the input data of each frame is taken as the convolution value of the spatial reuse element. The specific method can refer to step S11 and step S12, and will not be described herein.
In another embodiment, to further enhance the affinity of the time dimension and the space dimension in the calculation process, as shown in fig. 4, the acceleration method further includes:
step S40: and judging whether all elements in the N-th convolution window of the next frame of data are time reuse elements, wherein 1<N is less than or equal to M, and M is the last convolution window of the next frame of data.
Specifically, the determining method in the step S10 refers to the step S11, and will not be described herein.
Step S50: if not, judging whether a spatial reuse element exists in the N-th convolution window of the later frame data except for the time reuse element, wherein the spatial reuse element is an element which is the same as at least one convolution window in the N-1 previous convolution windows of the later frame data and is positioned at the same position.
Specifically, the determining method in step S50 refers to step S11, and will not be described in detail herein. Illustratively, y is the second convolution window for the next frame of data 1 、z 1 、h 1 、I 1 Of the four elements, assume y 1 And z 1 For time reuse of elements, h 1 And I 1 Not a time reuse element, then for h 1 And I 1 Further, it is determined whether the element is a spatial reuse element, and the determination is made by referring to the method of step S11.
Step S60: if the spatial reuse element exists, the convolution value of the same element which is positioned at the same position as the spatial reuse element in the first N-1 convolution windows is used as the convolution value of the spatial reuse element.
Illustratively, let h be the second convolution window for the next frame of data 1 Reusing elements and I for space 1 Not a spatially reused element, the first convolution window of the next frame of dataIntermediate and h 1 The same element g at the same position 1 Is a convolution value c x g of (2) 1 H as 1 Convolution value for element I 1 Then a normal point multiplication operation is performed such that two time reuse elements y in the second convolution window for the next frame of data 1 And z 1 And a spatial reuse element h 1 No repeated dot product operation is performed, but only the element I is needed 1 The normal point multiplication operation is performed, three times of point multiplication operations are reduced compared with the traditional convolution calculation method, so that the calculated amount is greatly reduced, and the system resources are saved.
The acceleration calculation method of the convolutional neural network in the embodiment avoids repeated calculation of the time reuse element and the space reuse element by identifying the time reuse element and the space reuse element in advance, and saves the operand.
As a preferred embodiment, in order to more vividly embody the combined application process of the time dimension and the space dimension in the acceleration calculation method of the convolutional neural network of the embodiment, a space-time multiplexing architecture model is designed, as shown in fig. 5. The architecture is a three-dimensional computing unit (PE) array, and for simplicity of representation, the parallelism of the time dimension in the figure is 4, the parallelism of the space dimension is 4, and the parallelism of the channel dimension is 2. This represents that the architecture can process 4 input pictures simultaneously to take advantage of temporal correlation, 4 sliding windows per input picture can be processed simultaneously to take advantage of spatial correlation, each sliding window being dot multiplied simultaneously with 2 weights to increase output parallelism. And sharing the sliding window among the PEs of each column and sharing the weight among the PEs of each row on the same time dimension, namely a two-dimensional plane, so as to achieve multiplexing of the sliding window and the weight. The calculation methods of the time dimension and the space dimension have been described in the foregoing, and are not described herein.
The application also discloses a computer readable storage medium, wherein the computer readable storage medium stores an acceleration calculation program of the convolutional neural network, and the acceleration calculation method of the convolutional neural network is realized when the acceleration calculation program of the convolutional neural network is executed by a processor.
The application also discloses a computer device, which comprises a processor 12, an internal bus 13, a network interface 14 and a computer readable storage medium 11 at the hardware level as shown in fig. 6. The processor 12 reads the corresponding computer program from the computer-readable storage medium and then runs to form the request processing means at a logic level. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device. The computer readable storage medium 11 stores an acceleration calculation program of a convolutional neural network, which when executed by a processor, implements the acceleration calculation method of a convolutional neural network described above.
Computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
While certain embodiments have been shown and described, it would be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.

Claims (4)

1. An acceleration computing method of a convolutional neural network, characterized in that the acceleration computing method comprises:
acquiring a plurality of frames of input data to be processed, wherein at least one group of adjacent frame data exists in the frames of input data, and the adjacent frame data comprises a previous frame data and a subsequent frame data;
the method comprises the steps of sequentially carrying out convolution calculation on a plurality of frames of input data, wherein when carrying out the convolution calculation on the next frame data in adjacent frame data, judging whether time reuse elements exist in a convolution window of the next frame data and the previous frame data in the adjacent frame data at the same position, wherein the time reuse elements are elements which are positioned at the same position and are the same in the convolution window, the convolution window is a pixel area which corresponds to a convolution kernel and contains a plurality of elements, and the time reuse elements are two elements which are positioned at the same position and are positioned at the same two adjacent frames in the same pixel area;
if the time reuse element exists, the convolution value of the same element which is positioned at the same position as the time reuse element in the convolution window of the previous frame data is used as the convolution value of the time reuse element;
the specific method for carrying out convolution calculation on the previous frame data in the adjacent frame data comprises the following steps:
judging whether a space reuse element exists in an N-th convolution window of the previous frame data, wherein 1<N is less than or equal to M, M is the last convolution window of the previous frame data, the space reuse element is an element which is the same as at least one convolution window in the N-th convolution window and the N-1 convolution windows and is positioned at the same position, and the space reuse element is two elements which correspond to the same weight and are the same in a convolution kernel in two adjacent pixel areas;
if the space reuse element exists, the convolution value of the element which is positioned at the same position as the space reuse element in the first N-1 convolution windows is used as the convolution value of the space reuse element;
the acceleration calculation method further includes: judging whether all elements in a convolution window of the next frame of data are time reuse elements, if not, carrying out convolution calculation on other elements except the time reuse elements in the convolution window of the next frame of data;
the acceleration calculation method further includes: judging whether all elements of the nth convolution window are spatial reuse elements, if not, carrying out convolution calculation on other elements except the spatial reuse elements in the nth convolution window;
the acceleration calculation method further includes: judging whether all elements in an N-th convolution window of the next frame of data are time reuse elements, wherein 1<N is less than or equal to M, and M is the last convolution window of the next frame of data;
if not, judging whether a spatial reuse element exists in an N-th convolution window of the later frame data except for a time reuse element, wherein the spatial reuse element is an element which is the same as at least one convolution window in the former N-1 convolution windows of the later frame data and is positioned at the same position;
if the space reuse element exists, the convolution value of the element which is positioned at the same position as the space reuse element in the first N-1 convolution windows is used as the convolution value of the space reuse element;
the specific method for carrying out convolution calculation on each frame of input data except adjacent frame data in the plurality of frames of input data comprises the following steps:
judging whether a space reuse element exists in an N-th convolution window of each frame of input data, wherein 1<N is less than or equal to M, M is the last convolution window of each frame of input data, and the space reuse element is an element which is the same in the same position as at least one convolution window in the N-1 th convolution window;
if the space reuse element exists, the convolution value of the same element which is positioned at the same position as the space reuse element in the first N-1 convolution windows of the input data of each frame is used as the convolution value of the space reuse element;
the acceleration calculation method further includes:
and judging whether all elements of the N-th convolution window are spatial reuse elements, and if not, carrying out convolution calculation on other elements except the spatial reuse elements in the N-th convolution window.
2. The acceleration computing method of a convolutional neural network according to claim 1, wherein the specific method for acquiring a plurality of frames of input data to be processed comprises:
extracting frames from the original video data according to a preset frame rate to obtain a plurality of picture data;
and respectively preprocessing the plurality of picture data to generate a plurality of frames of input data of the convolutional neural network.
3. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an acceleration calculation program of a convolutional neural network, which when executed by a processor, implements the acceleration calculation method of a convolutional neural network according to any one of claims 1 to 2.
4. A computer device comprising a computer readable storage medium, a processor and an acceleration calculation program of a convolutional neural network stored in the computer readable storage medium, which when executed by the processor, implements the acceleration calculation method of a convolutional neural network according to any one of claims 1 to 2.
CN202010158212.8A 2020-03-09 2020-03-09 Acceleration calculation method for convolutional neural network, storage medium and computer equipment Active CN113379046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010158212.8A CN113379046B (en) 2020-03-09 2020-03-09 Acceleration calculation method for convolutional neural network, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010158212.8A CN113379046B (en) 2020-03-09 2020-03-09 Acceleration calculation method for convolutional neural network, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN113379046A CN113379046A (en) 2021-09-10
CN113379046B true CN113379046B (en) 2023-07-11

Family

ID=77568550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010158212.8A Active CN113379046B (en) 2020-03-09 2020-03-09 Acceleration calculation method for convolutional neural network, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN113379046B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2332072A1 (en) * 2008-09-10 2011-06-15 Co-operative Research Centre For Advanced Automotive Technology Ltd. Method and device for computing matrices for discrete fourier transform (dft) coefficients
CN108537330B (en) * 2018-03-09 2020-09-01 中国科学院自动化研究所 Convolution computing device and method applied to neural network
US20190295228A1 (en) * 2018-03-21 2019-09-26 Nvidia Corporation Image in-painting for irregular holes using partial convolutions
CN109726803B (en) * 2019-01-10 2021-06-29 广州小狗机器人技术有限公司 Pooling method, image processing method and device
CN110070178B (en) * 2019-04-25 2021-05-14 北京交通大学 Convolutional neural network computing device and method
CN110705687B (en) * 2019-09-05 2020-11-03 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN110659627A (en) * 2019-10-08 2020-01-07 山东浪潮人工智能研究院有限公司 Intelligent video monitoring method based on video segmentation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765247A (en) * 2018-05-15 2018-11-06 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于快速滤波算法的卷积神经网络加速器设计;王巍,等;电子与信息学报;第41卷(第11期);第2578-2584页 *

Also Published As

Publication number Publication date
CN113379046A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN108765247B (en) Image processing method, device, storage medium and equipment
WO2021098362A1 (en) Video classification model construction method and apparatus, video classification method and apparatus, and device and medium
CN111091045A (en) Sign language identification method based on space-time attention mechanism
CN109522902B (en) Extraction of space-time feature representations
US20220083857A1 (en) Convolutional neural network operation method and device
CN110189260B (en) Image noise reduction method based on multi-scale parallel gated neural network
CN112016682B (en) Video characterization learning and pre-training method and device, electronic equipment and storage medium
US20220036167A1 (en) Sorting method, operation method and operation apparatus for convolutional neural network
CN113689517B (en) Image texture synthesis method and system for multi-scale channel attention network
CN117499658A (en) Generating video frames using neural networks
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
WO2022007265A1 (en) Dilated convolution acceleration calculation method and apparatus
CN113792621B (en) FPGA-based target detection accelerator design method
CN112164008A (en) Training method of image data enhancement network, and training device, medium, and apparatus thereof
CN111738276A (en) Image processing method, device and equipment based on multi-core convolutional neural network
CN117273084A (en) Calculation method and device of neural network model, electronic equipment and storage medium
CN115797835A (en) Non-supervision video target segmentation algorithm based on heterogeneous Transformer
CN111340173A (en) Method and system for training generation countermeasure network for high-dimensional data and electronic equipment
CN113379046B (en) Acceleration calculation method for convolutional neural network, storage medium and computer equipment
CN111542837B (en) Three-dimensional convolutional neural network computing device and related products
CN112529064B (en) Efficient real-time semantic segmentation method
CN109191016B (en) Gauss-Jordan factor table method for fast solving node impedance matrix of power system
CN112905954A (en) CNN model convolution operation accelerated calculation method using FPGA BRAM
Liu Lightweight single image super-resolution by channel split residual convolution
CN113298225A (en) Data processing method, audio noise reduction method and neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant