CN111898743A - CNN acceleration method and accelerator - Google Patents

CNN acceleration method and accelerator Download PDF

Info

Publication number
CN111898743A
CN111898743A CN202010784854.9A CN202010784854A CN111898743A CN 111898743 A CN111898743 A CN 111898743A CN 202010784854 A CN202010784854 A CN 202010784854A CN 111898743 A CN111898743 A CN 111898743A
Authority
CN
China
Prior art keywords
group
feature vector
denotes
index
eigenvector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010784854.9A
Other languages
Chinese (zh)
Inventor
陈乔乔
刘洪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiutian Ruixin Technology Co ltd
Original Assignee
Shenzhen Jiutian Ruixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiutian Ruixin Technology Co ltd filed Critical Shenzhen Jiutian Ruixin Technology Co ltd
Publication of CN111898743A publication Critical patent/CN111898743A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a CNN (convolutional neural network) acceleration method and accelerator, relates to the technical field of convolutional neural networks, and mainly solves the technical problems of complex control, repeated reading and writing and poor expansibility in the conventional CNN. The CNN acceleration method comprises the following steps: inputting initial data, and reading the initial data in sequence to obtain a first feature vector group; multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group; performing partial sum accumulation on the second feature vector group to obtain a third feature vector group; and classifying the third feature vector group to obtain a classification result. The invention has no redundant read-write operation, the data read-write is very friendly to the off-chip DDR, and the read-write efficiency problem does not exist; the invention is easy to expand, and can improve the clock frequency of reading and writing data at the interface side or adopt a multi-bank mode for high computation force requirements; each port is completely independent of the other without any dependency.

Description

CNN acceleration method and accelerator
Technical Field
The invention relates to the technical field of convolutional neural networks, in particular to a CNN (convolutional neural network) acceleration method and accelerator.
Background
Convolutional Neural Networks (CNN) are a type of feed-forward Neural Network that includes convolution calculations and has a deep structure, and are one of the representative algorithms of deep learning (deep learning). Its artificial neuron can respond to peripheral units in a part of coverage range, and has excellent performance for large-scale image processing.
The problems of complex control, repeated read-write operation, certain limitation, poor expansibility and the like exist in the conventional CNN accelerator or algorithm. Therefore, the present invention optimizes the existing technical solutions.
Disclosure of Invention
One of the purposes of the present invention is to provide a CNN acceleration method and accelerator, which solve the technical problems of complex CNN control, repeated read-write and poor expansibility in the prior art. Advantageous effects can be achieved in preferred embodiments of the present invention, as described in detail below.
In order to achieve the purpose, the invention provides the following technical scheme:
the invention relates to a CNN acceleration method, which comprises the following steps:
inputting initial data, and reading the initial data in sequence to obtain a first feature vector group; scanning initial data in sequence from the channel direction, the horizontal direction and the vertical direction to obtain a first feature vector group;
multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group; the method specifically comprises the following steps:
Figure BDA0002621566070000011
wherein ofm _ t represents a second set of feature vectors; ifm denotes a first feature vector group; kernel denotes the convolution kernel;
h1 denotes the vertical direction index of the second feature vector group, H denotes the vertical direction index of the first feature vector group, and H denotes the maximum value of the vertical direction index of the first feature vector group;
w1 denotes the horizontal direction index of the second feature vector group, W denotes the horizontal direction index of the first feature vector group, and W denotes the maximum value of the horizontal direction index of the first feature vector group;
m1 represents the channel direction index of the second feature vector group, M represents the group number index of the kernel convolution kernel, and M represents the maximum value of the channel direction index of the first feature vector group;
i denotes the vertical index of the convolution kernel, HKRepresents the maximum value of the vertical index of the convolution kernel;
j denotes the horizontal index of the convolution kernel, WKRepresents the maximum value of the horizontal index of the convolution kernel;
k denotes the channel direction index of the convolution kernel, CKRepresenting a maximum value of a channel direction index of the convolution kernel;
i1 denotes a row index of the second eigenvector group, and j1 denotes a column index of the second eigenvector group;
accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group; the method specifically comprises the following steps:
accumulating the shifts of the second eigenvector group to obtain a third eigenvector group; the expression formula is as follows:
Figure BDA0002621566070000021
wherein ofm _ F denotes a third feature vector group, ofm _ t denotes a second feature vector group;
h2 denotes a vertical direction index of the third eigenvector group, H1 denotes a vertical direction index of the second eigenvector group, and H1 denotes a maximum value of the vertical direction index of the second eigenvector group;
w2 denotes a horizontal direction index of the third feature vector group, W1 denotes a horizontal direction index of the second feature vector group, and W1 denotes a horizontal direction index maximum value of the second feature vector group;
m2 denotes a channel direction index of the third eigenvector group, M1 denotes a channel direction index of the second eigenvector group, and M1 denotes a channel direction index maximum value of the second eigenvector group;
HK1representing the maximum value, W, in the row direction of the second set of eigenvectorsK1Representing the maximum value in the column direction of the second feature vector group;
s represents window stepping and is set according to actual requirements;
the accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group further includes:
storing the sum of the parts of one window in the row direction in a register;
storing all the windowed parts in the row direction in an on-chip RAM;
classifying the third feature vector group to obtain a classification result; the method specifically comprises the following steps:
and bringing the third feature vector group into a softmax () function, and carrying out classification processing to obtain a classification result.
Further, the multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group includes:
the convolution kernel is a constant vector.
Further, the multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group includes:
the convolution kernel is multiple.
The present invention also includes a computer-readable storage medium having stored thereon a computer program which, when executed, performs the CNN acceleration method as described above.
The invention also includes a CNN accelerator comprising: a processor, and a memory coupled to the processor, the memory having a computer program stored therein, the computer program, when executed by the processor, performing the CNN acceleration method as described above.
The CNN acceleration method and the CNN accelerator provided by the invention at least have the following beneficial technical effects:
the whole control of the invention is simple, only initial data is needed to be scanned in sequence from the channel direction, the horizontal direction and the vertical direction on the input layer, and no complex control is needed, such as window division (s is 1, s is 2, etc.); in addition, the invention can read the feature vector of the initial data once, the feature vector of the output layer can be output on the corresponding channel, and no redundant read-write operation exists. The data reading and writing of the invention is very friendly to the off-chip DDR, and the problem of reading and writing efficiency does not exist. The invention is easy to expand, and can improve the interface side data reading clock or adopt a multi-bank mode for high calculation force requirement; and each port is completely independent of each other without any dependence.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a CNN acceleration method according to the present invention;
FIG. 2 is a schematic diagram of a second feature vector set according to the present invention;
FIG. 3 is a schematic diagram of a first feature vector set obtained by the present invention;
FIG. 4 is a schematic diagram of the structure of the convolution kernel of the present invention;
FIG. 5 is a schematic diagram of the present invention in which partial sum accumulation is performed;
fig. 6 is a schematic structural diagram of a CNN accelerator according to the present invention.
FIG. 1-processor; 2-memory.
Detailed Description
In order that the objects, aspects and advantages of the present invention will become more apparent, various exemplary embodiments will be described below with reference to the accompanying drawings, which form a part hereof, and in which are shown by way of illustration various exemplary embodiments in which the invention may be practiced, and in which like numerals in different drawings represent the same or similar elements, unless otherwise specified. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. It is to be understood that they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims, and that other embodiments may be used, or structural and functional modifications may be made to the embodiments set forth herein, without departing from the scope and spirit of the present disclosure. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
Referring to fig. 1, the present invention provides an embodiment of a CNN acceleration method, which includes:
s1: inputting initial data, and reading the initial data in sequence to obtain a first feature vector group;
s2: multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group;
s3: accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group;
s4: and classifying the third feature vector group to obtain a classification result.
The CNN itself includes three layers, i.e., an input layer, a convolutional layer, and an output layer, which are connected in this order. Inputting initial data in the input layer, reading the initial data in sequence to obtain a first feature vector group, and sending the first feature vector group to the convolutional layer, wherein the initial data is preferably a picture. In the convolutional layer, multiplying and accumulating the convolutional kernel and the first eigenvector group to obtain a second eigenvector group, accumulating partial sums of the second eigenvector group to obtain a third eigenvector group, and sending the third eigenvector group to the output layer; and then, classifying the third feature vector group to obtain a classification result. Wherein, the final classification result can distinguish various objects on the picture, such as cats, dogs, characters, flowers, birds and the like.
The invention provides a CNN acceleration method which is a realization method based on vector x vector and then partial sum accumulation. The implementation process of the partial sum is to accumulate the contribution of each vector x vector of the input feature vector group to the final output feature vector group result on the channel according to the result obtained by each vector x vector of the input feature vector group, and is different from the partial sum of other schemes. Therefore, the invention is optimal in DDR efficiency, internal control logic, energy consumption and computing power expandability.
Referring specifically to fig. 3, S1: inputting initial data, and reading the initial data in sequence to obtain a first feature vector group, wherein the first feature vector group comprises:
and scanning the initial data in sequence from the channel direction, the horizontal direction and the vertical direction to obtain a first feature vector group.
It should be noted that, the initial data (corresponding to feature map in the figure) is sequentially scanned in the order from the channel direction first (corresponding to C direction of the coordinate system in fig. 3), the horizontal direction second (corresponding to W direction of the coordinate system in fig. 3), and the vertical direction last (corresponding to H direction of the coordinate system in fig. 3), so as to obtain a first feature vector group; the feature vectors of each subdata in the initial data are sequentially scanned from top to bottom and from left to right, and the feature vectors of each subdata are integrated into a first feature vector group. For example: the initial data is a picture, the information of each point on the picture is sequentially read from top to bottom and from left to right, the information of each point in all channel directions forms a feature vector, and a plurality of different feature vectors can be obtained by traversing the whole picture. Namely, scanning the picture according to the CWH direction in sequence, and converting the picture into a first feature vector group consisting of single feature vectors with different coordinates;
the feature vectors are scanned starting from when H is 0:
vector 0 includes the following feature vectors: H0W0C0, H0W0C1, H0W0c2.. H0W0 Cn-1;
vector 1 includes the following feature vectors: H0W1C0, H0W1C1, H0W1C2.. H0W1 Cn-1;
the vectors 2, 3, 4. H0Wn-1C0, H0Wn-1C1.... H0Wn-1 Cn-1;
then, starting to scan the feature vectors when H is 1:
vector 0 includes the following feature vectors: H1W0C0, H1W0C1, h1w0c2.. H1W0 Cn-1;
vector 1 includes the following feature vectors: H1W1C0, H1W1C1, h1w1c2.. H1W1 Cn-1;
the vectors 2, 3, 4. H1Wn-1C0, H1Wn-1C1.... H1Wn-1 Cn-1;
and then, when the H is 2 and the H is 3.
The invention only needs to scan each eigenvector included in the initial data in the input layer according to the sequence of the channel direction, the horizontal direction and the vertical direction, does not need any complex control and has the characteristic of simple control.
Referring specifically to fig. 2 and 4, S2: multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group, wherein the steps of:
the convolution kernel is a constant vector; the convolution kernel is plural.
It should be noted that the convolution kernel employs: the constant vector is trained and shaped by the disclosed mass data. For example: the convolution kernel can be extracted from a public-net-yolov 3 model trained on the public-deep learning model above the public-massive image library (ImageNet).
As shown in fig. 4, one convolution kernel (taking kernel _ group0 in fig. 4 as an example) itself is also a vector group, and the vectors in the convolution kernel are also read by scanning in the order of channel direction (C direction) first, horizontal direction (W direction) second, and vertical direction (H direction) last.
Also taking the kernel _ group0 convolution kernel in fig. 4 as an example, scanning in the CWH direction to obtain the following vector group:
when the first H is 0, the second,
vector 0 includes: H0W0C0, H0W0C1.. H0W0Cn-1,
vector 1 includes: H0W1C0, H0W1C1.
Vectors 2, 3, 4. H0Wn-1C0, H0Wn-1C1.... H0Wn-1 Cn-1; and then when H is respectively 1, 2 and 3.
If there are a plurality of convolution kernels, the number of convolution kernels in the present invention is M, and the convolution kernels are kernel _ group0, kernel _ group1, and kernel _ group2.... kernel _ group pm-1, respectively.
As shown in fig. 2, after the convolution kernel and the first eigenvector group are multiplied and accumulated, a second eigenvector group is obtained and sent to the register.
S2: multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group, which specifically comprises:
Figure BDA0002621566070000071
the simplified expression is:
Figure BDA0002621566070000072
wherein ofm _ t represents a second set of feature vectors; ifm denotes a first feature vector group; kernel denotes the convolution kernel;
h1 denotes the vertical direction index of the second feature vector group, H denotes the vertical direction index of the first feature vector group, and H denotes the maximum value of the vertical direction index of the first feature vector group;
w1 denotes the horizontal direction index of the second feature vector group, W denotes the horizontal direction index of the first feature vector group, and W denotes the maximum value of the horizontal direction index of the first feature vector group;
m1 represents the channel direction index of the second feature vector group, M represents the group number index of the kernel convolution kernel, and M represents the maximum value of the channel direction index of the first feature vector group, wherein M is equal to the group number index of the kernel convolution kernel;
i denotes the vertical index of the convolution kernel, HKRepresents the maximum value of the vertical index of the convolution kernel;
j denotes the horizontal index of the convolution kernel, WKRepresents the maximum value of the horizontal index of the convolution kernel;
k denotes the channel direction index of the convolution kernel, CKRepresenting a maximum value of a channel direction index of the convolution kernel; wherein the channel direction index of the convolution kernel is equal to the channel direction index of the first eigenvector group;
i1 denotes a row index of the second eigenvector group, and j1 denotes a column index of the second eigenvector group.
The specific algorithm is as follows:
Figure BDA0002621566070000073
Figure BDA0002621566070000081
wherein ofm _ t represents a feature vector of the second feature vector group of the present invention, ifm represents a feature vector of the first feature vector group of the present invention, kernel represents a convolution kernel, h represents a vertical direction index of the second feature vector group, w represents a horizontal direction index of the second feature vector group, m represents a channel direction index of the second feature vector group, and m is equal to a group number index of the kernel convolution kernel; i denotes a vertical direction index of the kernel, j denotes a horizontal direction index of the kernel, k denotes a channel direction index of the kernel, and k is equal to the channel direction index of the first eigenvector group.
Note that ofm _ t represents the partial sum of the output feature vector group (ofm) compared to the standard convolution formula, where i x j partial sums are added compared to the final ofm, i.e., the w x h size of the convolution kernel; it can also be seen from the above formula that the ifm herein does not require jump address reading, but only sequential reading. The standard convolution formula is as follows:
Figure BDA0002621566070000091
wherein ofm denotes the output feature vector set, and ifm denotes the first feature vector set (also called the input feature vector set); kernel denotes the convolution kernel;
h0 denotes the vertical direction index of the output feature vector group, H denotes the vertical direction index of the input feature vector group, and H denotes the maximum value of the vertical direction index of the input feature vector group;
w0 denotes the horizontal direction index of the output feature vector group, W denotes the horizontal direction index of the input feature vector group, and W denotes the maximum value of the horizontal direction index of the input feature vector group;
m0 represents the channel direction index of the output feature vector group, M represents the group number index of the kernel convolution kernel, and M represents the maximum value of the channel direction index of the input feature vector group;
i denotes the vertical index of kernel, HKRepresents the maximum value of the vertical index of the convolution kernel;
j denotes the horizontal index of kernel, WKRepresents the maximum value of the horizontal index of the convolution kernel;
k denotes the channel direction index of kernel, CKRepresenting a maximum value of a channel direction index of the convolution kernel; wherein k is the same as the channel direction index of the input feature vector group;
s denotes a windowing step.
The specific algorithm is as follows:
Figure BDA0002621566070000092
Figure BDA0002621566070000101
wherein ofm denotes an output feature vector (output feature map), ifm denotes an input feature vector (input feature map), kernel denotes a convolution kernel, h denotes a vertical direction index of the output feature vector, w denotes a horizontal direction index of the output feature vector, and m denotes a channel direction index of the output feature vector, where m is equal to a group number index of the kernel convolution kernel; i represents the vertical direction index of the kernel, j represents the horizontal direction index of the kernel, k represents the channel direction index of the kernel, k is equal to the channel direction index of the input feature vector, and s represents the windowing step.
S3: accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group, comprising:
shifting and accumulating the second eigenvector group to obtain a third eigenvector group; the expression formula is as follows:
Figure BDA0002621566070000102
the simplified expression is:
Figure BDA0002621566070000103
wherein ofm _ F denotes a third feature vector group, ofm _ t denotes a second feature vector group;
h2 denotes a vertical direction index of the third eigenvector group, H1 denotes a vertical direction index of the second eigenvector group, and H1 denotes a maximum value of the vertical direction index of the second eigenvector group;
w2 denotes a horizontal direction index of the third feature vector group, W1 denotes a horizontal direction index of the second feature vector group, and W1 denotes a horizontal direction index maximum value of the second feature vector group;
m2 denotes a channel index of the third eigenvector group, M1 denotes a channel direction index of the second eigenvector group, and M1 denotes a channel direction index maximum value of the second eigenvector group;
HK1representing the maximum value, W, in the row direction of the second set of eigenvectorsK1Representing the maximum value in the column direction of the second feature vector group;
and s represents a windowing step set according to actual requirements.
The specific algorithm is as follows:
Figure BDA0002621566070000111
for example: s is equal to 1, m is equal to 0, kernel (w is equal to h is equal to 3), and the shift accumulation process of step S3 specifically operates as shown in the following table:
ofm_F[0][0][0]=ofm_t[0][0][0][0][0]+ofm_t[0][1][0][0][1]+ofm_t[0][2][0][0][2]+ofm_t[1][0][0][1][0]+ofm_t[1][1][0][1][1]+ofm_t[1][2][0][1][2]+ofm_t[2][0][0][2][0]+ofm_t[2][1][0][2][1]+ofm_t[2][2][0][2][2]
Figure BDA0002621566070000112
Figure BDA0002621566070000121
the table above shows a specific decomposition process of how to obtain the final result ofm _ F from the partial sum ofm _ t, where the first leftmost column represents the row index of ofm _ t, the second column represents the index of each point on each row (00 represents the 0 th point on row 0, 01 represents the 1 st point on row 0, … …, and so on), and starting from column 3 to column 11, the first feature vector group of inputs is multiplied by each feature vector of the convolution kernel to obtain the partial sum (i.e., the result of vector x vector).
To obtain the final result ofm _ F, i.e. the last column, it is only necessary to add up the results of the vectors x vector labeled with the same shading; it is clear that the accumulation here is divided into partial sum accumulation within a row and partial sum accumulation between rows; there are 3 identical shading blocks in each row that need to be added up for a total of 3 rows, so a total of 9 shading blocks need to be added up (diagonally), thus obtaining the final ofm _ F.
Therefore, as shown in the above table, in conjunction with fig. 5, the process of accumulating the partial sums of the second eigenvector group further includes 2 steps:
s31: storing the sum of the parts of one window in the row direction into a register;
s32: the sum of all windowed portions in the row direction is saved to on-chip RAM.
If the RAM memory capacity on the chip is insufficient, all the windowed portions in the row direction can be moved in and out of the chip.
The invention carries out multiplication accumulation calculation and partial sum accumulation calculation based on initial data and convolution kernel, thus realizing that data reading and writing is very friendly to off-chip DDR and having no problem of reading and writing efficiency; the expansion is easy, the requirement on high calculation power can be met, the interface side data reading clock can be improved, or a multi-bank mode is adopted; each port is completely independent of the other without any dependency.
S4: and classifying the third feature vector group to obtain a classification result, wherein the specific implementation process is as follows:
and bringing the third feature vector group into a softmax () function, and classifying to obtain a result.
Softmax () function:
Figure BDA0002621566070000131
Viis an element of the i-th feature vector of the third set of feature vectors, VjIs the element of the jth feature vector in the third feature vector group, j represents the number of terms of the feature vector in the third feature vector group, SiThe probabilities are classified for the pictures.
Note that the final result SiAnd if the picture is in the same range or the same numerical value, the picture is classified into a class. For example, if the probability range of a picture class is 0.3-0.5 and Si is 0.4, the picture belongs to the picture class. The end result is therefore a categorisation of the input pictures.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed, performs the CNN acceleration method described above.
Referring to fig. 6, the present invention also includes a CNN accelerator, which includes: a processor, and a memory coupled to the processor, the memory having a computer program stored therein, the computer program, when executed by the processor, performing the CNN acceleration method described above.
It should be noted that the processor itself includes registers, and the memory includes RAM. Therefore, the invention optimizes the CNN algorithm and improves the performance efficiency on hardware.
After reading the above description, it will be apparent to one skilled in the art that various features described herein can be implemented by a method, a data processing system, or a computer program product. Accordingly, these features may be embodied in hardware, in software in their entirety, or in a combination of hardware and software. Furthermore, the above-described features may also be embodied in the form of a computer program product stored on one or more computer-readable storage media having computer-readable program code segments or instructions embodied in the storage medium. The readable storage medium is configured to store various types of data to support operations at the device. The readable storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. Such as an electrostatic hard disk, a Static Random Access Memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), an optical storage device, a magnetic storage device, a flash memory, a magnetic or optical disk, and/or combinations thereof.
The specific application method of the invention is as follows:
the first garbage classification application method of the invention comprises the following steps:
the garbage classification result is 5 types: plastic bottles, books, paper boxes, waste batteries and kitchen garbage; inputting a picture into the invention, and scanning according to the CWH sequence direction to obtain a first feature vector group; the convolution kernels are two convolution kernels of 7 × 16 and 6 × 32, and the first feature vector group and the convolution kernels are multiplied and accumulated to obtain a second feature vector group, which is the calculation of formula 1; then, carrying out partial sum (calculation of the public 2) on the second feature vector group to obtain a third feature vector group; and classifying the third feature vector group by a softmax () function to obtain a probability value, and mapping corresponding garbage classification.
Secondly, the license plate recognition application method of the invention comprises the following steps:
chinese license plate number relates to 31-province Chinese characters, 26 letters and 10 numbers, and license plate pictures are acquired by a camera and other acquisition devices and then input into the invention. Scanning the license plate picture according to the CWH sequence direction to obtain a first feature vector group; multiplying and accumulating the first feature vector group and the convolution kernel to obtain a second feature vector group, which is the calculation of a formula 1; then, carrying out partial sum (calculation of the public 2) on the second feature vector group to obtain a third feature vector group; and classifying the third feature vector group by a softmax () function to obtain a probability value, mapping corresponding Chinese characters, letters and numbers to form a clear license plate number, and facilitating subsequent work of traffic police officers.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (5)

1. A CNN acceleration method, comprising:
inputting initial data, and reading the initial data in sequence to obtain a first feature vector group; scanning initial data in sequence from the channel direction, the horizontal direction and the vertical direction to obtain a first feature vector group;
multiplying and accumulating the convolution kernel and the first eigenvector group to obtain a second eigenvector group; the method specifically comprises the following steps:
Figure FDA0002621566060000011
wherein ofm _ t represents a second set of feature vectors; ifm denotes a first feature vector group; kernel denotes the convolution kernel;
h1 denotes the vertical direction index of the second feature vector group, H denotes the vertical direction index of the first feature vector group, and H denotes the maximum value of the vertical direction index of the first feature vector group;
w1 denotes the horizontal direction index of the second feature vector group, W denotes the horizontal direction index of the first feature vector group, and W denotes the maximum value of the horizontal direction index of the first feature vector group;
m1 represents the channel direction index of the second feature vector group, M represents the group number index of the kernel convolution kernel, and M represents the maximum value of the channel direction index of the first feature vector group;
i denotes the vertical index of the convolution kernel, HKRepresents the maximum value of the vertical index of the convolution kernel;
j denotes the horizontal index of the convolution kernel, WKRepresents the maximum value of the horizontal index of the convolution kernel;
k denotes the channel direction index of the convolution kernel, CKRepresenting a maximum value of a channel direction index of the convolution kernel;
i1 denotes a row index of the second eigenvector group, and j1 denotes a column index of the second eigenvector group;
accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group; the method specifically comprises the following steps:
accumulating the shifts of the second eigenvector group to obtain a third eigenvector group; the expression formula is as follows:
Figure FDA0002621566060000021
wherein ofm _ F denotes a third feature vector group, ofm _ t denotes a second feature vector group;
h2 denotes a vertical direction index of the third eigenvector group, H1 denotes a vertical direction index of the second eigenvector group, and H1 denotes a maximum value of the vertical direction index of the second eigenvector group;
w2 denotes a horizontal direction index of the third feature vector group, W1 denotes a horizontal direction index of the second feature vector group, and W1 denotes a horizontal direction index maximum value of the second feature vector group;
m2 denotes a channel direction index of the third eigenvector group, M1 denotes a channel direction index of the second eigenvector group, and M1 denotes a channel direction index maximum value of the second eigenvector group;
HK1representing the maximum value, W, in the row direction of the second set of eigenvectorsK1Representing the maximum value in the column direction of the second feature vector group;
s represents window stepping and is set according to actual requirements;
the accumulating the partial sums of the second eigenvector group to obtain a third eigenvector group further includes:
storing the sum of the parts of one window in the row direction in a register;
storing all the windowed parts in the row direction in an on-chip RAM;
classifying the third feature vector group to obtain a classification result; the method specifically comprises the following steps:
and bringing the third feature vector group into a softmax () function, and carrying out classification processing to obtain a classification result.
2. The CNN acceleration method according to claim 1, wherein said multiplying and accumulating the convolution kernel with the first eigenvector group to obtain a second eigenvector group comprises:
the convolution kernel is a constant vector.
3. The CNN acceleration method according to claim 2, wherein said multiplying and accumulating the convolution kernel with the first eigenvector group to obtain a second eigenvector group comprises:
the convolution kernel is multiple.
4. A computer-readable storage medium, having stored thereon a computer program which, when executed, performs the CNN acceleration method according to any one of claims 1-3.
5. A CNN accelerator, comprising: a processor, and a memory coupled to the processor, the memory having stored therein a computer program that, when executed by the processor, performs the CNN acceleration method of any one of claims 1-3.
CN202010784854.9A 2020-06-02 2020-08-06 CNN acceleration method and accelerator Pending CN111898743A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010487589 2020-06-02
CN2020104875898 2020-06-02

Publications (1)

Publication Number Publication Date
CN111898743A true CN111898743A (en) 2020-11-06

Family

ID=73246567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010784854.9A Pending CN111898743A (en) 2020-06-02 2020-08-06 CNN acceleration method and accelerator

Country Status (1)

Country Link
CN (1) CN111898743A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024040421A1 (en) * 2022-08-23 2024-02-29 Intel Corporation Fractional-bit quantization and deployment of convolutional neural network models

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024040421A1 (en) * 2022-08-23 2024-02-29 Intel Corporation Fractional-bit quantization and deployment of convolutional neural network models

Similar Documents

Publication Publication Date Title
Wilkinson et al. Semantic and verbatim word spotting using deep neural networks
Zheng et al. SIFT meets CNN: A decade survey of instance retrieval
Cakir et al. Online supervised hashing
KR102305568B1 (en) Finding k extreme values in constant processing time
CN101446962B (en) Data conversion method, device thereof and data processing system
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
CN111914085A (en) Text fine-grained emotion classification method, system, device and storage medium
US11829376B2 (en) Technologies for refining stochastic similarity search candidates
US20190080226A1 (en) Method of designing neural network system
CN106570173B (en) Spark-based high-dimensional sparse text data clustering method
CN114419381A (en) Semantic segmentation method and road ponding detection method and device applying same
CN114399649B (en) Rapid multi-view semi-supervised learning method and system based on learning graph
CN115390565A (en) Unmanned ship dynamic path planning method and system based on improved D-star algorithm
CN111898743A (en) CNN acceleration method and accelerator
Djenouri et al. Deep learning based decomposition for visual navigation in industrial platforms
CN103119606B (en) A kind of clustering method of large-scale image data and device
CN113255892A (en) Method and device for searching decoupled network structure and readable storage medium
US11989553B2 (en) Technologies for performing sparse lifting and procrustean orthogonal sparse hashing using column read-enabled memory
CN116150694A (en) Dynamic graph anomaly detection method
Song et al. Prada: Point cloud recognition acceleration via dynamic approximation
CN109614581A (en) The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
Hamid et al. Supervised learning of salient 2D views of 3D models
CN112733807A (en) Face comparison graph convolution neural network training method and device
CN116777727B (en) Integrated memory chip, image processing method, electronic device and storage medium
US20230334289A1 (en) Deep neural network accelerator with memory having two-level topology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination