CN109978161A - A kind of general convolution-pond synchronization process convolution kernel system - Google Patents
A kind of general convolution-pond synchronization process convolution kernel system Download PDFInfo
- Publication number
- CN109978161A CN109978161A CN201910268608.5A CN201910268608A CN109978161A CN 109978161 A CN109978161 A CN 109978161A CN 201910268608 A CN201910268608 A CN 201910268608A CN 109978161 A CN109978161 A CN 109978161A
- Authority
- CN
- China
- Prior art keywords
- input
- convolution
- image
- convolution kernel
- pond
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
The invention discloses a kind of general convolution-pond synchronization process convolution kernel systems, belong to convolutional neural networks acceleration technique field in machine learning.Software realization is used for existing machine learning method, it is limited that there are computing capabilitys, the problems such as higher cost, the present invention realizes machine learning using hardware design, the purpose accelerated to convolutional neural networks is realized in a manner of convolution-pond synchronization process, can be under the premise of accuracy rate be immovable, it being capable of quick, low-power consumption, efficient realization machine learning.The existing common convolution kernel of convolutional neural networks is fixed size, can not adapt to various design needs, and the convolution kernel in the present invention can change the parameters such as convolution kernel size, step-length, can adapt to the design needs in various situations.
Description
Technical field
The invention belongs to convolutional neural networks acceleration technique fields in machine learning.
Technical background
Artificial intelligence (Artificial Intelligence) is a developing direction of current era, is widely answered
For numerous areas such as computer, medicine, biology, machinery.Machine learning (Machine Learning) is as one of weight
Want branch, in recent years in obtained extensive concern, realize swift and violent development.It can by a large amount of data sample,
It is repeatedly trained, obtains ideal effect, be widely used in the fields such as image recognition, object tracking, speech processes.Convolution
Neural network (Convolutional Neural Network, CNN) is one of important method of machine learning, has been attracted a large amount of
Scholar goes in for the study.Wherein, Lenet, Alexnet, VGG etc. are its more representative models, are had in practical applications out
The performance of color.
Machine learning is risen in eighties of last century the fifties, experienced the development of more than ten years, in nineteen sixties
In arrive late nineteen seventies, since the computing capability of computer at that time is limited, the development of machine learning is in the lag phase.To last
Century, late nineteen seventies started, and with the promotion of computer process ability, machine learning starts second of upsurge.Nowadays, with
The development of computer and big data, machine learning method obtained unprecedented development.However, continuous with data volume
Increase and the continuous intensification of level, the processing capacity of CPU can not preferably adapt to its development.At this moment, GPU is emerged.
However, although GPU has a certain upgrade in computing capability, even if ability is still limited, and higher cost.Therefore, existing
Development trend be gradually partial to using hardware realization machine learning algorithm.Its speed is fast, low in energy consumption, high-efficient, makes its tool
There is bright development prospect.Also, convolutional neural networks are accelerated using hardware, and following important development
Direction.
Summary of the invention
The purpose of the present invention is to solve the deficiencies in the prior art, a kind of the hardware-accelerated of convolutional neural networks is proposed
Scheme can accelerate convolutional Neural to realize by convolution-pond synchronization process under the premise of accuracy rate is immovable
Purpose.
A kind of general convolution-pond synchronization process convolution kernel system, by nine processing units, 12 input ports and 3
A output port composition;Processing unit includes: weight register, image register, memory cell, multiplication unit, addition list
Member, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal end of input weight biasing
Mouth wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel
Quantity input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input
Port st, filling (padding) size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete letter
Number d.
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By multiple
For bit port rstn to the reset signal of system input high level, each processing unit carries out convolution-Chi Huatong under the signal designation
Step processing;
System is inputted after the completion of reset by picture traverse pixel quantity input port npx, image length pixel quantity
Port npy, convolution kernel size input port nc, step sizes input port st, filling (padding) size input port pa and
Pond type input port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by
Weight bias input end mouth wb input convolution kernel weighted value and offset value deposit weight register in, convolution kernel weight and
After biasing input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through
In the image pixel numerical value deposit image register that image input port p is received, each clock cycle receives the one of image
A pixel, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be-
1~1, after image pixel end of input, image input useful signal becomes invalid low signal;
3) while receiving pixel, by the convolution kernel in the image pixel numerical value and weight register in image register
In each weighted value be multiplied, multiplication unit to counter send indication signal xd;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length
The image length and width pixel quantity that pixel quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculates
Formula is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the pixel that counter counts obtain
Ordinal number, the number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx and npy
Value range be 0~1024, and be integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and picture
The calculated result of multiplication unit is stored in memory cell by position coordinates x, y of element, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is m's
Convolution kernel weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling size
Numerical value, the value range of filling size pa are 0~5, and are integer;
According to the order of ranks, successively by all pixels input system of whole picture and after the completion of calculating, memory list
Member sends indication signal cd to the second counting unit;
6) after the second counting unit receives indication signal cd, according to the step-length got from step sizes input port st
Numerical value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, meter
The position for product needed for convolution is calculated, calculation formula is as follows:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents last moment convolution kernel
Abscissa, cy ' represents the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image
The number of pixels of each column, pa represent filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value model of step-length st
Enclosing is 1~11, and is integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, each convolution kernel institute
Need product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc be 1~
11, and be integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, obtains four calculated results, is calculated public
Formula is as follows:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2It represents and is located at upper right
Convolution kernel calculated result, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the meter for being located at the convolution kernel of bottom right
It calculates as a result, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, b with four convolution kernels for one group
For weight biasing, value range is the size that -1~1, nc is convolution kernel.
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) compare with 0
Compared with, be greater than 0 access value itself, take 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond unit, pond unit is according to the pond class inputted by pond type input port po
Type carries out pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond
Change;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, four results are calculated
Average value exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being tied from output result useful signal port rl output
Fruit useful signal;
10) the second counting unit is according to the size of image, after completing all convolution shown in calculating, to output port d
Signal is completed in output, and reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture.
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, and npx representative image is every
Capable number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st represents step-length
Size.
Beneficial effects of the present invention:
It is 28 × 28 in input picture size to realize lenet-5 convolutional neural networks while calculating convolution,
Padding size is 1, and convolution kernel size is 3 × 3, and in the case of step-length is 1, this method is from input picture is started, to calculating
Whole convolution results need 980 clock cycle, wherein input picture needs 784 clock cycle, is input to from completion image defeated
The treatment process of whole results needs 196 clock cycle out.And the method that common convolution pondization is independently realized, from starting to input
Image needs 1568 clock cycle to whole convolution results are calculated, wherein input picture needs 784 clock cycle, from complete
784 clock cycle are needed at the treatment process that image is input to the whole results of output.The design method compared with than conventional method complete
Raise speed 37.5% in the process, the data handling procedure speed-raising 75% after image end of input.Also, common convolution kernel is to fix
Size can not adapt to various design needs, and the convolution kernel of the design can change the parameters such as convolution kernel size, step-length, can
Adapt to the design needs in various situations.
Detailed description of the invention
Fig. 1 is convolution general in the present invention-pond synchronization process convolution kernel flow chart;
Fig. 2 is convolution general in the present invention-pond synchronization process convolution kernel algorithm schematic diagram;
Fig. 3 is convolution general in the present invention-pond synchronization process convolution kernel hardware system structure figure;
Fig. 4 is convolution general in the present invention-pond synchronization process convolution kernel internal system structural schematic diagram;
Fig. 5 is convolution general in the present invention-pond synchronization process convolution kernel time stimulatiom schematic diagram.
Specific embodiment
To keep design scheme of the invention clearer, following will be combined with the drawings in the embodiments of the present invention, to the present invention
The embodiment of example is described in detail.
The present invention is the accelerating hardware realization to the differentiation process of the propagated forward of convolutional neural networks, including convolutional Neural
The convolutional layer and pond layer of network.Wherein, the formula of convolutional layer are as follows:
Wherein, l represents the number of plies, alOutput tensor is represented, * represents convolution, and b represents biasing, and M represents submatrix number, σ generation
Table activation primitive, usually Relu.
Pond layer formula be
al=pool (al-1)
Wherein, pool refers to the process that tensor diminution will be inputted according to pond area size and pond standard.
Whole design thought of the invention, as shown in Fig. 2, are as follows:
First by the realization synchronous with each input product of pixel of all weights;
After image inputs, while four adjacent convolution acquired results are calculated, it therefore, can be directly to this four
As a result the pondization operation averaged or be maximized, realizes the purpose of convolution realization synchronous with pondization operation.
Fig. 4 is the internal structure of hardware system in the present invention.System is to receive the weight that is inputted by the port wren effective
After signal, the convolution kernel weighted value and offset value that are inputted by the port wb are stored in weight register.Hereafter, upper unit
Input pixel useful signal is sent to system by the port pren, is posted the pixel received deposit image according to this signal system
Storage, at the same by the pixel number received by multiplication unit respectively with each multiplied by weight of this convolution kernel.Starting
After multiplying, multiplication unit sends indication signal xd to counter unit 1, and counter unit 1 can be according to the length of input picture
Wide number of pixels judges position coordinates x, y of pixel, and product is stored in memory according to the ordinal number of weight and filling size
Corresponding position.After the completion of storage, storage unit sends signal cd to counting unit 2, and counting unit 2 starts to calculate for convolution institute
The position of the product needed, while all product data for being used for four adjacent up and down convolution kernels being taken out, and in addition
Corresponding add operation is carried out in unit, while biasing is added, and obtains four calculated results.This four results are passed through in next step
Activation primitive Relu, method are that will add up result compared with 0, are greater than 0 access value itself, take 0 less than 0.After completing the step,
Four results are sent into pond unit, pond unit is according to the pond type inputted by input port po, if it is maximum value pond
Change, the maximum value of four results is taken to export, if it is mean value pond, the average value for calculating four results is exported, and is exported
As a result it is effective that result useful signal rl is exported while.At the same time, counting unit 2 is according to the size of image, according to calculating
It after all convolution are completed in display, is exported to output port d and completes signal, terminate the convolution of a picture, prepare to receive next
Picture.
Fig. 5 is the circuit simulation figure of hardware system in the present invention.Clk is clock signal, and rstn is reset signal, and npx is
The number of pixels of the every row of image, npy are the number of pixels of image each column, and nc is convolution kernel size, and st is step-length, and pa filling is big
Small, po is pond type, and wren is that weight biases useful signal, and wb is weight and biasing, and pren image inputs useful signal, p
For pixel, y1~yn is multiplication unit, and xd is the first counting unit useful signal, x pixel abscissa, y pixel ordinate, ram
For memory cell, cd is the second counting unit useful signal, cx convolution kernel abscissa, cy convolution kernel ordinate, and m1~m4 is
Addition unit, r are last convolution results, and rl is result useful signal, and d is that convolution completes signal.
Specific implementation of the invention is as follows:
A kind of general convolution-pond synchronization process convolution kernel system, by nine processing units, 12 input ports and 3
A output port composition;Processing unit includes: weight register, image register, memory cell, multiplication unit, addition list
Member, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal end of input weight biasing
Mouth wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel
Quantity input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input
Port st, filling (padding) size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete letter
Number d.
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By multiple
For bit port rstn to the reset signal of system input high level, each processing unit carries out convolution-Chi Huatong under the signal designation
Step processing;
System is inputted after the completion of reset by picture traverse pixel quantity input port npx, image length pixel quantity
Port npy, convolution kernel size input port nc, step sizes input port st, filling (padding) size input port pa and
Pond type input port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by
Weight bias input end mouth wb input convolution kernel weighted value and offset value deposit weight register in, convolution kernel weight and
After biasing input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through
In the image pixel numerical value deposit image register that image input port p is received, each clock cycle receives the one of image
A pixel, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be-
1~1, after image pixel end of input, image input useful signal becomes invalid low signal;
3) while receiving pixel, by the convolution kernel in the image pixel numerical value and weight register in image register
In each weighted value be multiplied, multiplication unit to counter send indication signal xd;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length
The image length and width pixel quantity that pixel quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculates
Formula is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the pixel that counter counts obtain
Ordinal number, the number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx and npy
Value range be 0~1024, and be integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and picture
The calculated result of multiplication unit is stored in memory cell by position coordinates x, y of element, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is m's
Convolution kernel weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling size
Numerical value, the value range of filling size pa are 0~5, and are integer;
According to the order of ranks, successively by all pixels input system of whole picture and after the completion of calculating, memory list
Member sends indication signal cd to the second counting unit;
6) after the second counting unit receives indication signal cd, according to the step-length got from step sizes input port st
Numerical value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, meter
The position for product needed for convolution is calculated, calculation formula is as follows:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents last moment convolution kernel
Abscissa, cy ' represents the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image
The number of pixels of each column, pa represent filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value model of step-length st
Enclosing is 1~11, and is integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, each convolution kernel institute
Need product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc be 1~
11, and be integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, obtains four calculated results, is calculated public
Formula is as follows:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2It represents and is located at upper right
Convolution kernel calculated result, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the meter for being located at the convolution kernel of bottom right
It calculates as a result, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, b with four convolution kernels for one group
For weight biasing, value range is the size that -1~1, nc is convolution kernel.
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) compare with 0
Compared with, be greater than 0 access value itself, take 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond unit, pond unit is according to the pond class inputted by pond type input port po
Type carries out pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond
Change;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, four results are calculated
Average value exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being tied from output result useful signal port rl output
Fruit useful signal;
10) the second counting unit is according to the size of image, after completing all convolution shown in calculating, to output port d
Signal is completed in output, and reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture.
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, and npx representative image is every
Capable number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st represents step-length
Size.
Claims (1)
1. a kind of general convolution-pond synchronization process convolution kernel system, which is characterized in that the system by nine processing units,
12 input ports and 3 output port compositions;Processing unit include: weight register, image register, memory cell,
Multiplication unit, addition unit, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal port of input weight biasing
Wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel number
Measure input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input terminal
Mouth st, filling size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete signal d;
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By reset terminal
To the reset signal of system input high level, each processing unit carries out at convolution-pond synchronization under the signal designation by mouthful rstn
Reason;
System passes through picture traverse pixel quantity input port npx, image length pixel quantity input port after the completion of reset
Npy, convolution kernel size input port nc, step sizes input port st, filling size input port pa and the input of pond type
Port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by weight
In the convolution kernel weighted value and offset value deposit weight register of bias input end mouth wb input, convolution kernel weight and biasing
After input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through image
In the image pixel numerical value deposit image register that input port p is received, each clock cycle receives a picture of image
Element, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be -1~
1, after image pixel end of input, image input useful signal becomes invalid low signal;
It 3), will be in the convolution kernel in the image pixel numerical value and weight register in image register while receiving pixel
Each weighted value is multiplied, and multiplication unit sends indication signal xd to counter;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length pixel
The image length and width pixel quantity that quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculation formula
It is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the ordinal number for the pixel that counter counts obtain,
The number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx's and npy takes
Being worth range is 0~1024, and is integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and pixel
The calculated result of multiplication unit is stored in memory cell by position coordinates x, y, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is the convolution of m
Core weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling magnitude numerical value,
The value range for filling size pa is 0~5, and is integer;
According to the order of ranks, successively by all pixels input system of whole picture and calculate after the completion of, memory cell to
Second counting unit sends indication signal cd;
6) after the second counting unit receives indication signal cd, according to the step-length number got from step sizes input port st
Value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, calculate
The position for product needed for convolution, calculation formula are as follows out:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents the cross of last moment convolution kernel
Coordinate, cy ' represent the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image each column
Number of pixels, pa represents filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value range of step-length st is 1
~11, and be integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, and multiplies needed for each convolution kernel
Product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc is 1~11, and
For integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, and obtains four calculated results, calculation formula is such as
Under:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2Represent the volume for being located at upper right
The calculated result of product core, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the calculating knot for being located at the convolution kernel of bottom right
Fruit, with four convolution kernels for one group, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, and b is power
It biases again, value range is the size that -1~1, nc is convolution kernel;
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) are greater than compared with 0
0 access value itself, takes 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond units, pond unit according to the pond type inputted by pond type input port po into
Row pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, the flat of four results is calculated
Mean value is exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being had from output result useful signal port rl output result
Imitate signal;
10) the second counting unit exports after calculating all convolution of display completion to output port d according to the size of image
Signal is completed, reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture;
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, the every row of npx representative image
Number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st ride instead of walk is grown up
It is small.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910177153 | 2019-03-08 | ||
CN2019101771536 | 2019-03-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109978161A true CN109978161A (en) | 2019-07-05 |
CN109978161B CN109978161B (en) | 2022-03-04 |
Family
ID=67082846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910268608.5A Active CN109978161B (en) | 2019-03-08 | 2019-04-04 | Universal convolution-pooling synchronous processing convolution kernel system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109978161B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103905686A (en) * | 2012-12-24 | 2014-07-02 | 三星电子株式会社 | Image scanning apparatus, image compensation method and computer-readable recording medium |
CN107894189A (en) * | 2017-10-31 | 2018-04-10 | 北京艾克利特光电科技有限公司 | A kind of EOTS and its method for automatic tracking of target point automatic tracing |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
IN201811023855A (en) * | 2018-06-26 | 2018-07-13 | Hcl Technologies Ltd | |
CN108764467A (en) * | 2018-04-04 | 2018-11-06 | 北京大学深圳研究生院 | For convolutional neural networks convolution algorithm and full connection computing circuit |
CN108804973A (en) * | 2017-04-27 | 2018-11-13 | 上海鲲云信息科技有限公司 | The hardware structure and its execution method of algorithm of target detection based on deep learning |
CN108848326A (en) * | 2018-06-13 | 2018-11-20 | 吉林大学 | A kind of high dynamic range MCP detector front end reading circuit and its reading method |
CN109284824A (en) * | 2018-09-04 | 2019-01-29 | 复旦大学 | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies |
CN109416756A (en) * | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Acoustic convolver and its applied artificial intelligence process device |
-
2019
- 2019-04-04 CN CN201910268608.5A patent/CN109978161B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103905686A (en) * | 2012-12-24 | 2014-07-02 | 三星电子株式会社 | Image scanning apparatus, image compensation method and computer-readable recording medium |
CN108804973A (en) * | 2017-04-27 | 2018-11-13 | 上海鲲云信息科技有限公司 | The hardware structure and its execution method of algorithm of target detection based on deep learning |
CN107894189A (en) * | 2017-10-31 | 2018-04-10 | 北京艾克利特光电科技有限公司 | A kind of EOTS and its method for automatic tracking of target point automatic tracing |
CN108154229A (en) * | 2018-01-10 | 2018-06-12 | 西安电子科技大学 | Accelerate the image processing method of convolutional neural networks frame based on FPGA |
CN109416756A (en) * | 2018-01-15 | 2019-03-01 | 深圳鲲云信息科技有限公司 | Acoustic convolver and its applied artificial intelligence process device |
CN108764467A (en) * | 2018-04-04 | 2018-11-06 | 北京大学深圳研究生院 | For convolutional neural networks convolution algorithm and full connection computing circuit |
CN108848326A (en) * | 2018-06-13 | 2018-11-20 | 吉林大学 | A kind of high dynamic range MCP detector front end reading circuit and its reading method |
IN201811023855A (en) * | 2018-06-26 | 2018-07-13 | Hcl Technologies Ltd | |
CN109284824A (en) * | 2018-09-04 | 2019-01-29 | 复旦大学 | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies |
Non-Patent Citations (6)
Title |
---|
FRANCESCO CONTI等: "A Ultra-Low-Energy Convolution Engine for Fast Brain-Inspired Vision in Multicore Clusters", 《DOI:10.7873/DATE.2015.0404》 * |
LI DU等: "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things", 《DOI:10.1109/TCSI.2017.2735490》 * |
MOHAMMED ALAWAD等: "Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic Fabric", 《IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS》 * |
俞健: "多核DSP图像处理系统的硬件设计", 《中国优秀硕士学位论文全文数据库_信息科技辑》 * |
姬梦飞等: "一种低功耗数据转换系统的设计", 《集成电路应用》 * |
邱宇: "基于FPGA的Alexnet前向网络加速", 《中国优秀硕士学位论文全文数据库_信息科技辑》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11487288B2 (en) | 2017-03-23 | 2022-11-01 | Tesla, Inc. | Data synthesis for autonomous control systems |
US11403069B2 (en) | 2017-07-24 | 2022-08-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11681649B2 (en) | 2017-07-24 | 2023-06-20 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11797304B2 (en) | 2018-02-01 | 2023-10-24 | Tesla, Inc. | Instruction set architecture for a vector computational unit |
US11734562B2 (en) | 2018-06-20 | 2023-08-22 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11841434B2 (en) | 2018-07-20 | 2023-12-12 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11893774B2 (en) | 2018-10-11 | 2024-02-06 | Tesla, Inc. | Systems and methods for training machine models with augmented data |
US11665108B2 (en) | 2018-10-25 | 2023-05-30 | Tesla, Inc. | QoS manager for system on a chip communications |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11908171B2 (en) | 2018-12-04 | 2024-02-20 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11748620B2 (en) | 2019-02-01 | 2023-09-05 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US11790664B2 (en) | 2019-02-19 | 2023-10-17 | Tesla, Inc. | Estimating object properties using visual image data |
Also Published As
Publication number | Publication date |
---|---|
CN109978161B (en) | 2022-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109978161A (en) | A kind of general convolution-pond synchronization process convolution kernel system | |
CN109993297A (en) | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing | |
CN111062472A (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
CN107239823A (en) | A kind of apparatus and method for realizing sparse neural network | |
CN101706741B (en) | Method for partitioning dynamic tasks of CPU and GPU based on load balance | |
CN105528191B (en) | Data accumulation apparatus and method, and digital signal processing device | |
CN109063825A (en) | Convolutional neural networks accelerator | |
CN109086244A (en) | Matrix convolution vectorization implementation method based on vector processor | |
CN110119809A (en) | The asymmetric quantization of multiplication and accumulating operation in deep learning processing | |
CN111242289A (en) | Convolutional neural network acceleration system and method with expandable scale | |
CN110348574A (en) | A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ | |
CN112465110B (en) | Hardware accelerator for convolution neural network calculation optimization | |
CN111445012A (en) | FPGA-based packet convolution hardware accelerator and method thereof | |
CN107886167A (en) | Neural network computing device and method | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
CN110390385A (en) | A kind of general convolutional neural networks accelerator of configurable parallel based on BNRP | |
CN106951395A (en) | Towards the parallel convolution operations method and device of compression convolutional neural networks | |
CN110210610A (en) | Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment | |
CN109740739A (en) | Neural computing device, neural computing method and Related product | |
CN110263925A (en) | A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA | |
CN109191364A (en) | Accelerate the hardware structure of artificial intelligence process device | |
CN112200300B (en) | Convolutional neural network operation method and device | |
CN109901814A (en) | Customized floating number and its calculation method and hardware configuration | |
CN108764467A (en) | For convolutional neural networks convolution algorithm and full connection computing circuit | |
CN106959937A (en) | A kind of vectorization implementation method of warp product matrix towards GPDSP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |