CN109978161A - A kind of general convolution-pond synchronization process convolution kernel system - Google Patents

A kind of general convolution-pond synchronization process convolution kernel system Download PDF

Info

Publication number
CN109978161A
CN109978161A CN201910268608.5A CN201910268608A CN109978161A CN 109978161 A CN109978161 A CN 109978161A CN 201910268608 A CN201910268608 A CN 201910268608A CN 109978161 A CN109978161 A CN 109978161A
Authority
CN
China
Prior art keywords
input
convolution
image
convolution kernel
pond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910268608.5A
Other languages
Chinese (zh)
Other versions
CN109978161B (en
Inventor
张宝林
姬梦飞
常玉春
李东泽
丁宁
戴加海
慕雨松
蒋佳奇
马玉美
郭玉萍
孙畅
宫浩然
王若溪
李捷菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Publication of CN109978161A publication Critical patent/CN109978161A/en
Application granted granted Critical
Publication of CN109978161B publication Critical patent/CN109978161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The invention discloses a kind of general convolution-pond synchronization process convolution kernel systems, belong to convolutional neural networks acceleration technique field in machine learning.Software realization is used for existing machine learning method, it is limited that there are computing capabilitys, the problems such as higher cost, the present invention realizes machine learning using hardware design, the purpose accelerated to convolutional neural networks is realized in a manner of convolution-pond synchronization process, can be under the premise of accuracy rate be immovable, it being capable of quick, low-power consumption, efficient realization machine learning.The existing common convolution kernel of convolutional neural networks is fixed size, can not adapt to various design needs, and the convolution kernel in the present invention can change the parameters such as convolution kernel size, step-length, can adapt to the design needs in various situations.

Description

A kind of general convolution-pond synchronization process convolution kernel system
Technical field
The invention belongs to convolutional neural networks acceleration technique fields in machine learning.
Technical background
Artificial intelligence (Artificial Intelligence) is a developing direction of current era, is widely answered For numerous areas such as computer, medicine, biology, machinery.Machine learning (Machine Learning) is as one of weight Want branch, in recent years in obtained extensive concern, realize swift and violent development.It can by a large amount of data sample, It is repeatedly trained, obtains ideal effect, be widely used in the fields such as image recognition, object tracking, speech processes.Convolution Neural network (Convolutional Neural Network, CNN) is one of important method of machine learning, has been attracted a large amount of Scholar goes in for the study.Wherein, Lenet, Alexnet, VGG etc. are its more representative models, are had in practical applications out The performance of color.
Machine learning is risen in eighties of last century the fifties, experienced the development of more than ten years, in nineteen sixties In arrive late nineteen seventies, since the computing capability of computer at that time is limited, the development of machine learning is in the lag phase.To last Century, late nineteen seventies started, and with the promotion of computer process ability, machine learning starts second of upsurge.Nowadays, with The development of computer and big data, machine learning method obtained unprecedented development.However, continuous with data volume Increase and the continuous intensification of level, the processing capacity of CPU can not preferably adapt to its development.At this moment, GPU is emerged. However, although GPU has a certain upgrade in computing capability, even if ability is still limited, and higher cost.Therefore, existing Development trend be gradually partial to using hardware realization machine learning algorithm.Its speed is fast, low in energy consumption, high-efficient, makes its tool There is bright development prospect.Also, convolutional neural networks are accelerated using hardware, and following important development Direction.
Summary of the invention
The purpose of the present invention is to solve the deficiencies in the prior art, a kind of the hardware-accelerated of convolutional neural networks is proposed Scheme can accelerate convolutional Neural to realize by convolution-pond synchronization process under the premise of accuracy rate is immovable Purpose.
A kind of general convolution-pond synchronization process convolution kernel system, by nine processing units, 12 input ports and 3 A output port composition;Processing unit includes: weight register, image register, memory cell, multiplication unit, addition list Member, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal end of input weight biasing Mouth wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel Quantity input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input Port st, filling (padding) size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete letter Number d.
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By multiple For bit port rstn to the reset signal of system input high level, each processing unit carries out convolution-Chi Huatong under the signal designation Step processing;
System is inputted after the completion of reset by picture traverse pixel quantity input port npx, image length pixel quantity Port npy, convolution kernel size input port nc, step sizes input port st, filling (padding) size input port pa and Pond type input port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by Weight bias input end mouth wb input convolution kernel weighted value and offset value deposit weight register in, convolution kernel weight and After biasing input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through In the image pixel numerical value deposit image register that image input port p is received, each clock cycle receives the one of image A pixel, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be- 1~1, after image pixel end of input, image input useful signal becomes invalid low signal;
3) while receiving pixel, by the convolution kernel in the image pixel numerical value and weight register in image register In each weighted value be multiplied, multiplication unit to counter send indication signal xd;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length The image length and width pixel quantity that pixel quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculates Formula is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the pixel that counter counts obtain Ordinal number, the number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx and npy Value range be 0~1024, and be integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and picture The calculated result of multiplication unit is stored in memory cell by position coordinates x, y of element, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is m's Convolution kernel weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling size Numerical value, the value range of filling size pa are 0~5, and are integer;
According to the order of ranks, successively by all pixels input system of whole picture and after the completion of calculating, memory list Member sends indication signal cd to the second counting unit;
6) after the second counting unit receives indication signal cd, according to the step-length got from step sizes input port st Numerical value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, meter The position for product needed for convolution is calculated, calculation formula is as follows:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents last moment convolution kernel Abscissa, cy ' represents the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image The number of pixels of each column, pa represent filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value model of step-length st Enclosing is 1~11, and is integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, each convolution kernel institute Need product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc be 1~ 11, and be integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, obtains four calculated results, is calculated public Formula is as follows:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2It represents and is located at upper right Convolution kernel calculated result, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the meter for being located at the convolution kernel of bottom right It calculates as a result, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, b with four convolution kernels for one group For weight biasing, value range is the size that -1~1, nc is convolution kernel.
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) compare with 0 Compared with, be greater than 0 access value itself, take 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond unit, pond unit is according to the pond class inputted by pond type input port po Type carries out pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond Change;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, four results are calculated Average value exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being tied from output result useful signal port rl output Fruit useful signal;
10) the second counting unit is according to the size of image, after completing all convolution shown in calculating, to output port d Signal is completed in output, and reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture.
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, and npx representative image is every Capable number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st represents step-length Size.
Beneficial effects of the present invention:
It is 28 × 28 in input picture size to realize lenet-5 convolutional neural networks while calculating convolution, Padding size is 1, and convolution kernel size is 3 × 3, and in the case of step-length is 1, this method is from input picture is started, to calculating Whole convolution results need 980 clock cycle, wherein input picture needs 784 clock cycle, is input to from completion image defeated The treatment process of whole results needs 196 clock cycle out.And the method that common convolution pondization is independently realized, from starting to input Image needs 1568 clock cycle to whole convolution results are calculated, wherein input picture needs 784 clock cycle, from complete 784 clock cycle are needed at the treatment process that image is input to the whole results of output.The design method compared with than conventional method complete Raise speed 37.5% in the process, the data handling procedure speed-raising 75% after image end of input.Also, common convolution kernel is to fix Size can not adapt to various design needs, and the convolution kernel of the design can change the parameters such as convolution kernel size, step-length, can Adapt to the design needs in various situations.
Detailed description of the invention
Fig. 1 is convolution general in the present invention-pond synchronization process convolution kernel flow chart;
Fig. 2 is convolution general in the present invention-pond synchronization process convolution kernel algorithm schematic diagram;
Fig. 3 is convolution general in the present invention-pond synchronization process convolution kernel hardware system structure figure;
Fig. 4 is convolution general in the present invention-pond synchronization process convolution kernel internal system structural schematic diagram;
Fig. 5 is convolution general in the present invention-pond synchronization process convolution kernel time stimulatiom schematic diagram.
Specific embodiment
To keep design scheme of the invention clearer, following will be combined with the drawings in the embodiments of the present invention, to the present invention The embodiment of example is described in detail.
The present invention is the accelerating hardware realization to the differentiation process of the propagated forward of convolutional neural networks, including convolutional Neural The convolutional layer and pond layer of network.Wherein, the formula of convolutional layer are as follows:
Wherein, l represents the number of plies, alOutput tensor is represented, * represents convolution, and b represents biasing, and M represents submatrix number, σ generation Table activation primitive, usually Relu.
Pond layer formula be
al=pool (al-1)
Wherein, pool refers to the process that tensor diminution will be inputted according to pond area size and pond standard.
Whole design thought of the invention, as shown in Fig. 2, are as follows:
First by the realization synchronous with each input product of pixel of all weights;
After image inputs, while four adjacent convolution acquired results are calculated, it therefore, can be directly to this four As a result the pondization operation averaged or be maximized, realizes the purpose of convolution realization synchronous with pondization operation.
Fig. 4 is the internal structure of hardware system in the present invention.System is to receive the weight that is inputted by the port wren effective After signal, the convolution kernel weighted value and offset value that are inputted by the port wb are stored in weight register.Hereafter, upper unit Input pixel useful signal is sent to system by the port pren, is posted the pixel received deposit image according to this signal system Storage, at the same by the pixel number received by multiplication unit respectively with each multiplied by weight of this convolution kernel.Starting After multiplying, multiplication unit sends indication signal xd to counter unit 1, and counter unit 1 can be according to the length of input picture Wide number of pixels judges position coordinates x, y of pixel, and product is stored in memory according to the ordinal number of weight and filling size Corresponding position.After the completion of storage, storage unit sends signal cd to counting unit 2, and counting unit 2 starts to calculate for convolution institute The position of the product needed, while all product data for being used for four adjacent up and down convolution kernels being taken out, and in addition Corresponding add operation is carried out in unit, while biasing is added, and obtains four calculated results.This four results are passed through in next step Activation primitive Relu, method are that will add up result compared with 0, are greater than 0 access value itself, take 0 less than 0.After completing the step, Four results are sent into pond unit, pond unit is according to the pond type inputted by input port po, if it is maximum value pond Change, the maximum value of four results is taken to export, if it is mean value pond, the average value for calculating four results is exported, and is exported As a result it is effective that result useful signal rl is exported while.At the same time, counting unit 2 is according to the size of image, according to calculating It after all convolution are completed in display, is exported to output port d and completes signal, terminate the convolution of a picture, prepare to receive next Picture.
Fig. 5 is the circuit simulation figure of hardware system in the present invention.Clk is clock signal, and rstn is reset signal, and npx is The number of pixels of the every row of image, npy are the number of pixels of image each column, and nc is convolution kernel size, and st is step-length, and pa filling is big Small, po is pond type, and wren is that weight biases useful signal, and wb is weight and biasing, and pren image inputs useful signal, p For pixel, y1~yn is multiplication unit, and xd is the first counting unit useful signal, x pixel abscissa, y pixel ordinate, ram For memory cell, cd is the second counting unit useful signal, cx convolution kernel abscissa, cy convolution kernel ordinate, and m1~m4 is Addition unit, r are last convolution results, and rl is result useful signal, and d is that convolution completes signal.
Specific implementation of the invention is as follows:
A kind of general convolution-pond synchronization process convolution kernel system, by nine processing units, 12 input ports and 3 A output port composition;Processing unit includes: weight register, image register, memory cell, multiplication unit, addition list Member, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal end of input weight biasing Mouth wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel Quantity input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input Port st, filling (padding) size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete letter Number d.
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By multiple For bit port rstn to the reset signal of system input high level, each processing unit carries out convolution-Chi Huatong under the signal designation Step processing;
System is inputted after the completion of reset by picture traverse pixel quantity input port npx, image length pixel quantity Port npy, convolution kernel size input port nc, step sizes input port st, filling (padding) size input port pa and Pond type input port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by Weight bias input end mouth wb input convolution kernel weighted value and offset value deposit weight register in, convolution kernel weight and After biasing input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through In the image pixel numerical value deposit image register that image input port p is received, each clock cycle receives the one of image A pixel, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be- 1~1, after image pixel end of input, image input useful signal becomes invalid low signal;
3) while receiving pixel, by the convolution kernel in the image pixel numerical value and weight register in image register In each weighted value be multiplied, multiplication unit to counter send indication signal xd;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length The image length and width pixel quantity that pixel quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculates Formula is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the pixel that counter counts obtain Ordinal number, the number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx and npy Value range be 0~1024, and be integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and picture The calculated result of multiplication unit is stored in memory cell by position coordinates x, y of element, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is m's Convolution kernel weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling size Numerical value, the value range of filling size pa are 0~5, and are integer;
According to the order of ranks, successively by all pixels input system of whole picture and after the completion of calculating, memory list Member sends indication signal cd to the second counting unit;
6) after the second counting unit receives indication signal cd, according to the step-length got from step sizes input port st Numerical value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, meter The position for product needed for convolution is calculated, calculation formula is as follows:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents last moment convolution kernel Abscissa, cy ' represents the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image The number of pixels of each column, pa represent filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value model of step-length st Enclosing is 1~11, and is integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, each convolution kernel institute Need product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc be 1~ 11, and be integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, obtains four calculated results, is calculated public Formula is as follows:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2It represents and is located at upper right Convolution kernel calculated result, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the meter for being located at the convolution kernel of bottom right It calculates as a result, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, b with four convolution kernels for one group For weight biasing, value range is the size that -1~1, nc is convolution kernel.
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) compare with 0 Compared with, be greater than 0 access value itself, take 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond unit, pond unit is according to the pond class inputted by pond type input port po Type carries out pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond Change;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, four results are calculated Average value exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being tied from output result useful signal port rl output Fruit useful signal;
10) the second counting unit is according to the size of image, after completing all convolution shown in calculating, to output port d Signal is completed in output, and reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture.
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, and npx representative image is every Capable number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st represents step-length Size.

Claims (1)

1. a kind of general convolution-pond synchronization process convolution kernel system, which is characterized in that the system by nine processing units, 12 input ports and 3 output port compositions;Processing unit include: weight register, image register, memory cell, Multiplication unit, addition unit, activation primitive unit, pond unit, the first counting unit and the second counting unit;
12 input ports are respectively input end of clock mouth clk, reseting port rstn, the effective signal port of input weight biasing Wren, weight bias input end mouth wb, image input useful signal port pren, image input port p, picture traverse pixel number Measure input port npx, image length pixel quantity input port npy, convolution kernel size input port nc, step sizes input terminal Mouth st, filling size input port pa and pond type input port po;
3 output ports are respectively that convolution results output port r, output result useful signal port rl and convolution complete signal d;
1) input end of clock mouth clk is used for timing with the alternating low and high level signal input system of constant time length;By reset terminal To the reset signal of system input high level, each processing unit carries out at convolution-pond synchronization under the signal designation by mouthful rstn Reason;
System passes through picture traverse pixel quantity input port npx, image length pixel quantity input port after the completion of reset Npy, convolution kernel size input port nc, step sizes input port st, filling size input port pa and the input of pond type Port po is by the parameters input system of convolution sum image;
System is biased after effective signal port wren receives the effective high RST of weight by input weight receiving, will be by weight In the convolution kernel weighted value and offset value deposit weight register of bias input end mouth wb input, convolution kernel weight and biasing After input, input weight biasing useful signal becomes invalid low signal;
2) system receives after image inputs effective high RST from image input useful signal port pren, and system will pass through image In the image pixel numerical value deposit image register that input port p is received, each clock cycle receives a picture of image Element, while the numerical value of image register being updated to the pixel number received at this time;The value range of pixel number be -1~ 1, after image pixel end of input, image input useful signal becomes invalid low signal;
It 3), will be in the convolution kernel in the image pixel numerical value and weight register in image register while receiving pixel Each weighted value is multiplied, and multiplication unit sends indication signal xd to counter;
4) after the first counting unit receives indication signal xd, according to from picture traverse pixel quantity npx and image length pixel The image length and width pixel quantity that quantity npy is obtained judges position coordinates x, y of multiplication unit pixel calculated, calculation formula It is as follows:
Wherein, x represents the abscissa of this pixel, and y represents the ordinate of this pixel, and n represents the ordinal number for the pixel that counter counts obtain, The number of pixels of the every row of npx representative image, the number of pixels of npy representative image each column, [], which represents, to be rounded;Npx's and npy takes Being worth range is 0~1024, and is integer;
Then gained location of pixels coordinate x, y are sent to memory cell;
5) memory cell according to the ordinal number of weight, from the filling size input port pa filling size obtained and pixel The calculated result of multiplication unit is stored in memory cell by position coordinates x, y, and storage mode is as follows:
Ram [m] [y+pa] [x+pa]=wm×pxy
Wherein, ram represents memory, and ram represents [] [] [] three-dimensional coordinate of memory, wmRepresentation repeated order number is the convolution of m Core weight, value range are -1~1, pxyPixel number of the abscissa as x ordinate as y is represented, pa represents filling magnitude numerical value, The value range for filling size pa is 0~5, and is integer;
According to the order of ranks, successively by all pixels input system of whole picture and calculate after the completion of, memory cell to Second counting unit sends indication signal cd;
6) after the second counting unit receives indication signal cd, according to the step-length number got from step sizes input port st Value, and the image length and width pixel quantity obtained from picture traverse pixel quantity npx and image length pixel quantity npy, calculate The position for product needed for convolution, calculation formula are as follows out:
Wherein, cx represents the abscissa of convolution kernel, and cy represents the ordinate of convolution kernel, and cx ' represents the cross of last moment convolution kernel Coordinate, cy ' represent the ordinate of last moment convolution kernel, the number of pixels of the every row of npx representative image, npy representative image each column Number of pixels, pa represents filling magnitude numerical value, and nc represents convolution kernel size, and st represents step-length;The value range of step-length st is 1 ~11, and be integer;
7) memory cell takes out all product data for adjacent four convolution kernels for being used to calculate, and multiplies needed for each convolution kernel Product number are as follows:
N=nc2
Wherein, product number, nc needed for n is represented represent convolution kernel size;The value range of convolution kernel size nc is 1~11, and For integer;
7) corresponding add operation is carried out in addition unit, while biasing is added, and obtains four calculated results, calculation formula is such as Under:
Wherein, a1Represent the calculated result for being located at the convolution kernel of upper left in four adjacent convolution kernels, a2Represent the volume for being located at upper right The calculated result of product core, a3Represent the calculated result for being located at the convolution kernel of lower-left, a4Represent the calculating knot for being located at the convolution kernel of bottom right Fruit, with four convolution kernels for one group, cx represents every group of abscissa, and cy represents every group of ordinate, and st represents step-length, and b is power It biases again, value range is the size that -1~1, nc is convolution kernel;
8) by this four results by activation primitive Relu, specific method is that will add up result ai(i=1~4) are greater than compared with 0 0 access value itself, takes 0 less than or equal to 0;
Wherein, miRepresent the calculated result by activation primitive;
9) four results are sent into pond units, pond unit according to the pond type inputted by pond type input port po into Row pond;It is respectively 0 and 1 that pond type po, which can input two values, and 0, maximum value pond is represented, 1 represents mean value pond;
If it is maximum value pond, the maximum value of four results is taken to export;If it is mean value pond, the flat of four results is calculated Mean value is exported;It is specific as follows:
Wherein, r represents pond result;
Pond result is exported from convolution results output port r, while being had from output result useful signal port rl output result Imitate signal;
10) the second counting unit exports after calculating all convolution of display completion to output port d according to the size of image Signal is completed, reset signal switchs to low level, terminates the convolution of a picture, prepares to receive next picture;
The method that judgement processing is completed is as follows:
And
Wherein, cx represents the abscissa of every group of convolution kernel, and cy represents the ordinate of every group of convolution kernel, the every row of npx representative image Number of pixels, npy represent the line number that pixel has, and pa represents filling size, and nc represents convolution kernel size, and st ride instead of walk is grown up It is small.
CN201910268608.5A 2019-03-08 2019-04-04 Universal convolution-pooling synchronous processing convolution kernel system Active CN109978161B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910177153 2019-03-08
CN2019101771536 2019-03-08

Publications (2)

Publication Number Publication Date
CN109978161A true CN109978161A (en) 2019-07-05
CN109978161B CN109978161B (en) 2022-03-04

Family

ID=67082846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910268608.5A Active CN109978161B (en) 2019-03-08 2019-04-04 Universal convolution-pooling synchronous processing convolution kernel system

Country Status (1)

Country Link
CN (1) CN109978161B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905686A (en) * 2012-12-24 2014-07-02 三星电子株式会社 Image scanning apparatus, image compensation method and computer-readable recording medium
CN107894189A (en) * 2017-10-31 2018-04-10 北京艾克利特光电科技有限公司 A kind of EOTS and its method for automatic tracking of target point automatic tracing
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA
IN201811023855A (en) * 2018-06-26 2018-07-13 Hcl Technologies Ltd
CN108764467A (en) * 2018-04-04 2018-11-06 北京大学深圳研究生院 For convolutional neural networks convolution algorithm and full connection computing circuit
CN108804973A (en) * 2017-04-27 2018-11-13 上海鲲云信息科技有限公司 The hardware structure and its execution method of algorithm of target detection based on deep learning
CN108848326A (en) * 2018-06-13 2018-11-20 吉林大学 A kind of high dynamic range MCP detector front end reading circuit and its reading method
CN109284824A (en) * 2018-09-04 2019-01-29 复旦大学 A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies
CN109416756A (en) * 2018-01-15 2019-03-01 深圳鲲云信息科技有限公司 Acoustic convolver and its applied artificial intelligence process device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905686A (en) * 2012-12-24 2014-07-02 三星电子株式会社 Image scanning apparatus, image compensation method and computer-readable recording medium
CN108804973A (en) * 2017-04-27 2018-11-13 上海鲲云信息科技有限公司 The hardware structure and its execution method of algorithm of target detection based on deep learning
CN107894189A (en) * 2017-10-31 2018-04-10 北京艾克利特光电科技有限公司 A kind of EOTS and its method for automatic tracking of target point automatic tracing
CN108154229A (en) * 2018-01-10 2018-06-12 西安电子科技大学 Accelerate the image processing method of convolutional neural networks frame based on FPGA
CN109416756A (en) * 2018-01-15 2019-03-01 深圳鲲云信息科技有限公司 Acoustic convolver and its applied artificial intelligence process device
CN108764467A (en) * 2018-04-04 2018-11-06 北京大学深圳研究生院 For convolutional neural networks convolution algorithm and full connection computing circuit
CN108848326A (en) * 2018-06-13 2018-11-20 吉林大学 A kind of high dynamic range MCP detector front end reading circuit and its reading method
IN201811023855A (en) * 2018-06-26 2018-07-13 Hcl Technologies Ltd
CN109284824A (en) * 2018-09-04 2019-01-29 复旦大学 A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
FRANCESCO CONTI等: "A Ultra-Low-Energy Convolution Engine for Fast Brain-Inspired Vision in Multicore Clusters", 《DOI:10.7873/DATE.2015.0404》 *
LI DU等: "A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things", 《DOI:10.1109/TCSI.2017.2735490》 *
MOHAMMED ALAWAD等: "Stochastic-Based Deep Convolutional Networks with Reconfigurable Logic Fabric", 《IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS》 *
俞健: "多核DSP图像处理系统的硬件设计", 《中国优秀硕士学位论文全文数据库_信息科技辑》 *
姬梦飞等: "一种低功耗数据转换系统的设计", 《集成电路应用》 *
邱宇: "基于FPGA的Alexnet前向网络加速", 《中国优秀硕士学位论文全文数据库_信息科技辑》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11487288B2 (en) 2017-03-23 2022-11-01 Tesla, Inc. Data synthesis for autonomous control systems
US11403069B2 (en) 2017-07-24 2022-08-02 Tesla, Inc. Accelerated mathematical engine
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11681649B2 (en) 2017-07-24 2023-06-20 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11797304B2 (en) 2018-02-01 2023-10-24 Tesla, Inc. Instruction set architecture for a vector computational unit
US11734562B2 (en) 2018-06-20 2023-08-22 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11841434B2 (en) 2018-07-20 2023-12-12 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
US11893774B2 (en) 2018-10-11 2024-02-06 Tesla, Inc. Systems and methods for training machine models with augmented data
US11665108B2 (en) 2018-10-25 2023-05-30 Tesla, Inc. QoS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11908171B2 (en) 2018-12-04 2024-02-20 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US11748620B2 (en) 2019-02-01 2023-09-05 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US11790664B2 (en) 2019-02-19 2023-10-17 Tesla, Inc. Estimating object properties using visual image data

Also Published As

Publication number Publication date
CN109978161B (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN109978161A (en) A kind of general convolution-pond synchronization process convolution kernel system
CN109993297A (en) A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN111062472A (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN107239823A (en) A kind of apparatus and method for realizing sparse neural network
CN101706741B (en) Method for partitioning dynamic tasks of CPU and GPU based on load balance
CN105528191B (en) Data accumulation apparatus and method, and digital signal processing device
CN109063825A (en) Convolutional neural networks accelerator
CN109086244A (en) Matrix convolution vectorization implementation method based on vector processor
CN110119809A (en) The asymmetric quantization of multiplication and accumulating operation in deep learning processing
CN111242289A (en) Convolutional neural network acceleration system and method with expandable scale
CN110348574A (en) A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ
CN112465110B (en) Hardware accelerator for convolution neural network calculation optimization
CN111445012A (en) FPGA-based packet convolution hardware accelerator and method thereof
CN107886167A (en) Neural network computing device and method
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN110390385A (en) A kind of general convolutional neural networks accelerator of configurable parallel based on BNRP
CN106951395A (en) Towards the parallel convolution operations method and device of compression convolutional neural networks
CN110210610A (en) Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment
CN109740739A (en) Neural computing device, neural computing method and Related product
CN110263925A (en) A kind of hardware-accelerated realization framework of the convolutional neural networks forward prediction based on FPGA
CN109191364A (en) Accelerate the hardware structure of artificial intelligence process device
CN112200300B (en) Convolutional neural network operation method and device
CN109901814A (en) Customized floating number and its calculation method and hardware configuration
CN108764467A (en) For convolutional neural networks convolution algorithm and full connection computing circuit
CN106959937A (en) A kind of vectorization implementation method of warp product matrix towards GPDSP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant