CN106250103A - A kind of convolutional neural networks cyclic convolution calculates the system of data reusing - Google Patents

A kind of convolutional neural networks cyclic convolution calculates the system of data reusing Download PDF

Info

Publication number
CN106250103A
CN106250103A CN201610633040.9A CN201610633040A CN106250103A CN 106250103 A CN106250103 A CN 106250103A CN 201610633040 A CN201610633040 A CN 201610633040A CN 106250103 A CN106250103 A CN 106250103A
Authority
CN
China
Prior art keywords
data
array
convolution
module
reusing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610633040.9A
Other languages
Chinese (zh)
Inventor
刘波
朱智洋
陈壮
阮星
龚宇
曹鹏
杨军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201610633040.9A priority Critical patent/CN106250103A/en
Publication of CN106250103A publication Critical patent/CN106250103A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of convolutional neural networks cyclic convolution towards coarseness reconfigurable system and calculate the system of data reusing, including master controller and link control module, input data reusing module, convolution loop calculation process array, data transmission path four part.During convolution loop computing, it is in the nature multiple two dimension input data matrix the biggest with multiple two dimension modulus matrix multiples, typically these matrix sizes, is multiplied and occupies the most of the time of whole convolutional calculation.The present invention utilizes coarse-grained reconfigurable array system to complete convolutional calculation process, after receiving convolution algorithm request instruction, the mode utilizing depositor round is fully excavated convolution loop and is calculated the input data reusability of process, improve data user rate and reduce bandwidth memory access pressure, and designed array element is configurable, convolution algorithm when different cyclic convolution scale and step-length can be completed.

Description

A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
Technical field
The present invention relates to imbedded reconfigurable design field, a kind of convolution god towards coarseness reconfigurable system Calculate the system of data reusing through network cyclic convolution, can be used for high-performance reconfigurable system, it is achieved convolutional neural networks is carried out Big number of cycles convolution algorithm, uses data with existing as far as possible, is reused data, improves arithmetic speed, reduces digital independent Bandwidth pressure.
Background technology
Reconfigurable processor architecture is a kind of preferably application acceleration platform, owing to hardware configuration can be according to program Data flow diagram reorganize, reconfigurable arrays has been demonstrated that it has good performance for scientific algorithm or multimedia application Improvement.
Convolution algorithm has purposes widely in image processing field, such as in image filtering, image enhaucament, graphical analysis Will use convolution algorithm Deng when processing, image convolution computing essence is a kind of matrix operations, is characterized in that operand is big, and Data-reusing rate is high, is extremely difficult to the requirement of real-time by computed in software image convolution.
Convolutional neural networks is as a kind of feedforward compensator, it is possible to automatically learn there being label data in a large number And therefrom extract complex characteristic, the advantage of convolutional neural networks is to have only to input picture is carried out less pretreatment with regard to energy Enough from pixel image, identify visual pattern, and to there being more diverse identification object also to have preferable recognition effect, with Time convolutional neural networks identification ability be not easily susceptible to the distortion of image or the impact of simple geometry conversion.The most refreshing as multilamellar Through an important directions of network research, the focus of convolutional neural networks always research for many years.
Convolution mask is placed on the upper left corner of image lattice, then convolution mask must be with the segmentation in the upper left corner in image lattice Matrix overlaps.Their coincidence item correspondence is multiplied, the most all sues for peace, just obtained first result points.Then, then will Convolution mask moves to right string, can obtain second result points.The most so, convolution mask travels through one time in image lattice, The convolution of a two field picture can be obtained the most completely.The reusability of data is the highest, but the caching of traditional approach or direct from outside Directly read, owing to being limited by digital independent bandwidth, and there is no configurable arrays, complete multilamellar convolution loop computing, Inefficient.
Summary of the invention
Goal of the invention: for problems of the prior art with not enough, the present invention provides a kind of and can weigh towards coarseness The convolutional neural networks cyclic convolution of construction system calculates the system of data reusing, can accelerate wanting of big quantity convolutional calculation Ask, reduce the pressure to broadband, and convolution algorithm array is configurable.The calculated performance of convolutional neural networks provides with hardware Taking of source is convolutional neural networks needs two aspects trading off in coarseness reconfigurable architecture realizes, based on can The design object of the convolutional neural networks of reconstruction processing array is to meet on the premise of application performance requires, making full use of and can weigh The calculating resource that structure array provides and storage resource, utilize input image data to reuse structure, utilize in cyclic convolution computing Height reuses rate, in addition the configurability of coarse-grained reconfigurable array, in digital independent bandwidth, in the case of calculating resource limit, Complete convolutional calculation, reach one the most compromise.
Technical scheme: a kind of convolutional neural networks cyclic convolution towards coarseness reconfigurable system calculates data reusing System, passes including master controller and link control module, input data reusing module, convolution loop calculation process array and data Transmission path.
Described master controller and link control module, complete the reception of extraneous convolution algorithm request, computing array configuration letter Breath loads, and result of calculation returns and monitoring to circular flow state, control external memory storage and input data reusing module it Between data transmission.
Described input data reusing module, be connect outer input data memorizer and cyclic convolution calculation process array it Between data reusing module, complete input data reusing, wherein module top half is image array width quantity FIFO, lower half Part is image array width quantity shift register.FIFO constantly loads input data from extraneous memorizer, respectively correspondence volume The long-pending string calculated, when shift register moves according to convolution step-length, FIFO is that wherein string changed by shift register, the completeest Become a convolution algorithm, reach the effect of data reusing.Shift register is used for utilizing top half FIFO part to provide and updates Adjacent region data.Owing to multiple shift registers use annular addressing mode, the data from FIFO will always replace annular shifting Data the oldest in bit register, are transferred to computing array data afterwards and complete convolution algorithm.
This module realizes specifically comprising the following steps that
Data once input S (1≤S < maximum image matrix width), and individual 32 bit data are to FIFO, when convolution algorithm was used Data in one depositor, FIFO will be transferred to shift register the data of oneself, and shift register need to update string K (1 ≤ K < maximum image matrix width, K is this convolutional calculation convolution kernel matrix width) individual 32 bit data, add that original K-1 arranges Data, shift register is transferred to convolutional calculation matrix K*K data, continues afterwards to move according to step-length backward, same String need to be updated, it is achieved enter to input data reusing.
Described cyclic convolution calculation process array, obtains required input data from input data reusing module, completes volume Long-pending calculating, and the function after having calculated, data sent.
Described data transmission path, has been master controller and interface control module, and cyclic convolution calculation process array is defeated Enter the data transmission channel between data reusing module.
Further, master controller and link control module include main control and connect controller, connect controller and prefetch Judge and data reusing configuration control action, prefetch judge should for judging convolution algorithm to be carried out time required data the most accurate Standby in place, if data are in place, cyclic convolution calculation process array performs convolution loop and calculates, if it did not, that waits for number According in place.Data in caching are read by external memory storage, and the present invention uses direct memory access mode to read, when needing When wanting external data to input, master controller sends to outside memory read data order, and master controller is not the most to storage afterwards Reading is controlled, connect controller can send out one stop signal to master controller, master controller is abandoned address bus, data Bus and the right to use about control bus, when the data of input data reusing module need to update, just by connecting controller, Directly read the data in external memory.
Cyclic convolution calculation process array include array configure module, including array configuration module, storage processing unit and Calculation processing unit, this module application is when matched data reuse module, according to convolutional calculation scale and step-length, array configuration mould Computing array is configured by block, utilize array can calculating resource, has calculated every time and has reconfigured array the most afterwards, counted Calculate processing unit to be adjusted according to calculating scale, carry out convolution algorithm next time.
Described convolution algorithm processes array Configuration Control Unit, after interface control module loads configuration information, and computing battle array Arranging the size according to cyclic convolution circulation scale and step information, can make convolved image matrix size variable is to maximum from 1 Exploitation between image array width, computing array can be reconfigured by convolution algorithm each time, and convolution kernel is advised When mould is less, convolution array is also available with whole convolutional calculation matrix, shortens the total duration of convolutional calculation with this.
Storage computing unit structure storage instruction and data reusing module tight association, it is in the driving of loop control parts Under, from address queue, take address or be directly calculated address by address generating unit, sending reading to data reusing module Request of data, returns data and writes in data queue, under the control of loop ends parts, and data in read shift register.
Calculation processing unit realizes the calculating in data flow process and selects function, and circulation subscript is constantly from depositor Obtaining data in group, and pass the data to calculation processing unit array, calculation processing unit array closes according to fixing connection System carries out computing, and the result of computing stores the position specified.
The application of cyclic convolution calculation process array continues pile line operation, and this operation cyclic mapping configures module to array, Array configuration module configures the initial value of loop control variable, final value and step value, and the execution of cyclic program need not outside control System, constitutes streamline link, completes cyclic convolution scheduling on streamline between each computing array unit.
Accompanying drawing explanation
Fig. 1 is the coarse-grained reconfigurable array system assumption diagram of convolutional calculation in the embodiment of the present invention;
Fig. 2 is input data reusing module data robin scheduling hardware structure diagram in the embodiment of the present invention;
Fig. 3 is the structured flowchart of storage processing unit in coarseness restructural convolutional calculation array in the embodiment of the present invention;
Fig. 4 is the structured flowchart of coarseness restructural convolutional calculation array computation processing unit in the embodiment of the present invention;
Fig. 5 be in the embodiment of the present invention cyclic convolution at the existing flow chart of reconfigurable arrays interior-excess.
Detailed description of the invention
Below in conjunction with specific embodiment, it is further elucidated with the present invention, it should be understood that these embodiments are merely to illustrate the present invention Rather than restriction the scope of the present invention, after having read the present invention, the those skilled in the art's various equivalences to the present invention The amendment of form all falls within the application claims limited range.
Convolutional neural networks cyclic convolution towards coarseness reconfigurable system calculates the system of data reusing, including master control Device processed and link control module, input data reusing module, convolution loop calculation process array and data transmission path.
Master controller and link control module, complete the reception of extraneous convolution algorithm request, and computing array configuration information adds Carry, result of calculation return and the monitoring to circular flow state, control number between external memory storage and input data reusing module According to transmission.
Input data reusing module, is to connect between outer input data memorizer and cyclic convolution calculation process array Data reusing module, wherein module top half is image array width quantity FIFO, and the latter half is image array width number Amount shift register.
Cyclic convolution calculation process array, obtains required input data from input data reusing module, completes convolution meter Calculate, and the function after having calculated, data sent.
Data transmission path, has been master controller and interface control module, cyclic convolution calculation process array, inputs number According to the data transmission channel between reuse module.
Master controller and link control module include main control and connect controller, connect controller and prefetch judgement and number According to reusing configuration control action, prefetch judge should for judging convolution algorithm to be carried out time required data whether prepare in place, If data are in place, cyclic convolution calculation process array performs convolution loop and calculates, if it did not, that to wait for data in place. Data in caching are read by external memory storage, and the present invention uses direct memory access mode to read, when needs outside During data input, master controller sends to outside memory read data order, afterwards master controller the most storage is not read into Row control, connect controller can send out one stop signal to master controller, master controller abandon to address bus, data/address bus and About controlling the right to use of bus, when the data of input data reusing module need to update, just by connecting controller, directly read Take the data in external memory.
As it is shown in figure 1, concrete computing array figure and the coarse-grained reconfigurable array figure of data stream.Configurable PE unit accounts for According to main part, also in that reconfigurable arrays has been the concrete part of convolutional calculation, remainder primarily to The instruction transmission started and terminate is come in.As seen in Figure 1, in configurable arrays, storage processing unit is directly connected to defeated Entering data reusing module (such as Fig. 2), according to step-length and convolution kernel size values, input data reusing module is by needed for convolution algorithm Data stream transmitting is to calculation processing unit, and router configuration data stream arrives each calculating by internet route and processes single Unit, is simultaneously connected with controller and undertakes a convolutional calculation and complete, spread out of by data message, and calculation processing unit is joined again Put, start the newest computing.
The data robin scheduling hardware chart of input data reusing module is as in figure 2 it is shown, (K is volume with convolution kernel size as K*K Long-pending core width) as a example by, between external memory storage and shift register, adding FIFO, data once input S 32 bit data To FIFO, data in convolution algorithm used a depositor, FIFO will be transferred to shift register the data of oneself, moves Bit register need to update string K 32 bit data, adds original K-1 column data, and shift register is transferred to volume K*K data Long-pending calculating matrix, such input image data is reused structure, is provided support for high efficiency convolution algorithm.
As it is shown on figure 3, correspondence is the structured flowchart of storage processing unit, when input channel receives address signal, Now the most corresponding storage processing unit position in an array, these storage processing unit complete the life of the address of corresponding data Become, generate address and corresponding will can use the data in input image data reuse module, now data are exported to calculating Processing unit.The generation of loop control operational data corresponding address, and the end of convolution algorithm, synchronize computed information It is transferred in external memory storage.And cycle criterion structure data not to or not enough time, terminate current operation, information passed to External memory storage, carries out data renewal.
As shown in Figure 4, corresponding is the structure chart of calculation processing unit, and calculation processing unit is receiving input data Time, application internal multiplier and adder complete convolution algorithm, complete once-through operation, according to Configuration Control Unit, reconfigure fortune Calculation processing unit required for calculation, completes configurable control, when outer loop size, during step-length conversion, is still able to very well Complete computing.
In conjunction with Fig. 1, Fig. 2, the concrete steps that convolution loop calculates are as it is shown in figure 5, comprise the steps:
1) if needing coarse-grained reconfigurable array system to complete a large amount of convolution algorithm, first have to this convolution control volume System sends request, when primary processor receives request, will send instruction to connecting processing unit;
2) connect processing unit and first determine whether that in input data reusing module, desired data right and wrong are the most in place, without Waiting signal will be sent, with directmemoryaccess, buffer is carried out data transmission simultaneously;
3) after data are the most continuous, the operational order that notice is waiting, control circulation and start, convolution loop calculation process battle array Configuring control unit in row to configure array, the memory access configuration module in computing array will calculate position residing for number play Putting, computing array carries out convolutional calculation to the data of this position afterwards, and flowing water is carried out the most rearwards.
4) Y (maximum image matrix width) individual FIFO caching by directly storage reading manner continuous renewal depositor in With crossing data, when entering back into this position, data complete to update, uninterruptedly carry out computing, arrive without each convolution algorithm External memory goes to access data.
5) connect controller control circulation to complete, when calculating completes, final data is exported in external memory storage, specifically Convolution algorithm array completes.
When specifically carrying out big number of cycles convolution algorithm, when computation resources are limited, the method for application data reusing, add Upper configurable reconfigurable arrays, streamline completes convolution algorithm, and we improve operation efficiency and speed.It is provided with having a competition Test, respectively contrast verification system A, contrast verification system B.Wherein, contrast verification system A, the most traditional does not supports that array is joined The reconfigurable system put and reuse.Contrast verification system B, support data pre-fetching the most proposed by the invention and the restructural reused System.Choosing the input data matrix of 16x16, the convolution matrix of 3x3, step-length is 1, is provided with 10 input data, 10 volumes Long-pending weight matrix, carries out convolution algorithm simultaneously.Test result indicate that, contrast verification system B can obtain contrast verification system A The performance boost of average 1.76 times.

Claims (5)

1. the convolutional neural networks cyclic convolution towards coarseness reconfigurable system calculates a system for data reusing, its feature It is: include that master controller and link control module, input data reusing module, convolution loop calculation process array and data pass Transmission path;
Described master controller and link control module, complete the reception of extraneous convolution algorithm request, and computing array configuration information adds Carry, result of calculation return and the monitoring to circular flow state, control number between external memory storage and input data reusing module According to transmission;
Described input data reusing module, is to connect between outer input data memorizer and cyclic convolution calculation process array Data reusing module, wherein module top half is image array width quantity FIFO, and the latter half is image array width number Amount shift register;
Described cyclic convolution calculation process array, obtains required input data from input data reusing module, completes convolution meter Calculate, and the function after having calculated, data sent.
2. data transmission path described in, has been master controller and interface control module, cyclic convolution calculation process array, input Data transmission channel between data reusing module.
3. as claimed in claim 1 towards the convolutional neural networks cyclic convolution calculating data reusing of coarseness reconfigurable system System, it is characterised in that: master controller and link control module include main control and connect controller, connect controller have pre- Take judgement and data reusing configuration control action, prefetch judge should for judging convolution algorithm to be carried out time required data whether Preparing in place, if data are in place, cyclic convolution calculation process array performs convolution loop and calculates, if it did not, that waits for Data are in place;Data in caching are read by external memory storage, use direct memory access mode to read, when outside needs During portion's data input, master controller sends to outside memory read data order, and storage is not read by master controller afterwards Be controlled, connect controller can send out one stop signal to master controller, master controller is abandoned address bus, data/address bus With about control bus the right to use, input data reusing module data need update time, just by connection controller, directly Read the data in external memory.
4. as claimed in claim 1 towards the convolutional neural networks cyclic convolution calculating data reusing of coarseness reconfigurable system System, it is characterised in that: cyclic convolution calculation process array include array configure module, including array configure module, storage Processing unit and calculation processing unit, this module application when coupling input data reusing module, according to convolutional calculation scale and Step-length, computing array is configured by array configuration module, utilize array can calculating resource, calculated once every time after Reconfiguring array, calculation processing unit is adjusted according to calculating scale, carries out convolution algorithm next time;Cyclic convolution computing Processing array application and continue pile line operation, this operation cyclic mapping to array configures module, and array configuration module configures to be followed The initial value of ring control variable, final value and step value, the execution of cyclic program need not external control, each computing array unit it Between constitute streamline link, complete cyclic convolution scheduling on streamline.
5. as claimed in claim 1 towards the convolutional neural networks cyclic convolution calculating data reusing of coarseness reconfigurable system System, it is characterised in that: described input data reusing module realize specifically comprise the following steps that
Data once input S 32 bit data to FIFO, data in convolution algorithm used a depositor, FIFO will from Oneself data are transferred to shift register, and shift register need to update string K 32 bit data, adds original K-1 column data, moves Bit register is transferred to convolutional calculation matrix K*K data, continues afterwards to move according to step-length backward, and same need to update one Row, it is achieved enter to input data reusing.
CN201610633040.9A 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing Pending CN106250103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610633040.9A CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610633040.9A CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Publications (1)

Publication Number Publication Date
CN106250103A true CN106250103A (en) 2016-12-21

Family

ID=58079364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610633040.9A Pending CN106250103A (en) 2016-08-04 2016-08-04 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing

Country Status (1)

Country Link
CN (1) CN106250103A (en)

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution algorithm chip and communication equipment
CN107103754A (en) * 2017-05-10 2017-08-29 华南师范大学 A kind of road traffic condition Forecasting Methodology and system
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107590085A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107635138A (en) * 2017-10-19 2018-01-26 珠海格力电器股份有限公司 Image processing apparatus
CN107832262A (en) * 2017-10-19 2018-03-23 珠海格力电器股份有限公司 Convolution algorithm method and device
CN107862650A (en) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 The method of speed-up computation two dimensional image CNN convolution
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108182471A (en) * 2018-01-24 2018-06-19 上海岳芯电子科技有限公司 A kind of convolutional neural networks reasoning accelerator and method
CN108198125A (en) * 2017-12-29 2018-06-22 深圳云天励飞技术有限公司 A kind of image processing method and device
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A kind of restructural neural network accelerated method and framework
WO2018137177A1 (en) * 2017-01-25 2018-08-02 北京大学 Method for convolution operation based on nor flash array
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
CN108595379A (en) * 2018-05-08 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108596331A (en) * 2018-04-16 2018-09-28 浙江大学 A kind of optimization method of cell neural network hardware structure
CN108665063A (en) * 2018-05-18 2018-10-16 南京大学 Two-way simultaneous for BNN hardware accelerators handles convolution acceleration system
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN108717571A (en) * 2018-06-01 2018-10-30 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence
CN108764182A (en) * 2018-06-01 2018-11-06 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence of optimization
WO2018232615A1 (en) * 2017-06-21 2018-12-27 华为技术有限公司 Signal processing method and device
CN109272112A (en) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 A kind of data reusing command mappings method, system and device towards neural network
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A kind of matrix convolution computing module and matrix convolution calculation method
CN109375952A (en) * 2018-09-29 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN109460813A (en) * 2018-09-10 2019-03-12 中国科学院深圳先进技术研究院 Accelerated method, device, equipment and the storage medium that convolutional neural networks calculate
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 Convolutional neural networks module based on FPGA
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
CN109816093A (en) * 2018-12-17 2019-05-28 北京理工大学 A kind of one-way convolution implementation method
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A kind of data method for carrying, Related product and computer storage medium
CN110069444A (en) * 2019-06-03 2019-07-30 南京宁麒智能计算芯片研究院有限公司 A kind of computing unit, array, module, hardware system and implementation method
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 The multi-functional unit for programmable hardware node for Processing with Neural Network
CN110377874A (en) * 2019-07-23 2019-10-25 江苏鼎速网络科技有限公司 Convolution algorithm method and system
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN110413561A (en) * 2018-04-28 2019-11-05 北京中科寒武纪科技有限公司 Data accelerate processing system
WO2019231254A1 (en) * 2018-05-30 2019-12-05 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
WO2020051751A1 (en) * 2018-09-10 2020-03-19 中国科学院深圳先进技术研究院 Convolution neural network computing acceleration method and apparatus, device, and storage medium
CN111045958A (en) * 2018-10-11 2020-04-21 展讯通信(上海)有限公司 Acceleration engine and processor
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN111095242A (en) * 2017-07-24 2020-05-01 特斯拉公司 Vector calculation unit
CN111176727A (en) * 2017-07-20 2020-05-19 上海寒武纪信息科技有限公司 Computing device and computing method
CN111291880A (en) * 2017-10-30 2020-06-16 上海寒武纪信息科技有限公司 Computing device and computing method
CN111465924A (en) * 2017-12-12 2020-07-28 特斯拉公司 System and method for converting matrix input to vectorized input for a matrix processor
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
CN111523642A (en) * 2020-04-10 2020-08-11 厦门星宸科技有限公司 Data reuse method, operation method and device and chip for convolution operation
CN109800867B (en) * 2018-12-17 2020-09-29 北京理工大学 Data calling method based on FPGA off-chip memory
CN111859797A (en) * 2020-07-14 2020-10-30 Oppo广东移动通信有限公司 Data processing method and device and storage medium
WO2021007037A1 (en) * 2019-07-09 2021-01-14 MemryX Inc. Matrix data reuse techniques in processing systems
US10928456B2 (en) 2017-08-17 2021-02-23 Samsung Electronics Co., Ltd. Method and apparatus for estimating state of battery
CN112992248A (en) * 2021-03-12 2021-06-18 西安交通大学深圳研究院 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
WO2022179075A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing method and apparatus, computer device and storage medium
US11694074B2 (en) 2018-09-07 2023-07-04 Samsung Electronics Co., Ltd. Integrated circuit that extracts data, neural network processor including the integrated circuit, and neural network device
CN116842307A (en) * 2023-08-28 2023-10-03 腾讯科技(深圳)有限公司 Data processing method, device, equipment, chip and storage medium
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090927A1 (en) * 2000-05-19 2001-11-29 Philipson Lars H G Method and device in a convolution process
CN102208005A (en) * 2011-05-30 2011-10-05 华中科技大学 2-dimensional (2-D) convolver
CN104077233A (en) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 Single-channel convolution layer and multi-channel convolution layer handling method and device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090927A1 (en) * 2000-05-19 2001-11-29 Philipson Lars H G Method and device in a convolution process
CN102208005A (en) * 2011-05-30 2011-10-05 华中科技大学 2-dimensional (2-D) convolver
CN104077233A (en) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 Single-channel convolution layer and multi-channel convolution layer handling method and device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
窦勇等: "支持循环自动流水线的粗粒度可重构阵列体系结构", 《中国科学E辑:信息科学》 *
陆志坚: "基于FPGA的卷积神经网络并行结构研究", 《中国博士学位论文全文数据库,信息科技辑》 *

Cited By (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844294B (en) * 2016-12-29 2019-05-03 华为机器有限公司 Convolution algorithm chip and communication equipment
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution algorithm chip and communication equipment
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
WO2018137177A1 (en) * 2017-01-25 2018-08-02 北京大学 Method for convolution operation based on nor flash array
US11309026B2 (en) 2017-01-25 2022-04-19 Peking University Convolution operation method based on NOR flash array
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 The multi-functional unit for programmable hardware node for Processing with Neural Network
CN110383237A (en) * 2017-02-28 2019-10-25 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
US11663450B2 (en) 2017-02-28 2023-05-30 Microsoft Technology Licensing, Llc Neural network processing with chained instructions
CN110383237B (en) * 2017-02-28 2023-05-26 德克萨斯仪器股份有限公司 Reconfigurable matrix multiplier system and method
CN110325963B (en) * 2017-02-28 2023-05-23 微软技术许可有限责任公司 Multifunctional unit for programmable hardware nodes for neural network processing
CN107229598A (en) * 2017-04-21 2017-10-03 东南大学 A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN107103754A (en) * 2017-05-10 2017-08-29 华南师范大学 A kind of road traffic condition Forecasting Methodology and system
WO2018232615A1 (en) * 2017-06-21 2018-12-27 华为技术有限公司 Signal processing method and device
CN111176727A (en) * 2017-07-20 2020-05-19 上海寒武纪信息科技有限公司 Computing device and computing method
CN111176727B (en) * 2017-07-20 2022-05-31 上海寒武纪信息科技有限公司 Computing device and computing method
CN111221578A (en) * 2017-07-20 2020-06-02 上海寒武纪信息科技有限公司 Computing device and computing method
CN111221578B (en) * 2017-07-20 2022-07-15 上海寒武纪信息科技有限公司 Computing device and computing method
CN111095242A (en) * 2017-07-24 2020-05-01 特斯拉公司 Vector calculation unit
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
CN111095242B (en) * 2017-07-24 2024-03-22 特斯拉公司 Vector calculation unit
US10928456B2 (en) 2017-08-17 2021-02-23 Samsung Electronics Co., Ltd. Method and apparatus for estimating state of battery
CN107590085B (en) * 2017-08-18 2018-05-29 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107590085A (en) * 2017-08-18 2018-01-16 浙江大学 A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN107832262A (en) * 2017-10-19 2018-03-23 珠海格力电器股份有限公司 Convolution algorithm method and device
CN107635138A (en) * 2017-10-19 2018-01-26 珠海格力电器股份有限公司 Image processing apparatus
CN111291880A (en) * 2017-10-30 2020-06-16 上海寒武纪信息科技有限公司 Computing device and computing method
CN111291880B (en) * 2017-10-30 2024-05-14 上海寒武纪信息科技有限公司 Computing device and computing method
US11537857B2 (en) 2017-11-01 2022-12-27 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
US11734554B2 (en) 2017-11-01 2023-08-22 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
CN107862650B (en) * 2017-11-29 2021-07-06 中科亿海微电子科技(苏州)有限公司 Method for accelerating calculation of CNN convolution of two-dimensional image
CN107862650A (en) * 2017-11-29 2018-03-30 中科亿海微电子科技(苏州)有限公司 The method of speed-up computation two dimensional image CNN convolution
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN111465924A (en) * 2017-12-12 2020-07-28 特斯拉公司 System and method for converting matrix input to vectorized input for a matrix processor
CN111465924B (en) * 2017-12-12 2023-11-17 特斯拉公司 System and method for converting matrix input into vectorized input for matrix processor
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108198125B (en) * 2017-12-29 2021-10-08 深圳云天励飞技术有限公司 Image processing method and device
CN109992541A (en) * 2017-12-29 2019-07-09 深圳云天励飞技术有限公司 A kind of data method for carrying, Related product and computer storage medium
CN108198125A (en) * 2017-12-29 2018-06-22 深圳云天励飞技术有限公司 A kind of image processing method and device
CN108182471A (en) * 2018-01-24 2018-06-19 上海岳芯电子科技有限公司 A kind of convolutional neural networks reasoning accelerator and method
CN108182471B (en) * 2018-01-24 2022-02-15 上海岳芯电子科技有限公司 Convolutional neural network reasoning accelerator and method
CN108241890B (en) * 2018-01-29 2021-11-23 清华大学 Reconfigurable neural network acceleration method and architecture
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A kind of restructural neural network accelerated method and framework
CN108596331A (en) * 2018-04-16 2018-09-28 浙江大学 A kind of optimization method of cell neural network hardware structure
CN108564524A (en) * 2018-04-24 2018-09-21 开放智能机器(上海)有限公司 A kind of convolutional calculation optimization method of visual pattern
CN110413561A (en) * 2018-04-28 2019-11-05 北京中科寒武纪科技有限公司 Data accelerate processing system
CN110413561B (en) * 2018-04-28 2021-03-30 中科寒武纪科技股份有限公司 Data acceleration processing system
CN108595379A (en) * 2018-05-08 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108665063B (en) * 2018-05-18 2022-03-18 南京大学 Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
CN108665063A (en) * 2018-05-18 2018-10-16 南京大学 Two-way simultaneous for BNN hardware accelerators handles convolution acceleration system
WO2019231254A1 (en) * 2018-05-30 2019-12-05 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
US11244027B2 (en) 2018-05-30 2022-02-08 Samsung Electronics Co., Ltd. Processor, electronics apparatus and control method thereof
CN108717571B (en) * 2018-06-01 2020-09-15 阿依瓦(北京)技术有限公司 Acceleration method and device for artificial intelligence
CN108764182B (en) * 2018-06-01 2020-12-08 阿依瓦(北京)技术有限公司 Optimized acceleration method and device for artificial intelligence
CN108717571A (en) * 2018-06-01 2018-10-30 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence
CN108764182A (en) * 2018-06-01 2018-11-06 阿依瓦(北京)技术有限公司 A kind of acceleration method and device for artificial intelligence of optimization
CN109272112B (en) * 2018-07-03 2021-08-27 北京中科睿芯科技集团有限公司 Data reuse instruction mapping method, system and device for neural network
CN109272112A (en) * 2018-07-03 2019-01-25 北京中科睿芯科技有限公司 A kind of data reusing command mappings method, system and device towards neural network
CN108681984A (en) * 2018-07-26 2018-10-19 珠海市微半导体有限公司 A kind of accelerating circuit of 3*3 convolution algorithms
CN108681984B (en) * 2018-07-26 2023-08-15 珠海一微半导体股份有限公司 Acceleration circuit of 3*3 convolution algorithm
US11694074B2 (en) 2018-09-07 2023-07-04 Samsung Electronics Co., Ltd. Integrated circuit that extracts data, neural network processor including the integrated circuit, and neural network device
CN109460813A (en) * 2018-09-10 2019-03-12 中国科学院深圳先进技术研究院 Accelerated method, device, equipment and the storage medium that convolutional neural networks calculate
WO2020051751A1 (en) * 2018-09-10 2020-03-19 中国科学院深圳先进技术研究院 Convolution neural network computing acceleration method and apparatus, device, and storage medium
CN109284475A (en) * 2018-09-20 2019-01-29 郑州云海信息技术有限公司 A kind of matrix convolution computing module and matrix convolution calculation method
CN109284475B (en) * 2018-09-20 2021-10-29 郑州云海信息技术有限公司 Matrix convolution calculating device and matrix convolution calculating method
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines
CN109375952B (en) * 2018-09-29 2021-01-26 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN109375952A (en) * 2018-09-29 2019-02-22 北京字节跳动网络技术有限公司 Method and apparatus for storing data
CN111045958A (en) * 2018-10-11 2020-04-21 展讯通信(上海)有限公司 Acceleration engine and processor
CN111045958B (en) * 2018-10-11 2022-09-16 展讯通信(上海)有限公司 Acceleration engine and processor
WO2020077565A1 (en) * 2018-10-17 2020-04-23 北京比特大陆科技有限公司 Data processing method and apparatus, electronic device, and computer readable storage medium
CN109800867B (en) * 2018-12-17 2020-09-29 北京理工大学 Data calling method based on FPGA off-chip memory
CN109816093B (en) * 2018-12-17 2020-12-04 北京理工大学 Single-path convolution implementation method
CN109816093A (en) * 2018-12-17 2019-05-28 北京理工大学 A kind of one-way convolution implementation method
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 Convolutional neural networks module based on FPGA
CN109711533B (en) * 2018-12-20 2023-04-28 西安电子科技大学 Convolutional neural network acceleration system based on FPGA
CN110069444A (en) * 2019-06-03 2019-07-30 南京宁麒智能计算芯片研究院有限公司 A kind of computing unit, array, module, hardware system and implementation method
WO2021007037A1 (en) * 2019-07-09 2021-01-14 MemryX Inc. Matrix data reuse techniques in processing systems
US11537535B2 (en) 2019-07-09 2022-12-27 Memryx Incorporated Non-volatile memory based processors and dataflow techniques
CN110377874B (en) * 2019-07-23 2023-05-02 江苏鼎速网络科技有限公司 Convolution operation method and system
CN110377874A (en) * 2019-07-23 2019-10-25 江苏鼎速网络科技有限公司 Convolution algorithm method and system
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111523642B (en) * 2020-04-10 2023-03-28 星宸科技股份有限公司 Data reuse method, operation method and device and chip for convolution operation
CN111523642A (en) * 2020-04-10 2020-08-11 厦门星宸科技有限公司 Data reuse method, operation method and device and chip for convolution operation
CN111859797A (en) * 2020-07-14 2020-10-30 Oppo广东移动通信有限公司 Data processing method and device and storage medium
WO2022179075A1 (en) * 2021-02-26 2022-09-01 成都商汤科技有限公司 Data processing method and apparatus, computer device and storage medium
CN112992248A (en) * 2021-03-12 2021-06-18 西安交通大学深圳研究院 PE (provider edge) calculation unit structure of FIFO (first in first out) -based variable-length cyclic shift register
CN114780910A (en) * 2022-06-16 2022-07-22 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN114780910B (en) * 2022-06-16 2022-09-06 千芯半导体科技(北京)有限公司 Hardware system and calculation method for sparse convolution calculation
CN116842307A (en) * 2023-08-28 2023-10-03 腾讯科技(深圳)有限公司 Data processing method, device, equipment, chip and storage medium
CN116842307B (en) * 2023-08-28 2023-11-28 腾讯科技(深圳)有限公司 Data processing method, device, equipment, chip and storage medium

Similar Documents

Publication Publication Date Title
CN106250103A (en) A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
JP7430203B2 (en) System and method for matrix multiplication instructions using floating point operations with specified bias
CN111291880B (en) Computing device and computing method
CN108268943B (en) Hardware accelerator engine
CN109376861B (en) Apparatus and method for performing full connectivity layer neural network training
CN104899182B (en) A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks
JP6960700B2 (en) Multicast Network On-Chip Convolutional Neural Network Hardware Accelerator and Its Behavior
CN108108809B (en) Hardware architecture for reasoning and accelerating convolutional neural network and working method thereof
CN108416436B (en) Method and system for neural network partitioning using multi-core processing module
CN104054108B (en) Can dynamic configuration streamline preprocessor
CN103221918B (en) IC cluster processing equipments with separate data/address bus and messaging bus
CA3051990A1 (en) Accelerated deep learning
US11544525B2 (en) Systems and methods for artificial intelligence with a flexible hardware processing framework
CN109740748B (en) Convolutional neural network accelerator based on FPGA
CN105468568B (en) Efficient coarseness restructurable computing system
CN109711533A (en) Convolutional neural networks module based on FPGA
CN106294278B (en) Adaptive hardware for dynamic reconfigurable array computing system is pre-configured controller
CN105912501A (en) SM4-128 encryption algorithm implementation method and system based on large-scale coarseness reconfigurable processor
WO2018057294A1 (en) Combined world-space pipeline shader stages
CN115136123A (en) Tile subsystem and method for automated data flow and data processing within an integrated circuit architecture
CN110991619A (en) Neural network processor, chip and electronic equipment
CN109657794A (en) A kind of distributed deep neural network performance modelling method of queue based on instruction
CN102446342B (en) Reconfigurable binary arithmetical unit, reconfigurable binary image processing system and basic morphological algorithm implementation method thereof
CN115860066A (en) Neural network reasoning pipeline multiplexing method based on batch processing
CN110503179A (en) Calculation method and Related product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161221

RJ01 Rejection of invention patent application after publication