CN108170640A - The method of its progress operation of neural network computing device and application - Google Patents

The method of its progress operation of neural network computing device and application Download PDF

Info

Publication number
CN108170640A
CN108170640A CN201711452014.7A CN201711452014A CN108170640A CN 108170640 A CN108170640 A CN 108170640A CN 201711452014 A CN201711452014 A CN 201711452014A CN 108170640 A CN108170640 A CN 108170640A
Authority
CN
China
Prior art keywords
arithmetic element
data
element group
arithmetic
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711452014.7A
Other languages
Chinese (zh)
Other versions
CN108170640B (en
Inventor
周聖元
陈云霁
陈天石
刘少礼
郭崎
杜子东
刘道福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201711452014.7A priority Critical patent/CN108170640B/en
Publication of CN108170640A publication Critical patent/CN108170640A/en
Application granted granted Critical
Publication of CN108170640B publication Critical patent/CN108170640B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

The disclosure provides a kind of neural network computing device and method, and wherein device includes:Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are distributed in the array of X rows Y row, transmit data between arithmetic element group with S-shaped direction and/or inverse S-shaped direction, wherein X and Y are respectively positive integer;Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.Completes the transmission of data in arithmetic element by using S-shaped and inverse S-shaped, while so as to effective accelerans network operations, reduce the reading repeatedly of weights and part and memory access power consumption caused by accessing repeatedly.

Description

The method of its progress operation of neural network computing device and application
Technical field
This disclosure relates to computer realm, further to artificial intelligence field.
Background technology
Deep neural network is the basis of current many artificial intelligence applications, in speech recognition, image procossing, data point The various aspects such as analysis, advertisement commending system, automatic driving have obtained breakthrough application so that deep neural network is applied In the various aspects of life.But the operand of deep neural network is huge, restrict always its faster development and more It is widely applied.When consider with accelerator design come when accelerating the operation of deep neural network, huge operand will necessarily With very big energy consumption expense, the further extensive use of accelerator equally restrict.
Existing common method is to use general processor (CPU).This method is by using general-purpose register and general Functional component performs universal command to support neural network algorithm.One of the disadvantages of this method is the operation of single general processor Performance is relatively low, can not meet the performance requirement of neural network computing.And multiple general processors are when performing parallel, general processor Intercommunication become performance bottleneck again.Another known method is to use graphics processor (GPU).This method is by making General SIMD instruction is performed with general-purpose register and general stream processing unit to support above-mentioned algorithm.Since GPU is specially to use Perform the equipment of graph image operation and scientific algorithm, on piece caching is smaller so that the outer bandwidth of piece becomes main performance Bottleneck brings huge power dissipation overhead.
Invention content
(1) technical problems to be solved
In view of this, the disclosure is designed to provide a kind of restructural S-shaped arithmetic unit and operation method, to solve Above-described at least part technical problem.
(2) technical solution
According to the one side of the disclosure, a kind of neural network computing device is provided, for carrying out convolution algorithm, including:
Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are in The array of X rows Y row is distributed, and transmits data between arithmetic element group with S-shaped direction and/or inverse S-shaped direction, wherein X and Y are respectively Positive integer;
Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.
In a further embodiment, control section is further included, row control is deposited into for easing up the arithmetic section, Make the two that can cooperate, complete required function.
In a further embodiment, each arithmetic element group includes:Multiple arithmetic elements are arranged in M rows N Array distribution, data are transmitted with S-shaped direction and/or inverse S-shaped direction between arithmetic element, wherein M and N are respectively positive integer.
In a further embodiment, each arithmetic element group includes:Two or more multipliers;Two or with Levels device;Be provided with an at least inner storage portion in the arithmetic element, the inner storage portion and the multiplier and/ Or adder connection.
In a further embodiment, each arithmetic element group is also comprising two selectors, for skipping the operation list Multiplier and adder in member:When the arithmetic element requires calculation, the result of selector selection adder is made Output for arithmetic element;Or when the arithmetic element need not carry out operation, selector directly exports input data.
In a further embodiment, each arithmetic element group is additionally operable to individually broadcast data to caching part, also uses In under control performed by the control section, selecting different output channels, to realize work in series or concurrent working.
According to another aspect of the present disclosure, it provides and carries out convolution algorithm using any description above neural network computing device Method, including:Convolution kernel is set, convolution kernel size is more than the arithmetic element number in an arithmetic element group;It will be multiple Arithmetic element group is combined into an arithmetic element race so that the arithmetic element group in arithmetic element race is according to serial operation side Formula carries out data transfer and operation, carries out the transmission and operation of data between arithmetic element race according to parallel operation mode.
In a further embodiment, it further includes:The corresponding weight data of characteristic pattern will be exported and be sent into each arithmetic element Inner storage portion;The neuron for treating operation is respectively fed to carry out multiplication and add operation in arithmetic element;Carry out addition Operation result afterwards passes to next arithmetic element by the S-shaped or inverse S-shaped direction and carries out operation.
In a further embodiment, when the number of arithmetic element in an arithmetic element group is equal to convolution kernel size, Operation result is further operated into line activating.
In a further embodiment, when the number of arithmetic element in an arithmetic element group is less than convolution kernel size, Using operation result as interim data, it is sent into next arithmetic element group and continues operation.
(3) advantageous effect
(1) the neural network computing device of the disclosure completes the transmission of data using S-shaped and inverse S-shaped in arithmetic element, In combination with the characteristic of neural network " weights are shared ", while so as to effective accelerans network operations, power is reduced The reading repeatedly of value and part and repeatedly memory access power consumption caused by access.
(2) the neural network computing device of the disclosure has multiple arithmetic element groups, and arithmetic element group can be supported parallel It calculates, so as to so that each arithmetic element group reads and shares same group of neuron number evidence, while calculates multiple output characteristic patterns Data, improve the utilization rate and operation efficiency of neuron number evidence.
(3) multiple arithmetic element groups can be combined by the neural network computing device of the disclosure, in control section The transfer mode of the lower adjustment operational data of control and result data.In calculating process, it disclosure satisfy that in same operational network There are different weights scales, further expand the scope of application of arithmetic section, improve arithmetic element in device Utilization rate accelerates the arithmetic speed of neural network.
Description of the drawings
Fig. 1 is the schematic diagram of the neural network arithmetic unit of one embodiment of the disclosure.
Fig. 2 is one embodiment neural network computing device data of disclosure flowing direction schematic diagram.
Fig. 3 is another embodiment neural network computing device data flowing direction schematic diagram of the disclosure.
Fig. 4 is the schematic diagram of arithmetic element group in Fig. 1.
Fig. 5 is the schematic diagram for containing an arithmetic element group in Fig. 1.
Three arithmetic element groups is are combined as the schematic diagram of an arithmetic element race by Fig. 6.
Fig. 7 is the schematic diagram of an arithmetic element in Fig. 1.
Specific embodiment
Purpose, technical scheme and advantage to make the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing is described in further detail the disclosure.
The primary structure of the disclosure is as shown in Figure 1, it is broadly divided into arithmetic section, storage section.Arithmetic section has been used for Into operation, comprising multiple arithmetic element groups, multiple arithmetic elements and 2 or more arithmetical logics are included in each arithmetic element group Unit (ALU).Storage section is for preserving data, and including external storage section and inner storage portion, external storage section exists Outside arithmetic element, multiple regions can be divided into, be respectively used to preserve input data, output data, temporal cache;Storage inside Part is located inside arithmetic section, and operational data is treated for preserving.Preferred situation further includes control section, for the device Various pieces control, can cooperate, complete required function.
Arithmetic section includes X*Y arithmetic element group, including X*Y (X, Y are arbitrary positive integer) a arithmetic element group, is in The two-dimensional array form of X rows Y row is arranged, and data are transmitted with S-shaped direction or inverse S-shaped direction between arithmetic element group.Each operation list Tuple can broadcast data to caching part, under control performed by the control section, select different output channels, so as to arithmetic element Group can be with work in series or concurrent working.That is, each arithmetic element group can be received with work in series from the operation of left/right side The data that unit group is transmitted to, after operation, by output data to the right/the arithmetic element group in left side transmits.The last one Arithmetic element group after caching, is preserved final result, data flow direction such as Fig. 2 institutes in incoming memory module Show.Between arithmetic element group can also concurrent working, i.e. primary data is transferred to by former s shapes path in each arithmetic element group, Arithmetic element group shares operational data, and carries out operation.The operation result of oneself is transferred directly to delay by each arithmetic element group It deposits middle caching and arranges, after treating operation, the data in caching are exported to be preserved into memory module, data flowing side To as shown in Figure 3.
As shown in figure 4, each arithmetic element group includes M*N, (M, N are positive integer, preferred M=N=3 or M=N= 5) a arithmetic element is arranged in the two-dimensional array form of M rows N row, and number is transmitted with S-shaped direction or inverse S-shaped direction between arithmetic element According to.Each arithmetic element includes two or more multipliers and (represents first multiplier, second multiplier with " X1 " " X2 " etc. Deng) and two or more adders (representing first adder, second adder etc. with "+1 " "+2 " etc.), an inside is deposited Storage unit.Multiplier in each arithmetic element carries out phase from the extraneous data read in data and internal storage unit every time Multiply, product is sent into adder.Adder is by the data transmitted along S-shaped or inverse S-shaped and the product addition of multiplier, as a result along S Shape or inverse S-shaped are transmitted in the adder of next arithmetic element.Wherein non-odd number (i.e. zero, second ...) adder Receive S-shaped direction and transmit the data progress add operation come, and result is continued to transmit according to S-shaped direction;Odd number (i.e. first A, third ...) adder receiving carrys out the data that the transmission of self-converse S-shaped comes, and result is continued to transmit by inverse S-shaped.Work as operation During to a last arithmetic element, it can select operation result passing back continuation operation along inverse S-shaped, can also be transmitted to and deposit Storage unit is preserved.
As shown in figure 5, first, use Wo, i, x here, y represents a weight data, represents that the data correspond to o-th Export characteristic pattern, i-th of input feature vector figure, the position for xth row y row.Arithmetic element number might as well be assumed for 3*3, and core is big Small is 3*3.So first output characteristic pattern and the corresponding first group of weight data of second output characteristic pattern are sent into first In the inner storage portion of each arithmetic element, as shown in Figure 5.The neuron of operation is treated from storage section taking-up, and be respectively fed to It is multiplied in arithmetic element.Then product is sent into adder, carries out add operation.If the number of pending add operation According to, "+0 " of arithmetic element 0 and "+1 " of arithmetic element 8 can directly acquire the data from storage unit and carry out add operation, 0 can be initialized as, sum of products 0 is made to carry out add operation.Then, by the add operation of arithmetic element and the direction according to regulation It is transmitted, as the add operation result of "+0 " of arithmetic element 0 is passed to by S-shaped the input terminal of "+0 " of arithmetic element 1, fortune The add operation result of "+0 " of calculation unit 2 is passed to the input terminal of "+0 " of arithmetic element 3 by S-shaped;"+1 " of arithmetic element 6 Add operation result passed to by inverse S-shaped arithmetic element 5 "+1 " input terminal, the add operation of "+1 " of arithmetic element 5 As a result the input terminal of "+1 " of arithmetic element 4 is passed to by inverse S-shaped.Then, data processing module is sent into second group of neuron number According to completing the operation being multiplied with weights and the part transmitted before and progress addition fortune in each arithmetic element, in arithmetic element It calculates, continues to transmit further in accordance with assigned direction, until completing all operations.The operation result of "+1 " of arithmetic element 0 and operation list The operation result of "+0 " of member 8 can be directly written back to the specified position of storage section.The scale of core is more than the number of arithmetic element, So the result may be interim part and data, be stored in the temporary storage section of storage unit, existed according to control instruction After the weight data more renewed, result is sent to the input terminal of "+0 " of arithmetic element 0 and "+1 " of arithmetic element 8, is continued Complete add operation.If what is obtained is final result, and has activation to operate, then result is input in ALU, into line activating Operation, is then written back storage section.Otherwise storage section is write direct to be preserved.In this way, convolution can be made full use of The characteristic of the shared weights of neural network, avoid weights reads the memory access power consumption brought repeatedly.Meanwhile read same group of god It calculates the data of two output characteristic patterns simultaneously through metadata, improves the utilization rate of neuron number evidence.In addition, multiple operation lists Member can be with concurrent operation, so as to greatly accelerate arithmetic speed.
This arithmetic unit can under control of the control means be combined arithmetic element, form an arithmetic element Race enables adaptation to the situation that different layers scale is different in same network model.I.e. convolution kernel is more than an arithmetic element When arithmetic element number in group, control device can control the transmission direction of data, and multiple arithmetic element groups are combined As an arithmetic element race so that the arithmetic element group in arithmetic element race carries out data transfer according to serial operation mode And operation, the transmission and operation of data are carried out between arithmetic element race according to parallel operation mode.That is, in an arithmetic element race Each arithmetic element group the multiplying and addition of data can be completed according to original order of operation (positive s shapes or inverse s shapes) Operation, then the data in arithmetic element race pass to another arithmetic element group adjacent with its in the race successively and transported It calculates, after operation, is exported result using the outgoing route of the last one arithmetic element group in the arithmetic element race Into caching.
By taking ALEXNET networks as an example, first convolutional layer core size is 11*11.Second convolutional layer core size is 5*5, The core size of third convolutional layer is 3*3, then we are initially configured in each arithmetic element group comprising 3*3 arithmetic element, That is M=N=3, a total of 15 arithmetic element groups, i.e. X=3, Y=5.When handling third convolutional layer (convolution kernel 3*3), Each arithmetic element group handles an arithmetic element core, and operation is carried out parallel between arithmetic element group, i.e., each arithmetic element group will Respective operation result is exported into caching.When handling second convolutional layer (convolution kernel 5*5), every three arithmetic element groups An arithmetic element race is combined as, is divided into 5 arithmetic element races, data order is transmitted in each arithmetic element race, each Data parallel operation between Elements Families;When handling first convolutional layer (convolution kernel 11*11), you can with by all arithmetic elements Sequence completes operation.The direction of data transfer is controlled by control section, so as to achieve the purpose that dynamic combined adjusts, There is the layer of different scales to meet consolidated network, improve the utilization rate of arithmetic unit.As shown in figure Fig. 6, i.e., To be combined three arithmetic element groups as the schematic diagram of an arithmetic element race.Former input data and intermediate result according to S shapes transmit the arithmetic element of a left/right side successively, and then, each arithmetic element race obtains operation knot as a basic unit Fruit is output in caching.Operation is treated as a result, the data in caching are output to storage section.
Preferably, also comprising two selectors in each arithmetic element, for skip the multiplier in the arithmetic element and Adder, as shown in Figure 7.When the arithmetic element requires calculation, selector select the result of adder as The output of arithmetic element.When the arithmetic element need not carry out operation, selector directly exports input data.For example, When the scale of convolution kernel is less than the arithmetic element number in arithmetic element group, extra arithmetic element can be skipped directly, Directly input and output are come out by selector, without progress multiply-add operation.Again for example, when multiple arithmetic element groups carry out group When conjunction, when the arithmetic element sum after combination is more than the number of convolution kernel, extra arithmetic element can pass through selector Directly input data is exported, other arithmetic elements are then exported the operation result of adder as final result.
Specifically, as M=N=3, there are 9 arithmetic elements in the arithmetic element group.When pending convolution kernel is 2* When 2, then only need to use 4 arithmetic elements, then input data is sequentially sent to multiplier, addition by this 4 arithmetic elements Operation is carried out in device, the result that the result of adder is then used as to the arithmetic element by selector exports, another 5 operation lists Member need not carry out operation, then using selector, directly be exported input data as the result of arithmetic element.When waiting to locate When managing convolution kernel as 5*5, need 3 arithmetic element groups being combined, become a big arithmetic element group, then, reel Product core only needs 5*5=25 arithmetic element, and after forming a big arithmetic element group, three arithmetic element groups share 3*3* 3=27 arithmetic element, then can two idle arithmetic elements, the two arithmetic elements will directly input number by selector According to output, multiplier is needed not move through, adder carries out operation.
In some embodiments, a kind of chip is disclosed, that includes above-mentioned neural network computing devices.
In some embodiments, a kind of chip-packaging structure is disclosed, that includes said chips.
In some embodiments, a kind of board is disclosed, that includes said chip encapsulating structures.
In some embodiments, a kind of electronic device is disclosed, that includes above-mentioned boards.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, projecting apparatus, wrist-watch, earphone, Mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker and/or kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B Super instrument and/or electrocardiograph.
It should be appreciated that disclosed relevant apparatus and method, can realize by another way.For example, above institute The device embodiment of description is only schematical, for example, the division of the module or unit, only a kind of logic function is drawn Point, there can be other dividing mode in actual implementation, such as multiple units or component may be combined or can be integrated into separately One system or some features can be ignored or does not perform.
Each functional unit/module can be hardware, for example the hardware can be circuit, including digital circuit, simulation electricity Road etc..The physics realization of hardware configuration includes but is not limited to physical device, and physical device includes but is not limited to transistor, Memristor etc..Computing module in the computing device can be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP and ASIC etc..The storage unit can be any appropriate magnetic storage medium or magnetic-optical storage medium, than Such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC etc..
Particular embodiments described above has carried out the purpose, technical solution and advantageous effect of the disclosure further in detail Describe in detail bright, it should be understood that the foregoing is merely the specific embodiment of the disclosure, be not limited to the disclosure, it is all The spirit of the disclosure and any modification, equivalent substitution, improvement and etc. within principle, done, should be included in the protection of the disclosure Within the scope of.

Claims (10)

1. a kind of neural network computing device, for carrying out convolution algorithm, which is characterized in that including:
Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are in X rows Y The array of row is distributed, and data are transmitted with S-shaped direction and/or inverse S-shaped direction between arithmetic element group, and wherein X and Y are respectively just whole Number;Each arithmetic element group includes multiple arithmetic elements, is distributed in the array of M rows N row, with S-shaped side between arithmetic element Data are transmitted to and/or against S-shaped direction, wherein M and N are respectively positive integer.
Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.
2. neural network computing device according to claim 1, which is characterized in that control section is further included, for institute It states arithmetic section and eases up and deposit into row control, both make to cooperate, complete required function.
3. neural network computing device according to claim 1, which is characterized in that each arithmetic element group includes:
Two or more multipliers;
Two or more adders;
An at least inner storage portion, the inner storage portion and the multiplier are provided in the arithmetic element and/or is added Musical instruments used in a Buddhist or Taoist mass connects.
4. neural network computing device according to claim 3, which is characterized in that each arithmetic element group also includes two Selector, for skipping the multiplier and adder in the arithmetic element:
When the arithmetic element requires calculation, selector selects the result of adder as the output of arithmetic element;
Or when the arithmetic element need not carry out operation, selector directly exports input data.
5. neural network computing device according to claim 1, which is characterized in that each arithmetic element group is additionally operable to individually Caching part is broadcast data to, is additionally operable under control performed by the control section, select different output channels, to realize work in series Or concurrent working.
6. the method that any neural network computing devices of application claim 1-5 carry out convolution algorithm, it is characterised in that packet It includes:
Convolution kernel is set, convolution kernel size is more than the arithmetic element number in an arithmetic element group;
Multiple arithmetic element groups are combined into an arithmetic element race so that the arithmetic element group in arithmetic element race is according to string Capable operation mode carries out data transfer and operation, between arithmetic element race according to parallel operation mode carry out data transmission and Operation.
7. according to the method described in claim 6, it is characterised in that it includes:
The inner storage portion of each arithmetic element of the corresponding weight data feeding of characteristic pattern will be exported;
The neuron for treating operation is respectively fed to carry out multiplication and add operation in arithmetic element;
It carries out the operation result after addition and passes to next arithmetic element progress operation by the S-shaped or inverse S-shaped direction.
8. the method according to the description of claim 7 is characterized in that the number when arithmetic element in an arithmetic element group is equal to volume During product core size, operation result is further operated into line activating.
9. according to the method described in claim 8, it is characterized in that, the number when arithmetic element in an arithmetic element group is less than volume During product core size, using operation result as interim data, it is sent into next arithmetic element group and continues operation.
10. it according to the method described in claim 6, it is characterized in that, is carried out between arithmetic element race according to parallel operation mode The transmission and operation of data include:
Each arithmetic element group in one arithmetic element race can complete multiplying for data according to the order of operation of S-shaped or inverse S-shaped Method operation and add operation;
Data in arithmetic element race pass to another arithmetic element group adjacent with its in the race and carry out operation successively, until Operation finishes;
Result is exported into caching using the outgoing route of the last one arithmetic element group in the arithmetic element race.
CN201711452014.7A 2017-10-17 2017-10-17 Neural network operation device and operation method using same Active CN108170640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711452014.7A CN108170640B (en) 2017-10-17 2017-10-17 Neural network operation device and operation method using same

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710967772.6A CN107632965B (en) 2017-10-17 2017-10-17 Restructural S type arithmetic unit and operation method
CN201711452014.7A CN108170640B (en) 2017-10-17 2017-10-17 Neural network operation device and operation method using same

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710967772.6A Division CN107632965B (en) 2017-10-17 2017-10-17 Restructural S type arithmetic unit and operation method

Publications (2)

Publication Number Publication Date
CN108170640A true CN108170640A (en) 2018-06-15
CN108170640B CN108170640B (en) 2020-06-09

Family

ID=61105558

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201711452014.7A Active CN108170640B (en) 2017-10-17 2017-10-17 Neural network operation device and operation method using same
CN201710967772.6A Active CN107632965B (en) 2017-10-17 2017-10-17 Restructural S type arithmetic unit and operation method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201710967772.6A Active CN107632965B (en) 2017-10-17 2017-10-17 Restructural S type arithmetic unit and operation method

Country Status (1)

Country Link
CN (2) CN108170640B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111024108A (en) * 2019-12-20 2020-04-17 中国科学院计算技术研究所 Intelligent route planning display device
CN111290787A (en) * 2019-06-19 2020-06-16 锐迪科(重庆)微电子科技有限公司 Arithmetic device and arithmetic method
CN114004343A (en) * 2021-12-31 2022-02-01 之江实验室 Method and device for obtaining shortest path based on memristor pulse coupling neural network

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764468A (en) * 2018-05-03 2018-11-06 中国科学院计算技术研究所 Artificial neural network processor for intelligent recognition
CN111078623B (en) * 2018-10-18 2022-03-29 上海寒武纪信息科技有限公司 Network-on-chip processing system and network-on-chip data processing method
CN109583580B (en) * 2018-11-30 2021-08-03 上海寒武纪信息科技有限公司 Operation method, device and related product
CN110096308B (en) * 2019-04-24 2022-02-25 北京探境科技有限公司 Parallel storage operation device and method thereof
CN111832717B (en) * 2020-06-24 2021-09-28 上海西井信息科技有限公司 Chip and processing device for convolution calculation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402415A (en) * 2011-10-21 2012-04-04 清华大学 Device and method for buffering data in dynamic reconfigurable array
CN102646262A (en) * 2012-02-28 2012-08-22 西安交通大学 Reconfigurable visual preprocessor and visual processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
US20160085721A1 (en) * 2014-09-22 2016-03-24 International Business Machines Corporation Reconfigurable array processor for pattern matching
CN106951395A (en) * 2017-02-13 2017-07-14 上海客鹭信息技术有限公司 Towards the parallel convolution operations method and device of compression convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402415A (en) * 2011-10-21 2012-04-04 清华大学 Device and method for buffering data in dynamic reconfigurable array
CN102646262A (en) * 2012-02-28 2012-08-22 西安交通大学 Reconfigurable visual preprocessor and visual processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
US20160085721A1 (en) * 2014-09-22 2016-03-24 International Business Machines Corporation Reconfigurable array processor for pattern matching
CN106951395A (en) * 2017-02-13 2017-07-14 上海客鹭信息技术有限公司 Towards the parallel convolution operations method and device of compression convolutional neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUNJI CHEN等: "DaDianNao: A Machine-Learning Supercomputer", 《2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 *
尹勇生: "可重构多流水计算系统研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 *
方睿等: "卷积神经网络的FPGA并行加速方案设计", 《计算机工程与应用》 *
陈云霁: "从人工智能到神经网络处理器", 《领导科学论坛》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111290787A (en) * 2019-06-19 2020-06-16 锐迪科(重庆)微电子科技有限公司 Arithmetic device and arithmetic method
CN111024108A (en) * 2019-12-20 2020-04-17 中国科学院计算技术研究所 Intelligent route planning display device
CN114004343A (en) * 2021-12-31 2022-02-01 之江实验室 Method and device for obtaining shortest path based on memristor pulse coupling neural network
CN114004343B (en) * 2021-12-31 2022-10-14 之江实验室 Shortest path obtaining method and device based on memristor pulse coupling neural network

Also Published As

Publication number Publication date
CN107632965B (en) 2019-11-29
CN108170640B (en) 2020-06-09
CN107632965A (en) 2018-01-26

Similar Documents

Publication Publication Date Title
CN107632965B (en) Restructural S type arithmetic unit and operation method
US11656910B2 (en) Data sharing system and data sharing method therefor
CN108733348B (en) Fused vector multiplier and method for performing operation using the same
CN108241890B (en) Reconfigurable neural network acceleration method and architecture
CN105930902B (en) A kind of processing method of neural network, system
CN109189474A (en) Processing with Neural Network device and its method for executing vector adduction instruction
CN110502330A (en) Processor and processing method
CN110245752A (en) A kind of connection operation method and device entirely
CN108205700A (en) Neural network computing device and method
CN112612521A (en) Apparatus and method for performing matrix multiplication operation
CN109032670A (en) Processing with Neural Network device and its method for executing vector duplicate instructions
CN111461311A (en) Convolutional neural network operation acceleration method and device based on many-core processor
CN110276447A (en) A kind of computing device and method
CN107451097B (en) High-performance implementation method of multi-dimensional FFT on domestic Shenwei 26010 multi-core processor
CN110163350A (en) A kind of computing device and method
CN109754062A (en) The execution method and Related product of convolution extended instruction
CN110909872A (en) Integrated circuit chip device and related product
CN109389213B (en) Storage device and method, data processing device and method, and electronic device
TW201931216A (en) Integrated circuit chip device and related products comprise a compression mapping circuit for executing the compressing processing of each of the data; the main processing circuit for executing each successive operation in the neural network operation, etc.
CN109389209A (en) Processing unit and processing method
CN110472734A (en) A kind of computing device and Related product
CN108960415A (en) Processing unit and processing system
TWI768168B (en) Integrated circuit chip device and related products
CN115081600A (en) Conversion unit for executing Winograd convolution, integrated circuit device and board card
TW201937412A (en) Integrated circuit chip device and related product has the advantages of small amount of calculation and low power consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant