CN108170640A - The method of its progress operation of neural network computing device and application - Google Patents
The method of its progress operation of neural network computing device and application Download PDFInfo
- Publication number
- CN108170640A CN108170640A CN201711452014.7A CN201711452014A CN108170640A CN 108170640 A CN108170640 A CN 108170640A CN 201711452014 A CN201711452014 A CN 201711452014A CN 108170640 A CN108170640 A CN 108170640A
- Authority
- CN
- China
- Prior art keywords
- arithmetic element
- data
- element group
- arithmetic
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
Abstract
The disclosure provides a kind of neural network computing device and method, and wherein device includes:Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are distributed in the array of X rows Y row, transmit data between arithmetic element group with S-shaped direction and/or inverse S-shaped direction, wherein X and Y are respectively positive integer;Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.Completes the transmission of data in arithmetic element by using S-shaped and inverse S-shaped, while so as to effective accelerans network operations, reduce the reading repeatedly of weights and part and memory access power consumption caused by accessing repeatedly.
Description
Technical field
This disclosure relates to computer realm, further to artificial intelligence field.
Background technology
Deep neural network is the basis of current many artificial intelligence applications, in speech recognition, image procossing, data point
The various aspects such as analysis, advertisement commending system, automatic driving have obtained breakthrough application so that deep neural network is applied
In the various aspects of life.But the operand of deep neural network is huge, restrict always its faster development and more
It is widely applied.When consider with accelerator design come when accelerating the operation of deep neural network, huge operand will necessarily
With very big energy consumption expense, the further extensive use of accelerator equally restrict.
Existing common method is to use general processor (CPU).This method is by using general-purpose register and general
Functional component performs universal command to support neural network algorithm.One of the disadvantages of this method is the operation of single general processor
Performance is relatively low, can not meet the performance requirement of neural network computing.And multiple general processors are when performing parallel, general processor
Intercommunication become performance bottleneck again.Another known method is to use graphics processor (GPU).This method is by making
General SIMD instruction is performed with general-purpose register and general stream processing unit to support above-mentioned algorithm.Since GPU is specially to use
Perform the equipment of graph image operation and scientific algorithm, on piece caching is smaller so that the outer bandwidth of piece becomes main performance
Bottleneck brings huge power dissipation overhead.
Invention content
(1) technical problems to be solved
In view of this, the disclosure is designed to provide a kind of restructural S-shaped arithmetic unit and operation method, to solve
Above-described at least part technical problem.
(2) technical solution
According to the one side of the disclosure, a kind of neural network computing device is provided, for carrying out convolution algorithm, including:
Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are in
The array of X rows Y row is distributed, and transmits data between arithmetic element group with S-shaped direction and/or inverse S-shaped direction, wherein X and Y are respectively
Positive integer;
Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.
In a further embodiment, control section is further included, row control is deposited into for easing up the arithmetic section,
Make the two that can cooperate, complete required function.
In a further embodiment, each arithmetic element group includes:Multiple arithmetic elements are arranged in M rows N
Array distribution, data are transmitted with S-shaped direction and/or inverse S-shaped direction between arithmetic element, wherein M and N are respectively positive integer.
In a further embodiment, each arithmetic element group includes:Two or more multipliers;Two or with
Levels device;Be provided with an at least inner storage portion in the arithmetic element, the inner storage portion and the multiplier and/
Or adder connection.
In a further embodiment, each arithmetic element group is also comprising two selectors, for skipping the operation list
Multiplier and adder in member:When the arithmetic element requires calculation, the result of selector selection adder is made
Output for arithmetic element;Or when the arithmetic element need not carry out operation, selector directly exports input data.
In a further embodiment, each arithmetic element group is additionally operable to individually broadcast data to caching part, also uses
In under control performed by the control section, selecting different output channels, to realize work in series or concurrent working.
According to another aspect of the present disclosure, it provides and carries out convolution algorithm using any description above neural network computing device
Method, including:Convolution kernel is set, convolution kernel size is more than the arithmetic element number in an arithmetic element group;It will be multiple
Arithmetic element group is combined into an arithmetic element race so that the arithmetic element group in arithmetic element race is according to serial operation side
Formula carries out data transfer and operation, carries out the transmission and operation of data between arithmetic element race according to parallel operation mode.
In a further embodiment, it further includes:The corresponding weight data of characteristic pattern will be exported and be sent into each arithmetic element
Inner storage portion;The neuron for treating operation is respectively fed to carry out multiplication and add operation in arithmetic element;Carry out addition
Operation result afterwards passes to next arithmetic element by the S-shaped or inverse S-shaped direction and carries out operation.
In a further embodiment, when the number of arithmetic element in an arithmetic element group is equal to convolution kernel size,
Operation result is further operated into line activating.
In a further embodiment, when the number of arithmetic element in an arithmetic element group is less than convolution kernel size,
Using operation result as interim data, it is sent into next arithmetic element group and continues operation.
(3) advantageous effect
(1) the neural network computing device of the disclosure completes the transmission of data using S-shaped and inverse S-shaped in arithmetic element,
In combination with the characteristic of neural network " weights are shared ", while so as to effective accelerans network operations, power is reduced
The reading repeatedly of value and part and repeatedly memory access power consumption caused by access.
(2) the neural network computing device of the disclosure has multiple arithmetic element groups, and arithmetic element group can be supported parallel
It calculates, so as to so that each arithmetic element group reads and shares same group of neuron number evidence, while calculates multiple output characteristic patterns
Data, improve the utilization rate and operation efficiency of neuron number evidence.
(3) multiple arithmetic element groups can be combined by the neural network computing device of the disclosure, in control section
The transfer mode of the lower adjustment operational data of control and result data.In calculating process, it disclosure satisfy that in same operational network
There are different weights scales, further expand the scope of application of arithmetic section, improve arithmetic element in device
Utilization rate accelerates the arithmetic speed of neural network.
Description of the drawings
Fig. 1 is the schematic diagram of the neural network arithmetic unit of one embodiment of the disclosure.
Fig. 2 is one embodiment neural network computing device data of disclosure flowing direction schematic diagram.
Fig. 3 is another embodiment neural network computing device data flowing direction schematic diagram of the disclosure.
Fig. 4 is the schematic diagram of arithmetic element group in Fig. 1.
Fig. 5 is the schematic diagram for containing an arithmetic element group in Fig. 1.
Three arithmetic element groups is are combined as the schematic diagram of an arithmetic element race by Fig. 6.
Fig. 7 is the schematic diagram of an arithmetic element in Fig. 1.
Specific embodiment
Purpose, technical scheme and advantage to make the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing is described in further detail the disclosure.
The primary structure of the disclosure is as shown in Figure 1, it is broadly divided into arithmetic section, storage section.Arithmetic section has been used for
Into operation, comprising multiple arithmetic element groups, multiple arithmetic elements and 2 or more arithmetical logics are included in each arithmetic element group
Unit (ALU).Storage section is for preserving data, and including external storage section and inner storage portion, external storage section exists
Outside arithmetic element, multiple regions can be divided into, be respectively used to preserve input data, output data, temporal cache;Storage inside
Part is located inside arithmetic section, and operational data is treated for preserving.Preferred situation further includes control section, for the device
Various pieces control, can cooperate, complete required function.
Arithmetic section includes X*Y arithmetic element group, including X*Y (X, Y are arbitrary positive integer) a arithmetic element group, is in
The two-dimensional array form of X rows Y row is arranged, and data are transmitted with S-shaped direction or inverse S-shaped direction between arithmetic element group.Each operation list
Tuple can broadcast data to caching part, under control performed by the control section, select different output channels, so as to arithmetic element
Group can be with work in series or concurrent working.That is, each arithmetic element group can be received with work in series from the operation of left/right side
The data that unit group is transmitted to, after operation, by output data to the right/the arithmetic element group in left side transmits.The last one
Arithmetic element group after caching, is preserved final result, data flow direction such as Fig. 2 institutes in incoming memory module
Show.Between arithmetic element group can also concurrent working, i.e. primary data is transferred to by former s shapes path in each arithmetic element group,
Arithmetic element group shares operational data, and carries out operation.The operation result of oneself is transferred directly to delay by each arithmetic element group
It deposits middle caching and arranges, after treating operation, the data in caching are exported to be preserved into memory module, data flowing side
To as shown in Figure 3.
As shown in figure 4, each arithmetic element group includes M*N, (M, N are positive integer, preferred M=N=3 or M=N=
5) a arithmetic element is arranged in the two-dimensional array form of M rows N row, and number is transmitted with S-shaped direction or inverse S-shaped direction between arithmetic element
According to.Each arithmetic element includes two or more multipliers and (represents first multiplier, second multiplier with " X1 " " X2 " etc.
Deng) and two or more adders (representing first adder, second adder etc. with "+1 " "+2 " etc.), an inside is deposited
Storage unit.Multiplier in each arithmetic element carries out phase from the extraneous data read in data and internal storage unit every time
Multiply, product is sent into adder.Adder is by the data transmitted along S-shaped or inverse S-shaped and the product addition of multiplier, as a result along S
Shape or inverse S-shaped are transmitted in the adder of next arithmetic element.Wherein non-odd number (i.e. zero, second ...) adder
Receive S-shaped direction and transmit the data progress add operation come, and result is continued to transmit according to S-shaped direction;Odd number (i.e. first
A, third ...) adder receiving carrys out the data that the transmission of self-converse S-shaped comes, and result is continued to transmit by inverse S-shaped.Work as operation
During to a last arithmetic element, it can select operation result passing back continuation operation along inverse S-shaped, can also be transmitted to and deposit
Storage unit is preserved.
As shown in figure 5, first, use Wo, i, x here, y represents a weight data, represents that the data correspond to o-th
Export characteristic pattern, i-th of input feature vector figure, the position for xth row y row.Arithmetic element number might as well be assumed for 3*3, and core is big
Small is 3*3.So first output characteristic pattern and the corresponding first group of weight data of second output characteristic pattern are sent into first
In the inner storage portion of each arithmetic element, as shown in Figure 5.The neuron of operation is treated from storage section taking-up, and be respectively fed to
It is multiplied in arithmetic element.Then product is sent into adder, carries out add operation.If the number of pending add operation
According to, "+0 " of arithmetic element 0 and "+1 " of arithmetic element 8 can directly acquire the data from storage unit and carry out add operation,
0 can be initialized as, sum of products 0 is made to carry out add operation.Then, by the add operation of arithmetic element and the direction according to regulation
It is transmitted, as the add operation result of "+0 " of arithmetic element 0 is passed to by S-shaped the input terminal of "+0 " of arithmetic element 1, fortune
The add operation result of "+0 " of calculation unit 2 is passed to the input terminal of "+0 " of arithmetic element 3 by S-shaped;"+1 " of arithmetic element 6
Add operation result passed to by inverse S-shaped arithmetic element 5 "+1 " input terminal, the add operation of "+1 " of arithmetic element 5
As a result the input terminal of "+1 " of arithmetic element 4 is passed to by inverse S-shaped.Then, data processing module is sent into second group of neuron number
According to completing the operation being multiplied with weights and the part transmitted before and progress addition fortune in each arithmetic element, in arithmetic element
It calculates, continues to transmit further in accordance with assigned direction, until completing all operations.The operation result of "+1 " of arithmetic element 0 and operation list
The operation result of "+0 " of member 8 can be directly written back to the specified position of storage section.The scale of core is more than the number of arithmetic element,
So the result may be interim part and data, be stored in the temporary storage section of storage unit, existed according to control instruction
After the weight data more renewed, result is sent to the input terminal of "+0 " of arithmetic element 0 and "+1 " of arithmetic element 8, is continued
Complete add operation.If what is obtained is final result, and has activation to operate, then result is input in ALU, into line activating
Operation, is then written back storage section.Otherwise storage section is write direct to be preserved.In this way, convolution can be made full use of
The characteristic of the shared weights of neural network, avoid weights reads the memory access power consumption brought repeatedly.Meanwhile read same group of god
It calculates the data of two output characteristic patterns simultaneously through metadata, improves the utilization rate of neuron number evidence.In addition, multiple operation lists
Member can be with concurrent operation, so as to greatly accelerate arithmetic speed.
This arithmetic unit can under control of the control means be combined arithmetic element, form an arithmetic element
Race enables adaptation to the situation that different layers scale is different in same network model.I.e. convolution kernel is more than an arithmetic element
When arithmetic element number in group, control device can control the transmission direction of data, and multiple arithmetic element groups are combined
As an arithmetic element race so that the arithmetic element group in arithmetic element race carries out data transfer according to serial operation mode
And operation, the transmission and operation of data are carried out between arithmetic element race according to parallel operation mode.That is, in an arithmetic element race
Each arithmetic element group the multiplying and addition of data can be completed according to original order of operation (positive s shapes or inverse s shapes)
Operation, then the data in arithmetic element race pass to another arithmetic element group adjacent with its in the race successively and transported
It calculates, after operation, is exported result using the outgoing route of the last one arithmetic element group in the arithmetic element race
Into caching.
By taking ALEXNET networks as an example, first convolutional layer core size is 11*11.Second convolutional layer core size is 5*5,
The core size of third convolutional layer is 3*3, then we are initially configured in each arithmetic element group comprising 3*3 arithmetic element,
That is M=N=3, a total of 15 arithmetic element groups, i.e. X=3, Y=5.When handling third convolutional layer (convolution kernel 3*3),
Each arithmetic element group handles an arithmetic element core, and operation is carried out parallel between arithmetic element group, i.e., each arithmetic element group will
Respective operation result is exported into caching.When handling second convolutional layer (convolution kernel 5*5), every three arithmetic element groups
An arithmetic element race is combined as, is divided into 5 arithmetic element races, data order is transmitted in each arithmetic element race, each
Data parallel operation between Elements Families;When handling first convolutional layer (convolution kernel 11*11), you can with by all arithmetic elements
Sequence completes operation.The direction of data transfer is controlled by control section, so as to achieve the purpose that dynamic combined adjusts,
There is the layer of different scales to meet consolidated network, improve the utilization rate of arithmetic unit.As shown in figure Fig. 6, i.e.,
To be combined three arithmetic element groups as the schematic diagram of an arithmetic element race.Former input data and intermediate result according to
S shapes transmit the arithmetic element of a left/right side successively, and then, each arithmetic element race obtains operation knot as a basic unit
Fruit is output in caching.Operation is treated as a result, the data in caching are output to storage section.
Preferably, also comprising two selectors in each arithmetic element, for skip the multiplier in the arithmetic element and
Adder, as shown in Figure 7.When the arithmetic element requires calculation, selector select the result of adder as
The output of arithmetic element.When the arithmetic element need not carry out operation, selector directly exports input data.For example,
When the scale of convolution kernel is less than the arithmetic element number in arithmetic element group, extra arithmetic element can be skipped directly,
Directly input and output are come out by selector, without progress multiply-add operation.Again for example, when multiple arithmetic element groups carry out group
When conjunction, when the arithmetic element sum after combination is more than the number of convolution kernel, extra arithmetic element can pass through selector
Directly input data is exported, other arithmetic elements are then exported the operation result of adder as final result.
Specifically, as M=N=3, there are 9 arithmetic elements in the arithmetic element group.When pending convolution kernel is 2*
When 2, then only need to use 4 arithmetic elements, then input data is sequentially sent to multiplier, addition by this 4 arithmetic elements
Operation is carried out in device, the result that the result of adder is then used as to the arithmetic element by selector exports, another 5 operation lists
Member need not carry out operation, then using selector, directly be exported input data as the result of arithmetic element.When waiting to locate
When managing convolution kernel as 5*5, need 3 arithmetic element groups being combined, become a big arithmetic element group, then, reel
Product core only needs 5*5=25 arithmetic element, and after forming a big arithmetic element group, three arithmetic element groups share 3*3*
3=27 arithmetic element, then can two idle arithmetic elements, the two arithmetic elements will directly input number by selector
According to output, multiplier is needed not move through, adder carries out operation.
In some embodiments, a kind of chip is disclosed, that includes above-mentioned neural network computing devices.
In some embodiments, a kind of chip-packaging structure is disclosed, that includes said chips.
In some embodiments, a kind of board is disclosed, that includes said chip encapsulating structures.
In some embodiments, a kind of electronic device is disclosed, that includes above-mentioned boards.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, projecting apparatus, wrist-watch, earphone,
Mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker and/or kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B
Super instrument and/or electrocardiograph.
It should be appreciated that disclosed relevant apparatus and method, can realize by another way.For example, above institute
The device embodiment of description is only schematical, for example, the division of the module or unit, only a kind of logic function is drawn
Point, there can be other dividing mode in actual implementation, such as multiple units or component may be combined or can be integrated into separately
One system or some features can be ignored or does not perform.
Each functional unit/module can be hardware, for example the hardware can be circuit, including digital circuit, simulation electricity
Road etc..The physics realization of hardware configuration includes but is not limited to physical device, and physical device includes but is not limited to transistor,
Memristor etc..Computing module in the computing device can be any appropriate hardware processor, such as CPU, GPU,
FPGA, DSP and ASIC etc..The storage unit can be any appropriate magnetic storage medium or magnetic-optical storage medium, than
Such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC etc..
Particular embodiments described above has carried out the purpose, technical solution and advantageous effect of the disclosure further in detail
Describe in detail bright, it should be understood that the foregoing is merely the specific embodiment of the disclosure, be not limited to the disclosure, it is all
The spirit of the disclosure and any modification, equivalent substitution, improvement and etc. within principle, done, should be included in the protection of the disclosure
Within the scope of.
Claims (10)
1. a kind of neural network computing device, for carrying out convolution algorithm, which is characterized in that including:
Arithmetic section, for completing the convolution algorithm, comprising multiple arithmetic element groups, multiple arithmetic element groups are in X rows Y
The array of row is distributed, and data are transmitted with S-shaped direction and/or inverse S-shaped direction between arithmetic element group, and wherein X and Y are respectively just whole
Number;Each arithmetic element group includes multiple arithmetic elements, is distributed in the array of M rows N row, with S-shaped side between arithmetic element
Data are transmitted to and/or against S-shaped direction, wherein M and N are respectively positive integer.
Caching, for transmitting data to the arithmetic element group and receiving the data after arithmetic element group operation.
2. neural network computing device according to claim 1, which is characterized in that control section is further included, for institute
It states arithmetic section and eases up and deposit into row control, both make to cooperate, complete required function.
3. neural network computing device according to claim 1, which is characterized in that each arithmetic element group includes:
Two or more multipliers;
Two or more adders;
An at least inner storage portion, the inner storage portion and the multiplier are provided in the arithmetic element and/or is added
Musical instruments used in a Buddhist or Taoist mass connects.
4. neural network computing device according to claim 3, which is characterized in that each arithmetic element group also includes two
Selector, for skipping the multiplier and adder in the arithmetic element:
When the arithmetic element requires calculation, selector selects the result of adder as the output of arithmetic element;
Or when the arithmetic element need not carry out operation, selector directly exports input data.
5. neural network computing device according to claim 1, which is characterized in that each arithmetic element group is additionally operable to individually
Caching part is broadcast data to, is additionally operable under control performed by the control section, select different output channels, to realize work in series
Or concurrent working.
6. the method that any neural network computing devices of application claim 1-5 carry out convolution algorithm, it is characterised in that packet
It includes:
Convolution kernel is set, convolution kernel size is more than the arithmetic element number in an arithmetic element group;
Multiple arithmetic element groups are combined into an arithmetic element race so that the arithmetic element group in arithmetic element race is according to string
Capable operation mode carries out data transfer and operation, between arithmetic element race according to parallel operation mode carry out data transmission and
Operation.
7. according to the method described in claim 6, it is characterised in that it includes:
The inner storage portion of each arithmetic element of the corresponding weight data feeding of characteristic pattern will be exported;
The neuron for treating operation is respectively fed to carry out multiplication and add operation in arithmetic element;
It carries out the operation result after addition and passes to next arithmetic element progress operation by the S-shaped or inverse S-shaped direction.
8. the method according to the description of claim 7 is characterized in that the number when arithmetic element in an arithmetic element group is equal to volume
During product core size, operation result is further operated into line activating.
9. according to the method described in claim 8, it is characterized in that, the number when arithmetic element in an arithmetic element group is less than volume
During product core size, using operation result as interim data, it is sent into next arithmetic element group and continues operation.
10. it according to the method described in claim 6, it is characterized in that, is carried out between arithmetic element race according to parallel operation mode
The transmission and operation of data include:
Each arithmetic element group in one arithmetic element race can complete multiplying for data according to the order of operation of S-shaped or inverse S-shaped
Method operation and add operation;
Data in arithmetic element race pass to another arithmetic element group adjacent with its in the race and carry out operation successively, until
Operation finishes;
Result is exported into caching using the outgoing route of the last one arithmetic element group in the arithmetic element race.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711452014.7A CN108170640B (en) | 2017-10-17 | 2017-10-17 | Neural network operation device and operation method using same |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710967772.6A CN107632965B (en) | 2017-10-17 | 2017-10-17 | Restructural S type arithmetic unit and operation method |
CN201711452014.7A CN108170640B (en) | 2017-10-17 | 2017-10-17 | Neural network operation device and operation method using same |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710967772.6A Division CN107632965B (en) | 2017-10-17 | 2017-10-17 | Restructural S type arithmetic unit and operation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108170640A true CN108170640A (en) | 2018-06-15 |
CN108170640B CN108170640B (en) | 2020-06-09 |
Family
ID=61105558
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711452014.7A Active CN108170640B (en) | 2017-10-17 | 2017-10-17 | Neural network operation device and operation method using same |
CN201710967772.6A Active CN107632965B (en) | 2017-10-17 | 2017-10-17 | Restructural S type arithmetic unit and operation method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710967772.6A Active CN107632965B (en) | 2017-10-17 | 2017-10-17 | Restructural S type arithmetic unit and operation method |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN108170640B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111024108A (en) * | 2019-12-20 | 2020-04-17 | 中国科学院计算技术研究所 | Intelligent route planning display device |
CN111290787A (en) * | 2019-06-19 | 2020-06-16 | 锐迪科(重庆)微电子科技有限公司 | Arithmetic device and arithmetic method |
CN114004343A (en) * | 2021-12-31 | 2022-02-01 | 之江实验室 | Method and device for obtaining shortest path based on memristor pulse coupling neural network |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764468A (en) * | 2018-05-03 | 2018-11-06 | 中国科学院计算技术研究所 | Artificial neural network processor for intelligent recognition |
CN111078623B (en) * | 2018-10-18 | 2022-03-29 | 上海寒武纪信息科技有限公司 | Network-on-chip processing system and network-on-chip data processing method |
CN109583580B (en) * | 2018-11-30 | 2021-08-03 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN110096308B (en) * | 2019-04-24 | 2022-02-25 | 北京探境科技有限公司 | Parallel storage operation device and method thereof |
CN111832717B (en) * | 2020-06-24 | 2021-09-28 | 上海西井信息科技有限公司 | Chip and processing device for convolution calculation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102402415A (en) * | 2011-10-21 | 2012-04-04 | 清华大学 | Device and method for buffering data in dynamic reconfigurable array |
CN102646262A (en) * | 2012-02-28 | 2012-08-22 | 西安交通大学 | Reconfigurable visual preprocessor and visual processing system |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
US20160085721A1 (en) * | 2014-09-22 | 2016-03-24 | International Business Machines Corporation | Reconfigurable array processor for pattern matching |
CN106951395A (en) * | 2017-02-13 | 2017-07-14 | 上海客鹭信息技术有限公司 | Towards the parallel convolution operations method and device of compression convolutional neural networks |
-
2017
- 2017-10-17 CN CN201711452014.7A patent/CN108170640B/en active Active
- 2017-10-17 CN CN201710967772.6A patent/CN107632965B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102402415A (en) * | 2011-10-21 | 2012-04-04 | 清华大学 | Device and method for buffering data in dynamic reconfigurable array |
CN102646262A (en) * | 2012-02-28 | 2012-08-22 | 西安交通大学 | Reconfigurable visual preprocessor and visual processing system |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
US20160085721A1 (en) * | 2014-09-22 | 2016-03-24 | International Business Machines Corporation | Reconfigurable array processor for pattern matching |
CN106951395A (en) * | 2017-02-13 | 2017-07-14 | 上海客鹭信息技术有限公司 | Towards the parallel convolution operations method and device of compression convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
YUNJI CHEN等: "DaDianNao: A Machine-Learning Supercomputer", 《2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 * |
尹勇生: "可重构多流水计算系统研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 * |
方睿等: "卷积神经网络的FPGA并行加速方案设计", 《计算机工程与应用》 * |
陈云霁: "从人工智能到神经网络处理器", 《领导科学论坛》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111290787A (en) * | 2019-06-19 | 2020-06-16 | 锐迪科(重庆)微电子科技有限公司 | Arithmetic device and arithmetic method |
CN111024108A (en) * | 2019-12-20 | 2020-04-17 | 中国科学院计算技术研究所 | Intelligent route planning display device |
CN114004343A (en) * | 2021-12-31 | 2022-02-01 | 之江实验室 | Method and device for obtaining shortest path based on memristor pulse coupling neural network |
CN114004343B (en) * | 2021-12-31 | 2022-10-14 | 之江实验室 | Shortest path obtaining method and device based on memristor pulse coupling neural network |
Also Published As
Publication number | Publication date |
---|---|
CN107632965B (en) | 2019-11-29 |
CN108170640B (en) | 2020-06-09 |
CN107632965A (en) | 2018-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107632965B (en) | Restructural S type arithmetic unit and operation method | |
US11656910B2 (en) | Data sharing system and data sharing method therefor | |
CN108733348B (en) | Fused vector multiplier and method for performing operation using the same | |
CN108241890B (en) | Reconfigurable neural network acceleration method and architecture | |
CN105930902B (en) | A kind of processing method of neural network, system | |
CN109189474A (en) | Processing with Neural Network device and its method for executing vector adduction instruction | |
CN110502330A (en) | Processor and processing method | |
CN110245752A (en) | A kind of connection operation method and device entirely | |
CN108205700A (en) | Neural network computing device and method | |
CN112612521A (en) | Apparatus and method for performing matrix multiplication operation | |
CN109032670A (en) | Processing with Neural Network device and its method for executing vector duplicate instructions | |
CN111461311A (en) | Convolutional neural network operation acceleration method and device based on many-core processor | |
CN110276447A (en) | A kind of computing device and method | |
CN107451097B (en) | High-performance implementation method of multi-dimensional FFT on domestic Shenwei 26010 multi-core processor | |
CN110163350A (en) | A kind of computing device and method | |
CN109754062A (en) | The execution method and Related product of convolution extended instruction | |
CN110909872A (en) | Integrated circuit chip device and related product | |
CN109389213B (en) | Storage device and method, data processing device and method, and electronic device | |
TW201931216A (en) | Integrated circuit chip device and related products comprise a compression mapping circuit for executing the compressing processing of each of the data; the main processing circuit for executing each successive operation in the neural network operation, etc. | |
CN109389209A (en) | Processing unit and processing method | |
CN110472734A (en) | A kind of computing device and Related product | |
CN108960415A (en) | Processing unit and processing system | |
TWI768168B (en) | Integrated circuit chip device and related products | |
CN115081600A (en) | Conversion unit for executing Winograd convolution, integrated circuit device and board card | |
TW201937412A (en) | Integrated circuit chip device and related product has the advantages of small amount of calculation and low power consumption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |