CN108241484A

CN108241484A - Neural computing device and method based on high bandwidth memory

Info

Publication number: CN108241484A
Application number: CN201611221798.8A
Authority: CN
Inventors: 陈天石; 李韦; 郭崎; 陈云霁
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2018-07-03
Anticipated expiration: 2036-12-26
Also published as: CN108241484B; TWI736716B; TW201824097A

Abstract

The present invention provides a kind of neural computing device and method based on high bandwidth memory, neural computing device includes at least one high bandwidth memory, each high bandwidth memory includes multiple memories that stack adds up, neural network accelerator, it is electrically connected with high bandwidth memory, data exchange is carried out between neural network accelerator and high bandwidth memory, and performs neural computing.The present invention can greatly improve memory bandwidth, and memory of the high broadband memory as neural computing device can carry out input data and the data exchange of operational parameter so that the IO times greatly shorten between buffer and memory faster；Since high broadband memory is stacked structures, lateral plane space is not take up, the area of neural computing device can be greatly reduced, the area of neural computing device can be contracted to about the 5% of the prior art；Reduce the power consumption of neural computing device.

Description

Neural computing device and method based on high bandwidth memory

Technical field

The present invention relates to the applications that high-performance is stored in neural computing field, especially a kind of to be stored based on high bandwidth The neural computing device and method of device.

Background technology

Artificial intelligence field is developing rapidly at present, and machine learning is also in the every aspect for affecting people's life.Make For one, machine learning field important component, the research in terms of neural network is also that industrial quarters and academia pay close attention to simultaneously Hot spot.Due to data volume huge in neural computing, how accelerans network algorithm, which performs to become, needs us to solve Major issue.Therefore dedicated neural computing device comes into being.

The type of dynamic random access memory area DRAM used in the framework of current stage neural computing device is big Mostly GDDR4 or GDDR5.However in neural computing device, due to needing to consider bandwidth, performance, power consumption and area side The problem of face, GDDR4 or GDDR5 cannot fully meet the needs of neural computing device, and technology development also has been enter into Bottleneck period.The bandwidth per second for increasing 1GB will bring more power consumptions, no matter this is for designer or consumer It is not a wisdom, efficient or worthwhile selection.Meanwhile also there is be difficult to reduce seriously asking for area by GDDR4 or GDDR5 Topic.Therefore, GDDR4 or GDDR5 will hinder the sustainable growth of neural computing device performance gradually.

Invention content

(1) technical problems to be solved

In view of this, it is a primary object of the present invention to provide a kind of neural computing device and method, for solving The bottleneck of the bandwidth that neural computing device presented above is faced, energy consumption, area etc..

(2) technical solution

The present invention provides a kind of neural computing device based on high bandwidth memory, including：At least one high band Wide memory, each high bandwidth memory include multiple memories that stack adds up；Neural network accelerator, with the high band Wide memory is electrically connected, and data exchange is carried out, and perform nerve between the neural network accelerator and high bandwidth memory Network calculations.

Preferably, the neural network accelerator includes：Memory interface, HBM Memory controls module, buffer, buffering control Molding block, neural processing unit；Pass through the memory interface and HBM Memory controls between the high bandwidth memory and buffer Module exchanges data, and the HBM Memory controls module carries out the high bandwidth memory and buffer that clock is synchronous and bit wide Matching；The buffer exchanges data with neural processing unit by the cushioning control module, the neural processing unit into Row neural computing.

Preferably, the memory interface is by the data transmission of the high bandwidth memory to the HBM Memory controls module, And by the data transmission of the HBM Memory controls module to the high bandwidth memory；The HBM Memory controls module synchronization The clock of the high bandwidth memory and buffer is converted to the data bandwidth that the memory interface transmits and the buffer The bandwidth to match, and by the data transmission of bandwidth match to the buffer；The data bandwidth of the buffer is converted to The bandwidth to match with the high bandwidth memory, and by the data transmission of the bandwidth match to the memory interface.

Preferably, it further includes：Package substrate, intermediary layer, logic chip；The intermediary layer is formed in the package substrate On, the logic chip and neural network accelerator are formed on the intermediary layer, and the high bandwidth memory is formed in logic On chip, multiple memories that the stack adds up are along the multiple DRAM cores to add up perpendicular to package substrate direction stack Piece.

Preferably, it further includes：Package substrate, logic chip；The neural network accelerator is formed on package substrate, institute It states logic chip to be formed on neural network accelerator, the high bandwidth memory is formed on logic chip, the stack Cumulative multiple memories are along the multiple dram chips to add up perpendicular to package substrate direction stack.

Preferably, the multiple dram chip is added up by dimpling Welding stacking-type, the high bandwidth memory most bottom The dram chip of layer is formed in by dimpling Welding on logic chip, and the logic chip and neural network accelerator pass through micro- Projection welding process is formed on intermediary layer；The dram chip, logic chip and intermediary layer have what is opened up using silicon perforation technique Through-hole, dram chip are electrically connected by the conducting wire that through-hole is arranged through logic chip and neural network accelerator.

Preferably, the multiple dram chip is added up by dimpling Welding stacking-type, the high bandwidth memory most bottom The dram chip of layer is formed in by dimpling Welding on logic chip, and the logic chip is formed in god by dimpling Welding Through on network accelerator, the neural network accelerator is formed in by dimpling Welding on package substrate, the dram chip There is the through-hole that is opened up using silicon perforation technique with logic chip, dram chip by the conducting wire that through-hole arrange through logic chip and Neural network accelerator is electrically connected.

The present invention also provides a kind of neural computing methods based on high bandwidth memory, utilize above-mentioned neural network Computing device carries out neural computing, including：The input data of this neural computing is transmitted from high bandwidth memory To the buffer of neural network accelerator；The operational parameter of this neural computing is transmitted to nerve by high bandwidth memory The buffer of network accelerator；The input data of buffer storage and operational parameter are transferred to neural processing unit, at nerve Reason unit handles input data and operational parameter, obtains the output data of this neural computing and by output data It stores to buffer；The output data of buffer is transmitted to high bandwidth memory.

Preferably, the input data of this neural computing is transmitted to neural network accelerator from high bandwidth memory Buffer include：By input data, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory controls successively Module, until input data to be all transmitted to HBM Memory control modules；HBM Memory controls module is by the bit wide of input data The bit wide to match with buffer is converted to, and the matched input data of bit wide is transmitted to buffer；By this neural network The buffer that the operational parameter of calculating is transmitted to neural network accelerator by high bandwidth memory includes：By operational parameter according to height Stored interface is transmitted to HBM Memory control modules to the bit wide of bandwidth memory successively, until operational parameter is all transmitted to HBM Memory control modules；The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, and The matched operational parameter of bit wide is transmitted to buffer.

Preferably, the output data of buffer is transmitted to high bandwidth memory to include：HBM Memory controls module will export The bit width conversion of data is the bit wide to match with high bandwidth memory, and the matched output data of bit wide is transmitted to high bandwidth Memory.

(3) advantageous effect

It can be seen from the above technical proposal that neural computing device based on high bandwidth memory and the side of the present invention Method has the advantages that：

(1) during neural network computing, memory of the high broadband memory as neural computing device can be faster The data exchange that input data and operational parameter are carried out between buffer and memory, this causes the IO times to greatly shorten；

(2) neural network using the high broadband memory of stacking-type storage organization and with HBM Memory control modules adds Fast device can greatly improve memory bandwidth, and bandwidth can be promoted to more than twice of the prior art, and operational performance obtains changing greatly very much It is kind；

(3) since high broadband memory is stacked structures, lateral plane space is not take up, nerve can be greatly reduced The area of network computation device, the area of neural computing device can be contracted to about the 5% of the prior art；

(4) power consumption of neural computing device is reduced, by dimpling Welding and the wiring interconnection of silicon perforation technique, is carried High data transfer bandwidth and transmission speed.

Description of the drawings

Fig. 1 is that the overall structure of the neural computing device based on high bandwidth memory of embodiment according to the present invention is shown It is intended to；

Fig. 2 is the sectional view of neural computing device shown in Fig. 1；

Fig. 3 is the integrated stand composition of the neural network accelerator with HBM Memory control modules of the embodiment of the present invention；

Fig. 4 is the section of the neural computing device based on high bandwidth memory of another embodiment according to the present invention Figure；

Fig. 5 is the flow chart of the neural computing method based on high bandwidth memory of embodiment according to the present invention.

Symbol description

101st, 201,401- package substrates；102nd, 202- intermediary layers；103rd, 203,403- logic chips；104- high bandwidths are deposited Reservoir；105th, 205,402- neural network accelerators；204-HBM Memory control modules；206th, 406- through-holes；207th, 407- microbondings Ball；208th, 405-DRAM chips；301- memory interfaces；302- encapsulating structures；303- control units；304th, 404-HBM memories control Molding block；305- buffers；306- cushioning control modules；307- nerve processing units.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.

High bandwidth memory (High-Bandwidth Memory, HBM) is used as a kind of Novel low power consumption storage chip, tool There are ultra-wide communication data access, low-power consumption and the small excellent characteristic of area.One embodiment of the invention proposes a kind of based on high band The neural computing device of wide memory, referring to Fig. 1, which includes：101 (Package of package substrate Substrate), intermediary layer 102 (Interposer), logic chip 103 (Logic Die), high bandwidth memory 104 (Stacked Memory) and neural network accelerator 105.Wherein,

Package substrate 101 for carrying the above-mentioned other component of neural computing device, and electrically connects with upper equipment It connects, such as computer, mobile phone and various embedded devices.

Intermediary layer 102 is formed on package substrate 101, carries logic chip 103 and neural network accelerator 105.

Logic chip 103 is formed on intermediary layer 102, and logic chip 103 stores for connecting intermediary layer 102 and high bandwidth Device 104 realizes one layer of encapsulation to high bandwidth memory 104.

High bandwidth memory 104 is formed on logic chip 103, which is included along perpendicular to encapsulation Multiple memories that stack adds up on the direction of substrate 101.

Neural network accelerator 105 is also formed on intermediary layer 102, for performing neural computing, can be completed Entire neural computing, can also complete the basic operation of the neural computings such as convolution, and neural network accelerator 105 passes through Intermediary layer 102 is electrically connected with logic chip 103, and data exchange is carried out between high bandwidth memory 104.

Fig. 2 is referred to, which show neural computing device edge the cuing open perpendicular to package substrate direction of the present embodiment Face figure, neural computing device are 2.5D (2.5 dimension) storage architecture.Wherein, high bandwidth memory includes four dynamic randoms Memory (DRAM) chip 208 is accessed, (μ bumps) technique is welded using dimpling and adds up to the progress stacking-type of dram chip 208, phase Microbonding ball 207 is formed between adjacent dram chip 208, the dram chip 208 of the high bandwidth memory bottom utilizes dimpling welder Skill is formed on logic chip 203, and logic chip 203 and neural network accelerator 205 are formed in intermediary using dimpling Welding On layer 202, intermediary layer 202 is formed in using flip-chip welding procedure on package substrate 201.Utilize silicon perforation (Through- Silicon Vias, TSVs) technique opens up through-hole 206 in dram chip 208, and utilizes silicon perforation technique in logic chip 203 Through-hole is opened up with intermediary layer 202, using above-mentioned through-hole and microbonding ball layout conducting wire, makes dram chip 208 and 203 electricity of logic chip Property connection, logic chip 203 is electrically connected by the conducting wire of 202 through-hole of intermediary layer with neural network accelerator 205, realization high band The interconnection of wide memory and neural network accelerator 205, in the control of the HBM Memory controls module 204 of neural network accelerator 205 Under system, data are transmitted between neural network accelerator 205 and high bandwidth memory.

Each channel width of existing GDDR memory is 32bits, and the memory bus width of 16 channels is 512bit.And in the present embodiment, high bandwidth memory can include four dram chips, and there are two each dram chip tools The channel of 128-bit, high bandwidth memory can provide the bit wide of 1024bit, be twice of above-mentioned GDDR memory bit wide.

It the above is only and illustratively describe the present invention, but present disclosure is not limited thereto, for example, neural network The logic chip of computing device can be multiple, and correspondingly, high bandwidth memory can also be multiple, each high bandwidth memory Dram chip can also be more than four, the quantity of above-mentioned component can be set according to actual demand.

For example, neural computing device can include four logic chips and four high bandwidth memories, each high band Wide memory includes four dram chips, and for each dram chip tool there are two the channel of 128-bit, each high bandwidth memory can To provide the bit wide of 1024bit, four high bandwidth memories can provide the bit wide of 4096bit, be above-mentioned GDDR memory position Wide is octuple.

Each high bandwidth memory can also include eight dram chips, and there are two 128-bit's for each dram chip tool Channel, each high bandwidth memory can provide the bit wide of 2048bit, and four high bandwidth memories can provide 8192bit's Bit wide is 16 times of above-mentioned GDDR memory bit wide.

Fig. 3 is referred to, which show the entirety of the neural network accelerator with HBM Memory control modules of the present embodiment Organization Chart.Neural network accelerator includes：Memory interface 301 (Memory Interface), 303 (Control of control unit Processor), HBM Memory controls module 304 (HBM Controller), buffer 305 (BUFFER), cushioning control module 306 (Buffer Controller), neural processing unit 307 (NFU), wherein, control processor 303, HBM Memory control moulds Block 304, buffer 305, cushioning control module 306 and neural processing unit 307 encapsulate integrally, form an encapsulating structure 302.

Memory interface 301 as neural network accelerator and the interface of high bandwidth memory, passes through conducting wire and logic core Piece, high bandwidth memory dram chip be electrically connected, for the data for receiving high bandwidth memory transmission and to high bandwidth Memory transmits data.

HBM Memory controls module 304, for controlling the data transmission between high bandwidth memory and buffer, including association The data bandwidth of bandwidth memory and buffer is turned up and makes high bandwidth memory synchronous with the clock of buffer.HBM memories Control module 304 synchronizes the clock of high bandwidth memory and buffer, the number for the high bandwidth memory that memory interface 301 is received According to bandwidth conversion be the bandwidth that matches with buffer, and by the data transmission of bandwidth match to buffer；By buffer The bandwidth conversion of data is the bandwidth to match with high bandwidth memory, and the stored interface 301 of the data of bandwidth match is passed Transport to high bandwidth memory.

Buffer 305 is the internal storage unit of neural network accelerator, is passed for receiving HBM Memory controls module 304 The data of defeated bandwidth match and the data stored are transferred to HBM Memory controls module 304.

Cushioning control module 306 is used to control the data interaction between buffer 305 and neural processing unit 307, will be slow The data that storage 305 stores are transferred to neural processing unit 307, and neural processing unit 307 carries out neural computing, buffering control The result of calculation of neural processing unit 307 is transferred to buffer 305 by molding block 306.

Control unit 303, by Instruction decoding to HBM Memory controls module 304, buffer 305, cushioning control module 306 send control instruction with neural processing unit 307, coordinate and dispatch above-mentioned module cooperative work, realize neural network acceleration The computing function of device.

It can be seen that the present invention neural computing device, using stacking-type storage organization high broadband memory and Neural network accelerator with HBM Memory control modules, can greatly improve memory bandwidth, and bandwidth can be promoted to existing skill More than twice of art, operational performance is greatly improved, and during neural network computing, high broadband memory is as nerve net The memory of network computing device can carry out input data and the data exchange of operational parameter between buffer and memory faster, This causes the IO times to greatly shorten.Since high broadband memory is stacked structures, it is not take up lateral plane space, Ke Yi great Width reduces the area of neural computing device, and the area of neural computing device can be contracted to the prior art about 5%；The power consumption of neural computing device is also reduced simultaneously；And pass through dimpling Welding and silicon perforation between dram chip Technique wiring interconnection, carries out data exchange using intermediary layer between neural network accelerator and high bandwidth memory, further carries Between high different dram chips and transmission bandwidth and transmission speed between neural network accelerator and high bandwidth memory.

The neural computing device based on high bandwidth memory that another embodiment of the present invention proposes, referring to Fig. 4, with The identical feature of above-described embodiment is not in repeated description.The neural computing device is 3D (3 dimension) storage architecture, by bottom extremely Top layer includes the package substrate 401, neural network accelerator 402, logic chip 403 and the high bandwidth memory that are stacked.It is high Bandwidth memory includes four dram chips 405, and carrying out stacking-type to dram chip 405 using dimpling Welding adds up, adjacent Microbonding ball 407 is formed between dram chip 405, the dram chip 405 of the high bandwidth memory bottom utilizes dimpling Welding It is formed on logic chip 403, logic chip 403 is formed in using dimpling Welding on neural network accelerator 402, nerve net Network accelerator 402 is formed in using dimpling Welding on package substrate 401.It is opened up using silicon perforation technique in dram chip 405 Through-hole 406, and through-hole is opened up in logic chip 403 using silicon perforation technique, using above-mentioned through-hole and microbonding ball layout conducting wire, Dram chip 405 is electrically connected with logic chip 403, neural network accelerator 402, realizes high bandwidth memory and neural network The vertical interconnection of accelerator 402, under the control of the HBM Memory controls module 404 of neural network accelerator 402, data are in god Through being transmitted between network accelerator 402 and high bandwidth memory.

It can be seen that the neural computing device, since high bandwidth memory is directly stacked upon neural network accelerator On, relative to the storage architecture of 2.5D, the area of neural computing device can be further saved, is particularly conducive to neural network The miniaturization of computing device；And high bandwidth memory and the distance of neural network accelerator are shorter, it is meant that therebetween It connects up shorter, signal transmission quality and transmission speed can be further improved.

Yet another embodiment of the invention provides a kind of neural computing method, the nerve based on above-mentioned high bandwidth memory Network computation device carries out neural computing, referring to Fig. 5, including：

Step S1：High bandwidth memory is written into the operational parameter of neural computing.

High bandwidth memory connects external memory, such as external disk by outside access unit.Outside access list High bandwidth memory is written in the operational parameter of external specified address by member, and operational parameter includes weights, bias table and function table etc..

Step S2：The input data of this neural computing is transmitted to neural network accelerator from high bandwidth memory Buffer, can specifically include：

Sub-step S21：HBM Memory controls module is addressed according to the initial address of input data, if address hit, The high bandwidth then provided according to high bandwidth memory, by data of bit wide that initial address starts, stored interface is transmitted to successively HBM Memory control modules, until input data to be all transmitted to HBM Memory control modules.For example, high bandwidth memory can be with The bit wide of 1024bit is provided, the input data of storage shares 4096bit, then high bandwidth memory is every time to HBM Memory controls Input data is all transferred to HBM Memory control modules by the input data of module transfer 1024bit after four transmission.

Sub-step S22：The bit width conversion of input data is the bit wide to match with buffer by HBM Memory controls module, And the matched input data of bit wide is transmitted to buffer.

Input data can be the input neuron vector of this neural computing.

Step S3：The operational parameter of this neural computing is transmitted to neural network accelerator by high bandwidth memory Buffer, similar with step S2, which can specifically include：

Sub-step S31：HBM Memory controls module is addressed according to the initial address of operational parameter, if address hit, The high bandwidth then provided according to high bandwidth memory, by data of bit wide that initial address starts, stored interface is transmitted to successively HBM Memory control modules, until operational parameter to be all transmitted to HBM Memory control modules.

Sub-step S32：The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, And the matched operational parameter of bit wide is transmitted to buffer.

Step S4：It is single that the input data and operational parameter that cushioning control module stores buffer are transferred to nerve processing Member, neural processing unit handle input data and operational parameter, obtain the output data of this neural computing, delay Control module is rushed to store output data to buffer.

Wherein, neural processing unit to input data and operational parameter in the process of processing, if neural network Calculating is there are intermediate data, then intermediate data to be stored in buffer by cushioning control module, and neural processing unit continues operation, when needing When wanting the intermediate data to participate in operation, intermediate data is returned to neural processing unit, neural processing unit by cushioning control module again Continue operation using intermediate data, obtain the output data of neural computing, the output data can be output neuron to Amount.

Step S5：Output data in buffer is transmitted to high bandwidth memory, and passes through outside access unit by output Data transmission is to external memory.Can specifically it include：HBM Memory controls module by the bit width conversion of output data be and height The bit wide that bandwidth memory matches, and the matched output data of bit wide is transmitted to high bandwidth memory, outside access unit The output data that high bandwidth memory stores is transmitted to external memory.

So far the operation result of this neural computing is can be obtained by, if continuing neural network meter next time It calculates, then can be performed with return to step S2, to obtain the operation result of neural computing next time.

It can be seen that the present invention neural computing method, using stacking-type storage organization high broadband memory and Neural network accelerator with HBM Memory control modules, can greatly improve memory bandwidth, and operational performance obtains changing greatly very much It is kind, improve signal transmission bandwidth and transmission speed.

It should be noted that in attached drawing or specification text, the realization method that is not painted or describes is affiliated technology Form known to a person of ordinary skill in the art in field, is not described in detail.In addition, above-mentioned definition to each element and not only limiting The various concrete structures mentioned in embodiment, shape, those of ordinary skill in the art simply can be changed or replaced to it It changes；The demonstration of the parameter comprising particular value can be provided herein, but these parameters are without being definitely equal to corresponding value, but can be can It is similar to analog value in the error margin of receiving or design constraint；The direction term mentioned in embodiment, such as " on ", " under ", "front", "rear", "left", "right" etc. are only the directions of refer to the attached drawing, are not used for limiting the scope of the invention；Above-mentioned reality Applying example can be based on the considerations of design and reliability, and the collocation that is mixed with each other is used using or with other embodiment mix and match, i.e., not More embodiments can be freely formed with the technical characteristic in embodiment.

Particular embodiments described above has carried out the purpose of the present invention, technical solution and advantageous effect further in detail It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the present invention Within the scope of shield.

Claims

1. a kind of neural computing device based on high bandwidth memory, which is characterized in that including：

At least one high bandwidth memory, each high bandwidth memory include multiple memories that stack adds up；

Neural network accelerator is electrically connected with the high bandwidth memory, and the neural network accelerator is stored with high bandwidth Data exchange is carried out between device, and performs neural computing.

2. neural computing device as described in claim 1, which is characterized in that the neural network accelerator includes：It deposits Store up interface, HBM Memory controls module, buffer, cushioning control module, neural processing unit；

Data are exchanged by the memory interface and HBM Memory control modules between the high bandwidth memory and buffer, institute HBM Memory controls module is stated to the high bandwidth memory synchronous and bit wide matching with buffer progress clock；

The buffer exchanges data with neural processing unit by the cushioning control module, and the nerve processing unit carries out Neural computing.

3. neural computing device as claimed in claim 2, which is characterized in that

The memory interface is by the data transmission of the high bandwidth memory to the HBM Memory controls module and by described in The data transmission of HBM Memory control modules is to the high bandwidth memory；

The clock of high bandwidth memory and buffer described in the HBM Memory controls module synchronization transmits the memory interface Data bandwidth be converted to the bandwidth to match with the buffer, and by the data transmission of bandwidth match to the buffer； The data bandwidth of the buffer is converted to the bandwidth to match with the high bandwidth memory, and by the bandwidth match Data transmission is to the memory interface.

4. neural computing device as described in claim 1, which is characterized in that further include：Package substrate, is patrolled at intermediary layer Collect chip；

The intermediary layer is formed on the package substrate,

The logic chip and neural network accelerator are formed on the intermediary layer,

The high bandwidth memory is formed on logic chip, and multiple memories that the stack adds up are along perpendicular to encapsulation Multiple dram chips that orientation substrate stack adds up.

5. neural computing device as described in claim 1, which is characterized in that further include：Package substrate, logic chip；

The neural network accelerator is formed on package substrate,

The logic chip is formed on neural network accelerator,

6. neural computing device as claimed in claim 4, which is characterized in that the multiple dram chip is welded by dimpling Technique stacking-type adds up, and the dram chip of the high bandwidth memory bottom is formed in logic chip by dimpling Welding On, the logic chip and neural network accelerator are formed in by dimpling Welding on intermediary layer；

The dram chip, logic chip and intermediary layer have the through-hole opened up using silicon perforation technique, and dram chip passes through logical The conducting wire of hole arrangement is electrically connected through logic chip and neural network accelerator.

7. neural computing device as claimed in claim 5, which is characterized in that

The multiple dram chip is added up by dimpling Welding stacking-type, the dram chip of the high bandwidth memory bottom It is formed on logic chip by dimpling Welding, the logic chip is formed in neural network accelerator by dimpling Welding On, the neural network accelerator is formed in by dimpling Welding on package substrate,

The dram chip and logic chip have the through-hole opened up using silicon perforation technique, and dram chip is arranged by through-hole Conducting wire is electrically connected through logic chip and neural network accelerator.

A kind of 8. neural computing method based on high bandwidth memory, which is characterized in that utilize any of the above-described claim The neural computing device based on high bandwidth memory carries out neural computing, including：

The input data of this neural computing is transmitted to the buffer of neural network accelerator from high bandwidth memory；

The operational parameter of this neural computing is transmitted to the buffer of neural network accelerator by high bandwidth memory；

The input data of buffer storage and operational parameter are transferred to neural processing unit, neural processing unit is to input data It is handled with operational parameter, obtain the output data of this neural computing and is stored output data to buffer；

The output data of buffer is transmitted to high bandwidth memory.

9. neural computing method as claimed in claim 8, which is characterized in that by the input number of this neural computing Include according to the buffer that neural network accelerator is transmitted to from high bandwidth memory：

By input data, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory control modules successively, until Input data is all transmitted to HBM Memory control modules；

The bit width conversion of input data is the bit wide to match with buffer by HBM Memory controls module, and bit wide is matched Input data is transmitted to buffer；

The operational parameter of this neural computing is transmitted to the buffer packet of neural network accelerator by high bandwidth memory It includes：

By operational parameter, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory control modules successively, until Operational parameter is all transmitted to HBM Memory control modules；

The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, and bit wide is matched Operational parameter is transmitted to buffer.

10. neural computing method as claimed in claim 8, which is characterized in that be transmitted to the output data of buffer High bandwidth memory includes：

The bit width conversion of output data is the bit wide to match with high bandwidth memory by HBM Memory controls module, and by bit wide Matched output data is transmitted to high bandwidth memory.