CN108241484A - Neural computing device and method based on high bandwidth memory - Google Patents

Neural computing device and method based on high bandwidth memory Download PDF

Info

Publication number
CN108241484A
CN108241484A CN201611221798.8A CN201611221798A CN108241484A CN 108241484 A CN108241484 A CN 108241484A CN 201611221798 A CN201611221798 A CN 201611221798A CN 108241484 A CN108241484 A CN 108241484A
Authority
CN
China
Prior art keywords
memory
high bandwidth
neural
buffer
bandwidth memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611221798.8A
Other languages
Chinese (zh)
Other versions
CN108241484B (en
Inventor
陈天石
李韦
郭崎
陈云霁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201611221798.8A priority Critical patent/CN108241484B/en
Priority to PCT/CN2017/111333 priority patent/WO2018121118A1/en
Priority to TW106141858A priority patent/TWI736716B/en
Publication of CN108241484A publication Critical patent/CN108241484A/en
Application granted granted Critical
Publication of CN108241484B publication Critical patent/CN108241484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Neurology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Memory System (AREA)
  • Dram (AREA)
  • Semiconductor Memories (AREA)

Abstract

The present invention provides a kind of neural computing device and method based on high bandwidth memory, neural computing device includes at least one high bandwidth memory, each high bandwidth memory includes multiple memories that stack adds up, neural network accelerator, it is electrically connected with high bandwidth memory, data exchange is carried out between neural network accelerator and high bandwidth memory, and performs neural computing.The present invention can greatly improve memory bandwidth, and memory of the high broadband memory as neural computing device can carry out input data and the data exchange of operational parameter so that the IO times greatly shorten between buffer and memory faster;Since high broadband memory is stacked structures, lateral plane space is not take up, the area of neural computing device can be greatly reduced, the area of neural computing device can be contracted to about the 5% of the prior art;Reduce the power consumption of neural computing device.

Description

Neural computing device and method based on high bandwidth memory
Technical field
The present invention relates to the applications that high-performance is stored in neural computing field, especially a kind of to be stored based on high bandwidth The neural computing device and method of device.
Background technology
Artificial intelligence field is developing rapidly at present, and machine learning is also in the every aspect for affecting people's life.Make For one, machine learning field important component, the research in terms of neural network is also that industrial quarters and academia pay close attention to simultaneously Hot spot.Due to data volume huge in neural computing, how accelerans network algorithm, which performs to become, needs us to solve Major issue.Therefore dedicated neural computing device comes into being.
The type of dynamic random access memory area DRAM used in the framework of current stage neural computing device is big Mostly GDDR4 or GDDR5.However in neural computing device, due to needing to consider bandwidth, performance, power consumption and area side The problem of face, GDDR4 or GDDR5 cannot fully meet the needs of neural computing device, and technology development also has been enter into Bottleneck period.The bandwidth per second for increasing 1GB will bring more power consumptions, no matter this is for designer or consumer It is not a wisdom, efficient or worthwhile selection.Meanwhile also there is be difficult to reduce seriously asking for area by GDDR4 or GDDR5 Topic.Therefore, GDDR4 or GDDR5 will hinder the sustainable growth of neural computing device performance gradually.
Invention content
(1) technical problems to be solved
In view of this, it is a primary object of the present invention to provide a kind of neural computing device and method, for solving The bottleneck of the bandwidth that neural computing device presented above is faced, energy consumption, area etc..
(2) technical solution
The present invention provides a kind of neural computing device based on high bandwidth memory, including:At least one high band Wide memory, each high bandwidth memory include multiple memories that stack adds up;Neural network accelerator, with the high band Wide memory is electrically connected, and data exchange is carried out, and perform nerve between the neural network accelerator and high bandwidth memory Network calculations.
Preferably, the neural network accelerator includes:Memory interface, HBM Memory controls module, buffer, buffering control Molding block, neural processing unit;Pass through the memory interface and HBM Memory controls between the high bandwidth memory and buffer Module exchanges data, and the HBM Memory controls module carries out the high bandwidth memory and buffer that clock is synchronous and bit wide Matching;The buffer exchanges data with neural processing unit by the cushioning control module, the neural processing unit into Row neural computing.
Preferably, the memory interface is by the data transmission of the high bandwidth memory to the HBM Memory controls module, And by the data transmission of the HBM Memory controls module to the high bandwidth memory;The HBM Memory controls module synchronization The clock of the high bandwidth memory and buffer is converted to the data bandwidth that the memory interface transmits and the buffer The bandwidth to match, and by the data transmission of bandwidth match to the buffer;The data bandwidth of the buffer is converted to The bandwidth to match with the high bandwidth memory, and by the data transmission of the bandwidth match to the memory interface.
Preferably, it further includes:Package substrate, intermediary layer, logic chip;The intermediary layer is formed in the package substrate On, the logic chip and neural network accelerator are formed on the intermediary layer, and the high bandwidth memory is formed in logic On chip, multiple memories that the stack adds up are along the multiple DRAM cores to add up perpendicular to package substrate direction stack Piece.
Preferably, it further includes:Package substrate, logic chip;The neural network accelerator is formed on package substrate, institute It states logic chip to be formed on neural network accelerator, the high bandwidth memory is formed on logic chip, the stack Cumulative multiple memories are along the multiple dram chips to add up perpendicular to package substrate direction stack.
Preferably, the multiple dram chip is added up by dimpling Welding stacking-type, the high bandwidth memory most bottom The dram chip of layer is formed in by dimpling Welding on logic chip, and the logic chip and neural network accelerator pass through micro- Projection welding process is formed on intermediary layer;The dram chip, logic chip and intermediary layer have what is opened up using silicon perforation technique Through-hole, dram chip are electrically connected by the conducting wire that through-hole is arranged through logic chip and neural network accelerator.
Preferably, the multiple dram chip is added up by dimpling Welding stacking-type, the high bandwidth memory most bottom The dram chip of layer is formed in by dimpling Welding on logic chip, and the logic chip is formed in god by dimpling Welding Through on network accelerator, the neural network accelerator is formed in by dimpling Welding on package substrate, the dram chip There is the through-hole that is opened up using silicon perforation technique with logic chip, dram chip by the conducting wire that through-hole arrange through logic chip and Neural network accelerator is electrically connected.
The present invention also provides a kind of neural computing methods based on high bandwidth memory, utilize above-mentioned neural network Computing device carries out neural computing, including:The input data of this neural computing is transmitted from high bandwidth memory To the buffer of neural network accelerator;The operational parameter of this neural computing is transmitted to nerve by high bandwidth memory The buffer of network accelerator;The input data of buffer storage and operational parameter are transferred to neural processing unit, at nerve Reason unit handles input data and operational parameter, obtains the output data of this neural computing and by output data It stores to buffer;The output data of buffer is transmitted to high bandwidth memory.
Preferably, the input data of this neural computing is transmitted to neural network accelerator from high bandwidth memory Buffer include:By input data, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory controls successively Module, until input data to be all transmitted to HBM Memory control modules;HBM Memory controls module is by the bit wide of input data The bit wide to match with buffer is converted to, and the matched input data of bit wide is transmitted to buffer;By this neural network The buffer that the operational parameter of calculating is transmitted to neural network accelerator by high bandwidth memory includes:By operational parameter according to height Stored interface is transmitted to HBM Memory control modules to the bit wide of bandwidth memory successively, until operational parameter is all transmitted to HBM Memory control modules;The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, and The matched operational parameter of bit wide is transmitted to buffer.
Preferably, the output data of buffer is transmitted to high bandwidth memory to include:HBM Memory controls module will export The bit width conversion of data is the bit wide to match with high bandwidth memory, and the matched output data of bit wide is transmitted to high bandwidth Memory.
(3) advantageous effect
It can be seen from the above technical proposal that neural computing device based on high bandwidth memory and the side of the present invention Method has the advantages that:
(1) during neural network computing, memory of the high broadband memory as neural computing device can be faster The data exchange that input data and operational parameter are carried out between buffer and memory, this causes the IO times to greatly shorten;
(2) neural network using the high broadband memory of stacking-type storage organization and with HBM Memory control modules adds Fast device can greatly improve memory bandwidth, and bandwidth can be promoted to more than twice of the prior art, and operational performance obtains changing greatly very much It is kind;
(3) since high broadband memory is stacked structures, lateral plane space is not take up, nerve can be greatly reduced The area of network computation device, the area of neural computing device can be contracted to about the 5% of the prior art;
(4) power consumption of neural computing device is reduced, by dimpling Welding and the wiring interconnection of silicon perforation technique, is carried High data transfer bandwidth and transmission speed.
Description of the drawings
Fig. 1 is that the overall structure of the neural computing device based on high bandwidth memory of embodiment according to the present invention is shown It is intended to;
Fig. 2 is the sectional view of neural computing device shown in Fig. 1;
Fig. 3 is the integrated stand composition of the neural network accelerator with HBM Memory control modules of the embodiment of the present invention;
Fig. 4 is the section of the neural computing device based on high bandwidth memory of another embodiment according to the present invention Figure;
Fig. 5 is the flow chart of the neural computing method based on high bandwidth memory of embodiment according to the present invention.
Symbol description
101st, 201,401- package substrates;102nd, 202- intermediary layers;103rd, 203,403- logic chips;104- high bandwidths are deposited Reservoir;105th, 205,402- neural network accelerators;204-HBM Memory control modules;206th, 406- through-holes;207th, 407- microbondings Ball;208th, 405-DRAM chips;301- memory interfaces;302- encapsulating structures;303- control units;304th, 404-HBM memories control Molding block;305- buffers;306- cushioning control modules;307- nerve processing units.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
High bandwidth memory (High-Bandwidth Memory, HBM) is used as a kind of Novel low power consumption storage chip, tool There are ultra-wide communication data access, low-power consumption and the small excellent characteristic of area.One embodiment of the invention proposes a kind of based on high band The neural computing device of wide memory, referring to Fig. 1, which includes:101 (Package of package substrate Substrate), intermediary layer 102 (Interposer), logic chip 103 (Logic Die), high bandwidth memory 104 (Stacked Memory) and neural network accelerator 105.Wherein,
Package substrate 101 for carrying the above-mentioned other component of neural computing device, and electrically connects with upper equipment It connects, such as computer, mobile phone and various embedded devices.
Intermediary layer 102 is formed on package substrate 101, carries logic chip 103 and neural network accelerator 105.
Logic chip 103 is formed on intermediary layer 102, and logic chip 103 stores for connecting intermediary layer 102 and high bandwidth Device 104 realizes one layer of encapsulation to high bandwidth memory 104.
High bandwidth memory 104 is formed on logic chip 103, which is included along perpendicular to encapsulation Multiple memories that stack adds up on the direction of substrate 101.
Neural network accelerator 105 is also formed on intermediary layer 102, for performing neural computing, can be completed Entire neural computing, can also complete the basic operation of the neural computings such as convolution, and neural network accelerator 105 passes through Intermediary layer 102 is electrically connected with logic chip 103, and data exchange is carried out between high bandwidth memory 104.
Fig. 2 is referred to, which show neural computing device edge the cuing open perpendicular to package substrate direction of the present embodiment Face figure, neural computing device are 2.5D (2.5 dimension) storage architecture.Wherein, high bandwidth memory includes four dynamic randoms Memory (DRAM) chip 208 is accessed, (μ bumps) technique is welded using dimpling and adds up to the progress stacking-type of dram chip 208, phase Microbonding ball 207 is formed between adjacent dram chip 208, the dram chip 208 of the high bandwidth memory bottom utilizes dimpling welder Skill is formed on logic chip 203, and logic chip 203 and neural network accelerator 205 are formed in intermediary using dimpling Welding On layer 202, intermediary layer 202 is formed in using flip-chip welding procedure on package substrate 201.Utilize silicon perforation (Through- Silicon Vias, TSVs) technique opens up through-hole 206 in dram chip 208, and utilizes silicon perforation technique in logic chip 203 Through-hole is opened up with intermediary layer 202, using above-mentioned through-hole and microbonding ball layout conducting wire, makes dram chip 208 and 203 electricity of logic chip Property connection, logic chip 203 is electrically connected by the conducting wire of 202 through-hole of intermediary layer with neural network accelerator 205, realization high band The interconnection of wide memory and neural network accelerator 205, in the control of the HBM Memory controls module 204 of neural network accelerator 205 Under system, data are transmitted between neural network accelerator 205 and high bandwidth memory.
Each channel width of existing GDDR memory is 32bits, and the memory bus width of 16 channels is 512bit.And in the present embodiment, high bandwidth memory can include four dram chips, and there are two each dram chip tools The channel of 128-bit, high bandwidth memory can provide the bit wide of 1024bit, be twice of above-mentioned GDDR memory bit wide.
It the above is only and illustratively describe the present invention, but present disclosure is not limited thereto, for example, neural network The logic chip of computing device can be multiple, and correspondingly, high bandwidth memory can also be multiple, each high bandwidth memory Dram chip can also be more than four, the quantity of above-mentioned component can be set according to actual demand.
For example, neural computing device can include four logic chips and four high bandwidth memories, each high band Wide memory includes four dram chips, and for each dram chip tool there are two the channel of 128-bit, each high bandwidth memory can To provide the bit wide of 1024bit, four high bandwidth memories can provide the bit wide of 4096bit, be above-mentioned GDDR memory position Wide is octuple.
Each high bandwidth memory can also include eight dram chips, and there are two 128-bit's for each dram chip tool Channel, each high bandwidth memory can provide the bit wide of 2048bit, and four high bandwidth memories can provide 8192bit's Bit wide is 16 times of above-mentioned GDDR memory bit wide.
Fig. 3 is referred to, which show the entirety of the neural network accelerator with HBM Memory control modules of the present embodiment Organization Chart.Neural network accelerator includes:Memory interface 301 (Memory Interface), 303 (Control of control unit Processor), HBM Memory controls module 304 (HBM Controller), buffer 305 (BUFFER), cushioning control module 306 (Buffer Controller), neural processing unit 307 (NFU), wherein, control processor 303, HBM Memory control moulds Block 304, buffer 305, cushioning control module 306 and neural processing unit 307 encapsulate integrally, form an encapsulating structure 302.
Memory interface 301 as neural network accelerator and the interface of high bandwidth memory, passes through conducting wire and logic core Piece, high bandwidth memory dram chip be electrically connected, for the data for receiving high bandwidth memory transmission and to high bandwidth Memory transmits data.
HBM Memory controls module 304, for controlling the data transmission between high bandwidth memory and buffer, including association The data bandwidth of bandwidth memory and buffer is turned up and makes high bandwidth memory synchronous with the clock of buffer.HBM memories Control module 304 synchronizes the clock of high bandwidth memory and buffer, the number for the high bandwidth memory that memory interface 301 is received According to bandwidth conversion be the bandwidth that matches with buffer, and by the data transmission of bandwidth match to buffer;By buffer The bandwidth conversion of data is the bandwidth to match with high bandwidth memory, and the stored interface 301 of the data of bandwidth match is passed Transport to high bandwidth memory.
Buffer 305 is the internal storage unit of neural network accelerator, is passed for receiving HBM Memory controls module 304 The data of defeated bandwidth match and the data stored are transferred to HBM Memory controls module 304.
Cushioning control module 306 is used to control the data interaction between buffer 305 and neural processing unit 307, will be slow The data that storage 305 stores are transferred to neural processing unit 307, and neural processing unit 307 carries out neural computing, buffering control The result of calculation of neural processing unit 307 is transferred to buffer 305 by molding block 306.
Control unit 303, by Instruction decoding to HBM Memory controls module 304, buffer 305, cushioning control module 306 send control instruction with neural processing unit 307, coordinate and dispatch above-mentioned module cooperative work, realize neural network acceleration The computing function of device.
It can be seen that the present invention neural computing device, using stacking-type storage organization high broadband memory and Neural network accelerator with HBM Memory control modules, can greatly improve memory bandwidth, and bandwidth can be promoted to existing skill More than twice of art, operational performance is greatly improved, and during neural network computing, high broadband memory is as nerve net The memory of network computing device can carry out input data and the data exchange of operational parameter between buffer and memory faster, This causes the IO times to greatly shorten.Since high broadband memory is stacked structures, it is not take up lateral plane space, Ke Yi great Width reduces the area of neural computing device, and the area of neural computing device can be contracted to the prior art about 5%;The power consumption of neural computing device is also reduced simultaneously;And pass through dimpling Welding and silicon perforation between dram chip Technique wiring interconnection, carries out data exchange using intermediary layer between neural network accelerator and high bandwidth memory, further carries Between high different dram chips and transmission bandwidth and transmission speed between neural network accelerator and high bandwidth memory.
The neural computing device based on high bandwidth memory that another embodiment of the present invention proposes, referring to Fig. 4, with The identical feature of above-described embodiment is not in repeated description.The neural computing device is 3D (3 dimension) storage architecture, by bottom extremely Top layer includes the package substrate 401, neural network accelerator 402, logic chip 403 and the high bandwidth memory that are stacked.It is high Bandwidth memory includes four dram chips 405, and carrying out stacking-type to dram chip 405 using dimpling Welding adds up, adjacent Microbonding ball 407 is formed between dram chip 405, the dram chip 405 of the high bandwidth memory bottom utilizes dimpling Welding It is formed on logic chip 403, logic chip 403 is formed in using dimpling Welding on neural network accelerator 402, nerve net Network accelerator 402 is formed in using dimpling Welding on package substrate 401.It is opened up using silicon perforation technique in dram chip 405 Through-hole 406, and through-hole is opened up in logic chip 403 using silicon perforation technique, using above-mentioned through-hole and microbonding ball layout conducting wire, Dram chip 405 is electrically connected with logic chip 403, neural network accelerator 402, realizes high bandwidth memory and neural network The vertical interconnection of accelerator 402, under the control of the HBM Memory controls module 404 of neural network accelerator 402, data are in god Through being transmitted between network accelerator 402 and high bandwidth memory.
It can be seen that the neural computing device, since high bandwidth memory is directly stacked upon neural network accelerator On, relative to the storage architecture of 2.5D, the area of neural computing device can be further saved, is particularly conducive to neural network The miniaturization of computing device;And high bandwidth memory and the distance of neural network accelerator are shorter, it is meant that therebetween It connects up shorter, signal transmission quality and transmission speed can be further improved.
Yet another embodiment of the invention provides a kind of neural computing method, the nerve based on above-mentioned high bandwidth memory Network computation device carries out neural computing, referring to Fig. 5, including:
Step S1:High bandwidth memory is written into the operational parameter of neural computing.
High bandwidth memory connects external memory, such as external disk by outside access unit.Outside access list High bandwidth memory is written in the operational parameter of external specified address by member, and operational parameter includes weights, bias table and function table etc..
Step S2:The input data of this neural computing is transmitted to neural network accelerator from high bandwidth memory Buffer, can specifically include:
Sub-step S21:HBM Memory controls module is addressed according to the initial address of input data, if address hit, The high bandwidth then provided according to high bandwidth memory, by data of bit wide that initial address starts, stored interface is transmitted to successively HBM Memory control modules, until input data to be all transmitted to HBM Memory control modules.For example, high bandwidth memory can be with The bit wide of 1024bit is provided, the input data of storage shares 4096bit, then high bandwidth memory is every time to HBM Memory controls Input data is all transferred to HBM Memory control modules by the input data of module transfer 1024bit after four transmission.
Sub-step S22:The bit width conversion of input data is the bit wide to match with buffer by HBM Memory controls module, And the matched input data of bit wide is transmitted to buffer.
Input data can be the input neuron vector of this neural computing.
Step S3:The operational parameter of this neural computing is transmitted to neural network accelerator by high bandwidth memory Buffer, similar with step S2, which can specifically include:
Sub-step S31:HBM Memory controls module is addressed according to the initial address of operational parameter, if address hit, The high bandwidth then provided according to high bandwidth memory, by data of bit wide that initial address starts, stored interface is transmitted to successively HBM Memory control modules, until operational parameter to be all transmitted to HBM Memory control modules.
Sub-step S32:The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, And the matched operational parameter of bit wide is transmitted to buffer.
Step S4:It is single that the input data and operational parameter that cushioning control module stores buffer are transferred to nerve processing Member, neural processing unit handle input data and operational parameter, obtain the output data of this neural computing, delay Control module is rushed to store output data to buffer.
Wherein, neural processing unit to input data and operational parameter in the process of processing, if neural network Calculating is there are intermediate data, then intermediate data to be stored in buffer by cushioning control module, and neural processing unit continues operation, when needing When wanting the intermediate data to participate in operation, intermediate data is returned to neural processing unit, neural processing unit by cushioning control module again Continue operation using intermediate data, obtain the output data of neural computing, the output data can be output neuron to Amount.
Step S5:Output data in buffer is transmitted to high bandwidth memory, and passes through outside access unit by output Data transmission is to external memory.Can specifically it include:HBM Memory controls module by the bit width conversion of output data be and height The bit wide that bandwidth memory matches, and the matched output data of bit wide is transmitted to high bandwidth memory, outside access unit The output data that high bandwidth memory stores is transmitted to external memory.
So far the operation result of this neural computing is can be obtained by, if continuing neural network meter next time It calculates, then can be performed with return to step S2, to obtain the operation result of neural computing next time.
It can be seen that the present invention neural computing method, using stacking-type storage organization high broadband memory and Neural network accelerator with HBM Memory control modules, can greatly improve memory bandwidth, and operational performance obtains changing greatly very much It is kind, improve signal transmission bandwidth and transmission speed.
It should be noted that in attached drawing or specification text, the realization method that is not painted or describes is affiliated technology Form known to a person of ordinary skill in the art in field, is not described in detail.In addition, above-mentioned definition to each element and not only limiting The various concrete structures mentioned in embodiment, shape, those of ordinary skill in the art simply can be changed or replaced to it It changes;The demonstration of the parameter comprising particular value can be provided herein, but these parameters are without being definitely equal to corresponding value, but can be can It is similar to analog value in the error margin of receiving or design constraint;The direction term mentioned in embodiment, such as " on ", " under ", "front", "rear", "left", "right" etc. are only the directions of refer to the attached drawing, are not used for limiting the scope of the invention;Above-mentioned reality Applying example can be based on the considerations of design and reliability, and the collocation that is mixed with each other is used using or with other embodiment mix and match, i.e., not More embodiments can be freely formed with the technical characteristic in embodiment.
Particular embodiments described above has carried out the purpose of the present invention, technical solution and advantageous effect further in detail It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the present invention Within the scope of shield.

Claims (10)

1. a kind of neural computing device based on high bandwidth memory, which is characterized in that including:
At least one high bandwidth memory, each high bandwidth memory include multiple memories that stack adds up;
Neural network accelerator is electrically connected with the high bandwidth memory, and the neural network accelerator is stored with high bandwidth Data exchange is carried out between device, and performs neural computing.
2. neural computing device as described in claim 1, which is characterized in that the neural network accelerator includes:It deposits Store up interface, HBM Memory controls module, buffer, cushioning control module, neural processing unit;
Data are exchanged by the memory interface and HBM Memory control modules between the high bandwidth memory and buffer, institute HBM Memory controls module is stated to the high bandwidth memory synchronous and bit wide matching with buffer progress clock;
The buffer exchanges data with neural processing unit by the cushioning control module, and the nerve processing unit carries out Neural computing.
3. neural computing device as claimed in claim 2, which is characterized in that
The memory interface is by the data transmission of the high bandwidth memory to the HBM Memory controls module and by described in The data transmission of HBM Memory control modules is to the high bandwidth memory;
The clock of high bandwidth memory and buffer described in the HBM Memory controls module synchronization transmits the memory interface Data bandwidth be converted to the bandwidth to match with the buffer, and by the data transmission of bandwidth match to the buffer; The data bandwidth of the buffer is converted to the bandwidth to match with the high bandwidth memory, and by the bandwidth match Data transmission is to the memory interface.
4. neural computing device as described in claim 1, which is characterized in that further include:Package substrate, is patrolled at intermediary layer Collect chip;
The intermediary layer is formed on the package substrate,
The logic chip and neural network accelerator are formed on the intermediary layer,
The high bandwidth memory is formed on logic chip, and multiple memories that the stack adds up are along perpendicular to encapsulation Multiple dram chips that orientation substrate stack adds up.
5. neural computing device as described in claim 1, which is characterized in that further include:Package substrate, logic chip;
The neural network accelerator is formed on package substrate,
The logic chip is formed on neural network accelerator,
The high bandwidth memory is formed on logic chip, and multiple memories that the stack adds up are along perpendicular to encapsulation Multiple dram chips that orientation substrate stack adds up.
6. neural computing device as claimed in claim 4, which is characterized in that the multiple dram chip is welded by dimpling Technique stacking-type adds up, and the dram chip of the high bandwidth memory bottom is formed in logic chip by dimpling Welding On, the logic chip and neural network accelerator are formed in by dimpling Welding on intermediary layer;
The dram chip, logic chip and intermediary layer have the through-hole opened up using silicon perforation technique, and dram chip passes through logical The conducting wire of hole arrangement is electrically connected through logic chip and neural network accelerator.
7. neural computing device as claimed in claim 5, which is characterized in that
The multiple dram chip is added up by dimpling Welding stacking-type, the dram chip of the high bandwidth memory bottom It is formed on logic chip by dimpling Welding, the logic chip is formed in neural network accelerator by dimpling Welding On, the neural network accelerator is formed in by dimpling Welding on package substrate,
The dram chip and logic chip have the through-hole opened up using silicon perforation technique, and dram chip is arranged by through-hole Conducting wire is electrically connected through logic chip and neural network accelerator.
A kind of 8. neural computing method based on high bandwidth memory, which is characterized in that utilize any of the above-described claim The neural computing device based on high bandwidth memory carries out neural computing, including:
The input data of this neural computing is transmitted to the buffer of neural network accelerator from high bandwidth memory;
The operational parameter of this neural computing is transmitted to the buffer of neural network accelerator by high bandwidth memory;
The input data of buffer storage and operational parameter are transferred to neural processing unit, neural processing unit is to input data It is handled with operational parameter, obtain the output data of this neural computing and is stored output data to buffer;
The output data of buffer is transmitted to high bandwidth memory.
9. neural computing method as claimed in claim 8, which is characterized in that by the input number of this neural computing Include according to the buffer that neural network accelerator is transmitted to from high bandwidth memory:
By input data, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory control modules successively, until Input data is all transmitted to HBM Memory control modules;
The bit width conversion of input data is the bit wide to match with buffer by HBM Memory controls module, and bit wide is matched Input data is transmitted to buffer;
The operational parameter of this neural computing is transmitted to the buffer packet of neural network accelerator by high bandwidth memory It includes:
By operational parameter, according to the bit wide of high bandwidth memory, stored interface is transmitted to HBM Memory control modules successively, until Operational parameter is all transmitted to HBM Memory control modules;
The bit width conversion of operational parameter is the bit wide to match with buffer by HBM Memory controls module, and bit wide is matched Operational parameter is transmitted to buffer.
10. neural computing method as claimed in claim 8, which is characterized in that be transmitted to the output data of buffer High bandwidth memory includes:
The bit width conversion of output data is the bit wide to match with high bandwidth memory by HBM Memory controls module, and by bit wide Matched output data is transmitted to high bandwidth memory.
CN201611221798.8A 2016-12-26 2016-12-26 Neural network computing device and method based on high-bandwidth memory Active CN108241484B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201611221798.8A CN108241484B (en) 2016-12-26 2016-12-26 Neural network computing device and method based on high-bandwidth memory
PCT/CN2017/111333 WO2018121118A1 (en) 2016-12-26 2017-11-16 Calculating apparatus and method
TW106141858A TWI736716B (en) 2016-12-26 2017-11-30 Device and method for neural network computation based on high bandwidth storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611221798.8A CN108241484B (en) 2016-12-26 2016-12-26 Neural network computing device and method based on high-bandwidth memory

Publications (2)

Publication Number Publication Date
CN108241484A true CN108241484A (en) 2018-07-03
CN108241484B CN108241484B (en) 2021-10-15

Family

ID=62702396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611221798.8A Active CN108241484B (en) 2016-12-26 2016-12-26 Neural network computing device and method based on high-bandwidth memory

Country Status (2)

Country Link
CN (1) CN108241484B (en)
TW (1) TWI736716B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738316A (en) * 2018-07-20 2020-01-31 北京三星通信技术研究有限公司 Operation method and device based on neural network and electronic equipment
WO2020061924A1 (en) * 2018-09-27 2020-04-02 华为技术有限公司 Operation accelerator and data processing method
CN111952298A (en) * 2019-05-17 2020-11-17 芯盟科技有限公司 Neural network intelligent chip and forming method thereof
CN112446475A (en) * 2019-09-03 2021-03-05 芯盟科技有限公司 Neural network intelligent chip and forming method thereof
US11138135B2 (en) 2018-09-20 2021-10-05 Samsung Electronics Co., Ltd. Scale-out high bandwidth memory system

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862396A (en) * 1996-08-02 1999-01-19 Nec Corporation Memory LSI with arithmetic logic processing capability, main memory system using the same, and method of controlling main memory system
EP1014274A1 (en) * 1998-06-16 2000-06-28 Joint-Stock Company Research Centre "Module" Neuroprocessor, device for calculating saturation functions, calculation device and adder
US20030229770A1 (en) * 2002-06-07 2003-12-11 Jeddeloh Joseph M. Memory hub with internal cache and/or memory access prediction
CN1605066A (en) * 2001-12-14 2005-04-06 皇家飞利浦电子股份有限公司 Data processing system
US7360162B2 (en) * 1997-11-13 2008-04-15 Sun Microsystems, Inc. Color quality and packet shaping features for displaying an application on various client devices
CN101667451A (en) * 2009-09-11 2010-03-10 西安电子科技大学 Data buffer of high-speed data exchange interface and data buffer control method thereof
CN102983998A (en) * 2012-11-22 2013-03-20 北京中创信测科技股份有限公司 Novel data acquiring system SuperCAP
US20140071778A1 (en) * 2012-09-11 2014-03-13 International Business Machines Corporation Memory device refresh
CN103946812A (en) * 2011-09-30 2014-07-23 英特尔公司 Apparatus and method for implementing a multi-level memory hierarchy
CN103988060A (en) * 2011-11-15 2014-08-13 罗伯特·博世有限公司 Converter arrangement for capturing sound waves and/or pressure waves by means of fiber-optic sensor
CN104049909A (en) * 2013-03-15 2014-09-17 国际商业机器公司 Dual asynchronous and synchronous memory system
CN104115129A (en) * 2011-12-21 2014-10-22 英特尔公司 System and method for intelligently flushing data from a processor into a memory subsystem
CN104541257A (en) * 2012-08-06 2015-04-22 先进微装置公司 Stacked memory device with metadata management
US20150143082A1 (en) * 2012-05-24 2015-05-21 Roger Smith Dynamically Erectable Computer System
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor
CN105789139A (en) * 2016-03-31 2016-07-20 上海新储集成电路有限公司 Method for preparing neural network chip
CN106030553A (en) * 2013-04-30 2016-10-12 惠普发展公司,有限责任合伙企业 Memory network
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201331855A (en) * 2012-01-19 2013-08-01 Univ Nat Taipei Technology High-speed hardware-based back-propagation feedback type artificial neural network with free feedback nodes
US9747546B2 (en) * 2015-05-21 2017-08-29 Google Inc. Neural network processor
CN105956659B (en) * 2016-05-11 2019-11-22 北京比特大陆科技有限公司 Data processing equipment and system, server

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862396A (en) * 1996-08-02 1999-01-19 Nec Corporation Memory LSI with arithmetic logic processing capability, main memory system using the same, and method of controlling main memory system
US7360162B2 (en) * 1997-11-13 2008-04-15 Sun Microsystems, Inc. Color quality and packet shaping features for displaying an application on various client devices
EP1014274A1 (en) * 1998-06-16 2000-06-28 Joint-Stock Company Research Centre "Module" Neuroprocessor, device for calculating saturation functions, calculation device and adder
CN1605066A (en) * 2001-12-14 2005-04-06 皇家飞利浦电子股份有限公司 Data processing system
US20030229770A1 (en) * 2002-06-07 2003-12-11 Jeddeloh Joseph M. Memory hub with internal cache and/or memory access prediction
CN101667451A (en) * 2009-09-11 2010-03-10 西安电子科技大学 Data buffer of high-speed data exchange interface and data buffer control method thereof
CN103946812A (en) * 2011-09-30 2014-07-23 英特尔公司 Apparatus and method for implementing a multi-level memory hierarchy
CN103988060A (en) * 2011-11-15 2014-08-13 罗伯特·博世有限公司 Converter arrangement for capturing sound waves and/or pressure waves by means of fiber-optic sensor
CN104115129A (en) * 2011-12-21 2014-10-22 英特尔公司 System and method for intelligently flushing data from a processor into a memory subsystem
US20150143082A1 (en) * 2012-05-24 2015-05-21 Roger Smith Dynamically Erectable Computer System
CN104541257A (en) * 2012-08-06 2015-04-22 先进微装置公司 Stacked memory device with metadata management
US20140071778A1 (en) * 2012-09-11 2014-03-13 International Business Machines Corporation Memory device refresh
CN102983998A (en) * 2012-11-22 2013-03-20 北京中创信测科技股份有限公司 Novel data acquiring system SuperCAP
CN104049909A (en) * 2013-03-15 2014-09-17 国际商业机器公司 Dual asynchronous and synchronous memory system
CN106030553A (en) * 2013-04-30 2016-10-12 惠普发展公司,有限责任合伙企业 Memory network
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor
CN105789139A (en) * 2016-03-31 2016-07-20 上海新储集成电路有限公司 Method for preparing neural network chip
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范晓星: "高速大容量固态存储系统设计与实现", 《中国优秀硕士论文电子期刊网 信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738316A (en) * 2018-07-20 2020-01-31 北京三星通信技术研究有限公司 Operation method and device based on neural network and electronic equipment
CN110738316B (en) * 2018-07-20 2024-05-14 北京三星通信技术研究有限公司 Operation method and device based on neural network and electronic equipment
US11138135B2 (en) 2018-09-20 2021-10-05 Samsung Electronics Co., Ltd. Scale-out high bandwidth memory system
US12032497B2 (en) 2018-09-20 2024-07-09 Samsung Electronics Co., Ltd. Scale-out high bandwidth memory system
WO2020061924A1 (en) * 2018-09-27 2020-04-02 华为技术有限公司 Operation accelerator and data processing method
CN112703511A (en) * 2018-09-27 2021-04-23 华为技术有限公司 Operation accelerator and data processing method
CN112703511B (en) * 2018-09-27 2023-08-25 华为技术有限公司 Operation accelerator and data processing method
CN111952298A (en) * 2019-05-17 2020-11-17 芯盟科技有限公司 Neural network intelligent chip and forming method thereof
CN111952298B (en) * 2019-05-17 2023-12-29 芯盟科技有限公司 Neural network intelligent chip and forming method thereof
CN112446475A (en) * 2019-09-03 2021-03-05 芯盟科技有限公司 Neural network intelligent chip and forming method thereof

Also Published As

Publication number Publication date
CN108241484B (en) 2021-10-15
TWI736716B (en) 2021-08-21
TW201824097A (en) 2018-07-01

Similar Documents

Publication Publication Date Title
CN108241484A (en) Neural computing device and method based on high bandwidth memory
TWI767489B (en) High capacity memory module including wafer-section memory circuit
US20200411064A1 (en) Flexible memory system with a controller and a stack of memory
CN111554680B (en) Unified Integrated Circuit System
WO2018121118A1 (en) Calculating apparatus and method
CN108847263A (en) System in package memory modules with embedded memory
JP7349812B2 (en) memory system
Roullard et al. Evaluation of 3D interconnect routing and stacking strategy to optimize high speed signal transmission for memory on logic
Clermidy et al. 3D embedded multi-core: Some perspectives
US20230051480A1 (en) Signal routing between memory die and logic die for mode based operations
Su et al. 3D-MiM (MUST-in-MUST) technology for advanced system integration
CN113688065A (en) Near memory computing module and method, near memory computing network and construction method
CN113626374A (en) Stacking chip
Drucker et al. The open domain-specific architecture
CN216118778U (en) Stacking chip
Vivet et al. Interconnect challenges for 3D multi-cores: From 3D network-on-chip to cache interconnects
Clermidy et al. 3D stacking for multi-core architectures: From WIDEIO to distributed caches
CN111952298B (en) Neural network intelligent chip and forming method thereof
CN113421879A (en) Cache content addressable memory and memory chip package structure
CN117915670B (en) Integrated chip structure for memory and calculation
TWI814179B (en) A multi-core chip, an integrated circuit device, a board card, and a process method thereof
CN112420089B (en) Storage device, connection method and device, and computer-readable storage medium
Raghavan Five emerging DRAM interfaces you should know for your next design
Effiong et al. Design Exploration Framework for 3D-NoC Multicore Systems under Process Variability at RTL level
CN112446475A (en) Neural network intelligent chip and forming method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant