CN108536642A

CN108536642A - Big data operation acceleration system and chip

Info

Publication number: CN108536642A
Application number: CN201810609496.0A
Authority: CN
Inventors: 桂文明; 杨英; 吴旭峰; 杨存永
Original assignee: Beijing Bitmain Technology Co Ltd
Current assignee: Bitmain Technologies Inc; Beijing Bitmain Technology Co Ltd
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2018-09-14

Abstract

A kind of big data operation acceleration system of present invention offer and chip, by the way that multiple kernel core are arranged in the chips, each kernel core executes operation and storage control function, and gives at least one storage unit of each kernel core connections in chip exterior.Technical solution using the present invention, so that the storage unit that each kernel is connected by reading the storage unit of oneself connection with other kernels, reaching each kernel can be with the technique effect of large capacity memory, reduce the number that memory was moved in from external memory space or moved out to data, accelerates the processing speed of data；Simultaneously as multiple kernels can independently operation or collaboration operation, also accelerate the processing speed of data in this way.

Description

Big data operation acceleration system and chip

Technical field

The present invention relates to integrated circuit fields, more particularly to a kind of big data operation acceleration system and chip.

Background technology

ASIC (Application Specific Integrated Circuits) i.e. application-specific integrated circuits, refer to Ying Te Determine the integrated circuit that user requires the needs with particular electronic system and designs, manufactures.The characteristics of ASIC is towards specific user Demand, ASIC has lower volume smaller, power consumption, reliability raising, property in batch production with universal integrated circuit compared with The advantages that energy improves, confidentiality enhances, cost reduction.

With the development of science and technology, more and more fields, such as artificial intelligence, safe operation etc. are directed to macrooperation amount Specific calculation.For certain operations, asic chip can play that its operation is fast, and small power consumption etc. is specific.Meanwhile for these big fortune Calculation amount field, in order to improve the processing speed and processing capacity of data, it usually needs control N number of operation chip and be carried out at the same time work Make.With the continuous promotion of data precision, the fields such as artificial intelligence, safe operation need to transport increasing data It calculates, such as：The size of present photo is generally 3-7MB, but as the precision of digital camera and video camera increases, photo it is big It is small to reach 10MB or more, and 30 minutes videos are likely to be breached the data of G more than 1.And in artificial intelligence, safety fortune Require calculating speed fast in the fields such as calculation, time delay is small, therefore how to improve calculating speed and reaction time is always chip design Required target.Since the memory of asic chip collocation is generally 64MB or 128MB, and when data to be processed exist When 512MB or more, asic chip will repeatedly utilize memory access data, repeatedly data are moved in from external memory space or Memory is moved out, processing speed is reduced.

Invention content

A kind of big data operation acceleration system of present invention offer and chip, by the way that multiple kernels are arranged in the chips Core, each kernel core execute operation and storage control function, and give each kernel core connections at least in chip exterior One storage unit, the storage unit that kernel each in this way is connected by reading the storage unit of oneself connection with other kernels, Allow each kernel that there is large capacity memory, reduces time that memory was moved in from external memory space or moved out to data Number, accelerates the processing speed of data；Simultaneously as multiple kernels can independently operation or collaboration operation, in this way Accelerate the processing speed of data.

In order to achieve the above objectives, the present invention provides the following technical solutions：

According to the first aspect of the invention, a kind of big data operation acceleration system, including at least one operation chip are provided With multiple storage units；The chip includes N number of kernel core, and wherein N is the positive integer more than or equal to 4, each kernel core Respectively include storage control unit and computing unit；Storage control unit is separately connected an at least storage unit by bus； N number of kernel core is interconnected by bus；The chip includes UART control units, for by chip exterior data or Person's instruction is sent to kernel core and either storage unit and obtains data or instruction from kernel core or storage unit.

Preferably, at least one storage unit is DDR array of memory cells.

Preferably, the storage control unit connects at least one storage unit for controlling the storage control unit Data read-write operation.

Preferably, the data read-write operation is some or all of memory space at least one storage unit Operation.

Preferably, the computing unit is for calculating the data of acquisition.

Preferably, the data of the acquisition can be at least one storage list for the kernel connection that the computing unit is arranged The some or all of data of some or all of data of member or at least one storage unit of other kernels connection, or The some or all of data of at least one storage unit of the kernel connection of the computing unit are arranged in person and other kernels connect The combination of some or all of data of at least one storage unit connect.

Preferably, the data of the acquisition can be at least one storage list for the kernel connection that the computing unit is arranged Member some or all of data specifically,

The computing unit obtains some or all of of at least one storage unit by the storage control unit of connection Data.

Preferably, at least one storage unit of described other kernels connection some or all of data specifically,

The computing unit by bus to other computing units send obtain data command, other computing units to other The storage control unit of computing unit connection, which is sent, obtains data command, and the storage control unit of other computing units connection is from even Either the data described in whole send out the data described in fetching portion or whole at least one storage unit fetching portion connect Other computing units are given, the data described in fetching portion or whole are sent to the computing unit by other computing units.

Preferably, the computing unit can execute one or more of cryptographic calculation, convolutional calculation.

Preferably, the computing unit executes independent operation respectively, and each computing unit calculates separately result.

Preferably, the computing unit can execute collaboration operation, and each computing unit is according to the meters of other computing units It calculates result and carries out operation.

Preferably, the storage control unit is used to obtain data from computing unit, by acquisition data storage to phase At least one storage unit even.

Preferably, the UART control units receive at least one storage unit described in external command initial configuration, right Multiple at least one storage units carry out unified addressing.

Preferably, multiple at least one storage units are read and write by the UART control units.

Preferably, by the UART control units to multiple computing unit broadcast datas.

Preferably, multiple computing units can by universal serial bus result of calculation by the UART control units to Outer transmission.

Preferably, the UART control units include UART interface, the first AXI units, the 2nd AXI units, AHB interface, Data proofread unit and data generating unit.

Preferably, the UART interface is for obtaining chip exterior data or instruction.

Preferably, the data generating unit is for generating GAD or network model parameter.

Preferably, the first AXI units include first interface (M0), second interface (M1), (S0 is extremely for N number of third interface ) and the 4th interface (S8) S；First interface (M0) and second interface (M1) are separately connected data generating unit and UART interface, N number of Third interface (S0 to S7) is separately connected N number of data check and correction unit, and the 4th interface connects the 2nd AXI units.

Preferably, the 2nd AXI units include a first interface (M) and N number of second interface (S)；First interface (M) The data that the first AXI units are sent are received, N number of second interface (S) is separately connected N number of AHB interface.

Preferably, the first AXI units be used for at least one storage unit send 512 bits data or Order.

Preferably, the 2nd AXI units be used for at least one storage unit send 32 bits data or Order.

Preferably, pass through the first interface of the 4th interface (S8) and the 2nd AXI units of the first AXI units (M) format conversion is carried out.

Preferably, the AHB interface connects at least one storage unit with data check and correction unit.

Preferably, the data check and correction unit is used to carry out school to storing to the data at least one storage unit It is right.

According to the second aspect of the invention, a kind of big data operation acceleration chip is provided, the chip includes N number of kernel Core, wherein N are to be separately connected at least one by bus more than or equal to each kernel in 4 positive integer and N number of kernel A storage unit；N number of kernel core is interconnected by bus；It is characterized in that：The chip includes that UART controls are single Member, the UART control units include UART interface, and the first AXI units, the 2nd AXI units, AHB interface, data proofread unit And data generating unit.

The embodiment of the present invention by being arranged multiple kernel core in the chips, and each kernel core executes operation and storage is controlled Function processed, and give at least one storage unit of each kernel core connections, kernel each in this way to pass through reading in chip exterior The storage unit that the storage unit of oneself connection is connected with other kernels so that each kernel can have large capacity memory, subtract Lack the number that memory was moved in from external memory space or moved out to data, accelerates the processing speed of data；Simultaneously as Multiple kernels can independently operation or collaboration operation, also accelerate the processing speed of data in this way.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only to show Some embodiments of example property for those of ordinary skill in the art without creative efforts, can be with Obtain other attached drawings according to these attached drawings.

Fig. 1 illustrates that first embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips；

Fig. 2 illustrates that second embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips；

Fig. 3 illustrates that 3rd embodiment has the schematic diagram of the big data operation acceleration system structure of 6 core chips；

Fig. 4 illustrates that connection and data move towards schematic diagram inside the big data operation acceleration system of fourth embodiment；

Fig. 5 illustrates data structure schematic diagram according to the present invention；

Fig. 6 illustrates that the 5th embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips.

Specific implementation mode

Exemplary embodiments of the present invention will be illustrated based on attached drawing below, it should be understood that provide these embodiment party Formula is used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and not limit in any way The scope of the present invention.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and can be by this public affairs The range opened completely is communicated to those skilled in the art.

Furthermore, it is necessary to specification, all directions of the upper and lower, left and right in each attached drawing are only with specific embodiment The illustration of progress, those skilled in the art part or all by each component shown in attached drawing can change according to actual needs It changes direction to apply, integrally realizes its function without influencing each component or system, this technical solution for changing direction is still It belongs to the scope of protection of the present invention.

Multi core chip is the multiprocessing system for being embodied in single large-scale integrated semiconductor core on piece.Typically, two Or more chip core can be embodied on multi core chip chip, (can also be in identical multi core chip chip by bus Upper formation bus) it is interconnected.Can have from two chip cores to many chip cores and be embodied in identical multinuclear On microarray biochip, the upper limit in the quantity of chip core is only limited by manufacturing capacity and performance constraints.Multi core chip can be with With application, the application included in multimedia and signal processing algorithm (such as, encoding and decoding of video, 2D/3D figures, audio and Speech processes, image procossing, phone, speech recognition and sound rendering, encryption) in the special arithmetic that executes and/or patrol Collect operation.

Although having referred only ASIC application-specific integrated circuits in the background technology, the specific wiring in embodiment is realized Mode can be applied to in multi core chip CPU, GPU, FPGA etc..Multiple kernels can be in identical in the present embodiment Core can also be different kernels.

[embodiment 1]

For convenience of explanation, it will be illustrated by taking the chip of 8 kernels present in Fig. 2 as an example below, and this field skill For art personnel it is found that select 8 kernels here, be only exemplary explanation, kernel number can be N, wherein N for more than etc. In 4 positive integer, such as can be 6,10,12 etc..Multiple kernels can be same kernel in the present embodiment, can also be Different kernels.

Fig. 1 illustrates that first embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips.Such as figure Shown in 1, big data operation acceleration system includes at least one operation chip with 8 kernels and multiple storage units, in 8 The operation chip of core includes 8 kernel core (10 ... 17), and each kernel core respectively includes storage control unit (110 ... 117) and computing unit (120 ... 127)；Storage control unit (110 ... 117) is connected respectively by data/address bus Connect 4 storage units (200,201,202,203 ... 270,271,272,273).8 kernel core (10 ... described above 17) it is interconnected by bus interface unit (30), it can transmission data or order between each other；Data exchange control unit It is connected with 8 kernel core (10 ... 17) by bus interface unit (30), and passes through bus and 32 storage units (200,201,202,203 ... 270,271,272,273) it is connected.

Data exchange control unit may be used various protocols and be realized, such as UART, SPI, PCIE, SERDES, USB Deng, in the present embodiment data exchange control unit be UART (Universal Asynchronous Receiver/ Transmitter) control unit (40).Universal asynchronous receiving-transmitting transmitter is commonly referred to as UART, is a kind of asynchronous receiving-transmitting transmitter, The data that it will be transmitted is converted between serial communication and parallel communications, and UART is usually integrated in various communication interfaces Connection on.But only said by taking UART protocol as an example here, other agreements can also be used.UART(Universal Asynchronous Receiver/Transmitter) control unit (40) passes through bus interface unit (30) and 8 kernels Core (10 ... 17) is connected, and by bus and 32 storage units (200,201,202,203 ... 270,271,272, 273) it is connected.

Storage unit may be that the high speed outers such as DDR, SDDR, DDR2, DDR3, DDR4, GDDR5, GDDR6, HMC, HBM are deposited Reservoir.Storage unit preferably selects DDR series memories, DDR (Dual Data Rate) memory, that is, Double Data Rate same herein Walk dynamic RAM.DDR has used synchronous circuit, keeps specified address, the conveying of data and output key step both independent It executes, and keeps fully synchronized with CPU；DDR has used DLL, and (Delay Locked Loop, delay locked loop provide one Data filtering signal) technology, when data are effective, this data filtering signal can be used to be accurately positioned number for storage control According to every 16 output is primary, and data of the re-synchronization from different memory module.The frequency of DDR memories can use work Frequency and equivalent frequency two ways indicate that working frequency is the actual working frequency of memory grain, but since DDR memories can With in the rising and falling edges of pulse all transmission datas, therefore the equivalent frequency of transmission data is twice of working frequency.DDR2 (Double Data Rate 2) memory is the memory of new generation developed by JEDEC (EEE electronic equipment engineering joint committee) Technical standard, each clock of DDR2 memories can be with the speed read/write datas of 4 times of external bus, and can be with internal control The speed operation that 4 times of bus.DDR3, DDR4, GDDR5, GDDR6, HMC, HBM memory are all the prior arts, just not detailed here It introduces.Storage control unit (110 ... 117) is by DDR buses respectively at 4 storage units (200,201,202,203 ... 270,271,272,273) it connects.Each storage unit is at least the memory of 512MB or more in the present embodiment, and storage is held Amount can be 1G, 2G or more.

32 storage units (200,201,202,203 ... 270,271,272,273) for store initial data or Pending data, such as：Image data, video data, be-encrypted data etc..It can also store by computing unit (120 ... 127) result of calculation, passes through.32 storage units carry out unified distribution address, the memory space of 32 storage units Address is continuous and differs.

Storage control unit (110 ... 117) is used to control the storage unit (200,201,202,203 ... of connection 270,271,272,273) reading or storage of data in.For example, the storage that storage control unit (110) can be connected from it Read some or all of initial data or pending data in unit (200,201,202,203), and by institute's reading part Point either whole initial data or pending data are sent to the computing unit (120) of storage control unit (110) connection.

The storage control unit (110 ... 117) that computing unit (120 ... 127) is used to connect to it, which is sent, obtains number According to order, storage control unit (110 ... the 117) storage unit that it is controlled according to order acquisition (200,201,202, 203 ... 270,271,272,273) data in, and the data of acquisition are sent to computing unit (120 ... 127).It calculates Unit can also send to other computing units and obtain data command, the storage that other computing units are connected to other computing units Control unit, which is sent, obtains data command, and the storage unit that the storage control unit of other computing units connection is controlled from it obtains The some or all of initial data either pending data by described in fetching portion or whole initial data or The pending data of person are sent to other computing units, other computing units are by the initial data described in fetching portion or whole Or pending data are sent to computing unit.

For example, the storage control unit (110) that computing unit (120) is connected to it, which is sent, obtains data command, storage control Unit (110) processed according to obtain data command obtain its control storage unit (200,201,202,203) in part or Initial data described in whole or pending data, and the data of acquisition are sent to computing unit (120).Computing unit (120) can also to computing unit (121 ... 127) send obtain data command, computing unit (121 ... 127) to itself The storage control unit (111 ... 117) of connection, which is sent, obtains data command, computing unit (121 ... 127) itself connection Storage control unit (111 ... 117) from its control storage unit (210,211,212,213 ... 270,271,272, 273) fetching portion either the initial data described in whole either pending data by the original described in fetching portion or whole Beginning data or pending data are sent to computing unit (121 ... 127), and computing unit (121 ... 127) is by acquisition unit Initial data or pending data described in either whole is divided to be sent to computing unit (120).

Storage control unit (110 ... 117) transmission that computing unit (120 ... 127) can also be connected to itself is write Enter data command, storage control unit (110 ... 117) is according to order by the result of calculation in computing unit (120 ... 127) It is written in the storage unit (200,201,202,203 ... 270,271,272,273) of its control.

Storage unit that computing unit (120 ... 127) is connected from this kernel (200,201,202,203 ... 270, 271,272,273) the fetching portion either initial data described in whole or pending data, and connected from other kernels Storage unit (200,201,202,203 ... 270,271,272,273) fetching portion or whole described in initial data Or pending data, and above-mentioned data are calculated, obtain operation result；The computing unit can execute SHA256 operations, convolutional calculation etc..

For example, storage unit (200,201,202,203) fetching portion that computing unit (120) is connected from this kernel (10) The either initial data described in whole or pending data, and by other computing units (121 ... 127) from other Kernel (20 ... 70) connection storage unit (210,211,212,213 ... 270,271,272,273) fetching portion or Initial data described in whole or pending data, and above-mentioned data are calculated, obtain operation result.And it will meter It calculates result and is written to storage unit (200,201,202,203).

Computing unit (120 ... 127) executes independent identical operation, each computing unit (120 ... 127) point respectively Other result of calculation.Such as the cryptographic calculations such as SHA256 are carried out to data.The computing unit (120 ... 127) can also execute Cooperate with operation, i.e. the first computing unit (120) that its result of calculation is sent to the second computing unit (121), the second computing unit (121) second is carried out according to the result of calculation of acquisition and other parameters to calculate, and so on, such as at the data of neural network Reason, neural network are made of multilayer, each computing unit can execute the calculating of one layer of neural network.

It is this field that centralized arbitration bus structures or loop wire topology bus structures, bussing technique, which may be used, in bus Common technology, therefore be just not described in detail herein.

UART control units (40) are for controlling chip interior storage control unit (110 ... 117) and computing unit (120 ... 127) and outside data and order exchange and storage unit (210,211,212,213 ... 270,271, 272,273) it is exchanged with order with external data, UART control units (40) and storage unit (210,211,212,213 ... 270,271,272,273) there is the bus of connection.It will describe in detail in figure 4 below.Outside described here can be with Refer to external host, outer CPU, peripheral control unit, external function chip, outside GPU and and the identical core of this chip Piece.

Bus interface unit (30) is used to connect 8 kernel core (10 ... 17) and UART control units (40), so as to Data or order are transmitted in each unit.

Fig. 2 illustrates that second embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips.Pass through The number of storage unit that Fig. 2 can be seen that with first embodiment is connected difference lies in each kernel is different, working method It is identical with operation principle, here just without being described in detail.

Fig. 3 illustrates that 3rd embodiment has the schematic diagram of the big data operation acceleration system structure of 6 core chips.Pass through Fig. 3 can be seen that difference lies in the number of cores and the storage list that is connected of each kernel in chip with first embodiment The number of member is different, and working method is identical with operation principle, here just without being described in detail.

Fig. 4 illustrates that connection and data move towards schematic diagram inside the big data operation acceleration system of fourth embodiment.Pass through Fig. 4 As can be seen that chip includes 8 kernel core0 ... core7, each kernel connects a DDR storage units DDR0 ... DDR7 and UART control units (40).UART control units (40) include UART interface (401), the first AXI (Advanced EXtensible Interface) unit (402), the 2nd AXI units (403), AHB interface (404), data proofread unit (405), data generating unit (406).Certain kernel each here includes storage control unit and computing unit.

UART Universal Asynchronous Receiver Transmitter UART (Universal Asynchronous Receiver/Transmitter) is serial total Wire protocol is a kind of agreement most widely used in low rate communication field, and circuit is simple, at low cost.But it can only be a pair of One communication.

AXI (Advanced eXtensible Interface) is a kind of bus protocol, which is that ARM companies propose AMBA (Advanced Microcontroller Bus Architecture) 3.0 agreements in most important part, be a kind of On-chip bus towards high-performance, high bandwidth, low latency.Its address/control and data phase are separation, support to be misaligned Data transmission, while in burst transfer, it is only necessary to first address, while the read-write data channel that detaches and supporting Outstanding transmission accesses and out of order access, and is more prone to carry out timing closure.AXI is a new high property in AMBA It can agreement.AXI technologies enrich existing AMBA standard contents, meet very-high performance and the design of complicated system on chip (SoC) Demand.

AHB (Advanced High Performance Bus) Advanced High-Performance Bus.Such as USB (Universal Serial Bus) equally and a kind of bus interface.AHB is mainly used between high-performance module (such as CPU, DMA and DSP) Connection, as the system on chip bus of SoC, it includes following some characteristics：Single clock edge operation；Non-tri-state realization Mode；Support burst transfer；Support segment transmissions；Support multiple master controllers；Configurable 32~128 BITBUS network width；Branch Hold the transmission of byte, half-word and word.AHB systems by primary module, from 3 part group of module and foundation structure (Infrastructure) At the transmission on entire ahb bus is all sent out by primary module, by being responsible for response from module.Foundation structure is then by moderator (arbiter), primary module is to the Port Multiplier from the module, Port Multiplier from module to primary module, decoder (decoder), virtual It is formed from module (dummy Slave), virtual primary module (dummy Master), this design keeps the structure of whole system clear It is clear, enhance the portability of function module in the versatility and system of system.

The UART interface is for obtaining chip exterior data or instruction, and the data generating unit is for generating GAD Or network model parameter.The first AXI units include first interface (M0), second interface (M1), 8 third interface (S0 To S) and the 4th interface (S8)；First interface (M0) and second interface (M1) are separately connected data generating unit and UART interface, and 8 A third interface (S0 to S7) is separately connected 9 data check and correction units, and the 4th interface (S8) connects the of the 2nd AXI units One interface (M).The 2nd AXI units include a first interface (M) and 8 second interfaces (S)；First interface (M) receives The data that 4th interface (S8) of the first AXI units is sent, N number of second interface (S) are separately connected N number of AHB interface.It is described AHB interface connects at least one storage unit with data check and correction unit.

The first AXI units pick data or order by first interface (M0) and second interface (M1), by data Or be sent to the data by 8 third interfaces (S0 to S7) after order conversion and proofread unit, the data proofread unit For being proofreaded to the data at least one storage unit to storing, the first AXI units be used for it is described extremely A few storage unit sends data or the order of 512 bits.

The 2nd AXI units receive number by first interface (M) from the 4th interface (S8) of the first AXI units According to the data of 512 bits are either ordered and carries out format conversion and is converted to data or the order of 32 bits.And pass through 8 Two interfaces (S) send data or the order of 32 bits to AHB interface (404).

External host can carry out unified volume by UART control unit initial configuration DDR parameters to multiple DDR particles Location.Addressing order is passed through UART interface (401), the first AXI (Advanced eXtensible Interface) by external host Unit (402), the 2nd AXI units (403) and AHB interface (404) are sent to DDR storage unit DDR0 ... DDR7, DDR storages Cells D DR0 ... DDR7 is distributed according to addressing order into row address.

Data can be written to DDR storage units DDR0 ... DDR7 by UART control units in external host；External host Unit (405) is proofreaded in the address of data and its storage by UART interface (401), the first AXI units (402) and data to send It is deposited according to the address of data and its storage to DDR storage units DDR0 ... DDR7, DDR storage unit DDR0 ... DDR7 Storage.The data that first AXI units (402) are sent by M0 interfaces UART interface (401), passing through S0 ..., S7 interfaces are sent To data check and correction unit (405), data check and correction unit (405) sends the data to the DDR storage units of connection.Pass through reverse strand Road external host can also read the data stored in DDR storage units DDR0 ... DDR7.

Data generating unit (406) can generate data, and the address for generating data and its storage is passed through UART interface (401), the first AXI units (402) and data check and correction unit (405) are sent in DDR storage units DDR0 ... DDR7 and carry out Storage.First AXI units (402) can generate data by M1 interfaces data generating unit (406), and passing through S0 ..., S7 connects Mouth is sent to data check and correction unit (405), and data check and correction unit (405) sends the data to the DDR storage units of connection.

When needed, data check and correction unit (405) can be to write-in DDR storage units DDR0's ... DDR7 512bit data are compared, and are re-write if mistake.

External host can broadcast write-in number by computing unit of the UART interface (401) into kernel core0 ... core7 According to；Each computing unit can be write result of calculation to external host by UART interface (401) by universal serial bus.

The system is applied in artificial intelligence field, image data that UART control units (40) send external host or In the storage to DDR storage units DDR0 ... DDR7 of person's video data, data generating unit (406) generates the mathematics of neural network Model parameter is written in storage unit DDR0 ... DDR7 by model.DDR control units in kernel are used to access data, And send the data to computing unit and calculated, computing unit can store result of calculation into storage unit, can also Result of calculation is fed back into external host by UART control units (40).

The chip is applied in the encryption digital cash field such as ether mill, and UART control units (40) send external host Block information storage in DDR storage units DDR0 ... DDR7, data generating unit (406) generates in the algorithm of ether mill DAG is written in storage unit DDR0 ... DDR7 by DAG.DDR control units in kernel will be counted for accessing data Proof of work operation is carried out according to computing unit is sent to, computing unit feeds back result of calculation by UART control units (40) To external host.

Fig. 5 illustrates data structure schematic diagram according to the present invention.Data mentioned here be order data, numeric data, A variety of data such as character data.Data format specifically include significance bit valid, destination address dst id, source address src id and Data data.Kernel can judge that the data packet is order or numerical value by significance bit valid, can be assumed for 0 generation here Table numerical value, 1 represents order.Kernel can judge destination address, source address and data type according to data structure.Such as in attached drawing 1 Kind, kernel 50 is 1 to 10 transmission data reading order of kernel, then significance bit, and destination address is the address of kernel 10, source address is The address of kernel 50 and data data are to read data command and data type or data address etc..Kernel 10 is to kernel 10 Transmission data, then significance bit is 0, and destination address is that the address of kernel 50, the address that source address is kernel 0 and data data are reading The data taken.From instruction operation sequential in view of, in the present embodiment use six traditional stage pipeline structures, respectively fetching, Decoding, execution, memory access, alignment and Write-back stage.From instruction set architecture, reduced instruction set computer framework can be taken.According to essence The general design method of simple instruction set architecture, instruction set of the present invention can be divided into the instruction of register-register type, deposit by function Device-immediate instruction, jump instruction, access instruction, control instruction and intercore communication instruction.

[embodiment 2]

Fig. 6 illustrates that the 5th embodiment has the schematic diagram of the big data operation acceleration system structure of 8 core chips.And reality It is independent compared to by the data production unit in UART control units to apply mode 1, a module is separately provided, here by data It generates unit and is particularly limited as GAD production units.This is to should be that some data are bigger, such as GAD data in ether mill, god Through network model parameter etc., and data production unit still carries out data by the first AXI units 402 in UART units It sends.Since principle and embodiment 1 are identical, here just not detailed data.

Using description provided herein, embodiment can be realized by using the programming and/or engineering technology of standard At machine, process or manufacture to generate programming software, firmware, hardware.

The program (multiple) (having computer readable program code) of any generation can be embodied in one or more On medium workable for computer, such as resident storage device, smart card or other movable memory equipments or transmission equipment, To make computer program product and manufacture according to embodiment.As such, as used in this article term " manufacture " and " computer program product " is intended to cover permanently or temporarily non-transitory in the presence of that can be used in any computer Medium on computer program.

As noted above, memory/storage is (all including but not limited to disk, CD, movable memory equipment Such as smart card, subscriber identity module (SIM), wireless identity module (WIM)), semiconductor memory (such as random access memory (RAM), read-only memory (ROM), programmable read only memory (PROM)) etc..Medium is transmitted including but not limited to via wireless Communication network, internet, intranet, the network communication based on telephone/modem, hard-wired/cabled communication network, satellite Communication and other fixations or the transmission of mobile network system/communication link.

Although having been disclosed for specific example embodiment, it will be appreciated by those skilled in the art that not carrying on the back In the case of from the spirit and scope of the present invention, specific example embodiments can be changed.

Above with reference to attached drawing, the present invention is illustrated based on embodiment, but the present invention is not limited to above-mentioned embodiment party The part of each embodiment and each variation is constituted the scheme after appropriately combined or displacement according to layout needs etc., also wrapped by formula Containing within the scope of the invention.Furthermore it is also possible to which the knowledge based on those skilled in the art suitably recombinates the group of each embodiment Conjunction and processing sequence, or the deformations such as various design alterations are applied to each embodiment, it has been applied in the implementation of such deformation Mode may also be within the scope of the present invention.

Although each conception of species has already been described in detail in the present invention, it will be appreciated by a person skilled in the art that for those concepts Various modifications and substituting can be achieved under the spirit disclosed by the invention integrally instructed.Those skilled in the art use Ordinary skill can realize the present invention illustrated in detail in the claims without undue experimentation.It is appreciated that , disclosed specific concept is merely illustrative, is not intended to limit the scope of the present invention, the scope of the present invention is by institute The full scope of attached claims and its equivalent program determines.

Claims

1. a kind of big data operation acceleration system, including at least one operation chip and multiple storage units；The chip includes N A kernel core, wherein N are the positive integer more than or equal to 4, and each kernel core respectively includes storage control unit and calculates single Member；Storage control unit is separately connected an at least storage unit by bus；N number of kernel core is mutually connected by bus It connects；The chip includes data exchange control unit, for chip exterior data are either instructed be sent to kernel core or Storage unit, and from kernel core, either storage unit obtains data or instruction.

2. system according to claim 1, which is characterized in that at least one storage unit is DDR series memories.

3. system according to claim 1, which is characterized in that the storage control unit is for controlling the storage control Unit connects the data read-write operation of at least one storage unit.

4. system according to claim 3, which is characterized in that the data read-write operation is single at least one storage The operation of some or all of memory space of member.

5. system according to claim 1, which is characterized in that based on the computing unit is carried out by the data to acquisition It calculates.

6. system according to claim 5, which is characterized in that the data of the acquisition can be include the computing unit Kernel connection at least one storage unit some or all of data or other kernels connection at least one storage The some or all of data of unit, or the part of at least one storage unit of the kernel connection of the computing unit is set Or the combination of some or all of data of at least one storage unit that total data is connected with other kernels.

7. system according to claim 6, which is characterized in that the data of the acquisition can be include the computing unit Kernel connection at least one storage unit some or all of data specifically,

The computing unit obtains some or all of data of at least one storage unit by the storage control unit of connection.

8. system according to claim 6, which is characterized in that at least one storage unit of other kernels connection Some or all of data specifically,

The computing unit is sent to other computing units by bus and obtains data command, other computing units are calculated to other The storage control unit of unit connection, which is sent, obtains data command, and the storage control unit of other computing units connection is from connection Data described in fetching portion or whole are sent to by data of at least one storage unit fetching portion either described in whole Data described in fetching portion or whole are sent to the computing unit by other computing units, other computing units.

9. system according to claim 5, which is characterized in that the computing unit can execute cryptographic calculation, convolution meter One or more of calculate.

10. system according to claim 5, which is characterized in that the computing unit executes independent operation respectively, each Computing unit calculates separately result.

11. system according to claim 5, which is characterized in that the computing unit can execute collaboration operation, Mei Geji It calculates unit and operation is carried out according to the result of calculation of other computing units.

12. system according to claim 1, which is characterized in that the storage control unit is used to obtain from computing unit Data, by acquisition data storage to connected at least one storage unit.

13. system according to claim 1, which is characterized in that at the beginning of the data exchange control unit receives external command Beginningization configures at least one storage unit, and unified addressing is carried out to multiple at least one storage units.

14. system according to claim 1, which is characterized in that read and write multiple institutes by the data exchange control unit State at least one storage unit.

15. system according to claim 1, which is characterized in that by the data exchange control unit to multiple described Computing unit broadcast data.

16. system according to claim 1, which is characterized in that multiple computing units can be by universal serial bus meter Result is calculated to transmit outward by the data exchange control unit.

17. system according to claim 1, which is characterized in that the data exchange control unit includes data-interface, the One AXI units, the 2nd AXI units, AHB interface, data check and correction unit and data generating unit.

18. system according to claim 17, which is characterized in that the data-interface for obtain chip exterior data or Person instructs.

19. system according to claim 17, which is characterized in that the data generating unit is for generating GAD or net Network model parameter.

20. system according to claim 17, which is characterized in that the first AXI units include first interface (M0), Two interfaces (M1), N number of third interface (S0 to S) and the 4th interface (S8)；First interface (M0) and second interface (M1) connect respectively Data generating unit and data-interface are connect, N number of third interface (S0 to S7) is separately connected N number of data check and correction unit, the 4th interface Connect the 2nd AXI units.

21. system according to claim 20, which is characterized in that the 2nd AXI units include a first interface (M) With N number of second interface (S)；First interface (M) receives the data that the first AXI units are sent, and N number of second interface (S) is respectively Connect N number of AHB interface.

22. the system according to claim 20 or 21, which is characterized in that the first AXI units be used for it is described at least One storage unit sends data or the order of 512 bits.

23. the system according to claim 20 or 21, which is characterized in that the 2nd AXI units be used for it is described at least One storage unit sends data or the order of 32 bits.

24. the system according to claim 20 or 21, which is characterized in that pass through the 4th interface of the first AXI units (S8) and the first interface (M) of the 2nd AXI units carries out format conversion.

25. the system according to claim 20 or 21, which is characterized in that the AHB interface and the data proofread unit Connect at least one storage unit.

26. system according to claim 17, which is characterized in that the data check and correction unit is used for storing described in extremely Data in a few storage unit are proofreaded.

27. according to claim 1 to 21 any one of them system, which is characterized in that the data exchange control unit is UART control units.

28. according to claim 17 to 21 any one of them system, which is characterized in that the data-interface is UART interface.

29. it includes N number of kernel core that a kind of big data operation, which accelerates chip, the chip, wherein N is just whole more than or equal to 4 Each kernel in number and N number of kernel distinguishes an external connection at least internal storage location by bus；N number of kernel Core is interconnected by bus；It is characterized in that：The chip includes UART control units, and the UART control units include UART interface, the first AXI units, the 2nd AXI units, AHB interface, data check and correction unit and data generating unit.

30. chip according to claim 29, which is characterized in that the UART interface for obtain chip exterior data or Person instructs.

31. chip according to claim 29, which is characterized in that the data generating unit is for generating GAD or net Network model parameter.

32. chip according to claim 29, which is characterized in that the first AXI units include first interface (M0), Two interfaces (M1), N number of third interface (S0 to S) and the 4th interface (S8)；First interface (M0) and second interface (M1) connect respectively Data generating unit and UART interface are connect, N number of third interface (S0 to S7) is separately connected N number of data check and correction unit, the 4th interface Connect the 2nd AXI units.

33. chip according to claim 32, which is characterized in that the 2nd AXI units include a first interface (M) With N number of second interface (S)；First interface (M) receives the data that the first AXI units are sent, and N number of second interface (S) is respectively Connect N number of AHB interface.

34. the chip according to claim 32 or 33, which is characterized in that the first AXI units be used for it is described at least One internal storage location sends data or the order of 512 bits.

35. the chip according to claim 32 or 33, which is characterized in that the 2nd AXI units be used for it is described at least One internal storage location sends data or the order of 32 bits.

36. the chip according to claim 32 or 33, which is characterized in that pass through the 4th interface of the first AXI units (S8) and the first interface (M) of the 2nd AXI units carries out format conversion.

37. the chip according to claim 32 or 33, which is characterized in that the AHB interface and the data proofread unit Connect at least one internal storage location.

38. chip according to claim 29, which is characterized in that the data check and correction unit is used for storing described in extremely Data in a few internal storage location are proofreaded.