CN209560543U

CN209560543U - Big data operation chip

Info

Publication number: CN209560543U
Application number: CN201821766507.8U
Authority: CN
Inventors: 秦强
Original assignee: Beijing Bitmain Technology Co Ltd
Current assignee: Bitmain Technologies Inc; Beijing Bitmain Technology Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-10-29
Anticipated expiration: 2028-10-30

Abstract

The utility model embodiment provides a kind of big data operation chip, and the operation chip includes N number of kernel core, N number of data channel and at least one storage unit, and N is the integer more than or equal to 4；The data channel includes transmission interface and receiving interface, and the kernel core and data channel correspond, and the kernel core sends and receives data by data channel；The operation chip is carried out data transmission by the transmission interface and the receiving interface and chip exterior；Each kernel in N number of kernel core is all connected at least one described storage unit.Multiple asic chip shared memory cells, this not only reduces the quantity of storage unit, decrease the connecting line between ASIC operation chip, simplify system construction, reduce the cost of asic chip.

Description

Big data operation chip

Technical field

The utility model relates to integrated circuit fields, more particularly to a kind of big data operation chip.

Background technique

ASIC (Application Specific Integrated Circuits) i.e. specific integrated circuit, refers to Ying Te Determine the integrated circuit that user requires the needs with particular electronic system and designs, manufactures.The characteristics of ASIC is towards specific user Demand, ASIC batch production when with universal integrated circuit compared with have volume is smaller, power consumption is lower, reliability raising, property The advantages that energy improves, confidentiality enhances, cost reduces.

With the development of science and technology, more and more fields, such as artificial intelligence, safe operation etc. are directed to macrooperation amount Specific calculation.For certain operations, asic chip can play that its operation is fast, and small power consumption etc. is specific.Meanwhile for these big fortune Calculation amount field, in order to improve the processing speed and processing capacity of data, it usually needs control N number of operation chip while carrying out work Make.With the continuous promotion of data precision, the fields such as artificial intelligence, safe operation need to transport increasing data Calculate, such as: the size of present photo is generally 3-7MB, but as the precision of digital camera and video camera increases, photo it is big It is small to can achieve 10MB or more, and 30 minutes videos are likely to be breached the data of G more than 1.And in artificial intelligence, safety fortune It calculates etc. and to require calculating speed fast in fields, time delay is small, therefore how to improve calculating speed and the reaction time is always chip design Required target.Since the memory of asic chip collocation is generally 64MB or 128MB, and when data to be processed exist When 512MB or more, asic chip will repeatedly utilize memory access data, repeatedly data are moved in from external memory space or Memory is moved out, processing speed is reduced.Meanwhile with the continuous promotion of data precision, the fields such as artificial intelligence, safe operation are needed Operation is carried out to increasing data, configure multiple storage units, example to asic chip in order to which storing data is generally required As one piece of asic chip will configure 4 pieces of 2G memories；When operation chip N number of in this way works at the same time, it is necessary to 4N block 2NG memory.But It is that, when multioperation chip works at the same time, data storage capacity does not exceed 2 G, thus causes the waste of storage unit, Improve system cost.

In the design for handling a large amount of related datas, two problems are faced in the prior art: 1, being significantly improving performances Demand.2, if it is distributed system, then also to solve the problems, such as data dependence, i.e., the number handled in some subsystem Confirmed and reprocessed according to needing to be presented in every other subsystem.It is general that data processing consumption is reduced by two ways The time taken, one is to speed up the clock of processing mathematical logic；Second is that increasing the concurrent block number of processing data.

Under technique limitation, the promotion of clock rate is very limited.Promoting concurrent number is significantly more efficient improving performance Method.But after promoting concurrent number, the requirement of data bandwidth generally also it has been correspondingly improved.In general system, if number The bandwidth that DDR is provided is depended on according to bandwidth, but the bandwidth promotion of DDR is not linear.Assuming that initial system contains DDR mono- Group provides bandwidth 1x.If the bandwidth that we need to obtain 2x is promoted, two groups of DDR may be implemented, but if needing to obtain 16x Above bandwidth is promoted, because of the limitation of physical size, it is impossible to by the way that simply 16 groups of DDR of exampleization are real in a system It is existing.

If cooperating if necessary to multiple asic chips, disjunct multiple systems cannot be directly distributed data across In handled because these data be all it is relevant, every part of data completed in some processing unit all must be at other Confirmed in processing unit and reprocessed, so if the rate for improving the data transmission between multiple asic chips is also must It need solve the problems, such as that multisystem interconnects.

Utility model content

The purpose of the utility model embodiment is just to provide a kind of mode using high-speed interface connection distributed storage, real Existing multiple a large amount of related datas of isomorphism system concurrent processing.The utility model embodiment provides a kind of big data operation acceleration system System eliminates the external memory of chip in the system, and storage unit is arranged inside asic chip, reduces asic chip from outer Portion reads the time of data, accelerates chip arithmetic speed.Multiple asic chip shared memory cells, this not only reduces deposit The quantity of storage unit decreases the connecting line between ASIC operation chip, simplifies system construction, reduces asic chip Cost.Meanwhile being carried out data transmission between multiple operation chips using serdes interfacing, it improves in multiple asic chips Between data transmission rate.

In order to achieve the above objectives, the utility model embodiment provides the following technical solutions:

According to the utility model embodiment in a first aspect, provide a kind of big data operation chip, the operation chip packet Include N number of kernel core, N number of data channel lane and at least one storage unit, wherein N is the integer more than or equal to 4； The data channel lane includes transmission interface (tx) and receiving interface (rx), and the kernel core and data channel lane are one by one Corresponding, the kernel core sends and receives data by data channel lane；The operation chip passes through the transmission interface (tx) carry out data transmission with the receiving interface (rx) and chip exterior；Each of N number of kernel core kernel core It is all connected at least one described storage unit.

According to the second aspect of the utility model embodiment, a kind of big data operation chip, the operation chip packet are provided Include N number of kernel core and at least one built-in storage unit, which is characterized in that the storage unit include multiple memories and At least one storage control unit, the memory include at least two storing sub-units and storage control subelement；Each institute Storage control subelement is stated to connect by each of interface and at least one described storage control unit；The storage control Subelement is used to control the reading data or storage of at least two storing sub-units；At least one described storage control is single Member is for controlling the reading data or storage of the multiple memory.

The utility model embodiment is held by the way that multiple kernel core, each kernel core are arranged in big data operation chip Row operation and storage control function, and portion gives each kernel core to connect at least one storage unit in the chip, it is every in this way A kernel passes through the data read in the storage unit that the storage unit that oneself is connected is connected with other operation chip cores, so that Each kernel can have large capacity memory, reduce the number that memory was moved in from external memory space or moved out to data, Accelerate the processing speed of data；Simultaneously as multiple kernels can independently operation or collaboration operation, also accelerate in this way The processing speeds of data.Multiple asic chip shared memory cells, this not only reduces the quantity of storage unit, also reduce Connecting line between ASIC operation chip simplifies system construction, reduces the cost of asic chip.

Detailed description of the invention

In order to illustrate the embodiment of the utility model or the technical proposal in the existing technology more clearly, below will be to embodiment Or attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments are exemplary, for those of ordinary skill in the art, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.

Fig. 1 illustrates that first embodiment has the schematic diagram of the big data operation acceleration system structure of M asic chip；

Fig. 2 illustrates the operation chip structure schematic diagram with 4 kernels；

Fig. 3 illustrates the structural schematic diagram of data channel lane；

Fig. 4 illustrates the structural schematic diagram of storage unit first embodiment

Fig. 5 illustrates the structural schematic diagram of storage unit second embodiment；

Fig. 6 illustrates the schematic diagram of big data operation acceleration system data transmission procedure；

Fig. 7 illustrates that first embodiment has the operation chip signal flow diagram of 4 kernels；

Fig. 8 illustrates data structure schematic diagram according to the present utility model.

Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.It will be based on attached drawing below Illustrate the illustrative embodiments of the utility model, it should be understood that provide these embodiments just for the sake of making ability Field technique personnel better understood when and then realize the utility model, and not limit the model of the utility model in any way It encloses.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be complete by the scope of the present disclosure Site preparation is communicated to those skilled in the art.

Furthermore, it is necessary to specification, all directions of the upper and lower, left and right in each attached drawing are only with specific embodiment The illustration of progress, those skilled in the art part or all by each component shown in the drawings can change according to actual needs It changing direction to apply, integrally realizing its function without will affect each component or system, this technical solution for changing direction is still Belong to the protection scope of the utility model.

Multi core chip is the multiprocessing system for being embodied in single large-scale integrated semiconductor core on piece.Typically, two Or more chip core can be embodied on multi core chip, this (can also be formed on identical multi core chip by bus Bus) it is interconnected.Can have from two chip cores to many chip cores and be embodied on identical multi core chip, In The upper limit in the quantity of chip core is only limited by manufacturing capacity and performance constraints.Multi core chip can have application, this is answered With included in multimedia and signal processing algorithm (such as, encoding and decoding of video, 2D/3D figure, audio and speech processes, image Processing, phone, speech recognition and sound rendering, encryption) in execute special arithmetic and/or logical operation.

Although having referred only ASIC specific integrated circuit in the background technology, the specific wiring in embodiment is realized Mode can be applied to in multi core chip CPU, GPU, FPGA etc..Multiple kernels can be in identical in the present embodiment Core is also possible to different kernels.

Fig. 1 illustrates that first embodiment has the schematic diagram of the big data operation acceleration system structure of M asic chip.Such as figure Shown in 1, big data operation acceleration system includes M ASIC operation chip, and wherein M is positive integer more than or equal to 2, such as can be with It is 6,10,12 etc..The operation chip includes that multiple kernel core (core0, core1, core2, core3), 4 data are logical Road (lane0, lane1, lane2lane3), the data channel lane include transmission interface (tx) and receiving interface (rx), institute It states kernel core and data channel lane to correspond, such as the kernel core0 of operation chip 10 has data channel (lane0), data channel (lane0) has transmission interface (lane0tx) and receiving interface (lane0rx), and data channel is sent Interface (lane0tx) connects for kernel core0 to transmission data outside the operation chip 10 or control instruction, data channel Receive external data from the operation chip 10 to kernel core0 or control instruction that interface (lane0rx) is used to send.Such institute It states M operation chip to be attached by the transmission interface (tx) and the receiving interface (rx), so as to data or control Instruction transmission.M operation chip forms a closed-loop shaped.Storage unit, the operation chip are set in each operation chip In 4 kernel core be all connected to storage unit, the storage unit of the M operation chip is used for distributed storage data, Operation chip core core can obtain data from the storage unit of this operation chip, can also be from the storage of other operation chips Unit obtains data.4 kernel core in the operation chip are all connected to storage unit, are also achieved by storage unit The purpose of 4 kernel core data interactions in the operation chip.And skilled person will appreciate that, it selects in 4 here For core, it is only exemplary explanation, kernel number can be N, and wherein N is positive integer more than or equal to 4, such as can be 6, 10,12 etc..Multiple kernels can be same kernel in the present embodiment, be also possible to different kernels.

The transmission interface (lane tx) and receiving interface (lane rx) of data channel lane is serdes interface, the fortune It calculates and is communicated between chip by serdes interface.Serdes is English SERializer (serializer)/DESerializer The abbreviation of (deserializer).It is the serial communication technology of a kind of time division multiplexing (TDM) of mainstream, point-to-point (P2P).Exist Transmitting terminal multi-path low speed parallel signal is converted into high-speed serial signals, by transmission media (optical cable or copper wire), is finally connecing Receiving end high-speed serial signals are re-converted into speed parallel signals.This point-to-point serial communication technology makes full use of transmission matchmaker The channel capacity of body reduces required transmission channel and device pin number, the transmission speed of promotion signal, to substantially reduce Communications cost.Certainly, serdes interface can also be replaced using other communication interfaces here, such as: SSI, UATR.Chip it Between by serdes interface carry out data and control instruction transmission.

Fig. 2 illustrates the first embodiment of the operation chip structure schematic diagram with 4 kernels.And those skilled in the art can Know, here select 4 kernels for, be only exemplary explanation, the number of operation chip core can be N, wherein N for greater than Positive integer equal to 2, such as can be 6,10,12 etc..Operation chip core can be with identical function in the present embodiment The kernel of energy, is also possible to the kernel of different function.

The operation chip (1) of 4 kernels includes 4 kernel core (core0, core1, core2, core3), 4 data Channel (lane0, lane1, lane2lane3) and at least one storage unit, a data switching control unit, specific number It is UART control unit according to switching control unit, each data channel lane includes a transmission interface (lane tx) and one Receiving interface (lane rx).

The kernel core0 of operation chip (1) is connected to the transmission interface (lane0tx) and receiving interface of data channel (lane0rx), data channel transmission interface (lane0tx) is used for the operation chip that kernel core0 is connected to the operation chip 1 Data or control instruction are sent, data channel receiving interface (lane0rx) is used to send the operation chip to kernel core0 (1) data or control instruction of the operation chip transmission connected.Similarly, it is logical to be connected to data by the kernel core1 of operation chip 1 The transmission interface (lane1tx) and receiving interface (lane1rx) in road；The kernel core2 of operation chip 1 is connected to data channel Transmission interface (lane2tx) and receiving interface (lane2rx), the kernel core3 of operation chip 1 are connected to the transmission of data channel Interface (lane3tx) and receiving interface (lane3rx).The transmission interface (lane tx) and receiving interface of data channel lane (lane rx) is serdes interface.

One data switching control unit by bus be connected to storage unit and 4 kernel core (core0, core1, Core2, core3), bus does not mark in fig 2.Data exchange control unit can use various protocols be realized, Such as UART, SPI, PCIE, SERDES, USB etc., data exchange control unit is UART (Universal in the present embodiment Asynchronous Receiver/Transmitter) control unit.Universal asynchronous receiving-transmitting transmitter is commonly referred to as UART, is A kind of asynchronous receiving-transmitting transmitter, the data that it will be transmitted are converted between serial communication and parallel communications, and UART is usual It is integrated in the connection of various communication interfaces.But only said by taking UART protocol as an example here, other can also be used Agreement.UART control unit receives external data, according to external data address by external data be sent to kernel core (core0, Core1, core2, core3) or storage unit.UART control unit can also receive external control instruction, to kernel Core (core0, core1, core2, core3) or storage unit send control instruction；Can be used for operation chip to its His operation chip sends either internally or externally control instruction, receives control instruction from other chips, and to external feedback op Or intermediate data etc. as a result.The internal data or internal control instruction refer to data or control that chip itself generates System instruction, the external data or external control instruction refer to data or control instruction that chip exterior generates, such as outer The data or control instruction that portion's host, external network are sent.

The major function of kernel core (core0, core1, core2, core3) is to execute external or internal control to refer to It enables, execute the functions such as data calculate and the storage of data controls.In the operation chip kernel core (core0, core1, Core2, core3) it is all connected to storage unit, the storage unit of Xiang Suoshu operation chip is read or write-in data, by depositing Storage unit realizes multiple kernel core data interactions in the operation chip；It can also be to the storage list of the operation chip Member sends control command.Kernel core (core0, core1, core2, core3) according to instruction by serdes interface to other The storage unit write-in data of operation chip read data or send control instruction to the storage unit of other operation chips； Kernel core (core0, core1, core2, core3) can also pass through serdes interface to other operation chips according to instruction Kernel core sends data, reads data or sends control instruction to the kernel core of other operation chips.

Fig. 3 illustrates the first embodiment of the structural schematic diagram of data channel lane.The data channel lane includes receiving Interface, transmission interface receive address judging unit, send address judging unit and multiple registers；Receive address judging unit One end is connected to receiving interface, receives the address judging unit other end by register and is connected to kernel core；Address is sent to sentence Disconnected unit one end is connected to transmission interface (tx), is sent the address judging unit other end by register and is connected to kernel core； It receives address judging unit and sends address judging unit and be connected with each other by register.Receiving interface receives receiving interface connection The data frame that sends of adjacent side operation chip perhaps the data frame or control instruction are sent to reception by control instruction Address judging unit receives address judging unit for the data frame or control instruction and is sent to kernel core, simultaneously will The data frame or control instruction, which are sent to, sends address judging unit；Send address judging unit receive the data frame or The data frame or control instruction are sent to transmission interface (tx) by person's control instruction, transmission interface by the data frame or Person's control instruction is sent to the adjacent other side operation chip of transmission interface connection.Kernel core generates data frame or control refers to It enables, the data frame or control instruction is sent to and send address judging unit, send address judging unit for the data Perhaps control instruction is sent to transmission interface transmission interface the data frame or control instruction is sent to adjacent side frame Run the receiving interface of chip.The effect of register is temporal data frame or control instruction.

Fig. 4 illustrates the first embodiment of the structural schematic diagram of storage unit.N number of kernel is contained in each operation chip Core, they need concurrent random access data, if the order of magnitude of N reach 64 and it is above if, need operation core The memory bandwidth of piece reaches the very high order of magnitude, even GDDR is also extremely difficult to so high bandwidth.Therefore, practical at this New embodiment, which has been suffered, provides high bandwidth using the mode of SRAM array and big MUX routing.System as shown in Figure 4 is by two-level memory Control unit composition, to slow down congestion problem when realization.The storage unit (40) includes 8 memories (410 ... 417), 8 memories (410 ... 417) are connected to storage control unit (420)；The storage control unit is for controlling Make the reading data or storage of the multiple memory.The memory (410 ... 417) includes that at least two storage are single Member and a storage control subelement；Storage control subelement is connect by interface with the storage control unit, the storage Control subelement is used to control the reading data or storage of at least two storing sub-units.The storing sub-units are SRAM memory.

Fig. 5 illustrates the second embodiment of the structural schematic diagram of storage unit.It can be set in storage unit in figure 5 more A storage control unit (420,421,422,423), each kernel core and the multiple storage control unit (420,421, Each of 422,423) it is connected, each storage control unit and each memory (410 ... 417) connect.Storage The structure of device is identical in attached drawing 4, does not just describe again here.

The data of generation are sent at least one storage control unit by kernel core, at least one described storage control is single Member sends the data to the storage control subelement, and the storage control subelement stores data into storing sub-units. Operation chip core core obtains the acquisition data command of other operation chips transmission, and operation chip core core is by data Location judges whether data are stored in the storage unit of this operation chip, and if so, single at least one described storage control Member sends data read command；Data read command is sent to corresponding storage control son list by least one storage control unit Member, storage control subelement obtain data from storing sub-units, and the acquisition data are sent at least by storage control subelement The acquisition data are sent to kernel core by one storage control unit, at least one storage control unit, and kernel core will The acquisition data, which are sent to, sends address judging unit, and the acquisition data are sent to transmission and connect by transmission address judging unit The acquisition data are sent to adjacent operation chip by mouth (tx), transmission interface.

The big data operation acceleration system is applied in artificial intelligence field, and the UART control unit of operation chip will be external The image data or video data that host is sent are stored by kernel core into storage unit, and operation chip generates nerve net The mathematical model of network, the mathematical model can also be stored by UART control unit to storage unit, by each by external host Operation chip is read.Run neural network first layer mathematical model on operation chip, the kernel core of operation chip is from this fortune The storage unit of the storage unit and/or other operation chips of calculating chip reads data and carries out operation, and operation result is passed through The storage of serdes interface arrives this operation core at least one storage unit in the storage unit of other operation chips, or storage The storage unit of piece.Operation chip (1) is sent by UART control unit or serdes interface to next operation chip (2) Control instruction starts next operation chip (2) and carries out operation.Neural network second is run on next operation chip (2) Layer mathematical model, the kernel core of next operation chip is from the storage unit of this operation chip and/or other operation chips Storage unit reads data and carries out operation, and operation result is arrived to the storage list of other operation chips by the storage of serdes interface The storage unit of this operation chip is arrived at least one storage unit in member, or storage.Each chip executes neural network In one layer, number is obtained from the storage units of other operation chips or the storage unit of this operation chip by serdes interface According to operation is carried out, operation result only is calculated to neural network the last layer.Operation chip from local storage unit or other Operation result is obtained in the storage unit of operation chip, external host is fed back to by UART control unit.

Big data operation acceleration system is applied in encryption digital cash field, the UART control unit of operation chip (1) The block information that external host is sent is stored at least one storage unit in multiple storage units of multiple operation chips. External host sends control instruction to M operation chip by operation chip (1 ... M) UART control unit and carries out data fortune It calculates, M operation chip starts arithmetic operation.Can certainly external host to operation chip (1) UART control unit (130) it sends control instruction and carries out data operation, operation chip (1) successively sends control instruction to other M-1 operation chip Data operation is carried out, M operation chip starts arithmetic operation.It can also be controlled with external host to operation chip (1) UART Unit sends control instruction and carries out data operation, and the first operation chip (1) sends control instruction to the second operation chip (2) and carries out Data operation, the second operation chip (2) send control instruction to third operation chip (3) and carry out data operation, third operation core Piece (3) sends control instruction to the 4th operation chip (4) and carries out data operation, and M operation chip starts arithmetic operation.M fortune It calculates chip and data is obtained from the storage unit of other operation chips or the storage unit of this operation chip by serdes interface Operation is carried out, M operation chip carries out proof of work operation simultaneously, and operation chip (1) obtains operation result from storage unit, External host is fed back to by UART control unit.

Fig. 6 illustrates the first embodiment of the schematic diagram of big data operation acceleration system data transmission procedure.Each operation core Piece completes the work of 1/n, after each operation chip completes its responsible data, because of data dependence, it is necessary to which it is calculated Complete result is transferred to other all chips.Operation chip n-1 is the source operation chip of data frame, and data pass through Lane1tx is sent to operation chip 0；In operation chip 0, data frame can be divided into two-way propagation, and the first via is sent to operation The kernel core of chip 0, outside all the way in the channel lane1tx for being forwarded to operation chip 0, such data frame can be sent To operation chip 1.

Source ID mechanism: each data frame carries the operation chip id in the source of the data frame, whenever the data frame quilt It is sent to after a new operation chip, the operation chip id in operation chip meeting detection data frame, if it find that the fortune When calculation chip id is equal with the ID of next operation chip that this operation chip is connected, then the data frame would not It is forwarded again, also means that the life cycle of the data frame terminates here, also no longer occupied bandwidth.Operation chip can detect Operation chip id in data frame can be to carry out in kernel core, is also possible to carry out in receiving address judging unit.

Fig. 7 illustrates that first embodiment has the operation chip signal flow diagram of 4 kernels.The UART control is single First (130) are used to obtain chip exterior data, and perhaps external data or control instruction are transferred to by control instruction and UART is controlled The kernel core (110) of unit connection processed.External data is transferred to depositing for this chip according to data address by kernel core (110) Storage unit (120) is stored or data are sent to by kernel core (110) according to data address by signal path lane Data address corresponding other chip cores core, other chip cores core are stored data into local storage unit. Kernel core (110) is executed by this operation chip core core according to external control instruction address or is passed through signal path lane Corresponding other chip cores core in control instruction address is sent to execute.If this operation chip core core needs to obtain number According to then kernel core can obtain data from local storage unit, can also obtain number from the storage unit of other operation chips According to.When obtaining data from the storage unit of other operation chips, kernel core (110) will acquire DC instruction and pass through certainly The serdes interface (150) of body connection will acquire the operation chip that DC instruction is broadcast to connection；The operation chip of connection It will acquire DC instruction and be divided into two-way, be sent to kernel core all the way, another way is forwarded to next chip.If connection Operation chip judge that data are stored in local storage unit, then kernel core reads data from storage unit, passes through Serdes interface is sent to the arithmetic element for issuing and obtaining DC instruction.Certainly, the control instruction between operation chip It can be sent by UART control unit.Kernel core is instructed according to external control instruction or internal control by operation knot Fruit when perhaps intermediate data feeds back to outside kernel core from the storage unit of this operation chip or by serdes interface from The storage units of other operation chips obtains operation result, and perhaps operation result or intermediate data are passed through UART by intermediate data Control unit is sent to outside.Outside described here can refer to external host, external network or external platform etc..It is external Host can carry out unified addressing to multiple storage units by UART control unit initial configuration storage unit parameter.

Certainly, kernel core is calculated according to the data of acquisition, and by calculated result storage into storage unit.Each Proprietary storage region and shared storage area are set in storage unit；The proprietary storage region is for storing an operation chip Interim operation result, which is the results of intermediate calculations that continues with of one operation chip, and other The results of intermediate calculations that operation chip not will use；The shared storage area is used to store the operational data knot of operation chip Fruit, the operational data result are used by other operation chips, or need to carry out feedback transmission to outside.

The utility model embodiment executes operation and deposits by the way that multiple kernel core, each kernel core are arranged in the chips Control function is stored up, and portion gives each kernel core to connect at least one storage unit in the chip, kernel each in this way passes through The storage unit that the storage unit of oneself connection is connected with other kernels is read, so that each kernel can have in large capacity It deposits, reduces the number that memory was moved in from external memory space or moved out to data, accelerate the processing speed of data；Together When, due to multiple kernels can independently operation or collaboration operation, also accelerate the processing speed of data in this way.

Fig. 8 illustrates data structure schematic diagram according to the present utility model.Data mentioned here are order data, numerical value number According to a variety of data such as, character datas.Data format specifically include significance bit valid, destination address dstid, source address srcid and Data data.Kernel can judge that the data packet is order or numerical value by significance bit valid, can be assumed for 0 generation here Table numerical value, 1 represents order.Kernel can judge destination address, source address and data type according to data structure.Such as in attached drawing 1 Kind, kernel 50 sends data read command to kernel 10, then significance bit is 1, and destination address is the address of kernel 10, source address is The address of kernel 50 and data data are to read data command and data type or data address etc..Kernel 10 is to kernel 10 Data are sent, then significance bit is 0, and destination address is that the address of kernel 50, the address that source address is kernel 0 and data data are reading The data taken.From instruction operation timing in view of, in the present embodiment use six traditional stage pipeline structures, respectively fetching, Decoding, execution, memory access, alignment and Write-back stage.From instruction set architecture, reduced instruction set computer framework can be taken.According to essence The general design method of simple instruction set architecture, the utility model instruction set can be divided into the instruction of register registers type by function, The instruction of register immediate, jump instruction, access instruction, control instruction and intercore communication instruction.

Using description provided herein, embodiment can be realized by using the programming and/or engineering technology of standard At machine, process or manufacture to generate programming software, firmware, hardware or any combination thereof.

The program (multiple) (having computer readable program code) of any generation can be embodied in one or more On medium workable for computer, such as resident storage equipment, smart card or other movable memory equipments or transmission equipment, To make computer program product and manufacture according to embodiment.As such, as used in this article term " manufacture " and " computer program product " is intended to cover permanently or temporarily non-transitory in the presence of can be used in any computer Medium on computer program.

As noted above, memory/storage is (all including but not limited to disk, CD, movable memory equipment Such as smart card, subscriber identity module (SIM), wireless identity module (WIM)), semiconductor memory (such as random access memory (RAM), read-only memory (ROM), programmable read only memory (PROM)) etc..Transmission medium is including but not limited to via wireless Communication network, internet, intranet, the network communication based on telephone/modem, hard-wired/cabled communication network, satellite Communication and other fixations or the transmission of mobile network system/communication link.

Although specific example embodiment has been disclosed, it will be appreciated by those skilled in the art that not carrying on the back In the case where spirit and scope from the utility model, specific example embodiments can be changed.

Above with reference to attached drawing, the utility model is illustrated based on embodiment, but the utility model be not limited to it is above-mentioned Embodiment, according to layout need etc. by the part of each embodiment and each variation constitute it is appropriately combined or displacement after side Case is also contained in the scope of the utility model.Furthermore it is also possible to which the knowledge based on those skilled in the art suitably recombinates each reality The combination and processing sequence of mode are applied, or the deformation such as various design alterations is applied to each embodiment, has been applied such The embodiment of deformation may also be included in the scope of the utility model.

Although each conception of species has already been described in detail in the utility model, it will be appreciated by a person skilled in the art that for those What the various modifications and substitution of concept can be achieved under the spirit disclosed by the utility model integrally instructed.Art technology Personnel can realize that sheet illustrated in detail in the claims is practical in the case where being not necessarily to undue experimentation with ordinary skill It is novel.It is understood that disclosed specific concept is merely illustrative, it is not intended to limit the model of the utility model It encloses, the scope of the utility model is determined by the full scope of the appended claims and its equivalent program.

Claims

1. a kind of big data operation chip, which is characterized in that the operation chip includes N number of kernel core, N number of data channel Lane and at least one storage unit, wherein N is the integer more than or equal to 4；

The data channel lane includes transmission interface (tx) and receiving interface (rx), the kernel core and data channel lane It corresponds, the kernel core sends and receives data by data channel lane；The operation chip passes through the transmission Interface (tx) and the receiving interface (rx) and chip exterior carry out data transmission；

Each of N number of kernel core kernel core is all connected at least one described storage unit.

2. chip according to claim 1, which is characterized in that the transmission interface (tx) of the operation chip and described Receiving interface (rx) is serdes interface.

3. chip according to claim 1 or 2, which is characterized in that the data channel lane further comprises receiving ground Location judging unit sends address judging unit；It receives address judging unit one end to be connected to receiving interface (rx), receives address and sentence The disconnected unit other end is connected to kernel core；It sends address judging unit one end to be connected to transmission interface (tx), sends address and sentence The disconnected unit other end is connected to kernel core；It receives address judging unit and sends address judging unit and be connected with each other.

4. chip according to claim 3, which is characterized in that receiving interface (rx) receives data frame, by the data frame It is sent to and receives address judging unit, receive address judging unit for the data frame and be sent to kernel core, while by institute It states data frame and is sent to transmission address judging unit；It sends address judging unit and receives the data frame, the data frame is sent out It gives transmission interface (tx), transmission interface (rx) sends the data frame.

5. chip according to claim 3, which is characterized in that kernel core generates data frame, and the data frame is sent It to address judging unit is sent, sends address judging unit and is sent to the data frame transmission interface (tx), transmission interface will The data frame is sent.

6. chip according to claim 3, which is characterized in that the reception address judging unit and the judgement of transmission address are single Member is connected with each other by push-up storage.

7. chip according to claim 1 or 2, which is characterized in that the storage unit includes multiple memories, described more A memory is connected at least one storage control unit；At least one described storage control unit is for controlling the multiple deposit The reading data or storage of reservoir.

8. chip according to claim 7, which is characterized in that the memory includes at least two storing sub-units and deposits Storage control subelement；Storage control subelement passes through each of interface and at least one storage control unit and connect, The storage control subelement is used to control the reading data or storage of at least two storing sub-units.

9. chip according to claim 8, which is characterized in that the storing sub-units are SRAM memory.

10. chip according to claim 1 or 2, which is characterized in that the operation chip is non-company with external memory unit Connect state.

11. chip according to claim 1 or 2, which is characterized in that the operation chip further comprises that UART control is single First (130) are connected with external host, for receiving external data or control instruction.

12. chip according to claim 6, which is characterized in that N number of kernel core and at least one described storage control Each of unit processed is connected；According to the operational order of N number of kernel core, data are read and write from multiple memories.

13. chip according to claim 12, which is characterized in that kernel core by the data of generation be sent to it is described at least One storage control unit, at least one described storage control unit sends the data to the storage control subelement, described Storage control subelement stores data into storing sub-units.

14. chip according to claim 1 or 2, which is characterized in that the operation chip is rolled up for executing cryptographic calculation One or more of product calculating.

15. chip according to claim 11, which is characterized in that at least one described UART control unit (130) receives The storage unit of operation chip described in external command initial configuration, it is single to storage in the storage unit of the operation chip Member carries out unified addressing.

16. chip according to claim 11, which is characterized in that the operation chip can pass through at least one described UART Control unit (130) is outward transmitted calculated result.

17. chip according to claim 1 or 2, which is characterized in that the kernel core is calculated for data, and data are deposited Storage control.

18. a kind of big data operation chip, the operation chip includes N number of kernel core and at least one built-in storage unit, N For the integer more than or equal to 4；It is characterized in that, the storage unit includes multiple memories and at least one storage control Unit, the memory include at least two storing sub-units and storage control subelement；Each storage controls subelement It is connect by each of interface and at least one described storage control unit；The storage control subelement is for controlling institute State the reading data or storage of at least two storing sub-units；At least one described storage control unit is described more for controlling The reading data or storage of a memory.

19. chip according to claim 18, which is characterized in that N number of kernel core be each all connected to it is described extremely A few storage unit.

20. chip according to claim 19, which is characterized in that the storing sub-units are SRAM memory.

21. chip according to claim 19, which is characterized in that N number of kernel core and at least one described storage Each of control unit is connected；According to the operational order of N number of kernel core, number is read and write from the multiple memory According to.

22. chip according to claim 19, which is characterized in that kernel core by the data of generation be sent to it is described at least One storage control unit, at least one described storage control unit sends the data to the storage control subelement, described Storage control subelement stores data into storing sub-units.