WO2020087276A1 - Système d'accélération de fonctionnement de mégadonnées et puce - Google Patents
Système d'accélération de fonctionnement de mégadonnées et puce Download PDFInfo
- Publication number
- WO2020087276A1 WO2020087276A1 PCT/CN2018/112688 CN2018112688W WO2020087276A1 WO 2020087276 A1 WO2020087276 A1 WO 2020087276A1 CN 2018112688 W CN2018112688 W CN 2018112688W WO 2020087276 A1 WO2020087276 A1 WO 2020087276A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- core
- chip
- unit
- storage
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G06F15/17312—Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G06F15/17375—One dimensional, e.g. linear array, ring
Definitions
- Embodiments of the present invention relate to the field of integrated circuits, and in particular, to a big data operation acceleration system and chip.
- ASIC Application Specific Integrated Circuits
- ASIC Application Specific Integrated Circuits
- ASICs application-specific integrated circuits
- the characteristics of ASICs are to meet the needs of specific users.
- ASICs Compared with general-purpose integrated circuits, ASICs have the advantages of smaller size, lower power consumption, improved reliability, improved performance, enhanced confidentiality, and lower costs.
- the embodiment of the present invention provides a big data operation acceleration system and a chip, and two or more ASIC operation chips are respectively connected to more than two storage units through a bus, and the operation chip performs data exchange through the storage unit, which not only reduces the
- the number of storage units also reduces the connection lines between ASIC operation chips, simplifies the system structure, and each ASIC operation chip is connected to multiple storage units separately, which will not cause conflicts when using the bus mode, and it is not necessary for each One ASIC operation chip sets Cache.
- a big data operation acceleration system including more than two operation chips and more than two storage units, wherein:
- the arithmetic chip includes at least one first data interface (130), more than two second data interfaces (150, 151, 152, 153), at least two cores (110, 111, 112, 113), and a routing unit ( 120); the at least one first data interface (130) and more than two second data interfaces (150, 151, 152, 153) are respectively connected to the routing unit, the routing unit and the at least two cores core (110, 111, 112, 113) connected;
- the storage unit includes more than two third data interfaces (250, 251, 252, 253); the storage unit (20) includes more than two memories, a routing unit (230) and more than two third data interfaces (250 , 251, 252, 253); the two or more third data interfaces (250, 251, 252, 253) are respectively connected to the routing unit through a bus, and the routing unit is connected to the two or more memories.
- the second data interface (150, 151, 152, 153) of the arithmetic chip is connected to the third data interface (250, 251, 252, 253) of the storage unit through a bus.
- a big data operation acceleration system including more than two operation chips and more than two storage units, wherein:
- the arithmetic chip includes at least one first data interface (130), more than two second data interfaces (150, 151, 152, 153), at least two cores (110, 111, 112, 113), and a routing unit ( 120); each second data interface is connected to a core core, the at least two core cores are connected to the routing unit, and the at least one first data interface (130) is connected to a core core (110);
- the storage unit includes more than two third data interfaces (250, 251, 252, 253); the storage unit (20) includes more than two memories, a routing unit (230) and more than two third data interfaces (250 , 251, 252, 253); the two or more third data interfaces (250, 251, 252, 253) are respectively connected to the routing unit through a bus, and the routing unit is connected to the two or more memories;
- the second data interface (150, 151, 152, 153) of the arithmetic chip is connected to the third data interface (250, 251, 252, 253) of the storage unit through a bus.
- a big data operation chip includes at least one first data interface (130) and more than two second data interfaces (150, 151, 152, 153), at least two cores (110, 111, 112, 113), a routing unit (120); the at least one first data interface (130) and more than two second data interfaces (150, 151, 152, 153) Connected to the routing unit respectively, the routing unit is connected to the at least two cores (110, 111, 112, 113); the second data interface and the third data interface are serdes interfaces; The second data interface (150, 151, 152, 153) of the arithmetic chip is connected to the storage unit through a bus.
- a big data operation chip includes at least one first data interface (130) and more than two second data interfaces (150, 151, 152, 153), at least two core cores (110, 111, 112, 113), a routing unit (120); each second data interface is connected to a core core, the at least two core cores are connected to the routing unit, so The at least one first data interface (130) is connected to a core core (110); the second data interface and the third data interface are serdes interfaces; the second data interface (150, 151, 152, 153) Connect to the storage unit through the bus.
- the embodiment of the present invention achieves the technical effect of saving the number of memory units by reducing the number of memory units by connecting multiple operation chips in the big data operation acceleration system to each memory unit, and reducing the connection cost between ASIC operation chips.
- the system structure is simplified, and each ASIC computing chip is connected to multiple storage units respectively, which will not cause conflicts when using the bus mode, and there is no need to set Cache for each ASIC computing chip.
- FIG. 1 illustrates a first embodiment of a schematic structural diagram of a big data operation acceleration system having 4 operation chips and 4 storage units;
- FIG. 2a illustrates a first embodiment of a schematic structural diagram of an arithmetic chip with 4 cores
- 2b illustrates a schematic diagram of a signal flow of an arithmetic chip with 4 cores in the first embodiment
- FIG. 3a illustrates a second embodiment of a schematic structural diagram of an arithmetic chip with 4 cores
- 3b illustrates a schematic diagram of a signal flow of an arithmetic chip with 4 cores in the second embodiment
- 4a illustrates a third embodiment of a schematic structural diagram of a storage unit corresponding to an arithmetic chip having 4 cores
- 4b illustrates a schematic diagram of a signal flow of a storage unit corresponding to an arithmetic chip having 4 cores in the third embodiment
- FIG. 5 illustrates a schematic diagram of a connection structure of a big data operation acceleration system with 4 operation chips and 4 storage units;
- FIG. 6 illustrates a schematic diagram of the data structure according to this embodiment
- Multi-core chips are multi-processing systems embodied on a single large-scale integrated semiconductor chip.
- two or more chip cores may be embodied on a multi-core chip chip, interconnected by a bus (which may also be formed on the same multi-core chip chip).
- a bus which may also be formed on the same multi-core chip chip.
- Multi-core chips can have applications that are implemented in multimedia and signal processing algorithms (such as video encoding / decoding, 2D / 3D graphics, audio and voice processing, image processing, telephony, voice recognition and voice synthesis, encryption processing) Special arithmetic and / or logical operations.
- ASIC-specific integrated circuits are mentioned in the background art, the specific wiring implementation in the embodiments can be applied to CPUs, GPUs, FPGAs, etc. that have multi-core chips.
- multiple cores may be the same core or different cores.
- the number of operation chips may be N, where N is a positive integer greater than or equal to 2, for example, 6, 10, 12, and so on.
- the number of storage units may be M, where M is a positive integer greater than or equal to 2, for example, 6, 9, 12, etc.
- N and M may be equal, or do not want to wait.
- a plurality of arithmetic chips may be the same arithmetic chip or different arithmetic chips.
- FIG. 1 is a first embodiment of a schematic structural diagram of a big data operation acceleration system having 4 operation chips and 4 storage units.
- the big data computing acceleration system includes 4 computing chips (10, 11, 12, 13) and 4 storage units (20, 21, 22, 23); each computing chip through the bus and all storage The units are connected, and the arithmetic chips exchange data through the storage unit. The arithmetic chips do not directly exchange data; the control instructions are sent between the arithmetic chips.
- Each storage unit is provided with a dedicated storage area and a shared storage area; the dedicated storage area is used to store a temporary calculation result of one arithmetic chip, and the temporary calculation result is an intermediate calculation result that the one arithmetic chip continues to use, and Intermediate calculation results that will not be used by other arithmetic chips; the shared storage area is used to store the data arithmetic results of the arithmetic chips.
- the data arithmetic results are used by other arithmetic chips, or need to be transmitted to the outside for feedback transmission.
- the storage unit may not be divided.
- the storage unit here may be a high-speed external memory such as DDR, SDDR, DDR2, DDR3, DDR4, GDDR5, GDDR6, HMC, HBM, etc.
- the storage unit preferably selects DDR series memory, DDR (Dual Data Rate) memory is double rate synchronous dynamic random access memory.
- DDR uses a synchronization circuit to ensure that the main steps of the specified address and data transmission and output are not only executed independently, but also fully synchronized with the CPU;
- DDR uses DLL (Delay Locked Loop, delay lock loop to provide a data filter signal) technology, when When the data is valid, the memory controller can use this data filter signal to accurately locate the data, output it every 16 times, and resynchronize the data from different memory modules.
- DLL Delay Locked Loop, delay lock loop to provide a data filter signal
- the frequency of DDR memory can be expressed in two ways: operating frequency and equivalent frequency.
- the operating frequency is the actual operating frequency of the memory particles, but since DDR memory can transmit data on both the rising and falling edges of the pulse, the equivalent frequency of the transmitted data It is twice the operating frequency.
- DDR2 (Double Data Rate 2) memory is a new generation memory technology standard developed by JEDEC (Joint Committee for Electronic Equipment Engineering). DDR2 memory can read / write data at 4 times the speed of the external bus per clock and can be controlled internally The bus runs at 4 times the speed.
- DDR3, DDR4, GDDR5, GDDR6, HMC, HBM memory are all existing technologies, and will not be described in detail here.
- ASIC operation chips are connected to 4 storage units through a bus, and the operation chips exchange data through the storage units, which not only reduces the number of storage units, but also reduces the connection lines between ASIC operation chips,
- the system structure is simplified, and each ASIC computing chip is connected to multiple storage units respectively, which will not cause conflicts when using the bus mode, and there is no need to set Cache for each ASIC computing chip.
- FIG. 2a illustrates a first embodiment of a schematic structural diagram of an arithmetic chip with 4 cores.
- the number of cores of the arithmetic chip may be Q, where Q is a positive integer greater than or equal to 2, for example, 6, 10, 12 etc. Wait.
- the core of the arithmetic chip may be a core with the same function or a core with different functions.
- the operation chip (10) of 4 cores includes 4 cores (110, 111, 112, 113), a routing unit (120), a data exchange control unit (130) and 4 serdes interfaces (150, 151, 152) , 153).
- a data exchange control unit and four serdes interfaces are respectively connected to the routing unit through the bus, and the routing unit is connected to each core core.
- the data exchange control unit can be implemented using multiple protocols, such as UART, SPI, PCIE, SERDES, USB, etc.
- the data exchange control unit is a UART (Universal Asynchronous Receiver / Transmitter) control unit (130). Universal asynchronous transceiver is usually called UART, which is an asynchronous transceiver.
- UART converts the data to be transmitted between serial communication and parallel communication.
- UART is usually integrated on the connection of various communication interfaces. But here is just taking the UART protocol as an example, other protocols can also be used.
- the UART control unit (130) can receive external data or control commands, send control commands to other chips, receive control commands from other chips, and feed back calculation results or intermediate data to the outside.
- Serdes is the abbreviation of English SERializer (serializer) / DESerializer (deserializer). It is a mainstream time division multiplexing (TDM) and point-to-point (P2P) serial communication technology. That is, multiple low-speed parallel signals at the transmitting end are converted into high-speed serial signals, and then through the transmission medium (optical cable or copper wire), and finally the high-speed serial signals at the receiving end are re-converted into low-speed parallel signals.
- TDM time division multiplexing
- Other communication interfaces can also be used instead of the serdes interface, for example: SSI, UATR. Data and control commands are transmitted between the chip and the storage unit through the serdes interface and the transmission line.
- the core core's main functions are to execute external or internal control instructions, perform data calculation, and data storage control.
- the routing unit is used to send data or control instructions to the core core (110, 111, 112, 113), and accepts data or control instructions sent by the core core (110, 111, 112, 113) to implement communication between the core cores.
- the routing unit and the UART control unit (130) accept external control instructions and send control instructions to each core core (110, 111, 112, 113); the UART control unit (130) accepts external data and converts the external data according to the external data address Send to the core (110, 111, 112, 113) or storage unit.
- the internal data or internal control commands refer to data or control commands generated by the chip itself, and the external data or external control commands refer to data or control commands generated outside the chip, such as data or control sent by an external host or an external network instruction.
- FIG. 2b illustrates a schematic diagram of a signal flow of an arithmetic chip with four cores in the first embodiment.
- the UART interface (130) is used to obtain data or control instructions external to the chip, the routing unit (120) sends the data or control instructions to the core core according to the data or control instruction address, or the routing unit (120) sends to the serdes through the serdes interface Storage unit connected to the interface. If the destination address of the external control instruction points to another chip, the routing unit sends the control instruction to the UART control unit (130), which is sent to the other chip by the UART control unit (130).
- the UART interface (130) sends the operation result to the outside according to the external control instruction or the internal control instruction.
- the operation result can be obtained from the core core of the operation chip, or can be obtained through the serdes interface to the storage unit connected to the serdes interface.
- the external mentioned here may refer to an external host, an external network, an external platform, or the like.
- the external host can initialize and configure the storage unit parameters through the UART control unit, and uniformly address multiple storage particles.
- the core core can send a control instruction to obtain or write data to the routing unit.
- the control instruction carries the data address, and the routing unit reads or writes data to the storage unit through the serdes interface according to the address.
- the core core may also send data or control instructions to other core cores through the routing unit according to the address, and obtain data or control instructions from other core cores through the routing unit.
- the core calculates based on the acquired data and stores the calculation result in the storage unit.
- Each storage unit is provided with a dedicated storage area and a shared storage area; the dedicated storage area is used to store a temporary calculation result of one arithmetic chip, and the temporary calculation result is an intermediate calculation result that the one arithmetic chip continues to use, and Intermediate calculation results that will not be used by other arithmetic chips; the shared storage area is used to store the data arithmetic results of the arithmetic chips. The data arithmetic results are used by other arithmetic chips, or need to be transmitted to the outside for feedback transmission. If the control command generated by the core core is used to control the operation of other chips, the routing unit sends the control command to the UART control unit (130), and the UART control unit (130) sends it to the other chips. If the control command generated by the core core is used to control the storage unit, the routing unit sends the control command to the storage unit through the serdes interface.
- FIG. 3a illustrates a second embodiment of a schematic structural diagram of an arithmetic chip with 4 cores.
- the operation chip of 4 cores includes 4 cores (110, 111, 112, 113), a routing unit (120), a UART control unit (130) and 4 serdes interfaces (150, 151, 152, 153). Each serdes interface is connected to one core core, 4 core cores are connected to the routing unit, and the UART control unit (130) is connected to the core core (110).
- FIG. 3b illustrates a schematic signal flow diagram of an arithmetic chip with 4 cores in the second embodiment.
- the UART control unit (130) is used to acquire external data or control instructions of the chip, and transmit the external data or control instructions to the core (110) connected to the UART control unit.
- the core (110) transmits external data or control instructions to the routing unit (120), and the routing unit sends the data or control instructions to the core (111, 112, 113) corresponding to the data address according to the data or control instruction addresses. If the destination address of the data or control instruction is the core core of the arithmetic chip, the routing unit sends the data or control instruction to the core core (110, 111, 112, 113).
- the core (111, 112, 113) is sent to the corresponding storage unit through the serdes interface (151, 152, 153).
- the core (110) can also directly send data or control commands to the corresponding storage unit through the serdes interface (150) connected to it.
- the routing unit stores the serdes interface corresponding to all storage unit addresses. If the destination address of the data or control command is another arithmetic chip, the data is sent by the core (111, 112, 113) to the corresponding storage unit through the serdes interface (151, 152, 153); the control command is sent to the UART control unit to Other computing chips.
- the core core When the core core feedbacks the operation result or intermediate data to the outside according to the external control instruction or the internal control instruction, the core core obtains the operation result or intermediate data from the storage unit from the serdes interface, and sends the operation result or intermediate data to the routing unit, and the routing unit will The operation result or intermediate data is sent to the core (110) connected to the UART control unit, and finally the operation result or intermediate data is sent to the outside through the UART control unit. If the serdes interface corresponding to the core core connected by the UART control unit obtains the operation result or intermediate data, then the operation result or intermediate data is directly sent to the outside through the UART control unit.
- the external mentioned here may refer to an external host, an external network, an external platform, or the like. The external host can initialize and configure the storage unit parameters through the UART control unit, and address multiple storage units uniformly.
- the core core can send control instructions to the routing unit.
- the routing unit sends control instructions to other core cores, other chips, or storage units according to the address of the control instructions. After receiving the control instructions, the other cores, other chips, or storage units perform corresponding operations.
- the core core sends control commands or data to other core cores, it is directly forwarded through the routing unit.
- the core core sends control commands to other chips via the UART control unit.
- the routing unit queries the serdes interface corresponding to the address according to the address, and sends the control command to the core core corresponding to the serdes interface, and then sends the core core to the corresponding serdes interface.
- the serdes interface sends the storage unit to the storage unit. Send control commands.
- the routing unit queries the serdes interface corresponding to the address according to the address, and sends control instructions to the core core corresponding to the serdes interface, and then the core core sends the corresponding serdes interface to the corresponding serdes interface.
- the storage unit sends data. Other chips are acquiring data through the storage unit.
- the kernel core When the kernel core obtains data from the memory unit, it reads the data address carried in the control instruction, and the routing unit queries the serdes interface corresponding to the address according to the address, and sends the control instruction to the kernel core corresponding to the serdes interface, and then the kernel core sends the corresponding The serdes interface, the serdes interface sends a read control instruction to the storage unit, and the instruction carries the destination address and the source address. After the serdes interface obtains data from the storage unit, the data is sent to the core core corresponding to the serdes interface. The core core sends the data packet including the source address and the destination address to the routing unit, and the routing unit sends the data packet to the corresponding according to the destination address Core.
- the kernel core finds that the destination address is its own address, the kernel core obtains data for processing. And the core core can also send data or commands to other core cores through the routing unit, and obtain data or commands from other core cores through the routing unit. The core calculates based on the acquired data and stores the calculation result in the storage unit.
- Each storage unit is provided with a dedicated storage area and a shared storage area; the dedicated storage area is used to store a temporary calculation result of one arithmetic chip, and the temporary calculation result is an intermediate calculation result that the one arithmetic chip continues to use, and Intermediate calculation results not used by other arithmetic chips; the shared storage area is used to store arithmetic data results of the arithmetic chips, which are used by other arithmetic chips, or need to be transmitted to the outside for feedback transmission.
- FIG. 4a illustrates a first embodiment of a schematic structural diagram of a memory cell corresponding to an arithmetic chip having 4 cores.
- the storage unit (20) includes C memories.
- C is a positive integer greater than or equal to 2, for example, 6, 10, 12, etc .
- memory (240, 241, 242, 243) includes storage controllers (220, 221, 222, 223) and storage particles (210, 211, 212, 213); storage controllers are used to write or read data to storage particles according to instructions, and storage particles are used to store data .
- the storage unit (20) further includes a routing unit (230) and four serdes interfaces (250, 251, 252, 253). The four serdes interfaces are connected to the routing unit through the bus, and the routing unit is connected to each memory.
- FIG. 4b illustrates a first embodiment of a schematic diagram of a signal flow of a memory cell corresponding to an arithmetic chip with 4 cores.
- the storage unit (20) accepts the control instruction through the serdes interface (250, 251, 252, 253) and sends the control instruction to the routing unit (230).
- the routing unit sends the control instruction to the corresponding memory according to the address in the control instruction ( 240, 241, 242, 243), the storage controller (220, 221, 222, 223) performs related operations according to the control instructions. For example, according to the initial configuration memory parameters, multiple storage particles are addressed uniformly; or according to the reset instruction, the storage particles are reset and reset; write instructions or read instructions and other operations.
- the serdes interface (250, 251, 252, 253) accept the data acquisition instruction sent by the arithmetic chip, the instruction carries the address of the data to be acquired, the routing unit sends the data acquisition instruction to the memory according to the address, and the storage controller stores The data is obtained from the particles, and the data is sent to the computing chip that needs the data through the serdes interface according to the source address.
- the serdes interface receive the write data command and data sent by the arithmetic chip, the command carries the address of the data to be written, the routing unit sends the write data command and data to the memory according to the address, storage control The device writes data to the storage particles according to the write data instruction.
- Each storage unit is provided with a dedicated storage area and a shared storage area; the dedicated storage area is used to store a temporary calculation result of one arithmetic chip, and the temporary calculation result is an intermediate calculation result that the one arithmetic chip continues to use, and Intermediate calculation results not used by other arithmetic chips; the shared storage area is used to store arithmetic data results of the arithmetic chips, which are used by other arithmetic chips, or need to be transmitted to the outside for feedback transmission.
- FIG. 5 illustrates a schematic diagram of a connection structure of a big data operation acceleration system with 4 operation chips and 4 storage units.
- the system has 4 arithmetic chips (10, 11, 12, 13) and 4 memory cells (20, 21, 22, 23).
- the structure of the arithmetic chip may be the chip structure disclosed in the first embodiment and the second embodiment.
- the arithmetic chip may also be an equivalent modified chip structure made by those skilled in the art for the first and second embodiments.
- the chip structure is also within the scope of protection in this embodiment.
- the structure of the storage unit may be the structure of the storage unit disclosed in the third embodiment.
- the storage unit may also be an equivalently improved storage unit structure made by those skilled in the art for the third embodiment. The scope of protection of the embodiments.
- the UART control unit (130) of the operation chip (10) is connected to an external host, and the UART control unit (130) of each chip (10, 11, 12, 13) is connected through a bus.
- Each serdes interface (150, 151, 152, 153) of the chip (10, 11, 12, 13) is connected to the serdes interface (250, 251, 252, 253) of a storage unit (20, 21, 22, 23),
- each operation chip is connected to all storage units through a bus, the operation chip performs data exchange through the storage unit, and data is not directly exchanged between the operation chips.
- the internal and external signal flows of the arithmetic chip and the storage unit have been described in detail in the first, second, and third embodiments, and will not be described again here.
- the system is applied in the field of artificial intelligence.
- the UART control unit (130) of the arithmetic chip (10) stores the picture data or video data sent by the external host to the storage unit (20, 151, 152, 153) through the serdes interface (150, 151, 152, 153).
- the arithmetic chip (10, 11, 12, 13) generates a mathematical model of the neural network, which can also be stored in the storage unit by the external host through the serdes interface (150, 151, 152, 153) (20, 21, 22, 23), read by each arithmetic chip (10, 11, 12, 13).
- the arithmetic chip (10) Run the first layer of mathematical model of the neural network on the arithmetic chip (10), the arithmetic chip (10) reads data from the storage unit (20, 21, 22, 23) through the serdes interface to perform the operation, and stores the operation result through the serdes interface To at least one of the storage units (20, 21, 22, 23).
- the arithmetic chip (10) sends a control instruction to the arithmetic chip (20) through the UART control unit (130), and starts the arithmetic chip (20) to perform arithmetic.
- the arithmetic chip (20) Run the second layer of mathematical model of the neural network on the arithmetic chip (20), the arithmetic chip (20) reads data from the storage unit (20, 21, 22, 23) through the serdes interface for operation, and stores the operation result through the serdes interface To at least one of the storage units (20, 21, 22, 23). Each chip executes a layer in the neural network, and obtains data from the storage unit (20, 21, 22, 23) through the serdes interface for operation, and only the final layer of the neural network calculates the operation result.
- the operation chip (10) obtains the operation result from the storage unit (20, 21, 22, 23) through the serdes interface, and feeds it back to the external host through the UART control unit (130).
- the system is applied to the field of encrypted digital currency, and the UART control unit (130) of the arithmetic chip (10) stores the block information sent by the external host to at least one storage unit in the storage units (20, 21, 22, 23).
- the external host sends control instructions to the four arithmetic chips (10, 11, 12, 13) through the arithmetic chip (10, 11, 12, 13) UART control unit (130), and the four arithmetic chips (10, 11, 12. 13) Start operation.
- the external host can also send control instructions to one arithmetic chip (10) UART control unit (130) for data calculation, and the arithmetic chip (10) sends control instructions to the other three arithmetic chips (11, 12, 13) in sequence for data calculation , 4 arithmetic chips (10, 11, 12, 13) start the arithmetic operation.
- one arithmetic chip (10) UART control unit (130) for data calculation
- the arithmetic chip (10) sends control instructions to the other three arithmetic chips (11, 12, 13) in sequence for data calculation , 4 arithmetic chips (10, 11, 12, 13) start the arithmetic operation.
- the external host may also send a control instruction to a computing chip (10) UART control unit (130) to perform data operations, the first computing chip (10) sends a control instruction to the second computing chip (11) to perform data operations, and the second computing chip (11) Send control instructions to the third arithmetic chip (12) for data calculation, the third arithmetic chip (12) sends control instructions to the fourth arithmetic chip (13) for data calculation, 4 arithmetic chips (10, 11, 12 , 13) Start operation.
- a computing chip (10) UART control unit (130) to perform data operations
- the first computing chip (10) sends a control instruction to the second computing chip (11) to perform data operations
- the second computing chip (11) Send control instructions to the third arithmetic chip (12) for data calculation
- the third arithmetic chip (12) sends control instructions to the fourth arithmetic chip (13) for data calculation
- 4 arithmetic chips (10, 11, 12, 13) read the block information data from the storage unit through the serdes interface, 4 arithmetic chips (10, 11, 12, 13) simultaneously perform the proof of work calculation, the arithmetic chip ( 10) Obtain the operation result from the storage unit (20, 21, 22, 23) and feed it back to the external host through the UART control unit (130).
- the number of the arithmetic chip and the storage unit are equal, and the number of the second data interface of the storage unit and the number of the second data interface of the arithmetic chip are both the number of the storage unit .
- the number of the arithmetic chip and the storage unit may also be unequal.
- the number of second data interfaces of the storage unit is the number of the arithmetic chip
- the second The number of data interfaces is the number of storage units. For example, there are four arithmetic chips and five storage units. At this time, five second data interfaces are provided on the arithmetic chip, and four second data interfaces are provided on the storage unit.
- the bus may use a centralized arbitration bus structure or a ring topology bus structure.
- the bus technology is a common technology in the field, so it will not be described in detail here.
- the data mentioned here is various data such as command data, numeric data, character data, and so on.
- the data format specifically includes valid bit valid, destination address dst id, source address src id and data data.
- the kernel can determine whether the data packet is a command or a value by valid bit. Here, it can be assumed that 0 represents a value and 1 represents a command.
- the kernel will determine the destination address, source address and data type according to the data structure. From the perspective of instruction operation timing, the traditional six-stage pipeline structure is adopted in this embodiment, which are instruction fetch, decoding, execution, memory access, alignment and write-back stage respectively.
- the instruction set of the present invention can be divided into register-register type instructions, register-immediate instruction, jump instruction, memory access instruction, control instruction and inter-core communication instruction according to functions.
- the embodiments can be implemented as a machine, process, or article of manufacture by using standard programming and / or engineering techniques to produce programming software, firmware, hardware, or any combination thereof.
- Any generated program (s) can be embodied on one or more computer-usable media, such as resident storage devices, smart cards or other removable storage devices, or transmission devices,
- computer program products and manufactured products are produced according to the embodiments.
- article of manufacture and “computer program product” as used herein are intended to cover computer programs that are permanently or temporarily present on any non-transitory medium that can be used by computers.
- memory / storage devices include but are not limited to magnetic disks, optical disks, removable storage devices (such as smart cards, subscriber identity modules (SIM), wireless identification modules (WIM)), semiconductor memories (such as random access memory (RAM), read only memory (ROM), programmable read only memory (PROM)), etc.
- Transmission media include, but are not limited to, transmission via wireless communication networks, the Internet, intranets, telephone / modem-based network communications, hard-wired / cable communications networks, satellite communications, and other fixed or mobile network systems / communication links.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Advance Control (AREA)
Abstract
La présente invention concerne un système d'accélération de fonctionnement de mégadonnées et une puce. Au moyen du réglage d'une pluralité de cœurs dans une puce, chaque cœur effectue des fonctions de commande de fonctionnement et de stockage, et au moins une unité de stockage est connectée à chaque cœur au niveau d'une partie externe de la puce. En utilisant la solution technique de la présente invention, chaque cœur lit une unité de stockage connectée à celui-ci et des unités de stockage connectées à d'autres cœurs, ce qui permet d'obtenir l'effet technique selon lequel chaque cœur peut avoir une mémoire de grande capacité, ce qui réduit le nombre de fois où les données sont déplacées depuis un espace de stockage externe ou sont déplacées hors de la mémoire et accélère la vitesse de traitement des données. Dans le même temps, étant donné que la pluralité de cœurs peut fonctionner de façon indépendante ou coopérative, lesdites manières accélèrent également la vitesse de traitement des données.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/112688 WO2020087276A1 (fr) | 2018-10-30 | 2018-10-30 | Système d'accélération de fonctionnement de mégadonnées et puce |
CN201880002364.XA CN109564562B (zh) | 2018-10-30 | 2018-10-30 | 大数据运算加速系统和芯片 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/112688 WO2020087276A1 (fr) | 2018-10-30 | 2018-10-30 | Système d'accélération de fonctionnement de mégadonnées et puce |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020087276A1 true WO2020087276A1 (fr) | 2020-05-07 |
Family
ID=65872661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/112688 WO2020087276A1 (fr) | 2018-10-30 | 2018-10-30 | Système d'accélération de fonctionnement de mégadonnées et puce |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109564562B (fr) |
WO (1) | WO2020087276A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112214448A (zh) * | 2020-10-10 | 2021-01-12 | 中科声龙科技发展(北京)有限公司 | 异质集成工作量证明运算芯片的数据动态重构电路及方法 |
CN114691591A (zh) * | 2020-12-31 | 2022-07-01 | 中科寒武纪科技股份有限公司 | 片间通信的电路、方法和系统 |
CN118330446A (zh) * | 2024-06-13 | 2024-07-12 | 电子科技大学 | 一种跨芯粒asic芯片老化预测方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003552B (zh) * | 2021-12-30 | 2022-03-29 | 中科声龙科技发展(北京)有限公司 | 工作量证明运算方法、工作量证明芯片及上位机 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102314377A (zh) * | 2010-06-30 | 2012-01-11 | 国际商业机器公司 | 加速器及其实现支持虚拟机迁移的方法 |
CN103634945A (zh) * | 2013-11-21 | 2014-03-12 | 安徽海聚信息科技有限责任公司 | 一种基于soc的高性能云终端 |
CN105183683A (zh) * | 2015-08-31 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | 一种多fpga芯片加速卡 |
CN108536642A (zh) * | 2018-06-13 | 2018-09-14 | 北京比特大陆科技有限公司 | 大数据运算加速系统和芯片 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7593457B2 (en) * | 2004-01-30 | 2009-09-22 | Broadcom Corporation | Transceiver system and method having a transmit clock signal phase that is phase-locked with a receive clock signal phase |
CN105550140B (zh) * | 2014-11-03 | 2018-11-09 | 联想(北京)有限公司 | 一种电子设备及数据处理方法 |
CN107451075B (zh) * | 2017-09-22 | 2023-06-20 | 北京算能科技有限公司 | 数据处理芯片和系统、数据存储转发和读取处理方法 |
CN209784995U (zh) * | 2018-10-30 | 2019-12-13 | 北京比特大陆科技有限公司 | 大数据运算加速系统和芯片 |
-
2018
- 2018-10-30 CN CN201880002364.XA patent/CN109564562B/zh active Active
- 2018-10-30 WO PCT/CN2018/112688 patent/WO2020087276A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102314377A (zh) * | 2010-06-30 | 2012-01-11 | 国际商业机器公司 | 加速器及其实现支持虚拟机迁移的方法 |
CN103634945A (zh) * | 2013-11-21 | 2014-03-12 | 安徽海聚信息科技有限责任公司 | 一种基于soc的高性能云终端 |
CN105183683A (zh) * | 2015-08-31 | 2015-12-23 | 浪潮(北京)电子信息产业有限公司 | 一种多fpga芯片加速卡 |
CN108536642A (zh) * | 2018-06-13 | 2018-09-14 | 北京比特大陆科技有限公司 | 大数据运算加速系统和芯片 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112214448A (zh) * | 2020-10-10 | 2021-01-12 | 中科声龙科技发展(北京)有限公司 | 异质集成工作量证明运算芯片的数据动态重构电路及方法 |
CN112214448B (zh) * | 2020-10-10 | 2024-04-09 | 声龙(新加坡)私人有限公司 | 异质集成工作量证明运算芯片的数据动态重构电路及方法 |
CN114691591A (zh) * | 2020-12-31 | 2022-07-01 | 中科寒武纪科技股份有限公司 | 片间通信的电路、方法和系统 |
CN118330446A (zh) * | 2024-06-13 | 2024-07-12 | 电子科技大学 | 一种跨芯粒asic芯片老化预测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN109564562B (zh) | 2022-05-13 |
CN109564562A (zh) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108536642B (zh) | 大数据运算加速系统和芯片 | |
WO2020087276A1 (fr) | Système d'accélération de fonctionnement de mégadonnées et puce | |
JP6974270B2 (ja) | 知能型高帯域幅メモリシステム及びそのための論理ダイ | |
JP4768386B2 (ja) | 外部デバイスとデータ通信可能なインターフェイスデバイスを有するシステム及び装置 | |
US7155554B2 (en) | Methods and apparatuses for generating a single request for block transactions over a communication fabric | |
CN106951379A (zh) | 一种基于axi协议的高性能ddr控制器及数据传输方法 | |
KR20040062717A (ko) | 고주파수 동작에 적합한 메모리 모듈장치 | |
US7277975B2 (en) | Methods and apparatuses for decoupling a request from one or more solicited responses | |
EP2985699B1 (fr) | Procédé d'accès à la mémoire et système de mémoire | |
CN112817907B (zh) | 互联裸芯扩展微系统及其扩展方法 | |
CN209149287U (zh) | 大数据运算加速系统 | |
CN106844263B (zh) | 一种基于可配置的多处理器计算机系统及实现方法 | |
CN108256643A (zh) | 一种基于hmc的神经网络运算装置和方法 | |
CN209784995U (zh) | 大数据运算加速系统和芯片 | |
CN209560543U (zh) | 大数据运算芯片 | |
WO2020087275A1 (fr) | Procédé pour système d'accélération d'opérations de mégadonnées exécutant des opérations | |
CN115129657B (zh) | 一种可编程逻辑资源扩展装置和服务器 | |
WO2020087278A1 (fr) | Système et procédé d'accélération de calculs avec des mégadonnées | |
JP2003050788A (ja) | 高レベル・データ・リンク・コントローラから多数個のディジタル信号プロセッサ・コアに信号を分配するための装置と方法 | |
CN208298179U (zh) | 大数据运算加速系统和芯片 | |
WO2020087239A1 (fr) | Système d'accélération de calcul de mégadonnées | |
CN209543343U (zh) | 大数据运算加速系统 | |
US11789884B2 (en) | Bus system and method for operating a bus system | |
WO2021213076A1 (fr) | Procédé et dispositif pour construire une structure de topologie de communication sur la base de multiples nœuds de traitement | |
WO2020087243A1 (fr) | Puce informatique de mégadonnées |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18938534 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.10.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18938534 Country of ref document: EP Kind code of ref document: A1 |