WO2022121278A1 - 芯片、数据搬移方法和电子设备 - Google Patents
芯片、数据搬移方法和电子设备 Download PDFInfo
- Publication number
- WO2022121278A1 WO2022121278A1 PCT/CN2021/101547 CN2021101547W WO2022121278A1 WO 2022121278 A1 WO2022121278 A1 WO 2022121278A1 CN 2021101547 W CN2021101547 W CN 2021101547W WO 2022121278 A1 WO2022121278 A1 WO 2022121278A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- memory
- dma controller
- cache
- mentioned
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000005192 partition Methods 0.000 claims abstract description 193
- 238000012545 processing Methods 0.000 claims abstract description 114
- 238000012546 transfer Methods 0.000 claims description 86
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000013508 migration Methods 0.000 description 3
- 230000005012 migration Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000004904 shortening Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7825—Globally asynchronous, locally synchronous, e.g. network on chip
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to computer technology, in particular to a chip, a data transfer method and an electronic device.
- the processing core when data movement between the first storage space and the second storage space needs to be performed in the memory partition inside the chip, the processing core needs to first read the data from the first storage space and store it in the processing core. inside the kernel. Then, the processing core reads out and writes the stored data into the second storage space.
- the present application discloses a chip, and the chip includes:
- At least one processing core and at least one memory partition wherein, for each memory partition, the above-mentioned memory partition includes a cache system, a memory system, and a direct memory access DMA controller; the above-mentioned DMA controller, with the above-mentioned cache system and above-mentioned memory The systems are respectively connected to perform data transfer between different storage spaces within the above-mentioned memory partitions.
- the first processing core in the at least one processing core is configured to send a data moving instruction to the at least one first DMA controller, wherein the at least one first DMA controller is included in the at least one first DMA controller.
- the at least one first DMA controller is configured to perform data movement between different storage spaces within the at least one first memory partition based on the data movement instruction.
- the cache system includes a multi-level cache; the DMA controller is used to perform data transfer between the storage space of the cache system and the storage space in the memory system, including the DMA controller. The data transfer between the storage space of the last level cache and the storage space in the memory system is performed.
- the last level cache supports three working modes, wherein, in the first working mode, the entire storage space of the last level cache is configured as a cache memory, and in the second In the working mode, the entire storage space of the last-level cache is configured as a note memory SPM, and in the third working mode, a part of the storage space of the last-level cache is configured as a cache memory, and another part of the storage space is configured as a cache memory. Configured as SPM.
- the memory partition further includes a mode configurator, and the mode configurator is configured to configure the working mode of the last level cache based on user configuration information.
- the above-mentioned at least one processing core and the above-mentioned DMA controller mutually access each other through the main on-chip network; or, the above-mentioned DMA controller, the above-mentioned cache system and the above-mentioned memory system mutually access each other through the sub-chip on-chip network.
- the use of the DMA controller to perform data transfer between different storage spaces within the memory partition includes performing at least one of the following: data movement; data movement between different storage spaces in the above-mentioned memory system; data movement between the storage space of the above-mentioned cache system and the storage space in the above-mentioned memory system.
- all or part of the different storage spaces in the above-mentioned memory partitions use a unified memory architecture UMA.
- the fact that the first processing core is configured to send a data moving instruction to the at least one first DMA controller includes that the first processing core is configured to broadcast the data moving instruction to the at least one second DMA controller , wherein the second DMA controller is included in the first memory partition in which the different storage spaces all use UMA.
- the above-mentioned data move instruction includes: data move type, data length, source storage address, and destination storage address.
- the data moving instruction includes a first field, a second field, a third field and a fourth field; wherein, the first field is used to indicate the data moving type and the data length; the first field The second field is used to indicate the low address of the source storage address; the third field is used to indicate the high address of the source storage address and the high address of the destination storage address; the fourth field is used to indicate the low address of the destination storage address.
- the use of the DMA controller to perform data transfer between different storage spaces in the memory partition includes: reading data from the first storage space in the memory partition, and reading data from the first storage space in the memory partition. The obtained data is written into the second storage space in the above-mentioned memory partition.
- the memory system described above is a high bandwidth memory HBM.
- the present application also proposes a data moving method, which is applied to a chip, wherein the chip includes at least one processing core and at least one memory partition, and each memory partition includes a cache system, a memory system, and a direct memory access DMA controller; the above-mentioned The method includes: for each memory partition, performing data transfer between different storage spaces within the memory partition through the DMA controller.
- the performing data transfer between different storage spaces within the memory partition by the DMA controller includes: transferring data to at least one first DMA via a first processing core in the at least one processing core
- the controller sends a data moving instruction, wherein the at least one first DMA controller is included in at least one first memory partition; the at least one first DMA controller performs the at least one first memory partition based on the data moving instruction Data movement between different internal storage spaces.
- the cache system includes a multi-level cache; the performing the data movement between different storage spaces within the memory partition by the DMA controller includes: performing the last step by the DMA controller Data movement between the storage space of the level cache and the storage space in the above-mentioned memory system.
- the last level cache supports three working modes, wherein, in the first working mode, the entire storage space of the last level cache is configured as a cache memory, and in the second In the working mode, the entire storage space of the last level cache is configured as SPM, and in the third working mode, a part of the storage space of the last level cache is configured as a cache memory, and another part of the storage space is configured as a cache memory. SPM.
- the memory partition further includes a mode configurator; the method further includes: based on the user configuration information, configuring the working mode of the last level cache through the mode configurator.
- the at least one processing core and the DMA controller access each other through the main on-chip network; and/or, the DMA controller, the cache system and the memory system communicate with each other through the sub-chip network access.
- data transfer between different storage spaces within the memory partition includes at least one of the following: data transfer between different storage spaces in the cache system; data transfer between different storage spaces in the memory system Data movement between storage spaces; data movement between the storage space of the above-mentioned cache system and the storage space of the above-mentioned memory system.
- all or part of the different storage spaces in the above-mentioned memory partitions use a unified memory architecture UMA.
- the sending the data transfer instruction to the at least one first DMA controller through the first processing core includes: broadcasting the data transfer to the at least one second DMA controller through the first processing core The instruction, wherein the second DMA controller is included in the first memory partition where the different storage spaces all adopt the unified memory architecture UMA.
- the above-mentioned data move instruction includes: data move type, data length, source storage address, and destination storage address.
- the data moving instruction includes a first field, a second field, a third field and a fourth field; wherein, the first field is used to indicate the data moving type and the data length; the first field The second field is used to indicate the low address of the source storage address; the third field is used to indicate the high address of the source storage address and the high address of the destination storage address; the fourth field is used to indicate the low address of the destination storage address.
- performing the data movement between different storage spaces in the memory partition by the DMA controller includes: reading data from the first storage space in the memory partition through the DMA controller , and write the read data into the second storage space in the above-mentioned memory partition.
- the memory system described above is a high bandwidth memory HBM.
- the present application further provides an electronic device, including: the chip shown in any of the above embodiments.
- the above-mentioned DMA controller is connected to the above-mentioned cache system and the above-mentioned memory system, respectively, and is used to perform data transfer between different storage spaces in the above-mentioned memory partition, it can be controlled that the above-mentioned data can be stored in the memory system.
- the above-mentioned memory partition is internally moved without preempting the memory access bandwidth of the above-mentioned chip, so that in the above-mentioned data moving process, the memory access bandwidth inside the chip is released, the data moving efficiency is improved, and the chip performance is prompted.
- the above-mentioned processing core sends a data transfer instruction to the above-mentioned DMA controller
- the above-mentioned DMA can control the data transfer between different storage spaces in the above-mentioned memory partition in response to the above-mentioned data transfer instruction.
- the data is moved inside the above-mentioned memory partition, thereby releasing the memory access bandwidth inside the chip, prompting the data moving efficiency and improving the chip performance.
- the use of the chip can assist in improving the processing efficiency of computing tasks, thereby improving the performance of electronic equipment.
- Figure 1 is an internal structure diagram of an AI chip
- Fig. 2 is the internal structure of a kind of chip shown in this application;
- Fig. 3 is a kind of chip structure diagram shown in the application.
- Fig. 4 is a kind of chip structure diagram shown in the application.
- FIG. 5 is a schematic diagram of a data movement instruction shown in the application.
- FIG. 6 is a schematic diagram of a data movement instruction shown in this application.
- FIG. 7 is a method flowchart of a data moving method shown in this application.
- FIG. 1 is an internal structure diagram of an AI chip.
- the processing core of the AI chip is connected to a memory partition; wherein, the memory partition at least includes a memory system and a cache system.
- the processing core when part of the data in the memory system needs to be moved to the cache system, the processing core first reads the part of the data from the memory system through a read command and stores it in the processing core , and then, the processing core writes the partial data into the above-mentioned cache system through a write command.
- the present application proposes a chip.
- the chip adds a DMA (Direct Memory Access, direct memory access) controller connected to the cache system and the memory system in the memory partition, so that the above-mentioned DMA can execute the data between different storage spaces inside the above-mentioned memory partition. Move instructions, thereby freeing the memory access bandwidth inside the chip, improving data moving efficiency, and improving chip performance.
- DMA Direct Memory Access, direct memory access
- FIG. 2 is an internal structure of a chip shown in this application. As shown in Figure 2, the above chip includes:
- At least one processing core 21 and at least one memory partition 22 are included in At least one processing core 21 and at least one memory partition 22 .
- each memory partition 22 includes a cache system 221 , a memory system 222 , and a DMA controller 223 .
- the DMA controller 223 is connected to the cache system 221 and the memory system 222 respectively, and is used for data transfer between different storage spaces within the memory partition 22 .
- the last level cache included in the cache system 221 can be connected to the DMA controller 223 .
- the DMA controller 223 may be connected to the cache of the corresponding level involved. There is no particular limitation here.
- the DMA controller may read data from the first storage space in the memory partition, and write the read data into the second storage space in the memory partition.
- the above-mentioned first storage space is a memory system
- the above-mentioned second storage space is an L2 cache.
- the DMA controller may control data transfer between the memory system and the L2 cache in response to a data transfer instruction sent by the processing core.
- a memory partition may include one or more DMA controllers.
- a memory partition includes a DMA controller responsible for moving data between all storage spaces within the memory partition.
- a memory partition includes multiple DMA controllers, and each DMA controller in the multiple DMA controllers may be responsible for data movement between one or more pairs of storage spaces in the memory partition.
- the present application does not limit the specific locations of these DMA controllers.
- the DMA controllers may be distributed in each memory partition, or may be centralized in one of the memory partitions.
- the above chip may specifically be any chip that requires high memory access bandwidth.
- the above chips may be equipped with multi-channel DRAM (Dynamic Random Access Memory, dynamic random access memory) chips.
- DRAM Dynamic Random Access Memory, dynamic random access memory
- the above chip may be a CPU, a DSP, an MCU, or the like.
- the aforementioned chip may execute artificial intelligence algorithms.
- the above chip may be an AI neural network chip (eg, FPGA, TPU, etc.) or a GPU graphics processing chip.
- the above-mentioned processing core usually a computing core in a chip, is used for executing code operations, and may include one or more processing units.
- the above-mentioned processing core can usually perform data movement in the above-mentioned memory partition according to the program code formulated by the developer.
- the data transfer between the storage spaces inside the above-mentioned memory partitions may generally include, the movement of the internal data of the cache system in the above-mentioned memory partitions, the movement of the internal data of the memory system in the above-mentioned memory partitions, and the above-mentioned memory partitions Data movement between the last level of cache and the memory system.
- the above memory partitions are usually used to store data.
- the chip usually adopts memory partitions with storage levels.
- the above-mentioned memory partition may include a cache system with one or more levels of caches and a memory system.
- the cache system 221 described above may include at least L1, L2 and L3 caches.
- the processing core 21 when it needs to obtain data, it usually first accesses the L1 cache. If the data required by the processing core 21 is stored in the L1 cache, the processing core 21 completes the data acquisition. If the data required by the processing core 21 is not stored in the L1 cache, the processing core 21 continues to access the L2 cache to obtain the required data. And so on. If the last level cache, that is, the L3 cache, does not involve the data required by the processing core 21, the processing core 21 continues to acquire data from the memory system 222.
- the last level cache can be used as the large-capacity cache
- the DMA controller is used to store the storage space of the last level cache and the storage space in the memory system. Data movement between spaces.
- At least a part of the storage space of the cache system is configured as a scratchpad memory (Scratchpad Memory, SPM)
- SPM scratchpad Memory
- the data transfer efficiency of this part of the storage space will be affected.
- at least a part of the storage space of the last level cache is configured as SPM.
- the DMA controller when performing data transfer, the DMA controller is used to perform data transfer between the storage space configured as SPM in the last level cache and the memory system. Since the data transfer between the storage space configured as SPM in the last level cache and the memory system is performed by the DMA controller, the data to be moved can be prevented from passing through the processing core, thereby releasing the bandwidth, shortening the data transfer path, and improving the Data movement efficiency.
- the last level cache of the above-mentioned cache system supports three working modes, wherein, in the first working mode, all the storage space of the above-mentioned last level cache is Configured as a cache memory, in the second working mode, the entire storage space of the last level cache is configured as SPM, and in the third working mode, a part of the storage space of the last level cache is configured as a cache memory, and another part of the storage space is configured as SPM.
- the above-mentioned memory partition may further include a mode configurator.
- the above-mentioned mode configurator is configured to configure the working mode of the last-level cache in the above-mentioned cache system based on the user configuration information.
- the developer can configure the working mode of the last-level cache through the mode configurator based on the user configuration information.
- the entire storage space of the above-mentioned last-level cache can be configured as SPM.
- the entire storage space of the last level cache can be configured as a cache memory.
- part of the storage space of the last-level cache can be configured as cache memory, and part of the storage space can be configured as SPM to store AI. Operation parameters.
- the above-mentioned memory system may be a global memory system.
- it can be DRAM (Dynamic Random Access Memory, dynamic random access memory), SDRAM (synchronous dynamic random-access memory, synchronous dynamic random access memory) and so on.
- the above-mentioned global memory system may be a high bandwidth memory (High Bandwidth Memory, HBM).
- HBM High Bandwidth Memory
- FIG. 3 is a structural diagram of a chip shown in this application. As shown in FIG. 3, the above-mentioned DMA controller, at least one processing core, and at least one memory partition are connected by a bus.
- a processing core will send a data move instruction to the DMA controller, so that the DMA controller completes the data move.
- the above-mentioned DMA controller is built into the above-mentioned memory partition, so that the DMA controller can control the above-mentioned data, so that the movement can be completed inside the above-mentioned memory partition without preempting The memory access bandwidth of the above chip.
- the above DMA controller since the above DMA controller is connected to the above cache system and the above memory system respectively, and is used to perform data transfer between different storage spaces within the above memory partition, the above data can be controlled, so that the above data can be stored in the above memory partition.
- the memory partition is internally moved without preempting the memory access bandwidth of the chip, and further, during the data transfer process, the memory access bandwidth inside the chip is released, the data transfer efficiency is improved, and the chip performance is prompted.
- the first processing core in the at least one processing core is connected to at least one first DMA controller; the at least one first DMA controller is included in at least one first memory partition, and the first memory partition can be All or part of the above memory partitions.
- the first processing core is configured to send a data transfer instruction to the at least one first DMA controller.
- the at least one DMA controller is configured to perform data movement between different storage spaces within the at least one first memory partition based on the data movement instruction.
- the above-mentioned DMA controller is connected with the above-mentioned first processing core.
- the above-mentioned connection mode may be a bus-based connection.
- the above-mentioned DMA controller and the above-mentioned processing core can access each other through a main on-chip network (NOC, network-on-chip).
- NOC main on-chip network
- the above-mentioned main on-chip network may be the main network in the above-mentioned chip.
- the chip includes multiple processing cores and multiple memory partitions, the multiple processing cores and the DMA controllers in the multiple memory partitions can access each other through the main on-chip network.
- the above DMA controller is respectively connected with the above cache system and the above memory system.
- the above-mentioned connection mode may be a bus-based connection.
- the DMA controller, the cache system, and the memory system access each other through a sub-chip network.
- the above-mentioned sub-on-chip network may be a sub-network in the above-mentioned memory partition.
- the above-mentioned chip includes a plurality of memory partitions
- the above-mentioned multiple memory partitions can all use the above-mentioned sub-on-chip network, so that the DMA controller, the cache system and the memory system in each memory partition can pass the above-mentioned sub-on-chip network (NOC, network- on-chip) access each other.
- NOC network- on-chip
- the chip may generally include multiple memory partitions. These memory partitions can be connected in parallel with the processing cores.
- FIG. 4 is a structural diagram of a chip shown in this application.
- the above-mentioned chip includes multiple processing cores and multiple memory partitions. It should be noted that, only the last level of cache in the cache system is shown in the memory partition, and the caches of other levels are not shown in FIG. 4 .
- Multiple processing cores and multiple memory partitions in the above-mentioned chip can access each other through the above-mentioned main on-chip network.
- the above-mentioned chip includes a plurality of memory partitions, in order to facilitate the developers to write programs, the above-mentioned plurality of memory partitions all adopt a Unified Memory Architecture (UMA, Unified Memory Architecture).
- UMA Unified Memory Architecture
- UMA can be used for the last level cache in the above-mentioned multiple memory partitions.
- the memory system in the above-mentioned multiple memory partitions may also employ UMA.
- the effective addresses are the same between different last-level caches and the same between different memory systems. Therefore, when writing data to each last-level cache or each memory system, only one address needs to be entered, and there is no need to write data separately for multiple last-level caches or multiple memory systems, which improves the programming of developers. It also improves the efficiency of data storage.
- Each processing core may send data moving instructions to one or more DMA controllers respectively.
- the processing core in order to reduce the overhead of invoking the DMA controllers, may send data to at least one of the at least one memory partition.
- the DMA controller broadcasts data movement instructions.
- the processing core can broadcast and send a data movement instruction to the DMA controllers in the above-mentioned multiple memory partitions.
- a chip may include 8 memory partitions.
- the last level cache of 4 memory partitions in the above-mentioned 8 memory partitions (assuming that the last level cache is an L2 cache), and the memory systems in the above-mentioned multiple memory partitions can all use UMA.
- the processing core can broadcast and send data transfer instructions to the DMA controllers in the four memory partitions that use UMA; on the other hand, it can send data to the DMA controllers in the four memory partitions that do not use UMA Data move instruction.
- each above-mentioned DMA controller After each above-mentioned DMA controller receives the data move instruction, can extract 1 megabyte of data from the storage location indicated by the above-mentioned data move instruction of the memory system, and move the above-mentioned 1 megabyte of data to the storage indicated by the above-mentioned data move instruction of the L2 cache. location to complete the data transfer.
- the processing core can broadcast and send data movement instructions to the DMA controllers in multiple memory partitions using UMA to complete data movement within each memory partition, the number of calls to the DMA controller by the processing core is reduced, thereby reducing Call overhead to the DMA controller.
- a plurality of the above-mentioned DMA controllers included in the above-mentioned chip may be centralized in the same memory partition, and respectively correspond to the memory system and the cache system included in each memory partition in a one-to-one manner.
- data movement instructions can be sent by broadcasting to the multiple DMA controllers in the above-mentioned memory partitions, thereby completing the data transfer between different storage spaces in each memory partition move.
- the following introduces the improvement of the data transfer instruction in this application.
- a new format of a data transfer instruction to the DMA controller is proposed.
- the data moving instruction reduces the number of fields of the data moving instruction and reasonably sets the meaning of each field indication, thereby reducing the length of the data moving instruction and reducing the calling overhead to the DMA controller.
- the data transfer instruction to the DMA controller includes 6 fields, which are the data transfer type field, the data length field, the last level cache low address field, the last level cache high address field, and the memory system low address field.
- the address field, and the memory system high address field are the data transfer type field, the data length field, the last level cache low address field, the last level cache high address field, and the memory system low address field.
- the above-mentioned data moving instruction may at least include a data moving type, a data length, a source storage address, and a destination storage address.
- the above data transfer type specifically indicates the direction of data transfer.
- the above-mentioned data movement type may indicate the data flow direction in the memory partition.
- the above-mentioned data flow direction may include any of the following four types:
- the movement of the cache system internal data in the above-mentioned memory partition, the movement of the internal data of the memory system in the above-mentioned memory partition, the data migration from the last level cache to the memory system in the above-mentioned memory partition, and from the memory system in the above-mentioned memory partition Move data to the last level of cache.
- the above four data flow directions can be corresponded to the four types of identifiers, and when the DMA controller is actually called, the above four identifiers can be written into the above data transfer types, so that the DMA controller can identify the data this time. Moved data flow.
- the above data length specifically indicates the amount of data to be transmitted. It can be understood that the size of the data has a corresponding relationship with the storage space. Therefore, if the starting position of the data in the storage space is known, the ending position of the data in the storage space can be obtained according to the data length of the data.
- the above-mentioned source storage address specifically refers to the starting address of the current storage location of the data to be moved. For example, if the data is moved from the memory system to the last level of cache, the above-mentioned source storage address is the starting position of the data in the above-mentioned memory system.
- the above-mentioned destination storage address specifically refers to the starting address of the storage location where the data to be moved needs to be moved. For example, if data is moved from the memory system to the last level of cache, the destination storage address is the starting position where the data is moved to the last level of cache.
- the source storage space can be determined according to the source storage address field and the data length in the above-mentioned data moving instruction;
- the destination storage address field and data length determine the destination storage space;
- the data in the source storage space can be moved to the destination storage space according to the data move type in the above data move instruction.
- FIG. 5 is a schematic diagram of a data moving instruction shown in the present application. As shown in Figure 5, the above-mentioned data movement instruction includes a first field, a second field, a third field and a fourth field;
- the above-mentioned first field is a field indicating data movement type and data length
- the above-mentioned second field is a field indicating the low address of the source storage address
- the above-mentioned third field is a field indicating the high address of the source storage address and the high address of the destination storage address;
- the above-mentioned fourth field is a field indicating the lower address of the destination storage address.
- 0000 indicates that the data is moved within the cache system
- 0001 indicates that the data is moved within the memory system
- 0010 indicates that the data is moved from the memory system to the last level of cache
- 0011 indicates that the data is moved from The last level of cache is moved to the memory system.
- the processing core of the chip constructs the data transfer instruction to the DMA controller, it can write 0010 into the first 4 bits of the first field, and write 2MB converted binary into the last 28 bits of the first field. Then the processing core can convert the low address 0x3EAB_0000 of the memory system into binary and write into the second field, and convert the high address 0xAB_00 of the memory system into binary and write the last sixteen bits of the third field. Finally, the processing core can write the high address 0xCD_00 of the last level cache into the first sixteen bits of the third field, and convert the low address 0x3E5B_0000 of the last level cache into binary and write it into the fourth field .
- the data moving instruction can be broadcast and sent to each DMA controller, so that each DMA controller responds to the above-mentioned data moving instruction, from the low address 0x3EAB_0000 of the above-mentioned memory system, high Address 0xAB_00, move 2 MB of data to the low address 0x3E5B_0000 and the high address 0xCD_00 of the last level cache above.
- At least the data movement type and data length fields, the source storage address field, and the destination storage address field can be included. call overhead.
- the six fields in the data moving instruction shown in the related art may be combined, thereby reducing the number of fields included in the data moving instruction.
- FIG. 6 is a schematic diagram of a data moving instruction shown in the present application.
- the above-mentioned data movement instruction includes at least a first field, a second field, a third field and a fourth field;
- the above-mentioned first field is a field indicating data movement type and data length
- the above-mentioned second field is a field indicating the storage address of the last level cache
- the above-mentioned third field is a low address field indicating the memory system
- the above-mentioned fourth field is a high address field indicating the memory system.
- the above-mentioned second field indicates the starting address of the storage space of the last level cache.
- the storage address indicated by the second field is the starting position of the current storage position of the data.
- the storage address indicated by the second field is the starting position of the storage location after the data is moved.
- the present application also proposes a data transfer method, which is applied to a chip.
- the processing core issues a data moving instruction to the built-in DMA controller of the memory partition, so that the DMA controller can respond to the data moving instruction issued by the processing core, so that the data to be moved can be completed inside the memory partition. It can release the memory access bandwidth inside the chip, improve the data transfer efficiency, and prompt the chip performance.
- FIG. 7 is a method flowchart of a data transfer method shown in the present application, which is applied to a chip. As shown in Figure 7, the above method may include:
- the processing core sends a data transfer instruction to the DMA controller.
- the DMA controller performs data transfer between different storage spaces within the memory partition based on the data transfer instruction.
- the above-mentioned chip may be a chip having the chip structure shown in any of the above-mentioned embodiments. In one embodiment, the above-mentioned chip may adopt the chip structure shown in FIG. 2 . As shown in FIG. 2, the above-mentioned chip includes at least one processing core; at least one memory partition. Wherein, the above-mentioned memory partition includes a cache system, a memory system and a DMA controller. Wherein, the DMA controller is connected to the cache system and the memory system respectively.
- the above-mentioned memory partition may include a cache system having one or more levels of caches, at least one memory system, and one or more DMA controllers, which are not particularly limited herein.
- the aforementioned chip may execute artificial intelligence algorithms.
- the above chip may be an AI neural network chip or a GPU graphics processing chip.
- the above-mentioned processing core is usually a computing core in a chip, and is used for executing code operations.
- the above-mentioned processing core can usually perform data movement in the above-mentioned memory partition according to the program code formulated by the developer.
- the data transfer between the storage spaces inside the above-mentioned memory partitions may generally include, the movement of the internal data of the cache system in the above-mentioned memory partitions, the movement of the internal data of the memory system in the above-mentioned memory partitions, and the above-mentioned memory partitions Data movement between the last level of cache and the memory system.
- the above memory partitions are usually used to store data.
- the chip usually adopts memory partitions with storage levels.
- the above-mentioned memory partition may include a cache system with one or more levels of caches and a memory system.
- the cache system described above may include at least L1, L2, and L3 caches.
- the processing core when the processing core needs to fetch data, it usually first accesses the L1 cache. If the data required by the processing core is stored in the L1 cache, the processing core completes the data acquisition. If the data required by the processing core is not stored in the L1 cache, the processing core continues to access the L2 cache to obtain the required data. And so on. If the last level cache, that is, the L3 cache, does not involve the data required by the processing core, the processing core continues to obtain data from the memory system.
- the last level cache can be used as the large-capacity cache
- the DMA controller is used to store the storage space of the last level cache and the storage space in the memory system. Data movement between spaces.
- At least a part of the storage space of the cache system is configured as SPM
- the data transfer efficiency of this part of the storage space will be affected.
- at least a part of the storage space of the last level cache is configured as SPM.
- the DMA controller when performing data transfer, the DMA controller is used to perform data transfer between the storage space configured as SPM in the last level cache and the memory system. Since the data transfer between the storage space configured as SPM in the last level cache and the memory system is performed by the DMA controller, the data to be moved can be prevented from passing through the processing core, thereby releasing the bandwidth, shortening the data transfer path, and improving the Data movement efficiency.
- the last level cache of the above-mentioned cache system supports three working modes, wherein, in the first working mode, all the storage space of the above-mentioned last level cache is Configured as a cache memory, in the second working mode, the entire storage space of the last level cache is configured as SPM, and in the third working mode, a part of the storage space of the last level cache is configured as a cache memory, and another part of the storage space is configured as SPM.
- the above-mentioned memory partition may further include a mode configurator.
- the above-mentioned mode configurator is configured to configure the working mode of the last-level cache in the above-mentioned cache system based on the user configuration information.
- the developer can configure the working mode of the last-level cache through the mode configurator based on the user configuration information.
- the entire storage space of the above-mentioned last-level cache can be configured as SPM.
- the entire storage space of the last level cache can be configured as a cache memory.
- part of the storage space of the last-level cache can be configured as cache memory, and part of the storage space can be configured as SPM to store AI. Operation parameters.
- the above-mentioned memory system may be a global memory system.
- it can be DRAM, SDRAM, etc.
- the above-mentioned global memory system may be HBM.
- the above-mentioned DMA controller is used to perform data transfer between different storage spaces in the above-mentioned memory partition.
- the DMA controller may read data from the first storage space in the memory partition, and write the read data into the second storage space in the memory partition.
- the above-mentioned first storage space is a memory system
- the above-mentioned second storage space is an L2 cache.
- the DMA controller may control data transfer between the memory system and the L2 cache in response to a data transfer instruction sent by the processing core.
- the above-mentioned data movement instruction is specifically used to trigger data movement between storage spaces within the above-mentioned memory partitions.
- the above-mentioned data transfer instruction can be constructed by the processing core of the chip and sent to the DMA controller, so that the DMA controller controls the completion of the data transfer.
- the processing core When data movement needs to be performed between storage spaces within the memory partition, the processing core sends a data movement instruction to the DMA controller.
- the DMA controller can control the data moving between storage spaces within the memory partition in response to the data moving instruction.
- the above-mentioned processing core sends a data transfer instruction to the above-mentioned DMA controller
- the above-mentioned DMA controller can control the data transfer between different storage spaces in the above-mentioned memory partition in response to the above-mentioned data transfer instruction.
- the data to be moved is completed within the above-mentioned memory partition, thereby releasing the memory access bandwidth inside the chip, prompting data transfer efficiency, and improving chip performance.
- the chip may include multiple memory partitions, and in order to complete data migration in each memory partition, the processing core may send data movement instructions to the DMA controllers in the multiple memory partitions, so that each DMA The controller can control data movement within the memory partition where it is located.
- the above processing core can send data movement instructions to the DMA controllers in the above 4 memory partitions respectively. After the DMA controller in the above-mentioned four memory partitions receives the data movement instruction, it can control the data movement inside the memory partition where it is located.
- the above-mentioned chip when the above-mentioned chip includes multiple memory partitions, in order to facilitate the developer to write programs, the above-mentioned multiple memory partitions all use UMA.
- the last level cache in the above multiple memory partitions and the memory system in the above multiple memory partitions can all use UMA.
- UMA can be used for the last level cache in the above-mentioned multiple memory partitions.
- the memory system in the above-mentioned multiple memory partitions may also employ UMA.
- the effective addresses are the same between different last-level caches and the same between different memory systems. Therefore, when writing data to each last-level cache or each memory system, only one address needs to be entered, and there is no need to write data separately for multiple last-level caches or multiple memory systems, which improves the programming of developers. It also improves the efficiency of data storage.
- the above-mentioned processing core is configured to broadcast a data moving instruction to at least one DMA controller in the above-mentioned at least one memory partition.
- the processing core can broadcast and send a data movement instruction to the DMA controllers in the above-mentioned multiple memory partitions.
- the chip includes 4 memory partitions, and the last level of cache in the above-mentioned 4 memory partitions (assuming, the last level of cache is L2 cache), and the memory system in the above-mentioned multiple memory partitions can all use UMA.
- the processing core may broadcast and send a data moving instruction to the DMA controllers in the above-mentioned multiple memory partitions.
- the DMA controller in the above-mentioned 4 memory sub-regions receives the data transfer instruction, it can extract 2 megabytes of data from the storage location indicated by the above-mentioned data transfer instruction of the memory system, and move the above-mentioned 2 megabytes of data to the above-mentioned data of the L2 cache. Move the data to the storage location indicated by the move instruction to complete the data move.
- the processing core can broadcast and send data movement instructions to the DMA controllers in the above four memory partitions to complete data movement within each memory partition, the number of calls made by the processing core to the DMA controller is reduced, thereby reducing the need for DMA control. the call overhead of the device.
- the following introduces the improvement of the data transfer instruction in this application.
- a new format of a data transfer instruction to the DMA controller is proposed.
- the data moving instruction reduces the number of fields of the data moving instruction and reasonably sets the meaning of each field indication, thereby reducing the length of the data moving instruction and reducing the calling overhead to the DMA controller.
- the data transfer instruction to the DMA controller includes 6 fields, which are the data transfer type field, the data length field, the last level cache low address field, the last level cache high address field, and the memory system low address field.
- the address field, and the memory system high address field are the data transfer type field, the data length field, the last level cache low address field, the last level cache high address field, and the memory system low address field.
- the above-mentioned data moving instruction may at least include a data moving type, a data length, a source storage address, and a destination storage address.
- the above data transfer type specifically indicates the direction of data transfer.
- the above-mentioned data movement type may indicate the data flow direction in the memory partition.
- the above-mentioned data flow direction may include any of the following four types:
- the movement of the cache system internal data in the above-mentioned memory partition, the movement of the internal data of the memory system in the above-mentioned memory partition, the data migration from the last level cache to the memory system in the above-mentioned memory partition, and from the memory system in the above-mentioned memory partition Move data to the last level of cache.
- the above four data flow directions can be corresponded to the four types of identifiers, and when the DMA controller is actually called, the above four identifiers can be written into the above data transfer types, so that the DMA controller can identify the data this time. Moved data flow.
- the above data length specifically indicates the amount of data to be transmitted. It can be understood that the size of the data has a corresponding relationship with the storage space. Therefore, if the starting position of the data in the storage space is known, the ending position of the data in the storage space can be obtained according to the data length of the data.
- the above-mentioned source storage address specifically refers to the starting address of the current storage location of the data to be moved. For example, if the data is moved from the memory system to the last level of cache, the above-mentioned source storage address is the starting position of the data in the above-mentioned memory system.
- the above-mentioned destination storage address specifically refers to the starting address of the storage location where the data to be moved needs to be moved. For example, if data is moved from the memory system to the last level of cache, the destination storage address is the starting position where the data is moved to the last level of cache.
- the source storage space can be determined according to the source storage address field and the data length in the above-mentioned data moving instruction;
- the destination storage address field and data length determine the destination storage space;
- the data in the source storage space can be moved to the destination storage space according to the data move type in the above data move instruction.
- FIG. 5 is a schematic diagram of a data moving instruction shown in the present application. As shown in Figure 5, the above-mentioned data movement instruction includes a first field, a second field, a third field and a fourth field;
- the above-mentioned first field is a field indicating data movement type and data length
- the above-mentioned second field is a field indicating the low address of the source storage address
- the above-mentioned third field is a field indicating the high address of the source storage address and the high address of the destination storage address;
- the above-mentioned fourth field is a field indicating the lower address of the destination storage address.
- 0000 indicates that the data is moved within the cache system
- 0001 indicates that the data is moved within the memory system
- 0010 indicates that the data is moved from the memory system to the last level of cache
- 0011 indicates that the data is moved from The last level of cache is moved to the memory system.
- the processing core of the chip constructs the data transfer instruction to the DMA controller, it can write 0010 into the first 4 bits of the first field, and write 2MB converted binary into the last 28 bits of the first field. Then the processing core can convert the low address 0x3EAB_0000 of the memory system into binary and write into the second field, and convert the high address 0xAB_00 of the memory system into binary and write the last sixteen bits of the third field. Finally, the processing core can write the high address 0xCD_00 of the last level cache into the first sixteen bits of the third field, and convert the low address 0x3E5B_0000 of the last level cache into binary and write it into the fourth field .
- the data moving instruction can be broadcast and sent to each DMA controller, so that each DMA controller responds to the above-mentioned data moving instruction, from the low address 0x3EAB_0000 of the above-mentioned memory system, high Address 0xAB_00, move 2 MB of data to the low address 0x3E5B_0000 and high address 0xCD_00 of the last level cache system.
- At least the data movement type and data length fields, the source storage address field, and the destination storage address field can be included. call overhead.
- the six fields in the data moving instruction shown in the related art may be combined, thereby reducing the number of fields included in the data moving instruction.
- FIG. 6 is a schematic diagram of a data moving instruction shown in the present application.
- the above-mentioned data movement instruction includes at least a first field, a second field, a third field and a fourth field;
- the above-mentioned first field is a field indicating data movement type and data length
- the above-mentioned second field is a field indicating the storage address of the last level cache
- the above-mentioned third field is a low address field indicating the memory system
- the above-mentioned fourth field is a high address field indicating the memory system.
- the above-mentioned second field indicates the starting address of the storage space of the last level cache.
- the storage address indicated by the second field is the starting position of the current storage position of the data.
- the storage address indicated by the second field is the starting position of the storage location after the data is moved.
- the present application also provides an electronic device, including the chip shown in any of the foregoing embodiments.
- the electronic device may be a smart terminal such as a mobile phone, or other devices that have a camera and can perform image processing.
- the electronic device acquires the collected image, it can process the image, and the processing process can use the chip of the embodiment of the present application to perform the computing task.
- the use of the chip can assist in improving the processing efficiency of computing tasks, thereby improving the performance of electronic equipment.
- one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
- Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this application and their structural equivalents, or in a combination of one or more.
- Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
- the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
- the processing device executes.
- the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
- the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
- the processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
- FPGA Field Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing system.
- a central processing system will receive instructions and data from read-only memory and/or random access memory.
- the basic components of a computer include a central processing system for implementing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
- the computer does not have to have such a device.
- the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
- PDA personal digital assistant
- GPS global positioning system
- USB universal serial bus
- Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and 0xCD_00ROM and DVD-ROM disks.
- semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
- magnetic disks eg, internal hard disks or memory devices. removable disks
- magneto-optical disks e.g, magneto-optical disks
- 0xCD_00ROM and DVD-ROM disks 0xCD_00ROM and DVD-ROM disks.
- the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Human Computer Interaction (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Bus Control (AREA)
Abstract
Description
Claims (20)
- 一种芯片,包括:至少一个处理内核和至少一个存储器分区;其中,对于每个存储器分区:所述存储器分区包括高速缓存系统、内存系统,以及直接存储器访问DMA控制器;所述DMA控制器,与所述高速缓存系统以及所述内存系统分别连接,用于进行所述存储器分区内部的不同存储空间之间的数据搬移。
- 根据权利要求1所述的芯片,其特征在于,所述DMA控制器用于进行所述存储器分区内部的不同存储空间之间的数据搬移包括用于进行下列中的至少一种:所述高速缓存系统的不同存储空间之间的数据搬移;所述内存系统内的不同存储空间之间的数据搬移;所述高速缓存系统的存储空间与所述内存系统内的存储空间之间的数据搬移。
- 根据权利要求2所述的芯片,其特征在于,所述高速缓存系统包括多级高速缓存;所述DMA控制器用于进行所述高速缓存系统的存储空间与所述内存系统内的存储空间之间的数据搬移包括所述DMA控制器用于进行最后一级高速缓存的存储空间与所述内存系统内的存储空间之间的数据搬移。
- 根据权利要求3中所述的芯片,其特征在于,所述最后一级高速缓存支持三种工作模式,其中,在第一工作模式中,所述最后一级高速缓存的全部存储空间被配置为高速缓存存储器,在第二工作模式中,所述最后一级高速缓存的全部存储空间被配置为便笺存储器SPM,在第三工作模式中,所述最后一级高速缓存的一部分存储空间被配置为高速缓存存储器,另一部分存储空间被配置为SPM。
- 根据权利要求4所述的芯片,其特征在于,所述存储器分区还包括模式配置器,所述模式配置器用于基于用户配置信息,配置所述最后一级高速缓存的工作模式。
- 根据权利要求1至5中任一项所述的芯片,其特征在于,所述至少一个处理内核与所述DMA控制器通过主片上网络互相访问;或所述DMA控制器、所述高速缓存系统以及所述内存系统之间通过子片上网络互相 访问。
- 根据权利要求1-6任一所述的芯片,其特征在于,所述存储器分区中的不同存储空间全部或部分采用统一内存架构UMA。
- 根据权利要求1至7任一所述的芯片,其特征在于,所述至少一个处理内核中的第一处理内核用于向至少一个第一DMA控制器发送数据搬移指令,其中,所述至少一个第一DMA控制器包括在至少一个第一存储器分区中;所述至少一个第一DMA控制器,用于基于所述数据搬移指令,进行所述至少一个第一存储器分区内部的不同存储空间之间的数据搬移。
- 根据权利要求8所述的芯片,其特征在于,所述第一处理内核用于向所述至少一个第一DMA控制器发送数据搬移指令包括,所述第一处理内核用于向至少一个第二DMA控制器广播数据搬移指令,其中所述第二DMA控制器包括在所述不同存储空间全部采用UMA的第一存储器分区中。
- 根据权利要求8或9所述的芯片,其特征在于,所述数据搬移指令包括:数据搬移类型、数据长度、源存储地址、以及目的存储地址。
- 根据权利要求10所述的芯片,其特征在于,所述数据搬移指令包括第一字段、第二字段、第三字段以及第四字段;其中,所述第一字段用于指示所述数据搬移类型和所述数据长度;所述第二字段用于指示所述源存储地址的低地址;所述第三字段用于指示所述源存储地址的高地址以及所述目的存储地址的高地址;所述第四字段用于指示所述目的存储地址的低地址。
- 根据权利要求1-11任一所述的芯片,其特征在于,所述DMA控制器用于进行所述存储器分区内部的不同存储空间之间的数据搬移包括用于:从所述存储器分区内的第一存储空间读取数据,并将读取到的数据写入所述存储器分区内的第二存储空间。
- 根据权利要求1-12任一所述的芯片,其特征在于,所述内存系统为高带宽存储器HBM。
- 一种数据搬移方法,应用于芯片,其中所述芯片包括至少一个处理内核和至少一个存储器分区,每个存储器分区包括高速缓存系统、内存系统、以及直接存储器访问DMA控制器;所述方法包括:对于每个存储器分区,通过所述DMA控制器进行所述存储器分区内部的不同存储空间之间的数据搬移。
- 根据权利要求14所述的方法,其特征在于,所述高速缓存系统包括多级高速缓存;所述通过所述DMA控制器进行所述存储器分区内部的不同存储空间之间的数据搬移,包括:通过所述DMA控制器进行最后一级高速缓存的存储空间与所述内存系统内的存储空间之间的数据搬移。
- 根据权利要求15所述的方法,其特征在于,所述方法还包括:基于用户配置信息配置所述最后一级高速缓存的工作模式。
- 根据权利要求14至16任一所述的方法,其特征在于,所述通过所述DMA控制器进行所述存储器分区内部的不同存储空间之间的数据搬移,包括:通过所述至少一个处理内核中的第一处理内核向至少一个第一DMA控制器发送数据搬移指令,其中,所述至少一个第一DMA控制器包括在至少一个第一存储器分区中;所述至少一个第一DMA控制器,基于所述数据搬移指令,进行所述至少一个第一存储器分区内部的不同存储空间之间的数据搬移。
- 根据权利要求17所述的方法,其特征在于,所述通过所述第一处理内核向所述至少一个第一DMA控制器发送所述数据搬移指令,包括:通过所述第一处理内核向至少一个第二DMA控制器广播数据搬移指令,其中所述第二DMA控制器包括在所述不同存储空间全部采用统一内存架构UMA的第一存储器分区中。
- 根据权利要求14-18任一所述的方法,其特征在于,所述通过所述DMA控制器进行所述存储器分区内部的不同存储空间之间的数据搬移,包括:通过所述DMA控制器从所述存储器分区内的第一存储空间读取数据,并将读取到的数据写入所述存储器分区内的第二存储空间。
- 一种电子设备,包括:权利要求1至13任一所述的芯片。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022527673A JP2023509818A (ja) | 2020-12-10 | 2021-06-22 | チップ、データ移行方法及び電子機器 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011458676.7A CN112506437A (zh) | 2020-12-10 | 2020-12-10 | 芯片、数据搬移方法和电子设备 |
CN202011458676.7 | 2020-12-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022121278A1 true WO2022121278A1 (zh) | 2022-06-16 |
Family
ID=74973679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/101547 WO2022121278A1 (zh) | 2020-12-10 | 2021-06-22 | 芯片、数据搬移方法和电子设备 |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP2023509818A (zh) |
CN (1) | CN112506437A (zh) |
WO (1) | WO2022121278A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034376A (zh) * | 2022-08-12 | 2022-09-09 | 上海燧原科技有限公司 | 神经网络处理器、批量标准化处理方法及存储介质 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112506437A (zh) * | 2020-12-10 | 2021-03-16 | 上海阵量智能科技有限公司 | 芯片、数据搬移方法和电子设备 |
CN113220346A (zh) * | 2021-04-29 | 2021-08-06 | 上海阵量智能科技有限公司 | 一种硬件电路、数据搬移方法、芯片和电子设备 |
CN117529704A (zh) * | 2022-05-18 | 2024-02-06 | 深圳市韶音科技有限公司 | 一种信号传输控制系统 |
CN116308999B (zh) * | 2023-05-18 | 2023-08-08 | 南京砺算科技有限公司 | 图形处理器的数据处理方法及图形处理器、存储介质 |
CN116610630B (zh) * | 2023-07-14 | 2023-11-03 | 上海芯高峰微电子有限公司 | 一种基于片上网络的多核系统和数据传输方法 |
CN117667828B (zh) * | 2024-01-31 | 2024-05-03 | 摩尔线程智能科技(北京)有限责任公司 | 一种片上网络集成方法、装置和存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101930357A (zh) * | 2010-08-17 | 2010-12-29 | 中国科学院计算技术研究所 | 采用可配置的片上存储装置实现访存操作的系统及方法 |
CN102521201A (zh) * | 2011-11-16 | 2012-06-27 | 刘大可 | 多核数字信号处理器片上系统及数据传输方法 |
US8677081B1 (en) * | 2006-09-29 | 2014-03-18 | Tilera Corporation | Transferring and storing data in multicore and multiprocessor architectures |
CN108153190A (zh) * | 2017-12-20 | 2018-06-12 | 福建新大陆电脑股份有限公司 | 一种人工智能微处理器 |
CN109739785A (zh) * | 2018-09-20 | 2019-05-10 | 威盛电子股份有限公司 | 多核系统的内连线结构 |
CN111797034A (zh) * | 2020-06-24 | 2020-10-20 | 深圳云天励飞技术有限公司 | 一种数据管理方法、神经网络处理器和终端设备 |
CN112506437A (zh) * | 2020-12-10 | 2021-03-16 | 上海阵量智能科技有限公司 | 芯片、数据搬移方法和电子设备 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000051004A1 (en) * | 1999-02-22 | 2000-08-31 | Infineon Technologies Ag | Methods and apparatus for facilitating direct memory access |
US6859862B1 (en) * | 2000-04-07 | 2005-02-22 | Nintendo Co., Ltd. | Method and apparatus for software management of on-chip cache |
JP4204759B2 (ja) * | 2001-03-09 | 2009-01-07 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Dma転送制御方法及び制御装置 |
DE602004026823D1 (de) * | 2004-02-12 | 2010-06-10 | Irdeto Access Bv | Verfahren und System zur externen Speicherung von Daten |
CN101645052B (zh) * | 2008-08-06 | 2011-10-26 | 中兴通讯股份有限公司 | 一种快速dma乒乓缓存方法 |
JP2011086131A (ja) * | 2009-10-16 | 2011-04-28 | Mitsubishi Electric Corp | データ処理システム |
US10078593B2 (en) * | 2011-10-28 | 2018-09-18 | The Regents Of The University Of California | Multiple-core computer processor for reverse time migration |
JP5776821B2 (ja) * | 2013-08-26 | 2015-09-09 | 富士ゼロックス株式会社 | 情報処理装置、演算処理装置及びプログラム |
CN104298645A (zh) * | 2014-10-09 | 2015-01-21 | 深圳市国微电子有限公司 | 一种可灵活配置的可编程片上系统芯片及其启动配置方法 |
US9959227B1 (en) * | 2015-12-16 | 2018-05-01 | Amazon Technologies, Inc. | Reducing input/output latency using a direct memory access (DMA) engine |
CN107562659A (zh) * | 2016-06-30 | 2018-01-09 | 中兴通讯股份有限公司 | 一种数据搬移装置及方法 |
CN109933553B (zh) * | 2019-02-28 | 2020-09-29 | 厦门码灵半导体技术有限公司 | 一种控制系统及其设计方法、一组控制系统、电子装置 |
CN110059024B (zh) * | 2019-04-19 | 2021-09-21 | 中国科学院微电子研究所 | 一种内存空间数据缓存方法及装置 |
CN111782154B (zh) * | 2020-07-13 | 2023-07-04 | 芯象半导体科技(北京)有限公司 | 数据搬移方法、装置及系统 |
CN111739577B (zh) * | 2020-07-20 | 2020-11-20 | 成都智明达电子股份有限公司 | 一种基于dsp的高效的ddr测试方法 |
-
2020
- 2020-12-10 CN CN202011458676.7A patent/CN112506437A/zh active Pending
-
2021
- 2021-06-22 JP JP2022527673A patent/JP2023509818A/ja active Pending
- 2021-06-22 WO PCT/CN2021/101547 patent/WO2022121278A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8677081B1 (en) * | 2006-09-29 | 2014-03-18 | Tilera Corporation | Transferring and storing data in multicore and multiprocessor architectures |
CN101930357A (zh) * | 2010-08-17 | 2010-12-29 | 中国科学院计算技术研究所 | 采用可配置的片上存储装置实现访存操作的系统及方法 |
CN102521201A (zh) * | 2011-11-16 | 2012-06-27 | 刘大可 | 多核数字信号处理器片上系统及数据传输方法 |
CN108153190A (zh) * | 2017-12-20 | 2018-06-12 | 福建新大陆电脑股份有限公司 | 一种人工智能微处理器 |
CN109739785A (zh) * | 2018-09-20 | 2019-05-10 | 威盛电子股份有限公司 | 多核系统的内连线结构 |
CN111797034A (zh) * | 2020-06-24 | 2020-10-20 | 深圳云天励飞技术有限公司 | 一种数据管理方法、神经网络处理器和终端设备 |
CN112506437A (zh) * | 2020-12-10 | 2021-03-16 | 上海阵量智能科技有限公司 | 芯片、数据搬移方法和电子设备 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034376A (zh) * | 2022-08-12 | 2022-09-09 | 上海燧原科技有限公司 | 神经网络处理器、批量标准化处理方法及存储介质 |
CN115034376B (zh) * | 2022-08-12 | 2022-11-18 | 上海燧原科技有限公司 | 神经网络处理器的批量标准化处理方法及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112506437A (zh) | 2021-03-16 |
JP2023509818A (ja) | 2023-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022121278A1 (zh) | 芯片、数据搬移方法和电子设备 | |
JP6817273B2 (ja) | 不揮発性大容量メモリ・システムによるキャッシュ移動を提供するための装置および方法 | |
CN112035381B (zh) | 一种存储系统及存储数据处理方法 | |
US20180239722A1 (en) | Allocation of memory buffers in computing system with multiple memory channels | |
US11687276B2 (en) | Data streaming for computational storage | |
US9569381B2 (en) | Scheduler for memory | |
KR20050051672A (ko) | 스케일러블 멀티채널 메모리 액세스를 위한 방법 및 메모리제어기 | |
CN103210378A (zh) | 使用高速缓存图像进行低电力音频解码和回放 | |
CN106775477B (zh) | Ssd主控数据传输管理装置及方法 | |
CN114490433A (zh) | 存储空间的管理方法、数据处理芯片、设备和存储介质 | |
CN113033785A (zh) | 芯片、神经网络训练系统、内存管理方法及装置、设备 | |
WO2022227563A1 (zh) | 一种硬件电路、数据搬移方法、芯片和电子设备 | |
TWI471731B (zh) | 記憶體存取方法、記憶體存取控制方法、spi快閃記憶體裝置以及spi控制器 | |
CN116483553A (zh) | 计算设备、数据处理方法、系统及相关设备 | |
CN107025190B (zh) | 系统及其操作方法 | |
JP2009037639A (ja) | ストリーミングidメソッドによるdmac発行メカニズム | |
WO2023142114A1 (zh) | 数据处理方法、装置以及电子设备 | |
CN117312201B (zh) | 一种数据传输方法、装置及加速器设备、主机和存储介质 | |
KR20220077863A (ko) | 로컬버스를 이용한 호스트와 컨트롤러 간의 데이터 교환 시스템 및 그 방법 | |
CN117666944A (zh) | 用于执行数据处理功能的方法和存储装置 | |
JP2023527770A (ja) | メモリにおける推論 | |
CN113176911A (zh) | 一种配置方法、数据处理方法、芯片和电子设备 | |
CN117033283A (zh) | 一种axi总线id的动态压缩装置与方法 | |
CN116795742A (zh) | 存储设备、信息存储方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022527673 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21901990 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21901990 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21901990 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.11.2023) |