WO2021139733A1 - Memory allocation method and device, and computer readable storage medium - Google Patents

Memory allocation method and device, and computer readable storage medium Download PDF

Info

Publication number
WO2021139733A1
WO2021139733A1 PCT/CN2021/070708 CN2021070708W WO2021139733A1 WO 2021139733 A1 WO2021139733 A1 WO 2021139733A1 CN 2021070708 W CN2021070708 W CN 2021070708W WO 2021139733 A1 WO2021139733 A1 WO 2021139733A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
memory
memory allocation
allocation
interleaving
Prior art date
Application number
PCT/CN2021/070708
Other languages
French (fr)
Chinese (zh)
Inventor
李健
张晓铮
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021139733A1 publication Critical patent/WO2021139733A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode

Definitions

  • This disclosure generally relates to the field of computers. More specifically, the present disclosure relates to methods, devices, and computer-readable storage media for allocating memory.
  • Double-rate synchronous dynamic random access memory (DDR SDRAM) is more and more widely used in today's computers. It is equipped with multi-channel memory control technology, which can effectively increase the total bandwidth of memory to meet the needs of high-speed processor data transmission and processing. .
  • DDR SDRAM Double-rate synchronous dynamic random access memory
  • some multi-channel DDR technologies cannot implement interleaving memory allocation between channels; some cannot implement memory block-level interleaving within a channel; some can only connect two channels in parallel and have limited access bandwidth. Therefore, how to obtain a technical solution for efficiently allocating memory is still a problem to be solved in the prior art.
  • the solution of the present disclosure provides a method, device and computer-readable storage medium for allocating memory.
  • the present disclosure provides a method for allocating memory, including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: The inter-channel interleaving memory allocation is performed on the channel; and on each channel where the memory is allocated, the intra-channel interleaving memory allocation is performed.
  • the present disclosure provides a device for performing data read and write operations, including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; and The channel DDR is configured to store data; the processor is configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: in multiple channels of the multi-channel DDR Perform inter-channel interleaving memory allocation; and perform inter-channel interleaving memory allocation on each channel where memory is allocated.
  • the present disclosure provides a computer-readable storage medium on which computer program code for allocating memory is stored.
  • the computer program code is run by a processor, the aforementioned method can be executed.
  • memory can be interleaved and allocated between multiple channels and multiple memory blocks, which greatly improves access bandwidth.
  • FIG. 1 is a schematic diagram illustrating the multi-channel interleaving scheme of the present disclosure by taking two channels as an example
  • FIG. 2 is a schematic diagram showing two adjacent memory blocks in the same channel according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart showing a method for allocating memory according to an embodiment of the present disclosure
  • Fig. 4 is a schematic diagram showing multi-channel interleaving according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram showing interleaving of multiple memory blocks according to an embodiment of the present disclosure
  • FIG. 6 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure.
  • FIG. 7 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram showing multi-channel interleaving between two adjacent channels according to an embodiment of the present disclosure
  • FIG. 9 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram showing interleaving of multiple memory blocks according to another embodiment of the present disclosure.
  • FIG. 11 is an architecture diagram showing a device for allocating memory according to another embodiment of the present disclosure.
  • FIG. 12 is a structural diagram showing an integrated circuit device according to an embodiment of the present disclosure.
  • Fig. 13 is a structural diagram showing a board according to an embodiment of the present disclosure.
  • the memory of multiple channels and multiple memory blocks can realize interleaving and allocation of memory between channels and memory blocks, thereby increasing access bandwidth and increasing parallelism.
  • DDR SDRAM DDR SDRAM
  • DDR can support multi-channel access.
  • the interleaving method of DDR between multiple channels is to distribute the same block of memory to different channels, which can increase the bandwidth of multiple DDRs in parallel and improve the performance of memory access. If the data is distributed on the memory blocks on different channels, the memory controller can read the data in parallel through these multi-channels. For a four-channel DDR, the access speed is almost quadrupled.
  • the same block of memory can be distributed to different channels to perform so-called interleaving, so that the same block of memory can be accessed in parallel, which improves the efficiency of the system.
  • Figure 1 uses two channels as an example to illustrate the multi-channel interleaving scheme of the present disclosure.
  • the embodiment of the present disclosure has a first channel 102 and a second channel 104, and each channel has a considerable page size.
  • each channel is only shown 32 pages have a size of 106, and each page has a size of 16KB. This size is sufficient as the granularity of inter-channel interleaving, and will not affect the normal jump instructions of the upper layer.
  • multi-memory block interleaving access is performed in the same channel.
  • each computing unit can only access the memory block in the same channel.
  • Fig. 2 shows two adjacent memory blocks in the same channel: the first memory block 202 and the second memory block 204.
  • memory block interleaving access is performed in two adjacent memory blocks.
  • Each memory block shows 8 address spaces, the addresses are addr0 to addr7.
  • the address space is smaller than the page size.
  • the address space can be 1KB, that is, when two memory blocks are interleaved access in the same channel, each access is
  • the memory unit is an address space. For example, at 0x000, it is allocated to the address addr0 of the first memory block 202; at 0x400, it is allocated to the address addr0 of the second memory block 204; at 0x800, it is allocated to the address addr1 of the first memory block 202; At 0xc00, the address addr1 of the second memory block 204 is allocated. Allocate in this way until all page sizes are allocated.
  • the use of multi-memory block interleaving access can realize that each computing unit only accesses the memory block in the same channel. In this case, it can reduce the access to the remote channel. It is also possible to reduce the access conflicts of various computing units between different channels.
  • An embodiment of the present disclosure is a method for allocating memory, especially when the host side issues a command to the accelerator card device, the command may be a dynamic memory allocation (memory allocation, malloc) function.
  • the memory of this embodiment has 4 channels in total, and the allocation method is interleaved among these 4 channels.
  • the flow of the allocation method is shown in FIG. 3.
  • step 302 When step 302 is executed, a memory allocation request for a 4-channel DDR is received. In more detail, it is to receive a command of the dynamic memory allocation function.
  • the command can dynamically allocate a part of the memory in the accelerator card device. The device side will first apply for a piece of physical memory in the physical memory after aligning according to the specified size.
  • step 304 memory allocation for inter-channel interleaving is performed on the 4-channel DDR.
  • the device side will evenly allocate memory space to the 4 channels based on the page size.
  • Figure 4 is a schematic diagram showing this step.
  • the 4-channel DDR of this embodiment includes: a first channel 402, a second channel 404, a third channel 406, and a fourth channel 408, each channel includes multiple Address space, for the convenience of illustration, each channel in the figure only shows 32 address spaces, with addresses from addr00 to addr31 (not shown in the figure), and each address space is 8KB. If the page size is 16KB, then every 2 address spaces form a page size.
  • the current device When performing inter-channel interleaving memory allocation, the current device cyclically performs inter-channel interleaving memory allocation on each channel with a page-sized inter-channel granularity. For example, at 0x000, the address of the first page size 410 allocated to the first channel 402 is addr00 and addr01; at 0x400, the address of the first page size 412 allocated to the second channel 404 is addr00 and addr01; at 0x800, the first page size 414 allocated to the third channel 406, its address is addr00 and addr01; at 0xc00, the first page size 416 allocated to the fourth channel 408, its location
  • the addresses are addr00 and addr01; at 0x1000, the second page size 418 allocated to the first channel 402 has addresses addr02 and addr03. Assign in this way.
  • the page size is 16KB
  • the space will still be specified at the granularity of a complete page size for alignment during allocation. If the required space is 20KB, the space will be specified at a granularity of 2 page sizes.
  • step 306 based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed.
  • interleaving memory allocation can be performed cyclically on two adjacent memory blocks at a channel granularity smaller than the page size, for example, interleaving allocation is performed in units of address space.
  • FIG. 5 shows the first channel 402 of FIG. 4 as an example for description. The channel can be divided into 4 memory blocks: the first memory block 502, the second memory block 504, the third memory block 506, and the fourth memory block.
  • Memory block 508 assuming that the first 4 page sizes of the first memory block 502 have been allocated and occupied in step 304 (as shown in the gray block), the memory blocks in this step are interleaved and allocated to the first memory block 502 and the second memory Executed in block 504.
  • the address space 510 of the first memory block 502 At 0x400, it is allocated to the address space 512 of the second memory block 504; at 0x800, it is allocated to the address space 514 of the first memory block 502 ;
  • the address space 516 allocated to the second memory block 504 is allocated in the order of the dotted arrow in the figure. Allocate in this way and execute repeatedly until the allocated memory is completely allocated.
  • the memory of another embodiment of the present disclosure also has 4 channels, and the allocation method is interleaved among the 4 channels, and the flow of the allocation method is shown in FIG. 6.
  • step 602 When step 602 is executed, a memory allocation request for a 4-channel DDR is received.
  • step 604 it is determined whether the number of calculation units related to the command of the dynamic memory allocation function this time is one.
  • step 606 is executed to perform inter-channel interleaving on a 4-channel DDR.
  • the operation of memory allocation is the same as that of step 304 in the previous embodiment, and will not be repeated here.
  • step 604 if in step 604, it is determined that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at the same time, so the execution In step 608, memory allocation for inter-channel and intra-channel interleaving is performed.
  • the operation mode is the same as that of the embodiment in FIG. 3, and will not be described again.
  • step 608 can also only perform memory block interleaving access.
  • the present disclosure can allocate each channel to a calculation unit, and each calculation unit Only access to the memory block in the same channel, in this case, can reduce the access to the remote channel, and can also reduce the access conflicts between the different channels of each computing unit.
  • the implementation of this disclosure can perform interleaving access between channels and memory blocks.
  • it can enjoy the bandwidth of multiple channel interleaving.
  • each computing unit can access the memory block of a specific channel, and/or enjoy multiple channel interleaving, which not only improves the access bandwidth, but also increases the degree of parallelism.
  • a single image computing unit is used to send a cluster (cluster) access to the DDR channel for testing.
  • the cluster has a total of 64 character segments, and the size of each character segment is 16KB. Without any interleaving, it took 55876 microseconds to complete the allocation, and the bandwidth was 18.76GB/s; when only interleaving between multiple channels was performed, it took 71080 microseconds to complete the allocation, and the bandwidth was 14.756GB/s ; Using the multi-channel and multi-memory block interleaving method of the present disclosure, it takes 44057 microseconds to complete the allocation, and the bandwidth is 23.8GB/s. Obviously the speed has increased a lot.
  • Another embodiment of the present disclosure is a method for allocating memory.
  • the memory in this embodiment has 4 channels in total, and the allocation method is between two channels.
  • the flow of the allocation method is shown in Figure 7. .
  • step 702 When step 702 is performed, a plurality of service instructions to be allocated to the 4-channel DDR are received.
  • the service instructions include service data and need to be stored in the accelerator card device.
  • the multiple service instructions include first service data, second service data, third service data, and fourth service data, and the size of each service data is 30 KB.
  • step 704 When step 704 is executed, a memory allocation request for a 4-channel DDR is received.
  • a command of the dynamic memory allocation function is received, which requires the foregoing 4 service data to be stored in the memory of the accelerator card device.
  • step 706 the multiple service instructions are allocated to the 4 channels one by one with the granularity between channels.
  • the inter-channel granularity (ie page size) of the DDR in this embodiment is 16KB, which means that each service data requires 2 pages of space, so each service data is split into the first sub-data (16KB) and the second sub-data (16KB). Data (14KB), and distributed to 4 channels in the following table.
  • step 708 memory allocation for inter-channel interleaving is performed on the 2 channels of the 4-channel DDR.
  • interleaving between channels is performed in units of page size, and allocation only occurs between two adjacent channels.
  • the DDR in this embodiment includes: a first channel 802, a second channel 804, a third channel 806, and a fourth channel 808, and each channel includes multiple page sizes.
  • the first service data and the second service data will be interleaved and distributed between the first channel 802 and the second channel 804, and the third service data and the fourth service data will be between the third channel 807 and the fourth channel 808 Interlaced distribution.
  • the first sub-data of the first business data is allocated to the first page size 810 of the first channel 802; the second sub-data of the first business data is allocated to the first page size 812 of the second channel 804 ; Allocate the first sub-data of the second service data to the second page size 814 of the first channel 802; allocate the second sub-data of the second service data to the second page size 816 of the second channel 804;
  • the first sub-data of the business data is allocated to the first page size 818 of the third channel 806;
  • the second sub-data of the third business data is allocated to the first page size 820 of the fourth channel 808;
  • the first sub-data of the fourth business data is allocated to the first page size 820 of the fourth channel 808;
  • One sub-data is allocated to the second page size 822 of the third channel 806; the second sub-data of the fourth service data is allocated to the second page size 824 of the fourth channel 808.
  • step 710 based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed. For example, when there are 4 computing units, each channel can be allocated to one computing unit.
  • step 712 in each channel, the allocated service instructions are allocated to the memory block one by one at the granularity within the channel.
  • memory block interleaving allocation can be performed in this step in the manner shown in FIG. 5, which will not be repeated.
  • different memory interleaving allocation methods may be adopted for different numbers of computing units. For example, after step 704, a step of judging whether the number of calculation units involved in the command of the dynamic memory allocation function this time is one is added. If the number of calculation units involved in the command of this dynamic memory allocation function is 1, it is more efficient to use multi-channel interleaving allocation at this time, and the memory allocation between two channels is performed on the 4-channel DDR. If it is judged that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at this time, so the inter-channel and intra-channel interleaving memory allocation is performed.
  • the operation mode is similar to the embodiment of FIG. 3 and FIG. 6, and those skilled in the art can easily perform operations based on the description of these embodiments, so the details will not be repeated.
  • Another embodiment of the present disclosure is a method for allocating memory of 4 channels.
  • the allocation method is performed among the 4 channels.
  • the flow of the allocation method is shown in FIG. 9.
  • step 902 When step 902 is executed, a memory allocation request for a 4-channel DDR is received.
  • step 904 when step 904 is executed, inter-channel memory allocation is performed on the 4 channels.
  • the current device when performing inter-channel interleaving memory allocation, uses the inter-channel granularity of the page size to cyclically perform inter-channel interleaving memory allocation on each channel.
  • FIG. 10 shows two memory blocks adjacent to a single channel 1002: the first memory block 1004 and the second memory block 1006.
  • step 904 it is assumed that part of the space of the first memory block 1004 is allocated, that is, the occupied space 1008 of the gray part.
  • step 906 when step 906 is executed, on each channel where the memory is allocated, interleaved memory allocation within the channel is executed.
  • the granular memory between channels is allocated one by one based on the unallocated space on the channel, and the unused memory blocks are preferentially allocated for interleaving.
  • the second memory block 1006 is the main memory area. Since the first memory block 1004 has participated in the allocation in step 904, it is only used as a spare memory area.
  • the spare memory area (the first memory block 1004) is additionally used for memory allocation. It is allocated in the unoccupied space in the first memory block 1004.
  • the foregoing embodiment can perform interleaving access between channels and between memory blocks, not only can enjoy the bandwidth of multiple channel interleaving, but also make good use of the space of each memory block.
  • Another embodiment of the present disclosure is a computer-readable storage medium on which computer program code for allocating memory is stored.
  • the computer program code is run by a processor, the method of the foregoing embodiment can be executed, for example, The technical solutions shown in Figure 3, Figure 6, Figure 7, and Figure 9.
  • FIG. 11 shows a system 1100 for allocating memory in another embodiment of the present disclosure.
  • the system 1100 includes a main device 1102 and a device 1104, and the main device 1102 may be a host.
  • the device 1104 can be an accelerator card, which includes multiple computing units 1106, a transceiver 1108, a processor 1110, a buffer 1112, and a multi-channel DDR 1114.
  • the four computing units 1106 in the figure are examples, and it is not limited to only four
  • the calculation unit 1106, similarly, the 4 DDR 1114 in the figure is an example, and it is not limited to only 4 DDR 1114.
  • the transceiver 1108 is configured to receive a memory allocation request from the master device 1102 that forms a master-slave relationship with the device 1104; the processor 1110 is configured to perform the following memory allocation operations on the multi-channel DDR 1114 according to the received memory allocation request: Perform inter-channel interleaving memory allocation on multiple channels of multi-channel DDR1114, and perform inter-channel interleaving memory allocation on each channel where memory is allocated.
  • the processor 1110 contains a system memory management unit (System Memory Management Unit). , SMMU); the buffer 1112 may be an ultimate cache (Last Level Cache, LLC), configured to implement interleaving memory allocation within the channel; and the multi-channel DDR 1114 is configured to store data.
  • System Memory Management Unit System Memory Management Unit
  • the buffer 1112 may be an ultimate cache (Last Level Cache, LLC), configured to implement interleaving memory allocation within the channel
  • the multi-channel DDR 1114 is configured to store data.
  • the processor 1110 receives the command of the dynamic memory allocation function from the autonomous device 1102, and when applying for memory space, the target is a memory address in the device 1104.
  • the transceiver 1108 After the transceiver 1108 receives the memory application, it converts the memory application into a physical address through the processor 1110, and obtains the real memory address on the DDR, which is interleaved and allocated to the multi-channel DDR 1114 according to the memory address.
  • this embodiment uses the processor 1110 and the buffer 1112 to implement the memory application.
  • the processor 1110 implements the interleaving allocation of the multi-channel DDR 1114 between the multi-channel DDR 1114 through the driver program, and at the same time implements the interleaving allocation of multiple memory blocks in the channel through the buffer 1112.
  • the memory address allocation requires the participation of the memory management module in the driver and the system memory management unit.
  • the memory management module is responsible for aligning the size of the application and based on the channel information of this application. , Such as single-channel or multi-channel, physical memory allocation to DDR 1114 channels.
  • the system memory management unit is responsible for the management of virtual addresses, and is responsible for managing the mapping between the requested physical addresses and virtual addresses by laying out the page table.
  • Each multi-channel DDR 1114 includes a DDR controller (not shown) connected to the buffer 1112. After the processor 1110 sends the physical address to the buffer 1112, it will apply for a corresponding virtual address according to the management of the virtual address, and implement the mapping from the physical address to the virtual address by laying the page table, and then send it to the host device through the transceiver 1108 1102 sends the virtual address related to this memory allocation request.
  • a DDR controller not shown
  • the master device 1102 After the master device 1102 receives the virtual address, it can access the memory.
  • the master device 1102 sends the data to the transceiver 1108 according to the returned virtual address.
  • the processor 1110 converts the virtual address into a physical address based on the virtual address, and the data is obtained through the buffer 1112.
  • the real memory address on the multi-channel DDR 1114 is used for interleaving allocation between multiple channels and/or between multiple memory blocks.
  • the processor 1110 is configured to apply for memory from the main device 1102 through the transceiver 1108. After receiving the virtual address from the main device 1102, the processor 1110 It is converted into a physical address, and the data stored in the multi-channel DDR 1114 is taken out, and sent to the master device 1102 through the transceiver 1108.
  • the transceiver 1108 receives the memory allocation application for 4 channels of DDR, and the processor 1110 is based on the calculation unit 1106 State, if the multi-channel interleaving allocation is more efficient, for example, there is only one computing unit 1106, and the processor 1110 chooses to perform inter-channel interleaving memory allocation on a 4-channel DDR, and converts the virtual address into a physical address, and passes the data through the buffer 1112 obtains the real memory address on the multi-channel DDR 1114 for multi-channel interleaving allocation. Then, the processor 1110 may continue to use multiple memory block interleaving allocation, and perform interleaving memory allocation within the channel based on each channel where the memory is allocated.
  • the architecture of this embodiment can implement the technical solutions shown in FIG. 3, FIG. 6, FIG. 7 and FIG. 9. Those skilled in the art can easily understand the technical details without creative input, so it will not be repeated.
  • This embodiment can perform interleaving access between channels and between memory blocks.
  • a computing unit accesses, it can enjoy the bandwidth of multiple channel interleaving.
  • each computing unit simultaneously communicates between channels. Accessing the memory of a specific channel not only reduces access to remote channels, but also reduces access conflicts between computing units.
  • FIG. 12 is a structural diagram showing an integrated circuit device 1200 according to an embodiment of the present disclosure.
  • the integrated circuit device 1200 includes a main device 1202, and the main device 1202 may be the main device 1102 in FIG. 11.
  • the integrated circuit device 1200 further includes a universal interconnection interface 1204 and a device 1206, and the device 1206 may be the device 1104 in FIG. 11.
  • the main device 1202 may be one or more types of processors in general and/or special-purpose processors such as a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but Determined according to actual needs.
  • the universal interconnection interface 1204 can be used to transmit data and control commands between the main device 1202 and the device 1206.
  • the main device 1202 may obtain the required input data from the device 1206 via the universal interconnect interface 1204, and write the input data to the on-chip storage device of the main device 1202.
  • the master device 1202 can obtain a control command from the device 1206 via the universal interconnect interface 1204, and write it into the control buffer on the chip of the master device 1202.
  • the universal interconnection interface 1204 may also read the data in the storage module of the main device 1202 and transmit it to the device 1206.
  • the integrated circuit device 1200 may further include a storage device 1208, which may be connected to the main device 1202 and the device 1206 respectively.
  • the storage device 1208 may be used to store data of the main device 1202 and the device 1206, and is particularly suitable for data that cannot be fully stored in the internal storage of the main device 1202 or the device 1206 for the data required for calculation.
  • the integrated circuit device 1200 of the present disclosure can be used as a SOC system on chip for mobile phones, robots, drones, video capture, video capture equipment and other equipment, thereby effectively reducing the core area of the control part, increasing the processing speed and Reduce overall power consumption.
  • the universal interconnection interface 1204 of the integrated circuit device 1200 is connected to certain components of the device. Some components referred to here can be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.
  • the present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1200. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
  • the present disclosure also discloses a board card, which includes the above-mentioned chip packaging structure.
  • a board card which includes the above-mentioned chip packaging structure.
  • FIG. 13 which provides the aforementioned exemplary board card 1300.
  • the board card 1300 may also include other supporting components.
  • the supporting components may include, but are not limited to: a storage device 1304, an interface device 1306, and Control device 1308.
  • the storage device 1304 is connected to the chip 1302 in the chip packaging structure through a bus 1314 for storing data.
  • the storage device 1304 may include multiple sets of memories 1310. Each group of memories 1310 and chip 1302 are connected through a bus 1314.
  • Each group of memories 1310 may be DDR SDRAM ("Double Data Rate SDRAM", double-rate synchronous dynamic random access memory).
  • the storage device 1304 may include 4 sets of memories 1310.
  • Each group of memory 1310 may include multiple DDR4 particles (chips).
  • the chip 1302 may include four 72-bit DDR4 controllers inside. Among the 72-bit DDR4 controllers, 64 bits are used for data transmission and 8 bits are used for ECC verification.
  • the DDR4 controller may be the aforementioned 72-bit DDR4 controller. ⁇ processor 1110.
  • each group of memories 1310 may include a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip 1302 to control the data transmission and data storage of each memory 1310.
  • the chip 1302 and the memory 1310 can be interleaved and allocated in the manner as in the foregoing embodiment.
  • the interface device 1306 is electrically connected to the chip 1302 in the chip packaging structure.
  • the interface device 1306 is used to implement data transmission between the chip 1302 and an external device 1312 (for example, a server or a computer).
  • the interface device 1306 may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip 1302 through a standard PCIE interface to realize data transfer.
  • the interface device 1306 may also be other interfaces.
  • the present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as it can realize the switching function.
  • the calculation result of the chip 1302 is still transmitted back to the external device 1312 by the interface device 1306.
  • the control device 1308 is electrically connected to the chip 1302 to monitor the state of the chip 1302. Specifically, the chip 1302 and the control device 1308 may be electrically connected through an SPI interface.
  • the control device 1308 may include a single-chip microcomputer ("MCU", Micro Controller Unit).
  • the chip 1302 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the chip 1302 can be in different working states such as multiple load and light load.
  • the control device 1308 can realize the regulation and control of the working status of multiple processing chips, multiple processing and/or multiple processing circuits in the chip 1302.
  • the present disclosure also discloses an electronic device or device, which includes the board card 1300 described above.
  • electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, earphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a method for allocating memory including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: inter-channel execution on multiple channels of the multi-channel DDR Interleaved memory allocation; and on each channel where memory is allocated, perform interleaved memory allocation within the channel.
  • Clause A2 the method according to clause A1, wherein performing the memory allocation of inter-channel interleaving includes performing memory allocation of inter-channel interleaving on every two channels of the multi-channel DDR.
  • Clause A3 The method of clause A1, wherein performing the memory allocation for inter-channel interleaving includes performing memory allocation for inter-channel interleaving for the plurality of channels based on page size.
  • Clause A6 The method according to clause A3, wherein performing intra-channel interleaving memory allocation includes interleaving the memory allocated on each channel based on the page size to multiple memory blocks in the channel.
  • interleaved allocation to a plurality of memory blocks in a channel includes cyclically performing interleaved memory allocation on each memory block with an intra-channel granularity smaller than the page size.
  • cyclically performing interleaved memory allocation on each memory block comprises: sequentially allocating the granularities in the channel on the multiple memory blocks in the channel one by one. Memory; and repeatedly performing the foregoing sequential allocation until the memory obtained by the channel through the inter-channel interleaving has been allocated.
  • Clause A9 The method according to clause A6, wherein the channel further includes one or more memory blocks that have participated in memory allocation as a spare memory area, and the method further includes additionally using all the memory blocks according to the memory allocation request.
  • the spare memory area is described for memory allocation.
  • Clause A10 The method according to any one of clauses A1-9, further comprising: receiving multiple service instructions to be allocated on multiple channels of the multi-channel DDR; The granularity is allocated to multiple channels one by one; and in each channel, the allocated business instructions are allocated to multiple memory blocks one by one with the granularity within the channel.
  • Clause A11 The method according to clause A10, wherein the business instruction includes business data, and the method further includes interleaving and allocating the business data in the business instruction to multiple memory blocks one by one at a granularity within a channel.
  • a device for performing data read and write operations including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; multi-channel DDR, which is configured for use In storing data; a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: perform inter-channel interleaving on multiple channels of the multi-channel DDR Memory allocation; and on each channel where memory is allocated, perform interleaving memory allocation within the channel.
  • Clause A13 The device according to clause A12, wherein in a memory allocation operation, the processor is further configured to use a driver to implement interleaving memory allocation between channels, and in a memory allocation operation, the processor further It is configured to use a buffer to realize interleaving memory allocation in the channel.
  • Clause A14 The device according to any one of clauses A12-13, wherein after the memory allocation is completed, the processor is configured to send information related to this memory allocation request to the main device through the transceiver
  • the transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with corresponding memory.
  • Clause A15 The device according to clause A13, wherein in the process of writing data to the main device, the processor is configured to apply for memory from the main device through the transceiver, and upon receiving the data from the main device After the virtual address of the master device, the data stored in the DDR memory is sent to the master device through the transceiver.
  • Clause A16 A computer-readable storage medium on which is stored computer program code for allocating memory. When the computer program code is run by a processor, the computer program code executes any one of clauses A1-11. method.

Abstract

A memory allocation method and device, and a computer readable storage medium. The device comprises a combined processing means, and the combined processing means comprises a universal interconnection interface (1204) and another processing means. A master device (1202) of the device interacts with another processing means to jointly complete a designated computing operation. The combined processing means also comprises a storage means (1208). The storage means (1208) is connected to the master device (1202) and another processing means, separately, and used for data storage of the master device (1202) and another processing means.

Description

一种对内存进行分配的方法、设备及计算机可读存储介质Method, equipment and computer readable storage medium for allocating memory
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年1月7日向中国国家知识产权局提交的、申请号为202010014955.8、发明名称为"一种对内存进行分配的方法、设备及计算机可读存储介质"的中国发明专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires a Chinese invention patent application filed with the State Intellectual Property Office of China on January 7, 2020, with an application number of 202010014955.8 and an invention title of "a method, equipment and computer-readable storage medium for allocating memory" Priority, its entire content is incorporated in this application by reference.
技术领域Technical field
本披露一般地涉及计算机领域。更具体地,本披露涉及用于对内存进行分配的方法、设备及计算机可读存储介质。This disclosure generally relates to the field of computers. More specifically, the present disclosure relates to methods, devices, and computer-readable storage media for allocating memory.
背景技术Background technique
双倍速率同步动态随机存储器(DDR SDRAM)在当今的电脑应用越来越广泛,其配备有多通道内存控制技术,能有效地提高内存总带宽,从而适应高速处理器的数据传输与处理的需要。但有些多通道DDR技术不能在通道间实现交织分配内存;有些无法在通道内实现内存块级的交织;有些只能并联两个通道,访问带宽有限。因此,如何获得一种对内存进行高效分配的技术方案仍是现有技术中需要解决的问题。Double-rate synchronous dynamic random access memory (DDR SDRAM) is more and more widely used in today's computers. It is equipped with multi-channel memory control technology, which can effectively increase the total bandwidth of memory to meet the needs of high-speed processor data transmission and processing. . However, some multi-channel DDR technologies cannot implement interleaving memory allocation between channels; some cannot implement memory block-level interleaving within a channel; some can only connect two channels in parallel and have limited access bandwidth. Therefore, how to obtain a technical solution for efficiently allocating memory is still a problem to be solved in the prior art.
发明内容Summary of the invention
为了至少部分地解决背景技术中提到的技术问题,本披露的方案提供了一种用于对内存进行分配的方法、设备及计算机可读存储介质。In order to at least partially solve the technical problems mentioned in the background art, the solution of the present disclosure provides a method, device and computer-readable storage medium for allocating memory.
在一个方面中,本披露提供一种用于对内存进行分配的方法,包括:接收针对于多通道DDR的内存分配申请;根据所述内存分配申请,执行:在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及在分配内存的每个通道上,执行通道内交织的内存分配。In one aspect, the present disclosure provides a method for allocating memory, including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: The inter-channel interleaving memory allocation is performed on the channel; and on each channel where the memory is allocated, the intra-channel interleaving memory allocation is performed.
在另一个方面中,本披露提供一种用于执行数据读写操作的设备,包括:收发器,其配置用于接收来自于与所述设备形成主从关系的主设备的内存分配申请;多通道DDR,其配置用于存储数据;处理器,其配置用于根据接收到的所述内存分配申请,对所述多通道DDR进行如下的内存分配操作:在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及在分配内存的每个通道上,执行通道内交织的内存分配。In another aspect, the present disclosure provides a device for performing data read and write operations, including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; and The channel DDR is configured to store data; the processor is configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: in multiple channels of the multi-channel DDR Perform inter-channel interleaving memory allocation; and perform inter-channel interleaving memory allocation on each channel where memory is allocated.
在另一个方面中,本披露提供一种计算机可读存储介质,其上存储有用于对内存进行分配的计算机程序代码,当所述计算机程序代码由处理器运行时,可以执行前述的方法。In another aspect, the present disclosure provides a computer-readable storage medium on which computer program code for allocating memory is stored. When the computer program code is run by a processor, the aforementioned method can be executed.
利用本披露的方法、设备及计算机可读存储介质,能在多通道与多内存块间实现交织分配内存,大大提升访问带宽。Using the method, device and computer readable storage medium of the present disclosure, memory can be interleaved and allocated between multiple channels and multiple memory blocks, which greatly improves access bandwidth.
附图说明Description of the drawings
通过参考附图阅读下文的详细描述,本披露示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本披露的若干实施方式,并且相同或对应的标号表示相同或对应的部分,其中:By reading the following detailed description with reference to the accompanying drawings, the above and other objects, features, and advantages of the exemplary embodiments of the present disclosure will become easier to understand. In the drawings, several embodiments of the present disclosure are shown in an exemplary but not restrictive manner, and the same or corresponding reference numerals indicate the same or corresponding parts, in which:
图1是示出以两通道为例说明本披露的多通道交织方案的示意图;FIG. 1 is a schematic diagram illustrating the multi-channel interleaving scheme of the present disclosure by taking two channels as an example;
图2是示出根据本披露实施例同一个通道内的相邻2个内存块的示意图;2 is a schematic diagram showing two adjacent memory blocks in the same channel according to an embodiment of the present disclosure;
图3是示出根据本披露实施例对内存进行分配的方法的流程图;FIG. 3 is a flowchart showing a method for allocating memory according to an embodiment of the present disclosure;
图4是示出根据本披露实施例多通道交织的示意图;Fig. 4 is a schematic diagram showing multi-channel interleaving according to an embodiment of the present disclosure;
图5是示出根据本披露实施例多内存块交织的示意图;FIG. 5 is a schematic diagram showing interleaving of multiple memory blocks according to an embodiment of the present disclosure;
图6是示出根据本披露另一实施例对内存进行分配的方法的流程图;FIG. 6 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;
图7是示出根据本披露另一实施例对内存进行分配的方法的流程图;FIG. 7 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;
图8是示出根据本披露实施例在临近两个通道间进行多通道交织的示意图;FIG. 8 is a schematic diagram showing multi-channel interleaving between two adjacent channels according to an embodiment of the present disclosure;
图9是示出根据本披露另一实施例对内存进行分配的方法的流程图;FIG. 9 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;
图10是示出根据本披露另一实施例多内存块交织的示意图;FIG. 10 is a schematic diagram showing interleaving of multiple memory blocks according to another embodiment of the present disclosure;
图11是示出根据本披露另一实施例对内存进行分配的设备的架构图;FIG. 11 is an architecture diagram showing a device for allocating memory according to another embodiment of the present disclosure;
图12是示出根据本披露实施例的集成电路装置的结构图;以及FIG. 12 is a structural diagram showing an integrated circuit device according to an embodiment of the present disclosure; and
图13是示出根据本披露实施例的板卡的结构图。Fig. 13 is a structural diagram showing a board according to an embodiment of the present disclosure.
具体实施方式Detailed ways
利用本披露的方法、设备及计算机可读存储介质,能使多通道多内存块的记忆体在通道与内存块间都实现交织分配内存,进而提升访问带宽,并增加并行度。Using the method, device and computer-readable storage medium of the present disclosure, the memory of multiple channels and multiple memory blocks can realize interleaving and allocation of memory between channels and memory blocks, thereby increasing access bandwidth and increasing parallelism.
记忆体在现今的计算机设备中是不可或缺的,随着科技的发展,许多记忆体可以支援多通道或是多内存块(bank),例如DDR SDRAM,行业中简称DDR。Memory is indispensable in today's computer equipment. With the development of technology, many memories can support multiple channels or banks, such as DDR SDRAM, referred to as DDR in the industry.
DDR可以支援多通道的存取。DDR在多通道之间的交织方式是将同一块内存分布到不同的通道中去,这样可以增加并联多个DDR的带宽,提升内存访问的性能。如果数据分布在不同通道上的内存块上,内存控制器可以透过这些多通道并行地读取这些数据,以四通道的DDR来说,存取速度几乎是增加四倍。DDR can support multi-channel access. The interleaving method of DDR between multiple channels is to distribute the same block of memory to different channels, which can increase the bandwidth of multiple DDRs in parallel and improve the performance of memory access. If the data is distributed on the memory blocks on different channels, the memory controller can read the data in parallel through these multi-channels. For a four-channel DDR, the access speed is almost quadrupled.
DDR在进行存储时,可以让同一块内存分布到不同的通道中去,以进行所谓的交织(interleaving),使得同一块内存并行地被访问,提高系统的效率。When DDR performs storage, the same block of memory can be distributed to different channels to perform so-called interleaving, so that the same block of memory can be accessed in parallel, which improves the efficiency of the system.
图1以两通道为例,说明本披露的多通道交织方案。如图所示,本披露实施例具有第一通道102与第二通道104,每个通道会有相当多的页面大小(page size),为说明方便,在此实施例中每个通道仅示出32个页面大小106,且每个页面大小是16KB,这样的大小作为通道间交织的粒度来说已经足够了,不会影响到上层正常的跳转指令。在进行多通道交织分配时,会先分配至第一通道102的第一页面大小108,接着分配至第二通道104的第一页面大小110,再分配至第一通道102的第二页面大小112,依序分配存储空间。Figure 1 uses two channels as an example to illustrate the multi-channel interleaving scheme of the present disclosure. As shown in the figure, the embodiment of the present disclosure has a first channel 102 and a second channel 104, and each channel has a considerable page size. For the convenience of description, in this embodiment, each channel is only shown 32 pages have a size of 106, and each page has a size of 16KB. This size is sufficient as the granularity of inter-channel interleaving, and will not affect the normal jump instructions of the upper layer. When performing multi-channel interleaving allocation, it will first be allocated to the first page size 108 of the first channel 102, then allocated to the first page size 110 of the second channel 104, and then allocated to the second page size 112 of the first channel 102 , Allocate storage space in order.
本披露的DDR在实现多内存块交织方案时,是在同一个通道内进行多内存块交织访问,在上层业务存在指令跳转时,可以实现每个计算单元只访问同一通道内的内存块。图2是示出同一个通道内的相邻2个内存块:第一内存块202及第二内存块204,在本披露中,内存块交织访问是在临近两个内存块中进行的。每个内存块均示出有8个位址空间,位址是addr0至addr7。在本披露的实施例中,位址空间小于页面大小,例如在页面大小为16KB的情况下,位址空间可以是1KB,即在同一个通道内进行两内存块交织访问时,每次访问的记忆体单元为一个位址空间。例如,在0x000时,分配至第一内存块202的位址addr0;在0x400时,分配至第二内存块204的位址addr0;在0x800时,分配至第一内存块202的位址addr1;在0xc00时,分配至第二内存块204的位址addr1。依此方式分配,直到所有页面大小均分配完毕。When the DDR of the present disclosure implements the multi-memory block interleaving solution, multi-memory block interleaving access is performed in the same channel. When there is an instruction jump in the upper layer business, each computing unit can only access the memory block in the same channel. Fig. 2 shows two adjacent memory blocks in the same channel: the first memory block 202 and the second memory block 204. In the present disclosure, memory block interleaving access is performed in two adjacent memory blocks. Each memory block shows 8 address spaces, the addresses are addr0 to addr7. In the embodiment of the present disclosure, the address space is smaller than the page size. For example, when the page size is 16KB, the address space can be 1KB, that is, when two memory blocks are interleaved access in the same channel, each access is The memory unit is an address space. For example, at 0x000, it is allocated to the address addr0 of the first memory block 202; at 0x400, it is allocated to the address addr0 of the second memory block 204; at 0x800, it is allocated to the address addr1 of the first memory block 202; At 0xc00, the address addr1 of the second memory block 204 is allocated. Allocate in this way until all page sizes are allocated.
在多个计算单元访问且上层业务存在指令跳转的时候,利用多内存块交织访问可以实现每个计算单元只访问同一通道内的内存块,在这种情况下能够减少对远程通道的访问,亦能够减少各计算单元在不同通道间的访问冲突。When multiple computing units access and there are instruction jumps in the upper-level business, the use of multi-memory block interleaving access can realize that each computing unit only accesses the memory block in the same channel. In this case, it can reduce the access to the remote channel. It is also possible to reduce the access conflicts of various computing units between different channels.
本披露的一个实施例是一种用于对内存进行分配的方法,特别是当主机侧下发命令给加速卡设备时,这命令可以是动态内存分配(memory allocation,malloc)函数。An embodiment of the present disclosure is a method for allocating memory, especially when the host side issues a command to the accelerator card device, the command may be a dynamic memory allocation (memory allocation, malloc) function.
此实施例的内存共有4个通道,分配的方式是在这4个通道间交织进行,其分配方法的流程如图3所示。The memory of this embodiment has 4 channels in total, and the allocation method is interleaved among these 4 channels. The flow of the allocation method is shown in FIG. 3.
在执行步骤302时,接收针对于4通道DDR的内存分配申请。更详细来说,便是收到动态内存分配函数的命令,该命令可以是在加速卡设备中动态分配一部分内存,设备侧会根据指定的大小对齐后,在物理内存中先申请一块物理内存。When step 302 is executed, a memory allocation request for a 4-channel DDR is received. In more detail, it is to receive a command of the dynamic memory allocation function. The command can dynamically allocate a part of the memory in the accelerator card device. The device side will first apply for a piece of physical memory in the physical memory after aligning according to the specified size.
在执行步骤304时,在4通道DDR上执行通道间交织的内存分配。更详细来说,设备侧会基于页面大小对所述4个通道平均分配内存空间。图4是示出此步骤的示意图,如图所示,本实施例的4通道DDR包括:第一通道402、第二通道404、第三通道406及第四通道408,每个通道包括多个位址空间,为说明方便,图中的每个通道仅示出32个位址空间,地址为addr00至addr31(未显示于图中),每个位址空间为8KB。如果页面大小为16KB,则每2个位址空间形成一个页面大小。When step 304 is performed, memory allocation for inter-channel interleaving is performed on the 4-channel DDR. In more detail, the device side will evenly allocate memory space to the 4 channels based on the page size. Figure 4 is a schematic diagram showing this step. As shown in the figure, the 4-channel DDR of this embodiment includes: a first channel 402, a second channel 404, a third channel 406, and a fourth channel 408, each channel includes multiple Address space, for the convenience of illustration, each channel in the figure only shows 32 address spaces, with addresses from addr00 to addr31 (not shown in the figure), and each address space is 8KB. If the page size is 16KB, then every 2 address spaces form a page size.
在执行通道间交织的内存分配时,当前设备是以页面大小的通道间粒度循环地在各个通道上执行通道间交织的内存分配的。例如,在0x000时,分配至第一通道402的第一个页面大小410,其位址为addr00及addr01;在0x400时,分配至第二通道404的第一个页面大小412,其位址为addr00及addr01;在0x800时,分配至第三通道406的第一个页面大小414,其位址为addr00及addr01;在0xc00时,分配至第四通道408的第一个页面大小416,其位址为addr00及addr01;在0x1000时,分配至第一通道402的第二个页面大小418,其位址为addr02及addr03。依此方式进行分配。When performing inter-channel interleaving memory allocation, the current device cyclically performs inter-channel interleaving memory allocation on each channel with a page-sized inter-channel granularity. For example, at 0x000, the address of the first page size 410 allocated to the first channel 402 is addr00 and addr01; at 0x400, the address of the first page size 412 allocated to the second channel 404 is addr00 and addr01; at 0x800, the first page size 414 allocated to the third channel 406, its address is addr00 and addr01; at 0xc00, the first page size 416 allocated to the fourth channel 408, its location The addresses are addr00 and addr01; at 0x1000, the second page size 418 allocated to the first channel 402 has addresses addr02 and addr03. Assign in this way.
需特别说明的是,在页面大小为16KB的前提下,如果所需的空间仅14KB,在分配时为了对齐,仍会以一个完整的页面大小的粒度指定空间。如果所需的空间为20KB,则会以2个页面大小的粒度指定空间。It should be noted that, under the premise that the page size is 16KB, if the required space is only 14KB, the space will still be specified at the granularity of a complete page size for alignment during allocation. If the required space is 20KB, the space will be specified at a granularity of 2 page sizes.
在执行步骤306时,基于分配内存的每个通道上,执行通道内交织的内存分配。在此实施例中,是可以以小于页面大小的通道内粒度循环地在临近两个内存块上执行交织的内存分配的,例如以位址空间为单位进行交织分配。图5是示出以图4的第一通道402为例来进行说明,该通道可以被切割成4个内存块:第一内存块502、第二内存块504、第三内存块506、第四内存块508,假设第一内存块502的前4个页面大小在步骤304中已被分配占用(如灰色区块所示),此步骤的内存块交织分配在第一内存块502及第二内存块504中进行。在0x000时,分配至第一内存块502的位址空间510;在0x400时,分配至第二内存块504的位址空间512;在0x800时,分配至第一内存块502的位址空间514;在0xc00时,分配至第二内存块504的位址空间516,如图中虚线箭头的顺序分配之。依此方式分配,反复执行直至分配完申请的内存。When step 306 is executed, based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed. In this embodiment, interleaving memory allocation can be performed cyclically on two adjacent memory blocks at a channel granularity smaller than the page size, for example, interleaving allocation is performed in units of address space. FIG. 5 shows the first channel 402 of FIG. 4 as an example for description. The channel can be divided into 4 memory blocks: the first memory block 502, the second memory block 504, the third memory block 506, and the fourth memory block. Memory block 508, assuming that the first 4 page sizes of the first memory block 502 have been allocated and occupied in step 304 (as shown in the gray block), the memory blocks in this step are interleaved and allocated to the first memory block 502 and the second memory Executed in block 504. At 0x000, it is allocated to the address space 510 of the first memory block 502; at 0x400, it is allocated to the address space 512 of the second memory block 504; at 0x800, it is allocated to the address space 514 of the first memory block 502 ; At 0xc00, the address space 516 allocated to the second memory block 504 is allocated in the order of the dotted arrow in the figure. Allocate in this way and execute repeatedly until the allocated memory is completely allocated.
本披露另一个实施例的内存同样有4个通道,分配的方式是在这4个通道间交织进行,其分配方法的流程如图6所示。The memory of another embodiment of the present disclosure also has 4 channels, and the allocation method is interleaved among the 4 channels, and the flow of the allocation method is shown in FIG. 6.
在执行步骤602时,接收针对于4通道DDR的内存分配申请。When step 602 is executed, a memory allocation request for a 4-channel DDR is received.
在执行步骤604时,判断涉及此次动态内存分配函数的命令的计算单元的个数是否为1个。When step 604 is executed, it is determined whether the number of calculation units related to the command of the dynamic memory allocation function this time is one.
根据内存分配申请,如果涉及此次动态内存分配函数的命令的计算单元的个数为1,此时采用多通道交织分配较有效率,因此执行步骤606,在4通道DDR上执行通道间交织的内存分配,其操作方式同前一个实施例的步骤304,不再赘述。According to the memory allocation application, if the number of calculation units involved in the command of the dynamic memory allocation function is 1, it is more efficient to use multi-channel interleaving allocation at this time. Therefore, step 606 is executed to perform inter-channel interleaving on a 4-channel DDR. The operation of memory allocation is the same as that of step 304 in the previous embodiment, and will not be repeated here.
根据内存分配申请,如果在步骤604中,判断涉及此次动态内存分配函数的命令的计算单元的个数超过1个,此时同时结合多通道与多内存块交织分配较有效率,因此在执行步骤608时,执行通道间和通道内交织的内存分配,其操作方式与图3的实施例相同,不再赘述。According to the memory allocation application, if in step 604, it is determined that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at the same time, so the execution In step 608, memory allocation for inter-channel and intra-channel interleaving is performed. The operation mode is the same as that of the embodiment in FIG. 3, and will not be described again.
在另一个实施例中,步骤608亦可以仅进行内存块交织访问,例如,计算单元为4个,而通道亦为4个,本披露可以将每个通道分配给一个计算单元,每个计算单元只访问同一通道内的内存块,在这种情况下能够减少对远程通道的访问,亦能够减少各计算单元在不同通道间的访问冲突。In another embodiment, step 608 can also only perform memory block interleaving access. For example, there are 4 calculation units and 4 channels. The present disclosure can allocate each channel to a calculation unit, and each calculation unit Only access to the memory block in the same channel, in this case, can reduce the access to the remote channel, and can also reduce the access conflicts between the different channels of each computing unit.
本披露的实施方案可以针对通道间及内存块间进行交织访问,当只有一个计算单元进行访问的时候,可以享有多个通道交织的带宽,而在多个计算单元进行访问的时候,每个计算单元可以访问特定通道的内存块,和/或享有多个通道交织,不仅提升访问带宽,同时增加并行度。The implementation of this disclosure can perform interleaving access between channels and memory blocks. When there is only one computing unit for access, it can enjoy the bandwidth of multiple channel interleaving. When multiple computing units access, each computing unit The unit can access the memory block of a specific channel, and/or enjoy multiple channel interleaving, which not only improves the access bandwidth, but also increases the degree of parallelism.
以单个图像计算单元(IPU)发送一个集群(cluster)的访问给DDR通道做测试,该集群共有64个字符段,每个字符段的大小为16KB。在不作任何交织的情况下,完成分配耗时55876微秒,频宽为18.76GB/s;单纯只进行多通道间交织的情况下,完成分配耗时71080微秒,频宽为14.756GB/s;而采用本披露多通道多内存块交织的方式,完成分配耗时44057微秒,频宽为23.8GB/s。明显地速度提升许多。A single image computing unit (IPU) is used to send a cluster (cluster) access to the DDR channel for testing. The cluster has a total of 64 character segments, and the size of each character segment is 16KB. Without any interleaving, it took 55876 microseconds to complete the allocation, and the bandwidth was 18.76GB/s; when only interleaving between multiple channels was performed, it took 71080 microseconds to complete the allocation, and the bandwidth was 14.756GB/s ; Using the multi-channel and multi-memory block interleaving method of the present disclosure, it takes 44057 microseconds to complete the allocation, and the bandwidth is 23.8GB/s. Obviously the speed has increased a lot.
本披露的另一个实施例是一种用于对内存进行分配的方法,此实施例的内存共有4个通道,分配的方式是在两两通道间进行,其分配方法的流程如图7所示。Another embodiment of the present disclosure is a method for allocating memory. The memory in this embodiment has 4 channels in total, and the allocation method is between two channels. The flow of the allocation method is shown in Figure 7. .
在执行步骤702时,接收待分配于4通道DDR的多条业务指令,所述业务指令包括业务数据,是需要存入到加速卡设备中的。在此实施例中,多条业务指令包括第一业务数据、第二业务数据、第三业务数据及第四业务数据,每个业务数据的大小为30KB。When step 702 is performed, a plurality of service instructions to be allocated to the 4-channel DDR are received. The service instructions include service data and need to be stored in the accelerator card device. In this embodiment, the multiple service instructions include first service data, second service data, third service data, and fourth service data, and the size of each service data is 30 KB.
在执行步骤704时,接收针对于4通道DDR的内存分配申请。更详细来说,响应前述的业务指令,收到动态内存分配函数的命令,该命令要求将前述4个业务数据存储至加速卡设备的内存。When step 704 is executed, a memory allocation request for a 4-channel DDR is received. In more detail, in response to the foregoing service instruction, a command of the dynamic memory allocation function is received, which requires the foregoing 4 service data to be stored in the memory of the accelerator card device.
在执行步骤706时,将所述多条业务指令以通道间粒度逐个地分配到4个通道上。此实施例的DDR的通道间粒度(即页面大小)为16KB,表示每个业务数据需要2个页面大小的空间,因此将每个业务数据拆分成第一子数据(16KB)及第二子数据(14KB),并以下表方式分配到4个通道上。When step 706 is executed, the multiple service instructions are allocated to the 4 channels one by one with the granularity between channels. The inter-channel granularity (ie page size) of the DDR in this embodiment is 16KB, which means that each service data requires 2 pages of space, so each service data is split into the first sub-data (16KB) and the second sub-data (16KB). Data (14KB), and distributed to 4 channels in the following table.
Figure PCTCN2021070708-appb-000001
Figure PCTCN2021070708-appb-000001
在执行步骤708时,在4通道DDR的2个通道上执行通道间交织的内存分配。更详细来说,此实施例在进行交织时,是以页面大小为单位执行通道间的交织,且分配仅会发生在临近两个通道间。如图8所示,本实施例的DDR包括:第一通道802、第二通道804、第三通道806及第四通道808,每个通道包括多个页面大小。When step 708 is performed, memory allocation for inter-channel interleaving is performed on the 2 channels of the 4-channel DDR. In more detail, when interleaving is performed in this embodiment, interleaving between channels is performed in units of page size, and allocation only occurs between two adjacent channels. As shown in FIG. 8, the DDR in this embodiment includes: a first channel 802, a second channel 804, a third channel 806, and a fourth channel 808, and each channel includes multiple page sizes.
在执行通道间交织的内存分配时,以当前设备可以支持的页面大小的粒度,循环在每两个通道上申请内存,也就是第一通道802与第二通道804进行交织,第三通道806与第四通道808进行交织。When performing inter-channel interleaving memory allocation, at the granularity of the page size that the current device can support, recursively apply for memory on every two channels, that is, the first channel 802 and the second channel 804 are interleaved, and the third channel 806 is interleaved with The fourth channel 808 performs interleaving.
如上表所示,第一业务数据与第二业务数据将在第一通道802与第二通道804间交织分配,第三业务数据与第四业务数据将在第三通道807与第四通道808间交织分配。更详细来说,将第一业务数据的第一子数据分配至第一通道802的第一页面大小810;将第一业务数据的第二子数据分配至第二通道804的第一页面大小812;将第二业务数据的第一子数据分配至第一通道802的第二页面大小814;将第二业务数据的第二子数据分配至第二通道804的第二页面大小816;将第三业务数据的第一子数据分配至第三通道806的第一页面大小818;将第三业务数据的第二子数据分配至第四通道808的第一页面大小820;将第四业务数据的第一子数据分配至第三通道806的第二页面大小822;将第四业务数据的第二子数据分配至第四通道808的第二页面大小824。As shown in the above table, the first service data and the second service data will be interleaved and distributed between the first channel 802 and the second channel 804, and the third service data and the fourth service data will be between the third channel 807 and the fourth channel 808 Interlaced distribution. In more detail, the first sub-data of the first business data is allocated to the first page size 810 of the first channel 802; the second sub-data of the first business data is allocated to the first page size 812 of the second channel 804 ; Allocate the first sub-data of the second service data to the second page size 814 of the first channel 802; allocate the second sub-data of the second service data to the second page size 816 of the second channel 804; The first sub-data of the business data is allocated to the first page size 818 of the third channel 806; the second sub-data of the third business data is allocated to the first page size 820 of the fourth channel 808; the first sub-data of the fourth business data is allocated to the first page size 820 of the fourth channel 808; One sub-data is allocated to the second page size 822 of the third channel 806; the second sub-data of the fourth service data is allocated to the second page size 824 of the fourth channel 808.
执行步骤710时,基于分配内存的每个通道上,执行通道内交织的内存分配。举例来说,在计算单元有4个的情况下,可以将每个通道分配给1个计算单元。When step 710 is executed, based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed. For example, when there are 4 computing units, each channel can be allocated to one computing unit.
在执行步骤712时,在每个通道内,将分配的业务指令以通道内粒度逐个地分配到内存块上。在此实施例中,如果还有其他业务指令尚未分配完毕,例如第五业务数据、第六业务数据等等,可以在此步骤中依图5所示方式进行内存块交织分配,不再赘述。When step 712 is executed, in each channel, the allocated service instructions are allocated to the memory block one by one at the granularity within the channel. In this embodiment, if there are other service instructions that have not yet been allocated, such as fifth service data, sixth service data, etc., memory block interleaving allocation can be performed in this step in the manner shown in FIG. 5, which will not be repeated.
在其他实施例中,可以针对不同数量的计算单元采取不一样的内存交织分配方式。例如在步骤704之后,增加判断涉及此次动态内存分配函数的命令的计算单元的个数是否为1个的步骤。如果涉及此次动态内存分配函数的命令的计算单元的个数为1,此时采用多通道交织分配较有效率,便在4通道DDR上执行两两通道间交织的内存分配。如果判断涉及此次动态内存分配函数的命令的计算单元的个数超过1个,此时结合多通道与多内存块交织分配较有效率,因此执行两两通道间和通道内交织的内存分配。其操作方式与图3和图6的实施例类似,该技术领域人员可以轻易基于这些实施例的描述进行操作,故不再赘述。In other embodiments, different memory interleaving allocation methods may be adopted for different numbers of computing units. For example, after step 704, a step of judging whether the number of calculation units involved in the command of the dynamic memory allocation function this time is one is added. If the number of calculation units involved in the command of this dynamic memory allocation function is 1, it is more efficient to use multi-channel interleaving allocation at this time, and the memory allocation between two channels is performed on the 4-channel DDR. If it is judged that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at this time, so the inter-channel and intra-channel interleaving memory allocation is performed. The operation mode is similar to the embodiment of FIG. 3 and FIG. 6, and those skilled in the art can easily perform operations based on the description of these embodiments, so the details will not be repeated.
本披露的另一个实施例是一种用于对4个通道的内存进行分配的方法,分配的方式是在这4个通道间进行,其分配方法的流程如图9所示。Another embodiment of the present disclosure is a method for allocating memory of 4 channels. The allocation method is performed among the 4 channels. The flow of the allocation method is shown in FIG. 9.
在执行步骤902时,接收针对于4通道DDR的内存分配申请。When step 902 is executed, a memory allocation request for a 4-channel DDR is received.
根据内存分配申请,在执行步骤904时,在4个通道上执行通道间交织的内存分配。与前述实施例相同,在执行通道间交织的内存分配时,以当前设备是以页面大小的通道间粒度循环地在各个通道上执行通道间交织的内存分配。以图10为例,其显示单个通道1002临近的两内存块:第一内存块1004、第二内存块1006。在步骤904中,假设第一内存块1004有部分空间被分配,即灰色部分的占用空间1008。According to the memory allocation application, when step 904 is executed, inter-channel memory allocation is performed on the 4 channels. As in the foregoing embodiment, when performing inter-channel interleaving memory allocation, the current device uses the inter-channel granularity of the page size to cyclically perform inter-channel interleaving memory allocation on each channel. Taking FIG. 10 as an example, it shows two memory blocks adjacent to a single channel 1002: the first memory block 1004 and the second memory block 1006. In step 904, it is assumed that part of the space of the first memory block 1004 is allocated, that is, the occupied space 1008 of the gray part.
根据内存分配申请,在执行步骤906时,在分配内存的每个通道上,执行通道内交织的内存分配。更详细来说,以通道上的未分配空间为基础逐个地分配通道间粒度的内存,且优先以未被使用的内存块进行交织分配。在此实施例中,第二内存块1006为主要内存区,由于第一内存块1004在步骤904中已参与分配,故其仅作为备用内存区。According to the memory allocation request, when step 906 is executed, on each channel where the memory is allocated, interleaved memory allocation within the channel is executed. In more detail, the granular memory between channels is allocated one by one based on the unallocated space on the channel, and the unused memory blocks are preferentially allocated for interleaving. In this embodiment, the second memory block 1006 is the main memory area. Since the first memory block 1004 has participated in the allocation in step 904, it is only used as a spare memory area.
如主要内存区(即第二内存块1006)空间不足以供存储,在执行步骤908时,根据所述内存分配申请,额外地使用备用内存区(第一内存块1004)来进行内存分配,也就是在第一内存块1004中的未占用空间内进行分配。If the main memory area (that is, the second memory block 1006) has insufficient space for storage, when step 908 is executed, according to the memory allocation request, the spare memory area (the first memory block 1004) is additionally used for memory allocation. It is allocated in the unoccupied space in the first memory block 1004.
前述实施例可以针对通道间及内存块间进行交织访问,不仅可以享有多个通道交织的带宽,更能善用各内存块的空间。The foregoing embodiment can perform interleaving access between channels and between memory blocks, not only can enjoy the bandwidth of multiple channel interleaving, but also make good use of the space of each memory block.
本披露另一实施例是一种计算机可读存储介质,其上存储有用于对内存进行分配的计算机程序代码,当所述计算机程序代码由处理器运行时,可以执行前述实施例的方法,例如图3、图6、图7、图9所示的技术方案。Another embodiment of the present disclosure is a computer-readable storage medium on which computer program code for allocating memory is stored. When the computer program code is run by a processor, the method of the foregoing embodiment can be executed, for example, The technical solutions shown in Figure 3, Figure 6, Figure 7, and Figure 9.
图11是示出本披露另一个实施例用于对内存进行分配的系统1100,系统1100包括主设备1102及设备1104,主设备1102可以是一个主机。设备1104可以是加速卡,其包括多个计算单元1106、收发器1108、处理器1110、缓冲器1112及多通道DDR 1114,图中的4个计算单元1106为示例,并非限制仅能有4个计算单元1106,同样地,图中的4个DDR 1114为示例,并非限制仅能有4个DDR 1114。收发器1108配置用于接收来自于与设备1104形成主从关系的主设备1102的内存分配申请;处理器1110配置用于根据接收到内存分配申请,对多通道DDR 1114进行如下的内存分配操作:在多通道DDR1114的多个通道上执行通道间交织的内存分配,以及在分配内存的每个通道上,执行通道内交织的内存分配,其中处理器1110内含系统内存管理单元(System Memory Management Unit,SMMU);缓冲器1112可以是终极缓存(Last Level Cache,LLC),配置用于实现所述通道内交织的内存分配;多通道DDR 1114配置用于存储数据。FIG. 11 shows a system 1100 for allocating memory in another embodiment of the present disclosure. The system 1100 includes a main device 1102 and a device 1104, and the main device 1102 may be a host. The device 1104 can be an accelerator card, which includes multiple computing units 1106, a transceiver 1108, a processor 1110, a buffer 1112, and a multi-channel DDR 1114. The four computing units 1106 in the figure are examples, and it is not limited to only four The calculation unit 1106, similarly, the 4 DDR 1114 in the figure is an example, and it is not limited to only 4 DDR 1114. The transceiver 1108 is configured to receive a memory allocation request from the master device 1102 that forms a master-slave relationship with the device 1104; the processor 1110 is configured to perform the following memory allocation operations on the multi-channel DDR 1114 according to the received memory allocation request: Perform inter-channel interleaving memory allocation on multiple channels of multi-channel DDR1114, and perform inter-channel interleaving memory allocation on each channel where memory is allocated. The processor 1110 contains a system memory management unit (System Memory Management Unit). , SMMU); the buffer 1112 may be an ultimate cache (Last Level Cache, LLC), configured to implement interleaving memory allocation within the channel; and the multi-channel DDR 1114 is configured to store data.
处理器1110自主设备1102处接收动态内存分配函数的命令,在申请内存空间时,目标是设备1104中的一块内存地址。收发器1108接收到内存申请后,通过处理器1110将内存申请转换成物理地址,并得到在DDR上真实的内存地址,根据内存地址进行交织分配给多通道DDR 1114。The processor 1110 receives the command of the dynamic memory allocation function from the autonomous device 1102, and when applying for memory space, the target is a memory address in the device 1104. After the transceiver 1108 receives the memory application, it converts the memory application into a physical address through the processor 1110, and obtains the real memory address on the DDR, which is interleaved and allocated to the multi-channel DDR 1114 according to the memory address.
更详细来说,此实施例是以处理器1110和缓冲器1112实现内存的申请的。处理器1110通过驱动程序实现申请的内存以页面大小的粒度进行多通道DDR 1114间的交织分配,同时通过缓冲器1112在通道内实现多内存块的交织分配。进一步来说,在内存申请过程中,内存地址的分配需要驱动程序中的内存管理模块和系统内存管理单元共同参与,其中内存管理模块负责对申请的大小做对齐,并根据本次申请的通道信息,像是单通道或多通道,对DDR 1114的通道进行物理内存的分配。而系统内存管理单元负责虚拟地址的管理,并通过铺设页表的方式,负责管理申请到的物理地址和虚拟地址的映射。In more detail, this embodiment uses the processor 1110 and the buffer 1112 to implement the memory application. The processor 1110 implements the interleaving allocation of the multi-channel DDR 1114 between the multi-channel DDR 1114 through the driver program, and at the same time implements the interleaving allocation of multiple memory blocks in the channel through the buffer 1112. Furthermore, during the memory application process, the memory address allocation requires the participation of the memory management module in the driver and the system memory management unit. The memory management module is responsible for aligning the size of the application and based on the channel information of this application. , Such as single-channel or multi-channel, physical memory allocation to DDR 1114 channels. The system memory management unit is responsible for the management of virtual addresses, and is responsible for managing the mapping between the requested physical addresses and virtual addresses by laying out the page table.
每一个多通道DDR 1114都包括一个DDR控制器(未绘出),连接缓冲器1112。处理器1110在发送的物理地址给缓冲器1112后,会根据虚拟地址的管理,申请一段相应的虚拟地址,并通过铺设页表实现物理地址到虚拟地址的映射,之后通过收发器1108向主设备1102发送与此次内存分配申请相关的虚拟地址。Each multi-channel DDR 1114 includes a DDR controller (not shown) connected to the buffer 1112. After the processor 1110 sends the physical address to the buffer 1112, it will apply for a corresponding virtual address according to the management of the virtual address, and implement the mapping from the physical address to the virtual address by laying the page table, and then send it to the host device through the transceiver 1108 1102 sends the virtual address related to this memory allocation request.
主设备1102收到虚拟地址后,便可以进行内存访问,主设备1102根据返回的虚拟地址,将数据发送给收发器1108,处理器1110基于虚拟地址转化成物理地址,将数据通过缓冲器1112得到在多通道DDR 1114上真实的内存地址,以进行多通道间及/或多内存块间交织分配。After the master device 1102 receives the virtual address, it can access the memory. The master device 1102 sends the data to the transceiver 1108 according to the returned virtual address. The processor 1110 converts the virtual address into a physical address based on the virtual address, and the data is obtained through the buffer 1112. The real memory address on the multi-channel DDR 1114 is used for interleaving allocation between multiple channels and/or between multiple memory blocks.
如果需要自设备1104向主设备1102写入数据时,处理器1110配置成通过收发器1108向主设备 1102申请内存,在收到来自于主设备1102的虚拟地址后,基于虚拟地址,处理器1110转换成物理地址,将多通道DDR 1114内存储的数据取出,通过收发器1108发送到主设备1102。If it is necessary to write data from the device 1104 to the main device 1102, the processor 1110 is configured to apply for memory from the main device 1102 through the transceiver 1108. After receiving the virtual address from the main device 1102, the processor 1110 It is converted into a physical address, and the data stored in the multi-channel DDR 1114 is taken out, and sent to the master device 1102 through the transceiver 1108.
在主设备1102向设备1104写入数据的情况下,以在4个通道的DDR 1114间交织分配为例,收发器1108接收针对于4通道DDR的内存分配申请,处理器1110基于计算单元1106的状态,如果采用多通道交织分配较有效率,例如只有一个计算单元1106,处理器1110选择在4通道DDR上执行通道间交织的内存分配,并将虚拟地址转化成物理地址,将数据通过缓冲器1112得到在多通道DDR 1114上真实的内存地址,以进行多通道交织分配。接着处理器1110可以继续采用多内存块交织分配,基于分配内存的每个通道上,执行通道内交织的内存分配。In the case where the master device 1102 writes data to the device 1104, taking the interleaving allocation among 4 channels of DDR 1114 as an example, the transceiver 1108 receives the memory allocation application for 4 channels of DDR, and the processor 1110 is based on the calculation unit 1106 State, if the multi-channel interleaving allocation is more efficient, for example, there is only one computing unit 1106, and the processor 1110 chooses to perform inter-channel interleaving memory allocation on a 4-channel DDR, and converts the virtual address into a physical address, and passes the data through the buffer 1112 obtains the real memory address on the multi-channel DDR 1114 for multi-channel interleaving allocation. Then, the processor 1110 may continue to use multiple memory block interleaving allocation, and perform interleaving memory allocation within the channel based on each channel where the memory is allocated.
此实施例的架构可以执行如图3、图6、图7及图9所示的技术方案,该技术领域人士在不需要创造性投入的前提下能轻易理解技术细节,故不赘述。The architecture of this embodiment can implement the technical solutions shown in FIG. 3, FIG. 6, FIG. 7 and FIG. 9. Those skilled in the art can easily understand the technical details without creative input, so it will not be repeated.
此实施例可以针对通道间及内存块间进行交织访问,在一个计算单元访问的时候,可以享有多个通道交织的带宽,在多个计算单元访问的时候,每个计算单元同时在通道间和特定通道的内存器进行访问,不仅减少对远程通道的访问,亦可减少计算单元间的访问冲突。This embodiment can perform interleaving access between channels and between memory blocks. When a computing unit accesses, it can enjoy the bandwidth of multiple channel interleaving. When multiple computing units access, each computing unit simultaneously communicates between channels. Accessing the memory of a specific channel not only reduces access to remote channels, but also reduces access conflicts between computing units.
图12是示出根据本披露实施例的集成电路装置1200的结构图。如图所示,集成电路装置1200包括主设备1202,主设备1202可以是图11的主设备1102。另外,集成电路装置1200还包括通用互联接口1204和设备1206,设备1206可以是图11的设备1104。FIG. 12 is a structural diagram showing an integrated circuit device 1200 according to an embodiment of the present disclosure. As shown in the figure, the integrated circuit device 1200 includes a main device 1202, and the main device 1202 may be the main device 1102 in FIG. 11. In addition, the integrated circuit device 1200 further includes a universal interconnection interface 1204 and a device 1206, and the device 1206 may be the device 1104 in FIG. 11.
在此实施例中,主设备1202可以是中央处理器、图形处理器、人工智能处理器等通用和/或专用处理器中的一种或多种类型的处理器,其数目不做限制而是依实际需要来确定。In this embodiment, the main device 1202 may be one or more types of processors in general and/or special-purpose processors such as a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but Determined according to actual needs.
根据此实施例的技术方案,通用互联接口1204可以用于在主设备1202与设备1206间传输数据和控制指令。例如,主设备1202可以经由通用互联接口1204从设备1206中获取所需的输入数据,写入主设备1202片上的存储装置。进一步,主设备1202可以经由通用互联接口1204从设备1206中获取控制指令,写入主设备1202片上的控制缓存。替代地或可选地,通用互联接口1204也可以读取主设备1202的存储模块中的数据并传输给设备1206。According to the technical solution of this embodiment, the universal interconnection interface 1204 can be used to transmit data and control commands between the main device 1202 and the device 1206. For example, the main device 1202 may obtain the required input data from the device 1206 via the universal interconnect interface 1204, and write the input data to the on-chip storage device of the main device 1202. Further, the master device 1202 can obtain a control command from the device 1206 via the universal interconnect interface 1204, and write it into the control buffer on the chip of the master device 1202. Alternatively or alternatively, the universal interconnection interface 1204 may also read the data in the storage module of the main device 1202 and transmit it to the device 1206.
可选地,集成电路装置1200还可以包括存储装置1208,其可以分别与主设备1202和设备1206连接。在一个或多个实施例中,存储装置1208可以用于保存主设备1202和设备1206的数据,尤其适用于所需要运算的数据在主设备1202或设备1206的内部存储中无法全部保存的数据。Optionally, the integrated circuit device 1200 may further include a storage device 1208, which may be connected to the main device 1202 and the device 1206 respectively. In one or more embodiments, the storage device 1208 may be used to store data of the main device 1202 and the device 1206, and is particularly suitable for data that cannot be fully stored in the internal storage of the main device 1202 or the device 1206 for the data required for calculation.
根据应用场景的不同,本披露的集成电路装置1200可以作为手机、机器人、无人机、视频采集、视频采集设备等设备的SOC片上系统,从而有效地降低控制部分的核心面积,提高处理速度并降低整体的功耗。在此情况下,集成电路装置1200的通用互联接口1204与设备的某些部件相连接。此处所指的某些部件可以例如是摄像头、显示器、鼠标、键盘、网卡或wifi接口。According to different application scenarios, the integrated circuit device 1200 of the present disclosure can be used as a SOC system on chip for mobile phones, robots, drones, video capture, video capture equipment and other equipment, thereby effectively reducing the core area of the control part, increasing the processing speed and Reduce overall power consumption. In this case, the universal interconnection interface 1204 of the integrated circuit device 1200 is connected to certain components of the device. Some components referred to here can be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.
在一些实施例里,本披露还公开了一种芯片或集成电路芯片,其包括了集成电路装置1200。在另一些实施例里,本披露还公开了一种芯片封装结构,其包括了上述芯片。In some embodiments, the present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1200. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
在一些实施例里,本披露还公开了一种板卡,其包括了上述芯片封装结构。参阅图13,其提供了前述的示例性板卡1300,板卡1300除了包括上述芯片1302以外,还可以包括其他的配套部件,该配套部件可以包括但不限于:存储器件1304、接口装置1306和控制器件1308。In some embodiments, the present disclosure also discloses a board card, which includes the above-mentioned chip packaging structure. Refer to FIG. 13, which provides the aforementioned exemplary board card 1300. In addition to the chip 1302 described above, the board card 1300 may also include other supporting components. The supporting components may include, but are not limited to: a storage device 1304, an interface device 1306, and Control device 1308.
存储器件1304与芯片封装结构内的芯片1302通过总线1314连接,用于存储数据。存储器件1304可以包括多组存储器1310。每一组存储器1310与芯片1302通过总线1314连接。每一组存储器1310可以是DDR SDRAM(“Double Data Rate SDRAM”,双倍速率同步动态随机存储器)。The storage device 1304 is connected to the chip 1302 in the chip packaging structure through a bus 1314 for storing data. The storage device 1304 may include multiple sets of memories 1310. Each group of memories 1310 and chip 1302 are connected through a bus 1314. Each group of memories 1310 may be DDR SDRAM ("Double Data Rate SDRAM", double-rate synchronous dynamic random access memory).
不同于图13所示,在一个实施例中,存储器件1304可以包括4组存储器1310。每一组存储器1310可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片1302内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中的64比特用于传输数据,8比特用于ECC校验,DDR4控制器可以是前述的处理器1110。Different from that shown in FIG. 13, in one embodiment, the storage device 1304 may include 4 sets of memories 1310. Each group of memory 1310 may include multiple DDR4 particles (chips). In an embodiment, the chip 1302 may include four 72-bit DDR4 controllers inside. Among the 72-bit DDR4 controllers, 64 bits are used for data transmission and 8 bits are used for ECC verification. The DDR4 controller may be the aforementioned 72-bit DDR4 controller.的processor 1110.
在一个实施例中,每一组存储器1310可以包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在芯片1302中设置控制DDR的控制器,用于对每个存储器1310的数据传输与数据存储的控制。芯片1302与存储器1310间可以采用如前述实施例的方式进行交织分配。In one embodiment, each group of memories 1310 may include a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip 1302 to control the data transmission and data storage of each memory 1310. The chip 1302 and the memory 1310 can be interleaved and allocated in the manner as in the foregoing embodiment.
接口装置1306与所述芯片封装结构内的芯片1302电连接。接口装置1306用于实现芯片1302与外部设备1312(例如服务器或计算机)之间的数据传输。在一个实施例中,接口装置1306可以为标准PCIE接口。例如,待处理的数据由服务器通过标准PCIE接口传递至芯片1302,实现数据转移。在另一个实施例中,接口装置1306还可以是其他的接口,本披露并不限制上述其他的接口的具体表现形式,能够实现转接功能即可。另外,芯片1302的计算结果仍由接口装置1306传送回外部设备1312。The interface device 1306 is electrically connected to the chip 1302 in the chip packaging structure. The interface device 1306 is used to implement data transmission between the chip 1302 and an external device 1312 (for example, a server or a computer). In one embodiment, the interface device 1306 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip 1302 through a standard PCIE interface to realize data transfer. In another embodiment, the interface device 1306 may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as it can realize the switching function. In addition, the calculation result of the chip 1302 is still transmitted back to the external device 1312 by the interface device 1306.
控制器件1308与芯片1302电连接,以便对芯片1302的状态进行监控。具体地,芯片1302与控制器件1308可以通过SPI接口电连接。控制器件1308可以包括单片机(“MCU”,Micro Controller Unit)。芯片1302可以包括多个处理芯片、多个处理核或多个处理电路,并且可以带动多个负载。由此,芯片1302可以处于多负载和轻负载等不同的工作状态。通过控制器件1308可以实现对芯片1302中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。The control device 1308 is electrically connected to the chip 1302 to monitor the state of the chip 1302. Specifically, the chip 1302 and the control device 1308 may be electrically connected through an SPI interface. The control device 1308 may include a single-chip microcomputer ("MCU", Micro Controller Unit). The chip 1302 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the chip 1302 can be in different working states such as multiple load and light load. The control device 1308 can realize the regulation and control of the working status of multiple processing chips, multiple processing and/or multiple processing circuits in the chip 1302.
在一些实施例里,本披露还公开了一种电子设备或装置,其包括了上述板卡1300。根据不同的应用场景,电子设备或装置可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the present disclosure also discloses an electronic device or device, which includes the board card 1300 described above. According to different application scenarios, electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, earphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
条款A1、一种用于对内存进行分配的方法,包括:接收针对于多通道DDR的内存分配申请;根据所述内存分配申请,执行:在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及在分配内存的每个通道上,执行通道内交织的内存分配。Clause A1. A method for allocating memory, including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: inter-channel execution on multiple channels of the multi-channel DDR Interleaved memory allocation; and on each channel where memory is allocated, perform interleaved memory allocation within the channel.
条款A2、根据条款A1所述的方法,其中执行所述通道间交织的内存分配包括在所述多通道DDR的每两个通道上执行通道间交织的内存分配。Clause A2, the method according to clause A1, wherein performing the memory allocation of inter-channel interleaving includes performing memory allocation of inter-channel interleaving on every two channels of the multi-channel DDR.
条款A3、根据条款A1所述的方法,其中执行所述通道间交织的内存分配包括基于页面大小对所述多个通道执行通道间交织的内存分配。Clause A3. The method of clause A1, wherein performing the memory allocation for inter-channel interleaving includes performing memory allocation for inter-channel interleaving for the plurality of channels based on page size.
条款A4、根据条款A3所述的方法,其中基于页面大小对多个通道执行通道间交织的内存分配包括以页面大小的通道间粒度循环地在各个通道上执行通道间交织的内存分配。Clause A4. The method according to clause A3, wherein performing inter-channel interleaving memory allocation for multiple channels based on page size includes cyclically performing inter-channel interleaving memory allocation on each channel at a page-size inter-channel granularity.
条款A5、根据条款A4所述的方法,其中以所述通道间粒度循环地在各个通道上执行通道间交织的内存分配包括:按顺序在所述多个通道上逐个地分配通道间粒度的内存;以及反复执行上述按顺序的分配直至分配完申请的内存。Clause A5. The method according to clause A4, wherein cyclically performing inter-channel interleaving memory allocation on each channel at the inter-channel granularity includes: sequentially allocating inter-channel granular memory on the multiple channels one by one ; And repeat the above sequential allocation until the allocated memory is completely allocated.
条款A6、根据条款A3所述的方法,其中执行通道内交织的内存分配包括将基于所述页面大小分配在各个通道上的内存交织地分配到通道内的多个内存块上。Clause A6. The method according to clause A3, wherein performing intra-channel interleaving memory allocation includes interleaving the memory allocated on each channel based on the page size to multiple memory blocks in the channel.
条款A7、根据条款A6所述的方法,其中交织地分配到通道内的多个内存块上包括以小于所述页面大小的通道内粒度循环地在各个内存块上执行交织的内存分配。Clause A7. The method according to clause A6, wherein interleaved allocation to a plurality of memory blocks in a channel includes cyclically performing interleaved memory allocation on each memory block with an intra-channel granularity smaller than the page size.
条款A8、根据条款A7所述的方法,其中循环地在各个内存块上执行交织的内存分配包括:按顺序在所述通道内的所述多个内存块上逐个地分配所述通道内粒度的内存;以及反复执行上述按顺序的分配直至分配完所述通道通过所述通道间交织获得的内存。Clause A8. The method according to clause A7, wherein cyclically performing interleaved memory allocation on each memory block comprises: sequentially allocating the granularities in the channel on the multiple memory blocks in the channel one by one. Memory; and repeatedly performing the foregoing sequential allocation until the memory obtained by the channel through the inter-channel interleaving has been allocated.
条款A9、根据条款A6所述的方法,其中所述通道内还包括已参与内存分配的一个或多个内存块作为备用内存区,所述方法进一步包括根据所述内存分配申请,额外地使用所述备用内存区来进行内存分配。Clause A9. The method according to clause A6, wherein the channel further includes one or more memory blocks that have participated in memory allocation as a spare memory area, and the method further includes additionally using all the memory blocks according to the memory allocation request. The spare memory area is described for memory allocation.
条款A10、根据条款A1-9的任意一项所述的方法,进一步包括:接收待分配于多通道DDR的多个通道上的多条业务指令;将所述多条业务指令以所述通道间粒度逐个地分配到多个通道上;以及在每个通道内,将分配的业务指令以通道内粒度逐个地分配到多个内存块上。Clause A10. The method according to any one of clauses A1-9, further comprising: receiving multiple service instructions to be allocated on multiple channels of the multi-channel DDR; The granularity is allocated to multiple channels one by one; and in each channel, the allocated business instructions are allocated to multiple memory blocks one by one with the granularity within the channel.
条款A11、根据条款A10所述的方法,其中所述业务指令包括业务数据,所述方法进一步包括将所述业务指令中的业务数据以通道内粒度逐个地交织分配到多个内存块上。Clause A11. The method according to clause A10, wherein the business instruction includes business data, and the method further includes interleaving and allocating the business data in the business instruction to multiple memory blocks one by one at a granularity within a channel.
条款A12、一种用于执行数据读写操作的设备,包括:收发器,其配置用于接收来自于与所述设备形成主从关系的主设备的内存分配申请;多通道DDR,其配置用于存储数据;处理器,其配置用于 根据接收到的所述内存分配申请,对所述多通道DDR进行如下的内存分配操作:在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及在分配内存的每个通道上,执行通道内交织的内存分配。Clause A12. A device for performing data read and write operations, including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; multi-channel DDR, which is configured for use In storing data; a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: perform inter-channel interleaving on multiple channels of the multi-channel DDR Memory allocation; and on each channel where memory is allocated, perform interleaving memory allocation within the channel.
条款A13、根据条款A12所述的设备,其中在内存分配操作中,所述处理器还配置成利用驱动程序来实现所述通道间交织的内存分配,在内存分配操作中,所述处理器还配置成利用缓冲器来实现所述通道内交织的内存分配。Clause A13. The device according to clause A12, wherein in a memory allocation operation, the processor is further configured to use a driver to implement interleaving memory allocation between channels, and in a memory allocation operation, the processor further It is configured to use a buffer to realize interleaving memory allocation in the channel.
条款A14、根据条款A12-13的任意一项所述的设备,其中当完成所述内存分配后,所述处理器配置成通过所述收发器向所述主设备发送与此次内存分配申请相关的虚拟地址,所述收发器配置成接收所述主设备基于所述虚拟地址发送到所述设备的数据,并且处理器配置成将所述数据存储于分配有相应内存的多通道DDR上。Clause A14. The device according to any one of clauses A12-13, wherein after the memory allocation is completed, the processor is configured to send information related to this memory allocation request to the main device through the transceiver The transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with corresponding memory.
条款A15、根据条款A13所述的设备,其中在向主设备写入数据的过程中,所述处理器配置成通过所述收发器向所述主设备申请内存,并且在收到来自于所述主设备的虚拟地址后,将所述DDR存储器内存储的数据通过所述收发器发送到所述主设备。Clause A15. The device according to clause A13, wherein in the process of writing data to the main device, the processor is configured to apply for memory from the main device through the transceiver, and upon receiving the data from the main device After the virtual address of the master device, the data stored in the DDR memory is sent to the master device through the transceiver.
条款A16、一种计算机可读存储介质,其上存储有用于对内存进行分配的计算机程序代码,当所述计算机程序代码由处理器运行时,执行根据条款A1-11的任意一项所述的方法。Clause A16. A computer-readable storage medium on which is stored computer program code for allocating memory. When the computer program code is run by a processor, the computer program code executes any one of clauses A1-11. method.
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本披露的方法及其核心思想。同时,本领域技术人员依据本披露的思想,基于本披露的具体实施方式及应用范围上做出的改变或变形之处,都属于本披露保护的范围。综上所述,本说明书内容不应理解为对本披露的限制。The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation manners of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of this disclosure, the specific implementation and application scope of this disclosure, are all within the protection scope of this disclosure. In summary, the content of this specification should not be construed as a limitation on this disclosure.

Claims (16)

  1. 一种用于对内存进行分配的方法,包括:A method for allocating memory, including:
    接收针对于多通道DDR的内存分配申请;Receive memory allocation applications for multi-channel DDR;
    根据所述内存分配申请,执行:According to the memory allocation request, execute:
    在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及Performing inter-channel interleaving memory allocation on multiple channels of the multi-channel DDR; and
    在分配内存的每个通道上,执行通道内交织的内存分配。On each channel where memory is allocated, interleaved memory allocation within the channel is performed.
  2. 根据权利要求1所述的方法,其中执行所述通道间交织的内存分配包括在所述多通道DDR的每两个通道上执行通道间交织的内存分配。The method according to claim 1, wherein performing the inter-channel interleaving memory allocation comprises performing inter-channel interleaving memory allocation on every two channels of the multi-channel DDR.
  3. 根据权利要求1所述的方法,其中执行所述通道间交织的内存分配包括基于页面大小对所述多个通道执行通道间交织的内存分配。The method according to claim 1, wherein performing the memory allocation of inter-channel interleaving comprises performing memory allocation of inter-channel interleaving for the plurality of channels based on a page size.
  4. 根据权利要求3所述的方法,其中基于页面大小对多个通道执行通道间交织的内存分配包括以页面大小的通道间粒度循环地在各个通道上执行通道间交织的内存分配。3. The method according to claim 3, wherein performing inter-channel interleaving memory allocation for a plurality of channels based on page size comprises cyclically performing inter-channel interleaving memory allocation on each channel with an inter-channel granularity of page size.
  5. 根据权利要求4所述的方法,其中以所述通道间粒度循环地在各个通道上执行通道间交织的内存分配包括:The method according to claim 4, wherein the cyclically performing inter-channel interleaving memory allocation on each channel at the inter-channel granularity comprises:
    按顺序在所述多个通道上逐个地分配通道间粒度的内存;以及Allocating the memory of the granularity between the channels one by one on the multiple channels in sequence; and
    反复执行上述按顺序的分配直至分配完申请的内存。Repeat the above sequential allocation until the allocated memory is completely allocated.
  6. 根据权利要求3所述的方法,其中执行通道内交织的内存分配包括将基于所述页面大小分配在各个通道上的内存交织地分配到通道内的多个内存块上。The method according to claim 3, wherein performing intra-channel interleaving memory allocation comprises interleaving the memory allocated on each channel based on the page size to a plurality of memory blocks in the channel.
  7. 根据权利要求6所述的方法,其中交织地分配到通道内的多个内存块上包括以小于所述页面大小的通道内粒度循环地在各个内存块上执行交织的内存分配。The method according to claim 6, wherein the interleaved allocation to a plurality of memory blocks in a channel comprises cyclically performing interleaved memory allocation on each memory block with an intra-channel granularity smaller than the page size.
  8. 根据权利要求7所述的方法,其中循环地在各个内存块上执行交织的内存分配包括:8. The method according to claim 7, wherein performing interleaved memory allocation on each memory block cyclically comprises:
    按顺序在所述通道内的所述多个内存块上逐个地分配所述通道内粒度的内存;以及Allocating the granular memory in the channel one by one on the multiple memory blocks in the channel in sequence; and
    反复执行上述按顺序的分配直至分配完所述通道通过所述通道间交织获得的内存。The foregoing sequential allocation is performed repeatedly until the memory obtained by the channel through inter-channel interleaving has been allocated.
  9. 根据权利要求6所述的方法,其中所述通道内还包括已参与内存分配的一个或多个内存块作为备用内存区,所述方法进一步包括根据所述内存分配申请,额外地使用所述备用内存区来进行内存分配。The method according to claim 6, wherein the channel further includes one or more memory blocks that have participated in memory allocation as a spare memory area, and the method further comprises additionally using the spare memory area according to the memory allocation request Memory area for memory allocation.
  10. 根据权利要求1-9的任意一项所述的方法,进一步包括:The method according to any one of claims 1-9, further comprising:
    接收待分配于多通道DDR的多个通道上的多条业务指令;Receive multiple business instructions to be allocated on multiple channels of the multi-channel DDR;
    将所述多条业务指令以所述通道间粒度逐个地分配到多个通道上;以及Allocating the multiple service instructions to multiple channels one by one with the granularity between channels; and
    在每个通道内,将分配的业务指令以通道内粒度逐个地分配到多个内存块上。In each channel, the allocated business instructions are allocated to multiple memory blocks one by one at the granularity within the channel.
  11. 根据权利要求10所述的方法,其中所述业务指令包括业务数据,所述方法进一步包括将所述业务指令中的业务数据以通道内粒度逐个地交织分配到多个内存块上。The method according to claim 10, wherein the service command includes service data, and the method further comprises interleaving and distributing the service data in the service command to a plurality of memory blocks one by one at an intra-channel granularity.
  12. 一种用于执行数据读写操作的设备,包括:A device used to perform data read and write operations, including:
    收发器,其配置用于接收来自于与所述设备形成主从关系的主设备的内存分配申请;A transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device;
    多通道DDR,其配置用于存储数据;Multi-channel DDR, which is configured to store data;
    处理器,其配置用于根据接收到的所述内存分配申请,对所述多通道DDR进行如下的内存分配操作:The processor is configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request:
    在所述多通道DDR的多个通道上执行通道间交织的内存分配;以及Performing inter-channel interleaving memory allocation on multiple channels of the multi-channel DDR; and
    在分配内存的每个通道上,执行通道内交织的内存分配。On each channel where memory is allocated, interleaved memory allocation within the channel is performed.
  13. 根据权利要求12所述的设备,其中在内存分配操作中,所述处理器还配置成利用驱动程序来实现所述通道间交织的内存分配,在内存分配操作中,所述处理器还配置成利用缓冲器来实现所述通道内交织的内存分配。The device according to claim 12, wherein in a memory allocation operation, the processor is further configured to use a driver to implement inter-channel memory allocation, and in a memory allocation operation, the processor is further configured to The buffer is used to realize interleaving memory allocation in the channel.
  14. 根据权利要求12-13的任意一项所述的设备,其中当完成所述内存分配后,所述处理器配置成通过所述收发器向所述主设备发送与此次内存分配申请相关的虚拟地址,所述收发器配置成接收所述主设备基于所述虚拟地址发送到所述设备的数据,并且处理器配置成将所述数据存储于分配有相应内存的多通道DDR上。The device according to any one of claims 12-13, wherein after the memory allocation is completed, the processor is configured to send a virtual memory related to the memory allocation request to the main device through the transceiver. Address, the transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with a corresponding memory.
  15. 根据权利要求13所述的设备,其中在向主设备写入数据的过程中,所述处理器配置成通过所述收发器向所述主设备申请内存,并且在收到来自于所述主设备的虚拟地址后,将所述DDR存储器内存储的数据通过所述收发器发送到所述主设备。The device according to claim 13, wherein in the process of writing data to the host device, the processor is configured to apply for memory from the host device through the transceiver, and upon receiving data from the host device After the virtual address is set, the data stored in the DDR memory is sent to the host device through the transceiver.
  16. 一种计算机可读存储介质,其上存储有用于对内存进行分配的计算机程序代码,当所述计算机程序代码由处理器运行时,执行根据权利要求1-11的任意一项所述的方法。A computer-readable storage medium having computer program code for allocating memory stored thereon, and when the computer program code is run by a processor, the method according to any one of claims 1-11 is executed.
PCT/CN2021/070708 2020-01-07 2021-01-07 Memory allocation method and device, and computer readable storage medium WO2021139733A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010014955.8A CN113157602B (en) 2020-01-07 2020-01-07 Method, equipment and computer readable storage medium for distributing memory
CN202010014955.8 2020-01-07

Publications (1)

Publication Number Publication Date
WO2021139733A1 true WO2021139733A1 (en) 2021-07-15

Family

ID=76787755

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070708 WO2021139733A1 (en) 2020-01-07 2021-01-07 Memory allocation method and device, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN113157602B (en)
WO (1) WO2021139733A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434821B (en) * 2023-03-14 2024-01-16 深圳市晶存科技有限公司 System and method for testing LPDDR4 particles

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105452986A (en) * 2013-08-08 2016-03-30 高通股份有限公司 System and method for memory channel interleaving with selective power or performance optimization
US20170108911A1 (en) * 2015-10-16 2017-04-20 Qualcomm Incorporated System and method for page-by-page memory channel interleaving
CN108845958A (en) * 2018-06-19 2018-11-20 中国科学院软件研究所 A kind of mapping of interleaver and dynamic EMS memory management system and method
US20190251034A1 (en) * 2019-04-26 2019-08-15 Intel Corporation Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105452986A (en) * 2013-08-08 2016-03-30 高通股份有限公司 System and method for memory channel interleaving with selective power or performance optimization
US20170108911A1 (en) * 2015-10-16 2017-04-20 Qualcomm Incorporated System and method for page-by-page memory channel interleaving
CN108845958A (en) * 2018-06-19 2018-11-20 中国科学院软件研究所 A kind of mapping of interleaver and dynamic EMS memory management system and method
US20190251034A1 (en) * 2019-04-26 2019-08-15 Intel Corporation Architectural enhancements for computing systems having artificial intelligence logic disposed locally to memory

Also Published As

Publication number Publication date
CN113157602A (en) 2021-07-23
CN113157602B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
US11714763B2 (en) Configuration interface to offload capabilities to a network interface
US10296217B2 (en) Techniques to configure a solid state drive to operate in a storage mode or a memory mode
JP6431536B2 (en) Final level cache system and corresponding method
US8996781B2 (en) Integrated storage/processing devices, systems and methods for performing big data analytics
US9135190B1 (en) Multi-profile memory controller for computing devices
US20140068125A1 (en) Memory throughput improvement using address interleaving
KR102365312B1 (en) Storage controller, computational storage device, and operation method of computational storage device
CN111258935B (en) Data transmission device and method
US20100077193A1 (en) Method and apparatus for assigning a memory to multi-processing unit
US7836221B2 (en) Direct memory access system and method
CN108256643A (en) A kind of neural network computing device and method based on HMC
KR20230094964A (en) Interleaving of heterogeneous memory targets
US11029847B2 (en) Method and system for shared direct access storage
US11157191B2 (en) Intra-device notational data movement system
CN114296638B (en) Storage and calculation integrated solid state disk controller and related device and method
EP2548129A1 (en) Masked register write method and apparatus
WO2021139733A1 (en) Memory allocation method and device, and computer readable storage medium
CN110737618B (en) Method, device and storage medium for embedded processor to carry out rapid data communication
CN115883022B (en) DMA transmission control method, apparatus, electronic device and readable storage medium
JP7247405B2 (en) Storage controller, computational storage device and method of operation of computational storage device
CN106502923B (en) Storage accesses ranks two-stage switched circuit in cluster in array processor
CN111258769A (en) Data transmission device and method
US20230153157A1 (en) Inter-node communication method and device based on multiple processing nodes
US20120159024A1 (en) Semiconductor apparatus
CN114238156A (en) Processing system and method of operating a processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738020

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738020

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21738020

Country of ref document: EP

Kind code of ref document: A1