WO2021139733A1

WO2021139733A1 - Memory allocation method and device, and computer readable storage medium

Info

Publication number: WO2021139733A1
Application number: PCT/CN2021/070708
Authority: WO
Inventors: 李健; 张晓铮
Original assignee: 中科寒武纪科技股份有限公司
Priority date: 2020-01-07
Filing date: 2021-01-07
Publication date: 2021-07-15
Also published as: CN113157602A; CN113157602B

Abstract

A memory allocation method and device, and a computer readable storage medium. The device comprises a combined processing means, and the combined processing means comprises a universal interconnection interface (1204) and another processing means. A master device (1202) of the device interacts with another processing means to jointly complete a designated computing operation. The combined processing means also comprises a storage means (1208). The storage means (1208) is connected to the master device (1202) and another processing means, separately, and used for data storage of the master device (1202) and another processing means.

Description

Method, equipment and computer readable storage medium for allocating memory

Cross-references to related applications

This application requires a Chinese invention patent application filed with the State Intellectual Property Office of China on January 7, 2020, with an application number of 202010014955.8 and an invention title of "a method, equipment and computer-readable storage medium for allocating memory" Priority, its entire content is incorporated in this application by reference.

Technical field

This disclosure generally relates to the field of computers. More specifically, the present disclosure relates to methods, devices, and computer-readable storage media for allocating memory.

Background technique

Double-rate synchronous dynamic random access memory (DDR SDRAM) is more and more widely used in today's computers. It is equipped with multi-channel memory control technology, which can effectively increase the total bandwidth of memory to meet the needs of high-speed processor data transmission and processing. . However, some multi-channel DDR technologies cannot implement interleaving memory allocation between channels; some cannot implement memory block-level interleaving within a channel; some can only connect two channels in parallel and have limited access bandwidth. Therefore, how to obtain a technical solution for efficiently allocating memory is still a problem to be solved in the prior art.

Summary of the invention

In order to at least partially solve the technical problems mentioned in the background art, the solution of the present disclosure provides a method, device and computer-readable storage medium for allocating memory.

In one aspect, the present disclosure provides a method for allocating memory, including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: The inter-channel interleaving memory allocation is performed on the channel; and on each channel where the memory is allocated, the intra-channel interleaving memory allocation is performed.

In another aspect, the present disclosure provides a device for performing data read and write operations, including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; and The channel DDR is configured to store data; the processor is configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: in multiple channels of the multi-channel DDR Perform inter-channel interleaving memory allocation; and perform inter-channel interleaving memory allocation on each channel where memory is allocated.

In another aspect, the present disclosure provides a computer-readable storage medium on which computer program code for allocating memory is stored. When the computer program code is run by a processor, the aforementioned method can be executed.

Using the method, device and computer readable storage medium of the present disclosure, memory can be interleaved and allocated between multiple channels and multiple memory blocks, which greatly improves access bandwidth.

Description of the drawings

By reading the following detailed description with reference to the accompanying drawings, the above and other objects, features, and advantages of the exemplary embodiments of the present disclosure will become easier to understand. In the drawings, several embodiments of the present disclosure are shown in an exemplary but not restrictive manner, and the same or corresponding reference numerals indicate the same or corresponding parts, in which:

FIG. 1 is a schematic diagram illustrating the multi-channel interleaving scheme of the present disclosure by taking two channels as an example;

2 is a schematic diagram showing two adjacent memory blocks in the same channel according to an embodiment of the present disclosure;

FIG. 3 is a flowchart showing a method for allocating memory according to an embodiment of the present disclosure;

Fig. 4 is a schematic diagram showing multi-channel interleaving according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram showing interleaving of multiple memory blocks according to an embodiment of the present disclosure;

FIG. 6 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;

FIG. 7 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;

FIG. 8 is a schematic diagram showing multi-channel interleaving between two adjacent channels according to an embodiment of the present disclosure;

FIG. 9 is a flowchart showing a method for allocating memory according to another embodiment of the present disclosure;

FIG. 10 is a schematic diagram showing interleaving of multiple memory blocks according to another embodiment of the present disclosure;

FIG. 11 is an architecture diagram showing a device for allocating memory according to another embodiment of the present disclosure;

FIG. 12 is a structural diagram showing an integrated circuit device according to an embodiment of the present disclosure; and

Fig. 13 is a structural diagram showing a board according to an embodiment of the present disclosure.

Detailed ways

Using the method, device and computer-readable storage medium of the present disclosure, the memory of multiple channels and multiple memory blocks can realize interleaving and allocation of memory between channels and memory blocks, thereby increasing access bandwidth and increasing parallelism.

Memory is indispensable in today's computer equipment. With the development of technology, many memories can support multiple channels or banks, such as DDR SDRAM, referred to as DDR in the industry.

DDR can support multi-channel access. The interleaving method of DDR between multiple channels is to distribute the same block of memory to different channels, which can increase the bandwidth of multiple DDRs in parallel and improve the performance of memory access. If the data is distributed on the memory blocks on different channels, the memory controller can read the data in parallel through these multi-channels. For a four-channel DDR, the access speed is almost quadrupled.

When DDR performs storage, the same block of memory can be distributed to different channels to perform so-called interleaving, so that the same block of memory can be accessed in parallel, which improves the efficiency of the system.

Figure 1 uses two channels as an example to illustrate the multi-channel interleaving scheme of the present disclosure. As shown in the figure, the embodiment of the present disclosure has a first channel 102 and a second channel 104, and each channel has a considerable page size. For the convenience of description, in this embodiment, each channel is only shown 32 pages have a size of 106, and each page has a size of 16KB. This size is sufficient as the granularity of inter-channel interleaving, and will not affect the normal jump instructions of the upper layer. When performing multi-channel interleaving allocation, it will first be allocated to the first page size 108 of the first channel 102, then allocated to the first page size 110 of the second channel 104, and then allocated to the second page size 112 of the first channel 102 , Allocate storage space in order.

When the DDR of the present disclosure implements the multi-memory block interleaving solution, multi-memory block interleaving access is performed in the same channel. When there is an instruction jump in the upper layer business, each computing unit can only access the memory block in the same channel. Fig. 2 shows two adjacent memory blocks in the same channel: the first memory block 202 and the second memory block 204. In the present disclosure, memory block interleaving access is performed in two adjacent memory blocks. Each memory block shows 8 address spaces, the addresses are addr0 to addr7. In the embodiment of the present disclosure, the address space is smaller than the page size. For example, when the page size is 16KB, the address space can be 1KB, that is, when two memory blocks are interleaved access in the same channel, each access is The memory unit is an address space. For example, at 0x000, it is allocated to the address addr0 of the first memory block 202; at 0x400, it is allocated to the address addr0 of the second memory block 204; at 0x800, it is allocated to the address addr1 of the first memory block 202; At 0xc00, the address addr1 of the second memory block 204 is allocated. Allocate in this way until all page sizes are allocated.

When multiple computing units access and there are instruction jumps in the upper-level business, the use of multi-memory block interleaving access can realize that each computing unit only accesses the memory block in the same channel. In this case, it can reduce the access to the remote channel. It is also possible to reduce the access conflicts of various computing units between different channels.

An embodiment of the present disclosure is a method for allocating memory, especially when the host side issues a command to the accelerator card device, the command may be a dynamic memory allocation (memory allocation, malloc) function.

The memory of this embodiment has 4 channels in total, and the allocation method is interleaved among these 4 channels. The flow of the allocation method is shown in FIG. 3.

When step 302 is executed, a memory allocation request for a 4-channel DDR is received. In more detail, it is to receive a command of the dynamic memory allocation function. The command can dynamically allocate a part of the memory in the accelerator card device. The device side will first apply for a piece of physical memory in the physical memory after aligning according to the specified size.

When step 304 is performed, memory allocation for inter-channel interleaving is performed on the 4-channel DDR. In more detail, the device side will evenly allocate memory space to the 4 channels based on the page size. Figure 4 is a schematic diagram showing this step. As shown in the figure, the 4-channel DDR of this embodiment includes: a first channel 402, a second channel 404, a third channel 406, and a fourth channel 408, each channel includes multiple Address space, for the convenience of illustration, each channel in the figure only shows 32 address spaces, with addresses from addr00 to addr31 (not shown in the figure), and each address space is 8KB. If the page size is 16KB, then every 2 address spaces form a page size.

When performing inter-channel interleaving memory allocation, the current device cyclically performs inter-channel interleaving memory allocation on each channel with a page-sized inter-channel granularity. For example, at 0x000, the address of the first page size 410 allocated to the first channel 402 is addr00 and addr01; at 0x400, the address of the first page size 412 allocated to the second channel 404 is addr00 and addr01; at 0x800, the first page size 414 allocated to the third channel 406, its address is addr00 and addr01; at 0xc00, the first page size 416 allocated to the fourth channel 408, its location The addresses are addr00 and addr01; at 0x1000, the second page size 418 allocated to the first channel 402 has addresses addr02 and addr03. Assign in this way.

It should be noted that, under the premise that the page size is 16KB, if the required space is only 14KB, the space will still be specified at the granularity of a complete page size for alignment during allocation. If the required space is 20KB, the space will be specified at a granularity of 2 page sizes.

When step 306 is executed, based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed. In this embodiment, interleaving memory allocation can be performed cyclically on two adjacent memory blocks at a channel granularity smaller than the page size, for example, interleaving allocation is performed in units of address space. FIG. 5 shows the first channel 402 of FIG. 4 as an example for description. The channel can be divided into 4 memory blocks: the first memory block 502, the second memory block 504, the third memory block 506, and the fourth memory block. Memory block 508, assuming that the first 4 page sizes of the first memory block 502 have been allocated and occupied in step 304 (as shown in the gray block), the memory blocks in this step are interleaved and allocated to the first memory block 502 and the second memory Executed in block 504. At 0x000, it is allocated to the address space 510 of the first memory block 502; at 0x400, it is allocated to the address space 512 of the second memory block 504; at 0x800, it is allocated to the address space 514 of the first memory block 502 ; At 0xc00, the address space 516 allocated to the second memory block 504 is allocated in the order of the dotted arrow in the figure. Allocate in this way and execute repeatedly until the allocated memory is completely allocated.

The memory of another embodiment of the present disclosure also has 4 channels, and the allocation method is interleaved among the 4 channels, and the flow of the allocation method is shown in FIG. 6.

When step 602 is executed, a memory allocation request for a 4-channel DDR is received.

When step 604 is executed, it is determined whether the number of calculation units related to the command of the dynamic memory allocation function this time is one.

According to the memory allocation application, if the number of calculation units involved in the command of the dynamic memory allocation function is 1, it is more efficient to use multi-channel interleaving allocation at this time. Therefore, step 606 is executed to perform inter-channel interleaving on a 4-channel DDR. The operation of memory allocation is the same as that of step 304 in the previous embodiment, and will not be repeated here.

According to the memory allocation application, if in step 604, it is determined that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at the same time, so the execution In step 608, memory allocation for inter-channel and intra-channel interleaving is performed. The operation mode is the same as that of the embodiment in FIG. 3, and will not be described again.

In another embodiment, step 608 can also only perform memory block interleaving access. For example, there are 4 calculation units and 4 channels. The present disclosure can allocate each channel to a calculation unit, and each calculation unit Only access to the memory block in the same channel, in this case, can reduce the access to the remote channel, and can also reduce the access conflicts between the different channels of each computing unit.

The implementation of this disclosure can perform interleaving access between channels and memory blocks. When there is only one computing unit for access, it can enjoy the bandwidth of multiple channel interleaving. When multiple computing units access, each computing unit The unit can access the memory block of a specific channel, and/or enjoy multiple channel interleaving, which not only improves the access bandwidth, but also increases the degree of parallelism.

A single image computing unit (IPU) is used to send a cluster (cluster) access to the DDR channel for testing. The cluster has a total of 64 character segments, and the size of each character segment is 16KB. Without any interleaving, it took 55876 microseconds to complete the allocation, and the bandwidth was 18.76GB/s; when only interleaving between multiple channels was performed, it took 71080 microseconds to complete the allocation, and the bandwidth was 14.756GB/s ; Using the multi-channel and multi-memory block interleaving method of the present disclosure, it takes 44057 microseconds to complete the allocation, and the bandwidth is 23.8GB/s. Obviously the speed has increased a lot.

Another embodiment of the present disclosure is a method for allocating memory. The memory in this embodiment has 4 channels in total, and the allocation method is between two channels. The flow of the allocation method is shown in Figure 7. .

When step 702 is performed, a plurality of service instructions to be allocated to the 4-channel DDR are received. The service instructions include service data and need to be stored in the accelerator card device. In this embodiment, the multiple service instructions include first service data, second service data, third service data, and fourth service data, and the size of each service data is 30 KB.

When step 704 is executed, a memory allocation request for a 4-channel DDR is received. In more detail, in response to the foregoing service instruction, a command of the dynamic memory allocation function is received, which requires the foregoing 4 service data to be stored in the memory of the accelerator card device.

When step 706 is executed, the multiple service instructions are allocated to the 4 channels one by one with the granularity between channels. The inter-channel granularity (ie page size) of the DDR in this embodiment is 16KB, which means that each service data requires 2 pages of space, so each service data is split into the first sub-data (16KB) and the second sub-data (16KB). Data (14KB), and distributed to 4 channels in the following table.

When step 708 is performed, memory allocation for inter-channel interleaving is performed on the 2 channels of the 4-channel DDR. In more detail, when interleaving is performed in this embodiment, interleaving between channels is performed in units of page size, and allocation only occurs between two adjacent channels. As shown in FIG. 8, the DDR in this embodiment includes: a first channel 802, a second channel 804, a third channel 806, and a fourth channel 808, and each channel includes multiple page sizes.

When performing inter-channel interleaving memory allocation, at the granularity of the page size that the current device can support, recursively apply for memory on every two channels, that is, the first channel 802 and the second channel 804 are interleaved, and the third channel 806 is interleaved with The fourth channel 808 performs interleaving.

As shown in the above table, the first service data and the second service data will be interleaved and distributed between the first channel 802 and the second channel 804, and the third service data and the fourth service data will be between the third channel 807 and the fourth channel 808 Interlaced distribution. In more detail, the first sub-data of the first business data is allocated to the first page size 810 of the first channel 802; the second sub-data of the first business data is allocated to the first page size 812 of the second channel 804 ; Allocate the first sub-data of the second service data to the second page size 814 of the first channel 802; allocate the second sub-data of the second service data to the second page size 816 of the second channel 804; The first sub-data of the business data is allocated to the first page size 818 of the third channel 806; the second sub-data of the third business data is allocated to the first page size 820 of the fourth channel 808; the first sub-data of the fourth business data is allocated to the first page size 820 of the fourth channel 808; One sub-data is allocated to the second page size 822 of the third channel 806; the second sub-data of the fourth service data is allocated to the second page size 824 of the fourth channel 808.

When step 710 is executed, based on each channel where the memory is allocated, interleaving memory allocation within the channel is executed. For example, when there are 4 computing units, each channel can be allocated to one computing unit.

When step 712 is executed, in each channel, the allocated service instructions are allocated to the memory block one by one at the granularity within the channel. In this embodiment, if there are other service instructions that have not yet been allocated, such as fifth service data, sixth service data, etc., memory block interleaving allocation can be performed in this step in the manner shown in FIG. 5, which will not be repeated.

In other embodiments, different memory interleaving allocation methods may be adopted for different numbers of computing units. For example, after step 704, a step of judging whether the number of calculation units involved in the command of the dynamic memory allocation function this time is one is added. If the number of calculation units involved in the command of this dynamic memory allocation function is 1, it is more efficient to use multi-channel interleaving allocation at this time, and the memory allocation between two channels is performed on the 4-channel DDR. If it is judged that the number of calculation units involved in the command of the dynamic memory allocation function is more than one, it is more efficient to combine multi-channel and multi-memory block interleaving allocation at this time, so the inter-channel and intra-channel interleaving memory allocation is performed. The operation mode is similar to the embodiment of FIG. 3 and FIG. 6, and those skilled in the art can easily perform operations based on the description of these embodiments, so the details will not be repeated.

Another embodiment of the present disclosure is a method for allocating memory of 4 channels. The allocation method is performed among the 4 channels. The flow of the allocation method is shown in FIG. 9.

When step 902 is executed, a memory allocation request for a 4-channel DDR is received.

According to the memory allocation application, when step 904 is executed, inter-channel memory allocation is performed on the 4 channels. As in the foregoing embodiment, when performing inter-channel interleaving memory allocation, the current device uses the inter-channel granularity of the page size to cyclically perform inter-channel interleaving memory allocation on each channel. Taking FIG. 10 as an example, it shows two memory blocks adjacent to a single channel 1002: the first memory block 1004 and the second memory block 1006. In step 904, it is assumed that part of the space of the first memory block 1004 is allocated, that is, the occupied space 1008 of the gray part.

According to the memory allocation request, when step 906 is executed, on each channel where the memory is allocated, interleaved memory allocation within the channel is executed. In more detail, the granular memory between channels is allocated one by one based on the unallocated space on the channel, and the unused memory blocks are preferentially allocated for interleaving. In this embodiment, the second memory block 1006 is the main memory area. Since the first memory block 1004 has participated in the allocation in step 904, it is only used as a spare memory area.

If the main memory area (that is, the second memory block 1006) has insufficient space for storage, when step 908 is executed, according to the memory allocation request, the spare memory area (the first memory block 1004) is additionally used for memory allocation. It is allocated in the unoccupied space in the first memory block 1004.

The foregoing embodiment can perform interleaving access between channels and between memory blocks, not only can enjoy the bandwidth of multiple channel interleaving, but also make good use of the space of each memory block.

Another embodiment of the present disclosure is a computer-readable storage medium on which computer program code for allocating memory is stored. When the computer program code is run by a processor, the method of the foregoing embodiment can be executed, for example, The technical solutions shown in Figure 3, Figure 6, Figure 7, and Figure 9.

FIG. 11 shows a system 1100 for allocating memory in another embodiment of the present disclosure. The system 1100 includes a main device 1102 and a device 1104, and the main device 1102 may be a host. The device 1104 can be an accelerator card, which includes multiple computing units 1106, a transceiver 1108, a processor 1110, a buffer 1112, and a multi-channel DDR 1114. The four computing units 1106 in the figure are examples, and it is not limited to only four The calculation unit 1106, similarly, the 4 DDR 1114 in the figure is an example, and it is not limited to only 4 DDR 1114. The transceiver 1108 is configured to receive a memory allocation request from the master device 1102 that forms a master-slave relationship with the device 1104; the processor 1110 is configured to perform the following memory allocation operations on the multi-channel DDR 1114 according to the received memory allocation request: Perform inter-channel interleaving memory allocation on multiple channels of multi-channel DDR1114, and perform inter-channel interleaving memory allocation on each channel where memory is allocated. The processor 1110 contains a system memory management unit (System Memory Management Unit). , SMMU); the buffer 1112 may be an ultimate cache (Last Level Cache, LLC), configured to implement interleaving memory allocation within the channel; and the multi-channel DDR 1114 is configured to store data.

The processor 1110 receives the command of the dynamic memory allocation function from the autonomous device 1102, and when applying for memory space, the target is a memory address in the device 1104. After the transceiver 1108 receives the memory application, it converts the memory application into a physical address through the processor 1110, and obtains the real memory address on the DDR, which is interleaved and allocated to the multi-channel DDR 1114 according to the memory address.

In more detail, this embodiment uses the processor 1110 and the buffer 1112 to implement the memory application. The processor 1110 implements the interleaving allocation of the multi-channel DDR 1114 between the multi-channel DDR 1114 through the driver program, and at the same time implements the interleaving allocation of multiple memory blocks in the channel through the buffer 1112. Furthermore, during the memory application process, the memory address allocation requires the participation of the memory management module in the driver and the system memory management unit. The memory management module is responsible for aligning the size of the application and based on the channel information of this application. , Such as single-channel or multi-channel, physical memory allocation to DDR 1114 channels. The system memory management unit is responsible for the management of virtual addresses, and is responsible for managing the mapping between the requested physical addresses and virtual addresses by laying out the page table.

Each multi-channel DDR 1114 includes a DDR controller (not shown) connected to the buffer 1112. After the processor 1110 sends the physical address to the buffer 1112, it will apply for a corresponding virtual address according to the management of the virtual address, and implement the mapping from the physical address to the virtual address by laying the page table, and then send it to the host device through the transceiver 1108 1102 sends the virtual address related to this memory allocation request.

After the master device 1102 receives the virtual address, it can access the memory. The master device 1102 sends the data to the transceiver 1108 according to the returned virtual address. The processor 1110 converts the virtual address into a physical address based on the virtual address, and the data is obtained through the buffer 1112. The real memory address on the multi-channel DDR 1114 is used for interleaving allocation between multiple channels and/or between multiple memory blocks.

If it is necessary to write data from the device 1104 to the main device 1102, the processor 1110 is configured to apply for memory from the main device 1102 through the transceiver 1108. After receiving the virtual address from the main device 1102, the processor 1110 It is converted into a physical address, and the data stored in the multi-channel DDR 1114 is taken out, and sent to the master device 1102 through the transceiver 1108.

In the case where the master device 1102 writes data to the device 1104, taking the interleaving allocation among 4 channels of DDR 1114 as an example, the transceiver 1108 receives the memory allocation application for 4 channels of DDR, and the processor 1110 is based on the calculation unit 1106 State, if the multi-channel interleaving allocation is more efficient, for example, there is only one computing unit 1106, and the processor 1110 chooses to perform inter-channel interleaving memory allocation on a 4-channel DDR, and converts the virtual address into a physical address, and passes the data through the buffer 1112 obtains the real memory address on the multi-channel DDR 1114 for multi-channel interleaving allocation. Then, the processor 1110 may continue to use multiple memory block interleaving allocation, and perform interleaving memory allocation within the channel based on each channel where the memory is allocated.

The architecture of this embodiment can implement the technical solutions shown in FIG. 3, FIG. 6, FIG. 7 and FIG. 9. Those skilled in the art can easily understand the technical details without creative input, so it will not be repeated.

This embodiment can perform interleaving access between channels and between memory blocks. When a computing unit accesses, it can enjoy the bandwidth of multiple channel interleaving. When multiple computing units access, each computing unit simultaneously communicates between channels. Accessing the memory of a specific channel not only reduces access to remote channels, but also reduces access conflicts between computing units.

FIG. 12 is a structural diagram showing an integrated circuit device 1200 according to an embodiment of the present disclosure. As shown in the figure, the integrated circuit device 1200 includes a main device 1202, and the main device 1202 may be the main device 1102 in FIG. 11. In addition, the integrated circuit device 1200 further includes a universal interconnection interface 1204 and a device 1206, and the device 1206 may be the device 1104 in FIG. 11.

In this embodiment, the main device 1202 may be one or more types of processors in general and/or special-purpose processors such as a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but Determined according to actual needs.

According to the technical solution of this embodiment, the universal interconnection interface 1204 can be used to transmit data and control commands between the main device 1202 and the device 1206. For example, the main device 1202 may obtain the required input data from the device 1206 via the universal interconnect interface 1204, and write the input data to the on-chip storage device of the main device 1202. Further, the master device 1202 can obtain a control command from the device 1206 via the universal interconnect interface 1204, and write it into the control buffer on the chip of the master device 1202. Alternatively or alternatively, the universal interconnection interface 1204 may also read the data in the storage module of the main device 1202 and transmit it to the device 1206.

Optionally, the integrated circuit device 1200 may further include a storage device 1208, which may be connected to the main device 1202 and the device 1206 respectively. In one or more embodiments, the storage device 1208 may be used to store data of the main device 1202 and the device 1206, and is particularly suitable for data that cannot be fully stored in the internal storage of the main device 1202 or the device 1206 for the data required for calculation.

According to different application scenarios, the integrated circuit device 1200 of the present disclosure can be used as a SOC system on chip for mobile phones, robots, drones, video capture, video capture equipment and other equipment, thereby effectively reducing the core area of the control part, increasing the processing speed and Reduce overall power consumption. In this case, the universal interconnection interface 1204 of the integrated circuit device 1200 is connected to certain components of the device. Some components referred to here can be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.

In some embodiments, the present disclosure also discloses a chip or integrated circuit chip, which includes the integrated circuit device 1200. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.

In some embodiments, the present disclosure also discloses a board card, which includes the above-mentioned chip packaging structure. Refer to FIG. 13, which provides the aforementioned exemplary board card 1300. In addition to the chip 1302 described above, the board card 1300 may also include other supporting components. The supporting components may include, but are not limited to: a storage device 1304, an interface device 1306, and Control device 1308.

The storage device 1304 is connected to the chip 1302 in the chip packaging structure through a bus 1314 for storing data. The storage device 1304 may include multiple sets of memories 1310. Each group of memories 1310 and chip 1302 are connected through a bus 1314. Each group of memories 1310 may be DDR SDRAM ("Double Data Rate SDRAM", double-rate synchronous dynamic random access memory).

Different from that shown in FIG. 13, in one embodiment, the storage device 1304 may include 4 sets of memories 1310. Each group of memory 1310 may include multiple DDR4 particles (chips). In an embodiment, the chip 1302 may include four 72-bit DDR4 controllers inside. Among the 72-bit DDR4 controllers, 64 bits are used for data transmission and 8 bits are used for ECC verification. The DDR4 controller may be the aforementioned 72-bit DDR4 controller.的processor 1110.

In one embodiment, each group of memories 1310 may include a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip 1302 to control the data transmission and data storage of each memory 1310. The chip 1302 and the memory 1310 can be interleaved and allocated in the manner as in the foregoing embodiment.

The interface device 1306 is electrically connected to the chip 1302 in the chip packaging structure. The interface device 1306 is used to implement data transmission between the chip 1302 and an external device 1312 (for example, a server or a computer). In one embodiment, the interface device 1306 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip 1302 through a standard PCIE interface to realize data transfer. In another embodiment, the interface device 1306 may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as it can realize the switching function. In addition, the calculation result of the chip 1302 is still transmitted back to the external device 1312 by the interface device 1306.

The control device 1308 is electrically connected to the chip 1302 to monitor the state of the chip 1302. Specifically, the chip 1302 and the control device 1308 may be electrically connected through an SPI interface. The control device 1308 may include a single-chip microcomputer ("MCU", Micro Controller Unit). The chip 1302 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the chip 1302 can be in different working states such as multiple load and light load. The control device 1308 can realize the regulation and control of the working status of multiple processing chips, multiple processing and/or multiple processing circuits in the chip 1302.

In some embodiments, the present disclosure also discloses an electronic device or device, which includes the board card 1300 described above. According to different application scenarios, electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, earphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

The foregoing can be better understood according to the following clauses:

Clause A1. A method for allocating memory, including: receiving a memory allocation request for a multi-channel DDR; according to the memory allocation request, executing: inter-channel execution on multiple channels of the multi-channel DDR Interleaved memory allocation; and on each channel where memory is allocated, perform interleaved memory allocation within the channel.

Clause A2, the method according to clause A1, wherein performing the memory allocation of inter-channel interleaving includes performing memory allocation of inter-channel interleaving on every two channels of the multi-channel DDR.

Clause A3. The method of clause A1, wherein performing the memory allocation for inter-channel interleaving includes performing memory allocation for inter-channel interleaving for the plurality of channels based on page size.

Clause A4. The method according to clause A3, wherein performing inter-channel interleaving memory allocation for multiple channels based on page size includes cyclically performing inter-channel interleaving memory allocation on each channel at a page-size inter-channel granularity.

Clause A5. The method according to clause A4, wherein cyclically performing inter-channel interleaving memory allocation on each channel at the inter-channel granularity includes: sequentially allocating inter-channel granular memory on the multiple channels one by one ; And repeat the above sequential allocation until the allocated memory is completely allocated.

Clause A6. The method according to clause A3, wherein performing intra-channel interleaving memory allocation includes interleaving the memory allocated on each channel based on the page size to multiple memory blocks in the channel.

Clause A7. The method according to clause A6, wherein interleaved allocation to a plurality of memory blocks in a channel includes cyclically performing interleaved memory allocation on each memory block with an intra-channel granularity smaller than the page size.

Clause A8. The method according to clause A7, wherein cyclically performing interleaved memory allocation on each memory block comprises: sequentially allocating the granularities in the channel on the multiple memory blocks in the channel one by one. Memory; and repeatedly performing the foregoing sequential allocation until the memory obtained by the channel through the inter-channel interleaving has been allocated.

Clause A9. The method according to clause A6, wherein the channel further includes one or more memory blocks that have participated in memory allocation as a spare memory area, and the method further includes additionally using all the memory blocks according to the memory allocation request. The spare memory area is described for memory allocation.

Clause A10. The method according to any one of clauses A1-9, further comprising: receiving multiple service instructions to be allocated on multiple channels of the multi-channel DDR; The granularity is allocated to multiple channels one by one; and in each channel, the allocated business instructions are allocated to multiple memory blocks one by one with the granularity within the channel.

Clause A11. The method according to clause A10, wherein the business instruction includes business data, and the method further includes interleaving and allocating the business data in the business instruction to multiple memory blocks one by one at a granularity within a channel.

Clause A12. A device for performing data read and write operations, including: a transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device; multi-channel DDR, which is configured for use In storing data; a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request: perform inter-channel interleaving on multiple channels of the multi-channel DDR Memory allocation; and on each channel where memory is allocated, perform interleaving memory allocation within the channel.

Clause A13. The device according to clause A12, wherein in a memory allocation operation, the processor is further configured to use a driver to implement interleaving memory allocation between channels, and in a memory allocation operation, the processor further It is configured to use a buffer to realize interleaving memory allocation in the channel.

Clause A14. The device according to any one of clauses A12-13, wherein after the memory allocation is completed, the processor is configured to send information related to this memory allocation request to the main device through the transceiver The transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with corresponding memory.

Clause A15. The device according to clause A13, wherein in the process of writing data to the main device, the processor is configured to apply for memory from the main device through the transceiver, and upon receiving the data from the main device After the virtual address of the master device, the data stored in the DDR memory is sent to the master device through the transceiver.

Clause A16. A computer-readable storage medium on which is stored computer program code for allocating memory. When the computer program code is run by a processor, the computer program code executes any one of clauses A1-11. method.

The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementation manners of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of this disclosure, the specific implementation and application scope of this disclosure, are all within the protection scope of this disclosure. In summary, the content of this specification should not be construed as a limitation on this disclosure.

Claims

A method for allocating memory, including:

Receive memory allocation applications for multi-channel DDR;

According to the memory allocation request, execute:

Performing inter-channel interleaving memory allocation on multiple channels of the multi-channel DDR; and

On each channel where memory is allocated, interleaved memory allocation within the channel is performed.
The method according to claim 1, wherein performing the inter-channel interleaving memory allocation comprises performing inter-channel interleaving memory allocation on every two channels of the multi-channel DDR.
The method according to claim 1, wherein performing the memory allocation of inter-channel interleaving comprises performing memory allocation of inter-channel interleaving for the plurality of channels based on a page size.
3. The method according to claim 3, wherein performing inter-channel interleaving memory allocation for a plurality of channels based on page size comprises cyclically performing inter-channel interleaving memory allocation on each channel with an inter-channel granularity of page size.
The method according to claim 4, wherein the cyclically performing inter-channel interleaving memory allocation on each channel at the inter-channel granularity comprises:

Allocating the memory of the granularity between the channels one by one on the multiple channels in sequence; and

Repeat the above sequential allocation until the allocated memory is completely allocated.
The method according to claim 3, wherein performing intra-channel interleaving memory allocation comprises interleaving the memory allocated on each channel based on the page size to a plurality of memory blocks in the channel.
The method according to claim 6, wherein the interleaved allocation to a plurality of memory blocks in a channel comprises cyclically performing interleaved memory allocation on each memory block with an intra-channel granularity smaller than the page size.
8. The method according to claim 7, wherein performing interleaved memory allocation on each memory block cyclically comprises:

Allocating the granular memory in the channel one by one on the multiple memory blocks in the channel in sequence; and

The foregoing sequential allocation is performed repeatedly until the memory obtained by the channel through inter-channel interleaving has been allocated.
The method according to claim 6, wherein the channel further includes one or more memory blocks that have participated in memory allocation as a spare memory area, and the method further comprises additionally using the spare memory area according to the memory allocation request Memory area for memory allocation.
The method according to any one of claims 1-9, further comprising:

Receive multiple business instructions to be allocated on multiple channels of the multi-channel DDR;

Allocating the multiple service instructions to multiple channels one by one with the granularity between channels; and

In each channel, the allocated business instructions are allocated to multiple memory blocks one by one at the granularity within the channel.
The method according to claim 10, wherein the service command includes service data, and the method further comprises interleaving and distributing the service data in the service command to a plurality of memory blocks one by one at an intra-channel granularity.
A device used to perform data read and write operations, including:

A transceiver configured to receive a memory allocation application from a master device that forms a master-slave relationship with the device;

Multi-channel DDR, which is configured to store data;

The processor is configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation request:

Performing inter-channel interleaving memory allocation on multiple channels of the multi-channel DDR; and

On each channel where memory is allocated, interleaved memory allocation within the channel is performed.
The device according to claim 12, wherein in a memory allocation operation, the processor is further configured to use a driver to implement inter-channel memory allocation, and in a memory allocation operation, the processor is further configured to The buffer is used to realize interleaving memory allocation in the channel.
The device according to any one of claims 12-13, wherein after the memory allocation is completed, the processor is configured to send a virtual memory related to the memory allocation request to the main device through the transceiver. Address, the transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with a corresponding memory.
The device according to claim 13, wherein in the process of writing data to the host device, the processor is configured to apply for memory from the host device through the transceiver, and upon receiving data from the host device After the virtual address is set, the data stored in the DDR memory is sent to the host device through the transceiver.
A computer-readable storage medium having computer program code for allocating memory stored thereon, and when the computer program code is run by a processor, the method according to any one of claims 1-11 is executed.