CN113157602B

CN113157602B - Method, equipment and computer readable storage medium for distributing memory

Info

Publication number: CN113157602B
Application number: CN202010014955.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2024-01-26
Anticipated expiration: 2040-01-07
Also published as: WO2021139733A1; CN113157602A

Abstract

The present invention relates to a method, apparatus and computer readable storage medium for allocating memory, wherein the apparatus may comprise a combined processing device, which may further comprise a universal interconnect interface and other processing devices. The main equipment of the equipment interacts with other processing devices to jointly complete the appointed calculation operation. The combined processing means may further comprise storage means connected to the main device and the other processing means, respectively, for data storage of the main device and the other processing means.

Description

Method, equipment and computer readable storage medium for distributing memory

Technical Field

The present disclosure relates generally to the field of computers. More particularly, the present disclosure relates to methods, apparatus, and computer readable storage media for allocating memory.

Background

Double rate synchronous dynamic random access memory (DDR SDRAM) is increasingly used in current computers, and is equipped with multi-channel memory control technology, which can effectively increase the total bandwidth of the memory, thereby adapting to the data transmission and processing requirements of high-speed processors. However, some multi-channel DDR techniques cannot implement interleaving allocation of memory among channels; some cannot implement interleaving at the memory block level in the channel; some can only connect two channels in parallel, and the access bandwidth is limited. Therefore, how to obtain a technical solution for efficiently allocating the memory is still a problem to be solved in the prior art.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, the solution of the present disclosure provides a method, an apparatus, and a computer-readable storage medium for allocating memory.

In one aspect, the present disclosure provides a method for allocating memory, comprising: receiving a memory allocation application aiming at the multi-channel DDR; according to the memory allocation application, executing: performing inter-channel interleaving memory allocation on a plurality of channels of the multi-channel DDR; and executing the inter-channel interleaved memory allocation on each channel to which the memory is allocated.

In another aspect, the present disclosure provides an apparatus for performing data read and write operations, comprising: a transceiver configured to receive a memory allocation request from a master device in master-slave relationship with the device; a multi-channel DDR configured to store data; the processor is configured to perform the following memory allocation operation on the multi-channel DDR according to the received memory allocation application: performing inter-channel interleaving memory allocation on a plurality of channels of the multi-channel DDR; and executing the inter-channel interleaved memory allocation on each channel to which the memory is allocated.

In another aspect, the present disclosure provides a computer readable storage medium having stored thereon computer program code for allocating memory, which when executed by a processor, can perform the aforementioned method.

By using the method, the device and the computer readable storage medium, the interleaving memory allocation between the multi-channel memory block and the multi-memory block can be realized, and the access bandwidth is greatly improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is a schematic diagram illustrating a multi-channel interleaving scheme of the present disclosure with two channels as an illustration;

FIG. 2 is a schematic diagram illustrating adjacent 2 memory blocks within the same channel according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a method of allocating memory according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating multi-channel interleaving according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating multi-memory block interleaving in accordance with an embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating a method of allocating memory according to another embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating a method of allocating memory according to another embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating multi-channel interleaving between adjacent channels according to an embodiment of the present disclosure;

FIG. 9 is a flow chart illustrating a method of allocating memory according to another embodiment of the present disclosure;

FIG. 10 is a schematic diagram illustrating multi-memory block interleaving according to another embodiment of the present disclosure;

FIG. 11 is a block diagram illustrating an apparatus for allocating memory according to another embodiment of the present disclosure;

FIG. 12 is a block diagram illustrating an integrated circuit device according to an embodiment of the present disclosure; and

fig. 13 is a block diagram illustrating a board card according to an embodiment of the present disclosure.

Detailed Description

By using the method, the device and the computer readable storage medium, the memory of the multi-channel multi-memory block can realize interleaving allocation of the memory between the channels and the memory block, thereby improving the access bandwidth and increasing the parallelism.

Memories are indispensable in today's computer devices, and with the development of technology, many memories can support multiple channels or multiple memory blocks (banks), such as DDR SDRAM, abbreviated as DDR in the industry.

DDR can support multi-channel access. The interleaving mode of DDR between multiple channels distributes the same memory to different channels, so that the bandwidth of a plurality of DDR connected in parallel can be increased, and the performance of memory access is improved. If the data is distributed on the memory blocks on different channels, the memory controller can read the data in parallel through the multiple channels, and the access speed is nearly four times as high as that of the four-channel DDR.

When DDR stores, the same memory can be distributed to different channels to perform so-called interleaving (interleaving), so that the same memory is accessed in parallel, and the efficiency of the system is improved.

Fig. 1 illustrates a multi-channel interleaving scheme of the present disclosure, taking two channels as an example. As shown, the disclosed embodiment has a first channel 102 and a second channel 104, each of which has a considerable page size (page size), and for convenience of illustration, each channel only shows 32 page sizes 106, and each page size is 16KB, which is sufficient as granularity of interleaving among channels, without affecting the normal jump instruction of the upper layer. In the case of multi-channel interleaving allocation, the memory space is allocated sequentially from the first page size 108 of the first channel 102, then from the first page size 110 of the second channel 104, and then from the second page size 112 of the first channel 102.

When the DDR disclosed by the invention realizes the multi-memory block interleaving scheme, multi-memory block interleaving access is performed in the same channel, and when the upper layer service has instruction skip, each computing unit can only access the memory blocks in the same channel. Fig. 2 shows adjacent 2 memory blocks within the same channel: a first memory block 202 and a second memory block 204, in this disclosure, memory block interleaving accesses are performed in two adjacent memory blocks. Each memory block is shown with 8 address spaces, addresses addr0 through addr7. In embodiments of the present disclosure, the address space is smaller than the page size, for example, in the case of a page size of 16KB, the address space may be 1KB, i.e., when two memory block interleaving accesses are performed in the same channel, the memory unit for each access is one address space. For example, at 0x000, address addr0 is assigned to the first memory block 202; at 0x400, address addr0 allocated to the second memory block 204; at 0x800, address addr1 allocated to the first memory block 202; at 0xc00, address addr1 is assigned to the second memory block 204. And the allocation is performed in this way until all page sizes are allocated.

When a plurality of computing units access and the upper layer service has instruction jump, each computing unit can only access the memory block in the same channel by utilizing the multi-memory block interleaving access, so that the access to a remote channel can be reduced, and the access conflict of each computing unit among different channels can be reduced.

One embodiment of the present disclosure is a method for allocating memory, particularly when the host side issues a command to the accelerator card device, which may be a dynamic memory allocation (memory allocation, malloc) function.

The memory of this embodiment has 4 channels in total, and the allocation is performed by interleaving among the 4 channels, and the flow of the allocation method is shown in fig. 3.

In executing step 302, a memory allocation request for a 4-channel DDR is received. More specifically, a command of the dynamic memory allocation function is received, where the command may be to dynamically allocate a portion of the memory in the accelerator card device, and the device side will apply for a piece of physical memory in the physical memory after aligning according to a specified size.

In performing step 304, inter-channel interleaved memory allocation is performed on the 4-channel DDR. In more detail, the device side allocates memory space equally for the 4 channels based on page size. Fig. 4 is a schematic diagram showing this step, and as shown, the 4-channel DDR of the present embodiment includes: the first channel 402, the second channel 404, the third channel 406 and the fourth channel 408 each include a plurality of address spaces, and for convenience of illustration, each channel in the figure only shows 32 address spaces, addresses addr00 to addr31 (not shown in the figure), and each address space is 8KB. If the page size is 16KB, then every 2 address spaces forms a page size.

In performing inter-channel interleaved memory allocation, the current device performs inter-channel interleaved memory allocation on each channel in a cyclic manner with inter-channel granularity of page size. For example, at 0x000, the first page size 410 allocated to the first channel 402 is addressed to addr00 and addr01; at 0x400, the first page size 412 allocated to the second channel 404 is addressed to addr00 and addr01; at 0x800, the first page size 414 allocated to the third channel 406 is addressed to addr00 and addr01; at 0xc00, the first page size 416 allocated to the fourth channel 408 is addressed to addr00 and addr01; at 0x1000, a second page size 418 is allocated to the first channel 402, which addresses addr02 and addr03. The allocation is performed in this way.

It should be noted that, if the space is only 14KB on the premise that the page size is 16KB, the space is still specified with a granularity of one complete page size for alignment at the time of allocation. If the space required is 20KB, then the space will be specified at a granularity of 2 pages.

In performing step 306, intra-channel interleaved memory allocation is performed on a per-channel basis for allocating memory. In this embodiment, interleaving may be performed on two adjacent memory blocks cyclically with an intra-channel granularity smaller than the page size, for example, interleaving in units of address space. Fig. 5 illustrates the first channel 402 of fig. 4, which may be cut into 4 memory blocks: the first memory block 502, the second memory block 504, the third memory block 506, and the fourth memory block 508, assuming that the first 4 page sizes of the first memory block 502 are allocated and occupied (as indicated by gray blocks) in step 304, the memory block interleaving allocation in this step is performed in the first memory block 502 and the second memory block 504. At 0x000, an address space 510 is allocated to the first memory block 502; at 0x400, an address space 512 allocated to the second memory block 504; at 0x800, an address space 514 is allocated to the first memory block 502; at 0xc00, the address space 516 allocated to the second memory block 504 is allocated in the order of the dashed arrow. And the allocation is carried out in this way repeatedly until the applied memory is allocated.

The memory of another embodiment of the present disclosure has 4 channels as well, and the allocation is performed by interleaving among the 4 channels, and the flow of the allocation method is shown in fig. 6.

In executing step 602, a memory allocation request for a 4-channel DDR is received.

In executing step 604, it is determined whether the number of calculation units of the command related to the dynamic memory allocation function at this time is 1.

According to the memory allocation application, if the number of calculation units related to the command of the dynamic memory allocation function is 1, the multi-channel interleaving allocation is more efficient, so step 606 is performed, and the inter-channel interleaving memory allocation is performed on the 4-channel DDR, and the operation manner is the same as that of step 304 in the previous embodiment, and is not repeated.

According to the memory allocation application, if in step 604, it is determined that the number of computing units related to the command of the dynamic memory allocation function exceeds 1, and at this time, the interleaving allocation of the multi-channel and multi-memory blocks is more efficient, so that in executing step 608, the interleaving memory allocation between channels and in channels is performed, and the operation manner is the same as in the embodiment of fig. 3, and will not be repeated.

In another embodiment, step 608 may also only perform memory block interleaving access, for example, 4 computing units and 4 channels, and the disclosure may allocate each channel to one computing unit, where each computing unit only accesses memory blocks in the same channel, in which case access to remote channels can be reduced, and access conflicts between computing units in different channels can also be reduced.

Embodiments of the present disclosure may provide for inter-channel and inter-memory block interleaving, which may enjoy a bandwidth of multiple channel interleaving when only one computing unit is accessing, and may provide for each computing unit to access a memory block of a particular channel and/or to enjoy multiple channel interleaving when multiple computing units are accessing, which may not only increase access bandwidth, but also increase parallelism.

The DDR channel is tested by sending an access to a cluster (cluster) with a total of 64 character segments, each of 16KB in size, with a single image computation unit (IPU). Without any interleaving, the allocation takes 55876 microseconds to complete, and the bandwidth is 18.76GB/s; under the condition of only interleaving among multiple channels, the allocation is 71080 microseconds, and the bandwidth is 14.756GB/s; by adopting the multi-channel multi-memory block interleaving mode, the allocation takes 44057 microseconds, and the bandwidth is 23.8GB/s. Obviously the speed is improved a lot.

Another embodiment of the present disclosure is a method for allocating memory, where the memory of this embodiment has 4 channels in total, and the allocation is performed between two channels, and the flow of the allocation method is shown in fig. 7.

In executing step 702, a plurality of service instructions to be allocated to the 4-channel DDR are received, where the service instructions include service data that needs to be stored in the accelerator card device. In this embodiment, the plurality of service instructions includes first service data, second service data, third service data, and fourth service data, and each service data has a size of 30KB.

In executing step 704, a memory allocation request for a 4-channel DDR is received. In more detail, in response to the service instruction, a command of the dynamic memory allocation function is received, which requests that the 4 service data be stored in the memory of the accelerator card device.

In performing step 706, the plurality of business instructions are distributed to the 4 lanes one by one at inter-lane granularity. The inter-channel granularity (i.e., page size) of the DDR of this embodiment is 16KB, which means that each service data requires 2 page-sized spaces, so that each service data is split into first sub-data (16 KB) and second sub-data (14 KB) and allocated to 4 channels in the following table manner.

In performing step 708, inter-channel interleaved memory allocation is performed on 2 channels of the 4-channel DDR. In more detail, this embodiment performs interleaving among channels in units of page size, and allocation occurs only between adjacent channels. As shown in fig. 8, the DDR of the present embodiment includes: first channel 802, second channel 804, third channel 806, and fourth channel 808, each comprising a plurality of page sizes.

When performing inter-channel interleaving memory allocation, the memory is applied to every two channels in a circulating manner at the granularity of the page size that can be supported by the current device, that is, the first channel 802 is interleaved with the second channel 804, and the third channel 806 is interleaved with the fourth channel 808.

As shown in the above table, the first traffic data and the second traffic data will be allocated by interleaving between the first channel 802 and the second channel 804, and the third traffic data and the fourth traffic data will be allocated by interleaving between the third channel 807 and the fourth channel 808. In more detail, the first sub-data of the first traffic data is allocated to the first page size 810 of the first channel 802; distributing the second sub-data of the first traffic data to a first page size 812 of the second channel 804; the first sub-data of the second traffic data is assigned to a second page size 814 of the first channel 802; assigning a second sub-data of the second traffic data to a second page size 816 of the second channel 804; allocating the first sub-data of the third traffic data to a first page size 818 of the third channel 806; distributing the second sub-data of the third traffic data to the first page size 820 of the fourth channel 808; the first sub-data of the fourth traffic data is allocated to the second page size 822 of the third channel 806; the second sub-data of the fourth traffic data is allocated to a second page size 824 of the fourth channel 808.

In step 710, intra-channel interleaving memory allocation is performed based on each channel to which memory is allocated. For example, in the case of 4 computing units, each channel may be assigned to 1 computing unit.

In executing step 712, the allocated service instructions are allocated to the memory blocks one by one within each channel at the intra-channel granularity. In this embodiment, if other service instructions are not already allocated, for example, fifth service data, sixth service data, etc., the memory block interleaving allocation may be performed in this step in the manner shown in fig. 5, which is not described again.

In other embodiments, different memory interleaving allocation schemes may be employed for different numbers of computing units. For example, after step 704, a step of determining whether the number of calculation units of the command related to the dynamic memory allocation function at this time is 1 is added. If the number of the calculation units related to the command of the dynamic memory allocation function is 1, the multi-channel interleaving allocation is more efficient, and the interleaving memory allocation between every two channels is performed on the 4-channel DDR. If the number of the calculation units related to the command of the dynamic memory allocation function exceeds 1, the interleaving allocation of the multi-channel and multi-memory blocks is more efficient, so that the interleaving memory allocation between every two channels and in the channels is executed. The operation is similar to the embodiments of fig. 3 and 6, and those skilled in the art can easily perform the operation based on the descriptions of these embodiments, so that the description is omitted.

Another embodiment of the present disclosure is a method for allocating memory for 4 channels by performing the allocation among the 4 channels, the flow of which is shown in fig. 9.

In executing step 902, a memory allocation request for a 4-channel DDR is received.

According to the memory allocation application, when step 904 is performed, inter-channel interleaving memory allocation is performed on the 4 channels. As in the previous embodiment, when performing inter-channel interleaved memory allocation, the inter-channel interleaved memory allocation is performed on each channel in a cyclic manner with the inter-channel granularity of the page size at the current device. Taking fig. 10 as an example, two adjacent memory blocks of a single channel 1002 are shown: a first memory block 1004 and a second memory block 1006. In step 904, it is assumed that the first memory block 1004 has a portion of space allocated, i.e., the gray portion of space 1008.

According to the memory allocation application, at step 906, on each channel to which memory is allocated, inter-channel interleaved memory allocation is performed. In more detail, inter-channel granularity memory is allocated on a channel-by-channel basis based on unallocated space, and interleaved allocation is preferentially performed with unused memory blocks. In this embodiment, the second memory block 1006 is a main memory area, and since the first memory block 1004 has been allocated in step 904, it is only used as a spare memory area.

If the primary memory area (i.e., the second memory block 1006) is not sufficiently available for storage, then, in step 908, the spare memory area (i.e., the first memory block 1004) is additionally used for memory allocation, i.e., allocated within the unoccupied space in the first memory block 1004, according to the memory allocation application.

The above embodiment can perform interleaving access between channels and between memory blocks, so that not only can the bandwidth of interleaving of a plurality of channels be enjoyed, but also the space of each memory block can be better utilized.

Another embodiment of the disclosure is a computer readable storage medium, on which computer program code for allocating memory is stored, which, when being executed by a processor, can perform the method of the previous embodiment, such as the solutions shown in fig. 3, 6, 7, 9.

Fig. 11 illustrates a system 1100 for allocating memory according to another embodiment of the present disclosure, where the system 1100 includes a master device 1102 and a device 1104, and the master device 1102 may be a host. The device 1104 may be an accelerator card that includes a plurality of computing units 1106, a transceiver 1108, a processor 1110, a buffer 1112, and a multi-channel DDR 1114, with 4 computing units 1106 in the figure being an example and not limiting of only 4 computing units 1106, and as such, with 4 DDR 1114 in the figure being an example and not limiting of only 4 DDR 1114. Transceiver 1108 is configured to receive a memory allocation request from a master device 1102 in a master-slave relationship with device 1104; the processor 1110 is configured to perform the following memory allocation operation on the multi-channel DDR 1114 according to the received memory allocation request: performing inter-channel interleaved memory allocation on multiple channels of the multi-channel DDR 1114, and intra-channel interleaved memory allocation on each channel of allocated memory, wherein the processor 1110 includes a system memory management unit (System Memory Management Unit, SMMU); the buffer 1112 may be a Last Level Cache (LLC) configured to implement memory allocation for interleaving in the channel; the multi-channel DDR 1114 is configured to store data.

The processor 1110 receives a command from the host device 1102 for a dynamic memory allocation function that targets a block of memory address in the device 1104 when memory space is applied. After receiving the memory application, the transceiver 1108 converts the memory application into a physical address through the processor 1110, obtains a real memory address on the DDR, and performs interleaving according to the memory address to allocate to the multi-channel DDR 1114.

In more detail, this embodiment applies to implementing memory with a processor 1110 and a buffer 1112. The processor 1110 implements interleaving allocation among the multiple channels DDR 1114 with the applied memory at the granularity of page size through the driver, and implements interleaving allocation of multiple memory blocks in the channels through the buffer 1112. Furthermore, in the memory application process, the memory management module in the driver and the system memory management unit participate together, wherein the memory management module is responsible for aligning the application size, and allocating physical memory to the channel of the DDR 1114 according to the channel information of the present application, such as a single channel or multiple channels. The system memory management unit is responsible for managing virtual addresses and managing the mapping of the applied physical addresses and virtual addresses by paving page tables.

Each multi-channel DDR 1114 includes a DDR controller (not shown) coupled to buffer 1112. After sending the physical address to the buffer 1112, the processor 1110 applies for a segment of corresponding virtual address according to management of the virtual address, and maps the physical address to the virtual address by laying a page table, and then sends the virtual address related to the application for memory allocation to the host device 1102 through the transceiver 1108.

After receiving the virtual address, the host 1102 may perform memory access, the host 1102 sends data to the transceiver 1108 according to the returned virtual address, the processor 1110 converts the virtual address into a physical address based on the virtual address, and the data obtains a real memory address on the multi-channel DDR 1114 through the buffer 1112, so as to perform multi-channel and/or multi-memory inter-block interleaving allocation.

If data is to be written from the device 1104 to the host device 1102, the processor 1110 is configured to apply for memory from the host device 1102 via the transceiver 1108, and after receiving a virtual address from the host device 1102, the processor 1110 converts the virtual address to a physical address based on the virtual address, fetches the data stored in the multi-channel DDR 1114, and transmits the fetched data to the host device 1102 via the transceiver 1108.

In the case where the host 1102 writes data to the device 1104, for example, the 4-channel DDR 1114 interleaving allocation, the transceiver 1108 receives a memory allocation request for the 4-channel DDR, and the processor 1110 selects the memory allocation to perform the 4-channel DDR interleaving on the 4-channel DDR and converts the virtual address to a physical address based on the state of the computing units 1106, if the multi-channel interleaving allocation is more efficient, for example, only one computing unit 1106 is used, and the data is passed through the buffer 1112 to obtain the real memory address on the multi-channel DDR 1114 for the multi-channel interleaving allocation. Processor 1110 may then continue to employ multi-memory block interleaving allocation, performing intra-channel interleaving memory allocation on each channel of allocated memory.

The architecture of this embodiment can implement the technical solutions shown in fig. 3, 6, 7 and 9, and those skilled in the art can easily understand the technical details without any creative effort, so that the detailed description is omitted.

The embodiment can carry out interleaving access aiming at channels and memory blocks, can enjoy the bandwidth of interleaving of a plurality of channels when one computing unit is accessed, and can simultaneously carry out access between channels and the memory of a specific channel when a plurality of computing units are accessed, thereby not only reducing the access to remote channels, but also reducing the access conflict among the computing units.

Fig. 12 is a block diagram illustrating an integrated circuit device 1200 according to an embodiment of the disclosure. As shown, the integrated circuit device 1200 includes a master 1202, which master 1202 may be the master 1102 of fig. 11. In addition, integrated circuit device 1200 also includes a generic interconnect interface 1204 and a device 1206, device 1206 may be device 1104 of fig. 11.

In this embodiment, host device 1202 may be one or more types of processors, such as a general-purpose and/or special-purpose processor, a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but rather determined as desired.

According to aspects of this embodiment, the universal interconnect interface 1204 may be used to transfer data and control instructions between the host device 1202 and the device 1206. For example, the host device 1202 may obtain the required input data from the device 1206 via the universal interconnect interface 1204, writing to storage on the host device 1202 chip. Further, the master device 1202 may obtain control instructions from the device 1206 via the universal interconnect interface 1204, writing control caches on the master device 1202 chip. Alternatively or in addition, the universal interconnect interface 1204 may also read data in a memory module of the host device 1202 and transmit to the device 1206.

Optionally, integrated circuit device 1200 may also include a storage device 1208, which may be coupled to master 1202 and device 1206, respectively. In one or more embodiments, storage 1208 may be used to store data for host device 1202 and device 1206, particularly data that is suitable for use in operations that cannot be fully stored in internal storage of host device 1202 or device 1206.

According to different application scenarios, the integrated circuit device 1200 of the present disclosure may be used as an SOC system on a mobile phone, a robot, an unmanned aerial vehicle, a video acquisition device, and other devices, so as to effectively reduce the core area of the control portion, improve the processing speed, and reduce the overall power consumption. In this case, the universal interconnect interface 1204 of the integrated circuit device 1200 is connected with certain parts of the apparatus. Some of the components referred to herein may be, for example, cameras, displays, mice, keyboards, network cards, or wifi interfaces.

In some embodiments, the present disclosure also discloses a chip or integrated circuit chip that includes the integrated circuit device 1200. In other embodiments, the disclosure also discloses a chip package structure, which includes the chip.

In some embodiments, the disclosure further discloses a board card, which includes the chip package structure. Referring to fig. 13, which provides the aforementioned exemplary board 1300, the board 1300 may include other mating components in addition to the chips 1302 described above, which may include, but are not limited to: a memory device 1304, an interface device 1306, and a control device 1308.

The memory device 1304 is connected to the chip 1302 in the chip package structure via a bus 1314 for storing data. The memory device 1304 may include multiple sets of memory 1310. Each set of memory 1310 is coupled to chip 1302 by bus 1314. Each set of memory 1310 may be DDR SDRAM ("Double Data Rate SDRAM", double rate synchronous dynamic random access memory).

Unlike that shown in fig. 13, in one embodiment, the memory device 1304 may include 4 sets of memory 1310. Each set of memory 1310 may include multiple DDR4 particles (chips). In one embodiment, the chip 1302 may include 4 72-bit DDR4 controllers therein, 64 bits of the 72-bit DDR4 controllers being used for transmitting data, 8 bits for ECC check, and the DDR4 controllers may be the aforementioned processor 1110.

In one embodiment, each set of memory 1310 may include multiple double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the chip 1302 for control of data transfer and data storage for each memory 1310. The chip 1302 and the memory 1310 may be interleaved as in the previous embodiments.

The interface device 1306 is electrically connected to the chip 1302 within the chip package structure. The interface device 1306 is used to enable data transfer between the chip 1302 and an external device 1312 (e.g., a server or computer). In one embodiment, the interface device 1306 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip 1302 through a standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device 1306 may be other interfaces, and the disclosure is not limited to the specific implementation of the other interfaces, and may implement the transfer function. In addition, the calculation result of the chip 1302 is still transmitted back to the external device 1312 by the interface device 1306.

The control device 1308 is electrically connected to the chip 1302 to monitor the status of the chip 1302. Specifically, the chip 1302 and the control device 1308 may be electrically connected through an SPI interface. The control device 1308 may include a single-chip microcomputer ("MCU", micro Controller Unit). The chip 1302 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may carry multiple loads. Thus, the chip 1302 may be in different operating states such as multi-load and light load. The control device 1308 may be configured to regulate the operation of multiple processing chips, multiple processes, and/or multiple processing circuits in the chip 1302.

In some embodiments, the present disclosure also discloses an electronic device or apparatus that includes the board card 1300 described above. Depending on the application scenario, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

The foregoing may be better understood in light of the following clauses:

clause A1, a method for allocating memory, comprising: receiving a memory allocation application aiming at the multi-channel DDR; according to the memory allocation application, executing: performing inter-channel interleaving memory allocation on a plurality of channels of the multi-channel DDR; and executing the inter-channel interleaved memory allocation on each channel to which the memory is allocated.

Clause A2, the method of clause A1, wherein performing the inter-channel interleaved memory allocation comprises performing inter-channel interleaved memory allocation on every two channels of the multi-channel DDR.

Clause A3, the method of clause A1, wherein performing the inter-channel interleaved memory allocation comprises performing inter-channel interleaved memory allocation for the plurality of channels based on page size.

Clause A4, the method of clause A3, wherein performing the inter-channel interleaved memory allocation for the plurality of channels based on the page size comprises performing the inter-channel interleaved memory allocation on each channel cyclically with inter-channel granularity of the page size.

Clause A5, the method of clause A4, wherein performing inter-channel interleaved memory allocation on each channel cyclically at the inter-channel granularity comprises: sequentially distributing the inter-channel granularity memories on the plurality of channels one by one; and repeatedly executing the sequential allocation until the applied memory is allocated.

Clause A6, the method of clause A3, wherein performing the intra-channel interleaved memory allocation comprises interleaving memory allocated on each channel based on the page size over a plurality of memory blocks within the channel.

Clause A7, the method of clause A6, wherein interleaving allocation onto a plurality of memory blocks within a channel comprises performing interleaving memory allocation on each memory block cyclically with an intra-channel granularity smaller than the page size.

Clause A8, the method of clause A7, wherein cyclically performing interleaved memory allocation on each memory block comprises: sequentially distributing the memory with granularity in the channel on the memory blocks in the channel one by one; and repeatedly executing the sequential allocation until the memory obtained by the channel through the inter-channel interleaving is allocated.

Clause A9, the method of clause A6, wherein the channel further comprises one or more memory blocks already participating in memory allocation as a spare memory area, the method further comprising additionally using the spare memory area for memory allocation according to the memory allocation application.

Clause a10, the method of any of clauses A1-9, further comprising: receiving a plurality of service instructions to be distributed on a plurality of channels of the multi-channel DDR; distributing the plurality of business instructions to a plurality of channels one by one at the inter-channel granularity; and within each channel, distributing the distributed business instructions to the plurality of memory blocks one by one with granularity within the channel.

Clause a11, the method of clause a10, wherein the service instruction comprises service data, the method further comprising interleaving the service data in the service instruction onto a plurality of memory blocks one by one at an intra-channel granularity.

Clause a12, an apparatus for performing data read and write operations, comprising: a transceiver configured to receive a memory allocation request from a master device in master-slave relationship with the device; a multi-channel DDR configured to store data; the processor is configured to perform the following memory allocation operation on the multi-channel DDR according to the received memory allocation application: performing inter-channel interleaving memory allocation on a plurality of channels of the multi-channel DDR; and executing the inter-channel interleaved memory allocation on each channel to which the memory is allocated.

Clause a13, the apparatus of clause a12, wherein in a memory allocation operation, the processor is further configured to utilize a driver to implement the inter-lane interleaved memory allocation, and in a memory allocation operation, the processor is further configured to utilize a buffer to implement the intra-lane interleaved memory allocation.

Clause a14, the device of any of clauses a12-13, wherein upon completion of the memory allocation, the processor is configured to send a virtual address associated with the current memory allocation application to the host device via the transceiver, the transceiver is configured to receive data sent by the host device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with corresponding memory.

Clause a15, the device of clause a13, wherein in writing data to a host device, the processor is configured to apply for memory to the host device through the transceiver, and upon receipt of a virtual address from the host device, send the data stored in the DDR memory to the host device through the transceiver.

Clause a16, a computer readable storage medium having stored thereon computer program code for allocating memory, which, when executed by a processor, performs the method according to any of clauses A1-11.

The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Also, those skilled in the art, based on the teachings of the present disclosure, may make modifications or variations in the specific embodiments and application scope of the present disclosure, all falling within the scope of the protection of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.

Claims

1. A method for allocating memory, comprising:

receiving a memory allocation application aiming at the multi-channel DDR;

according to the memory allocation application, executing:

performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR, wherein performing the inter-channel interleaved memory allocation comprises performing inter-channel interleaved memory allocation on the plurality of channels based on page size; and

on each channel of the allocated memory, performing intra-channel interleaved memory allocation, wherein performing intra-channel interleaved memory allocation includes interleaving memory allocated on each channel based on the page size over a plurality of memory blocks within the channel.

2. The method of claim 1, wherein performing the inter-channel interleaved memory allocation comprises performing inter-channel interleaved memory allocation on every two channels of the multi-channel DDR.

3. The method of claim 1, wherein performing inter-channel interleaved memory allocation for a plurality of channels based on page size comprises performing inter-channel interleaved memory allocation on each channel cyclically with inter-channel granularity of page size.

4. The method of claim 3, wherein performing inter-channel interleaved memory allocation on each channel cyclically at the inter-channel granularity comprises:

sequentially distributing the inter-channel granularity memories on the plurality of channels one by one; and

and repeatedly executing the sequential allocation until the applied memory is allocated.

5. The method of claim 1, wherein interleaving allocation over a plurality of memory blocks within a channel comprises performing interleaved memory allocation over individual memory blocks cyclically with an intra-channel granularity smaller than the page size.

6. The method of claim 5, wherein cyclically performing interleaved memory allocation on individual memory blocks comprises:

sequentially distributing the memory with granularity in the channel on the memory blocks in the channel one by one; and

and repeatedly executing the sequential allocation until the memories obtained by the channels through the inter-channel interleaving are allocated.

7. The method of claim 1, wherein the channel further includes one or more memory blocks already involved in memory allocation as a spare memory area, the method further comprising additionally using the spare memory area for memory allocation in accordance with the memory allocation application.

8. The method of any of claims 1-7, further comprising:

receiving a plurality of service instructions to be distributed on a plurality of channels of the multi-channel DDR;

distributing the plurality of business instructions to a plurality of channels one by one at the inter-channel granularity; and

within each channel, the allocated traffic instructions are allocated to a plurality of memory blocks, one by one, at an intra-channel granularity.

9. The method of claim 8, wherein the traffic instructions comprise traffic data, the method further comprising interleaving traffic data in the traffic instructions onto a plurality of memory blocks one by one at an intra-channel granularity.

10. An apparatus for performing data read and write operations, comprising:

a transceiver configured to receive a memory allocation request from a master device in master-slave relationship with the device;

a multi-channel DDR configured to store data;

the processor is configured to perform the following memory allocation operation on the multi-channel DDR according to the received memory allocation application:

11. The apparatus of claim 10, wherein in a memory allocation operation the processor is further configured to utilize a driver to effect memory allocation of the inter-lane interlaces, and in a memory allocation operation the processor is further configured to utilize a buffer to effect memory allocation of the intra-lane interlaces.

12. The device of any of claims 10-11, wherein upon completion of the memory allocation, the processor is configured to send a virtual address associated with the current memory allocation application to the host device via the transceiver, the transceiver is configured to receive data sent by the host device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR allocated with corresponding memory.

13. The device of claim 11, wherein in writing data to a host device, the processor is configured to apply for memory to the host device through the transceiver and, upon receipt of a virtual address from the host device, to send data stored in the DDR memory to the host device through the transceiver.

14. A computer readable storage medium having stored thereon computer program code for allocating memory, which when executed by a processor, performs the method according to any of claims 1-9.