WO2023030195A1 - 缓存管理方法和装置、控制程序及控制器 - Google Patents

缓存管理方法和装置、控制程序及控制器 Download PDF

Info

Publication number
WO2023030195A1
WO2023030195A1 PCT/CN2022/115201 CN2022115201W WO2023030195A1 WO 2023030195 A1 WO2023030195 A1 WO 2023030195A1 CN 2022115201 W CN2022115201 W CN 2022115201W WO 2023030195 A1 WO2023030195 A1 WO 2023030195A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
address
cache
chip
controller
Prior art date
Application number
PCT/CN2022/115201
Other languages
English (en)
French (fr)
Inventor
李亚文
刘衡祁
徐金林
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2023030195A1 publication Critical patent/WO2023030195A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of communications, and in particular, to a cache management method and device, a control program, and a controller.
  • cache controllers In the application of Ethernet switching chips, it is often necessary to select different types of off-chip cache units, that is, cache controllers, according to application scenarios and costs. Basic functional requirements, and must have good compatibility and portability, so that in different application scenarios, different cache controllers can be connected according to factors such as storage size, speed, power consumption, and cost, without the need for multiple development, thereby saving manpower and cost.
  • the cache access efficiency is related to the implementation of the MMU.
  • the main function of the MMU is to be responsible for the data of the packet (packet, PK), the write request of the packet descriptor (packet descriptor, PD), the distribution of the write data, and release according to the order of the write request; The data is read back in order according to the read request.
  • technologies such as off-chip address management, physical address mapping, and packaging are used to maximize the use of cache bandwidth to improve the efficiency of the memory controller.
  • the current mainstream cache controllers include DDR3/DDR4/DDR5 (Double Data Rate, DDR double rate), HBM (High Bandwidth Memory) (high bandwidth memory), etc. How to be compatible with different controllers under the same framework and ensure Its storage efficiency is the main problem to be solved.
  • Embodiments of the present disclosure provide a cache management method and device, a control program, and a controller, so as to at least solve the problem of how to be compatible with different controllers under the same framework in the related art.
  • a cache management method including: the cache management unit MMU identifies the external cache controller type based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type , confirming the offset address by means of table lookup; calculating the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.
  • a cache management device including: an identification unit configured to enable the cache management unit MMU to identify the type of an external cache controller based on CPU configuration information; a confirmation unit configured to manage the cache based on the address The address area corresponding to the submodule and the above-mentioned cache controller type confirms the offset address by means of a table lookup; the calculation unit is set to calculate the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the different above-mentioned cache The number of external connection channels gated by the controller type is different.
  • a computer-readable storage control program wherein a computer program is stored in the computer-readable storage control program, wherein the computer program is configured to perform any one of the above-mentioned methods when running Steps in the examples.
  • a controller including a buffer and a processor
  • the above-mentioned controller stores a computer program
  • the above-mentioned processor is configured to run the above-mentioned computer program to perform any one of the above-mentioned methods Steps in the examples.
  • the cache management unit MMU is used to identify the type of external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset address is confirmed in a table lookup manner; The logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized, which can Solve the problem of being compatible with different controllers under the same framework, and then be compatible with multiple controllers under the same framework, and improve the effect of storage efficiency.
  • FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a buffer management method according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a cache management method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the architecture of a cache management system according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an optionE mode of HBM in a cache management method according to an embodiment of the present disclosure
  • FIG. 5 is a first schematic diagram of address correspondence in a cache management method according to an embodiment of the present disclosure
  • FIG. 6 is a second schematic diagram of address correspondence in a cache management method according to an embodiment of the present disclosure.
  • FIG. 7 is a third schematic diagram of address correspondence in the cache management method according to an embodiment of the present disclosure.
  • FIG. 8 is a fourth schematic diagram of address correspondence in a cache management method according to an embodiment of the present disclosure.
  • FIG. 9 is a first schematic diagram of address mapping in a cache management method according to an embodiment of the present disclosure.
  • FIG. 10 is a second schematic diagram of address mapping in a cache management method according to an embodiment of the present disclosure.
  • FIG. 11 is a third schematic diagram of address mapping in a cache management method according to an embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of off-chip address management in a cache management method according to an embodiment of the disclosure.
  • Fig. 13 is a schematic structural diagram of a cache management device according to an embodiment of the present disclosure.
  • FIG. 1 is a block diagram of a hardware structure of a mobile terminal according to a cache management method according to an embodiment of the present disclosure.
  • the mobile terminal may include one or more (only one is shown in Figure 1) processors 102 (processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.) and a memory 104 for storing data, wherein the above-mentioned mobile terminal may also include a transmission device 106 and an input and output device 108 for communication functions.
  • processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.
  • a memory 104 for storing data
  • the above-mentioned mobile terminal may also include a transmission device 106 and an input and output device 108 for communication functions.
  • the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal.
  • the mobile terminal may also include more or fewer components than those shown in FIG. 1,
  • the memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the cache management method in the embodiments of the present disclosure, and the processor 102 executes various functions by running the computer programs stored in the memory 104 A functional application and data processing, that is, to realize the above-mentioned method.
  • the memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or transmit data via a network.
  • the specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal.
  • the transmission device 106 includes a network interface controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flow chart of a cache management method according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:
  • Step S202 the cache management unit MMU identifies the type of the external cache controller based on the CPU configuration information of the central processing unit;
  • Step S204 based on the address management submodule and the address area corresponding to the above cache controller type, confirm the offset address by means of table lookup;
  • Step S206 calculating the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.
  • the cache management unit MMU is used to identify the type of the external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset is confirmed in the form of a table lookup Address; the logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized , can solve the problem of being compatible with different controllers under the same framework, and then can be compatible with multiple controllers under the same framework, and can improve storage efficiency.
  • the above cache management method further includes: the address mapping submodule of the above MMU reads the preset address mapping relationship; the above address mapping relationship is to convert the on-chip logical address into a value acceptable to the cache chip physical address.
  • the above-mentioned cache management method further includes: the above-mentioned method also includes: the address mapping submodule of the above-mentioned MMU reads a preset address mapping relationship; the above-mentioned address mapping relationship is to convert the on-chip logical address into The physical address accepted by the cache chip; according to different application scenarios, reconfigure through the CPU to obtain the address mapping relationship corresponding to the above application scenarios after reconfiguration.
  • the above-mentioned cache management method further includes: according to the requirements of the off-chip cache and the structural attributes of the cache controller, segmenting the data packet sent by the CPU to the above-mentioned MMU according to the block unit block, wherein the segmentation The divided data packet corresponds to the block address, and one data packet corresponds to multiple block addresses.
  • the address management intervals are different when the types of the above cache controllers are different.
  • the buffer management method further includes: the MMU receives off-chip data sent by the packet buffer management unit PMU, wherein the off-chip data includes a packet descriptor PD and a packet PK; extracting The data information of the above-mentioned PD; the above-mentioned data information is bundled to obtain the bundled data, and the first data in the above-mentioned PK is deleted; here, the first data may include invalid data in the bundled data.
  • the second data may include valid data in the grouped data.
  • the data information of the above-mentioned PD in the above-mentioned grouped data and the extracted second data are grouped again to obtain the target grouped data.
  • the above-mentioned target grouping data is sent to the off-chip cache for storage.
  • the above cache management method further includes: when the data bit length of the above PD plus the extracted data bit length of the second data is less than or equal to the bus bit width, simultaneously output the above PD and the above second data;
  • the above cache management method further includes: a message cache management unit PMU sends a write message;
  • the above-mentioned PMU stores the above-mentioned write message, and sends a write release command to the traffic buffer management unit TMMU; the above-mentioned TMMU sends a message descriptor to the queue management unit QMU;
  • the above-mentioned QMU issues a write command through the above-mentioned TMMU, and transparently transmits it to the above-mentioned MMU, and the above-mentioned QMU sends a write release signal to the above-mentioned TMMU after storing the above-mentioned write command;
  • the command queue issues a read message command, reads the read message data and returns the above read message data to the above PMU;
  • the above-mentioned TMMU issues a read command and reads the packet descriptor data.
  • FIG. 3 is a schematic diagram of the architecture of the cache management system according to an embodiment of the present disclosure.
  • the MMU is located in the PMU (Packet Memory Unit) message storage unit, TMMU (TM Memory Management Unit memory management unit), CMD_FIFO (Command FIFO, command first-in-first-out queue) and HBM (high Bandwidth Memory, high Bandwidth memory)/DDR, the buffer management method of the embodiment of the present disclosure can mainly realize the following functions:
  • mapping from logical address to off-chip physical address can be reconfigured.
  • the above-mentioned PMU stores the above-mentioned write message, and sends a write release command to the traffic buffer management unit TMMU; the above-mentioned TMMU sends a message descriptor to the queue management unit QMU; the above-mentioned QMU sends a write command through the above-mentioned TMMU, and transparently transmits it to the above-mentioned MMU, After the above-mentioned QMU finishes storing the above-mentioned write command, it sends a write release signal to the above-mentioned TMMU;
  • the command queue issues a read message command, reads the read message data, and returns the read message data to the PMU; the above TMMU issues a read command, and reads the message descriptor data.
  • the main technologies used include address management, PC balance, off-chip address mapping and packetization technology. Compatibility runs through the entire cache management process, so as to take into account all Supports controller functionality and performance.
  • the above cache management method includes:
  • the first step is multi-controller switching.
  • the cache management module identifies the type of external cache controller, and the address management submodule confirms the offset address by looking up the table according to the address area corresponding to the controller type, thereby calculating the logical address of this type of controller; configuration Different types of cache controllers have different numbers of external connection channels to be selected.
  • the present disclosure designs off-chip cache interfaces of 16 Channels, and each Channel completely supports five channels of the AXI4 bus including write address, write data, write response, read address and read data.
  • there are HBM and DDR modes and the mode switching is configurable. The default is HBM of 16Channel. Through CPU configuration, it can be switched to DDR mode.
  • the transfer rate of each type of controller is not necessarily synchronized with the system clock of the MMU.
  • some cache areas on the chip are divided.
  • asynchronous (first-in-first-out queue) FIFO the data is cached first, and at the same time, through the pre-reading function of the self-developed logic, the data and commands are pre-read out and enter the Ready waiting state.
  • the effective cache controller arrives, through the handshake mechanism The data is sent out in the same cycle, which can ensure the maximum utilization of the bandwidth of the off-chip controller and reliably process the data stream across clocks.
  • the second step is to configure address mapping.
  • the address mapping sub-module reads the address mapping relationship preset by the system according to the configuration type, and the address mapping relationship can be reconfigured through the CPU according to the application scenario to adjust the best mapping method;
  • the logical address cannot be directly indexed to the address pin of the HBM/DDR, after processing by the MMU, the logical address is converted into an address that can be accepted by the cache chip, which is called a physical address. Because of the multi-level structure of caching HBM/DDR, the way of address mapping has a lot to do with the off-chip read and write bandwidth, storage rate and efficiency.
  • the address mapping of this disclosure is to divide each physical channel (data bus bit width 128bit) into two virtual channels (Pseudo Channel, pseudo channel, referred to as PC, data bus bit width 64bit), and the two virtual channels share a set of addresses and control bus.
  • Fig. 4 is a schematic diagram of the option E mode of HBM in the cache management method according to an embodiment of the present disclosure.
  • each virtual channel corresponds to a controller, and the controller uses half of the cache frequency run.
  • the virtual channel (PS) represented by Psgnt is an internal arbitration signal of the controller, and the controller determines the value of Psgnt according to the physical interface of PS.
  • PS virtual channel
  • SID is the unique address of data 8Hi, which can be regarded as a bank (storage body) address.
  • the number of banks of 8Hi is twice that of 4Hi, which are 32 banks and 16 banks respectively. Banks with 4 consecutive IDs belong to a bank set bank_group.
  • 8Hi and 4Hi are 8 banks and 4 banks respectively.
  • laddr[N:O] is an 8Byte address (one control bus has a bit width of 128 bits, which is divided into two PSs, and the bit width of each PS is 64 bits). Since the prefetching times of the HBM controller is 4, 256 bits are stored at a time. It will occupy 4 addresses, so the actual logic of laddr[1:0] will not assign an address, the actual default is 0, and the controller does not use it.
  • the address sent by the MMU is the 384B address (the actual address is sent by an integer power of 2), the address of the AXI is 1 byte, and each address of the particle corresponding to the PS channel stores 128bit data, so the correspondence between the four addresses is as follows:
  • FIG. 5 is a schematic diagram of address correspondence in the cache management method according to an embodiment of the present disclosure.
  • Fig. 6 is a schematic diagram 2 of the address correspondence in the cache management method according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of the address correspondence in the cache management method according to an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of address correspondence in the cache management method according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of address mapping in the cache management method according to an embodiment of the present disclosure.
  • 16 PSs need to be used in a balanced manner.
  • bank switching is added in the channel.
  • 4Hi4G particles are used, and the logical address The mapping relationship between physical addresses and physical addresses is shown in Figure 9.
  • FIG. 10 is a second schematic diagram of address mapping in the cache management method according to an embodiment of the present disclosure.
  • DDR5 When DDR5 is plugged in, DDR5 connected to three channels can be configured, and the mapping relationship between logical addresses and physical addresses is shown in FIG. 10 .
  • FIG. 11 is a schematic diagram of address mapping in the cache management method according to an embodiment of the present disclosure.
  • DDR4 when DDR4 is plugged in, DDR5 connected to 3 channels can be configured, and the mapping relationship between logical addresses and physical addresses is shown in FIG. 11 .
  • the third step is off-chip address management.
  • the off-chip cache address is managed in units of blocks (configurable fixed-size data units), which are called virtual addresses (also called logical addresses).
  • a data packet sent by the device to the MMU needs to be segmented according to the block so as to correspond to the block address. After such processing, a data packet may correspond to multiple block data, that is, multiple block addresses need to be generated.
  • the range of address management is different.
  • FIG. 12 is a schematic diagram of off-chip address management according to the cache management method of an embodiment of the present disclosure.
  • the chunk (relay port) address is 128K
  • T[16:13] is 16 large linked list IDs
  • T[ 12:10] is the sub-list ID under each large linked list
  • T[9:0] is the linked list number in each sub-linked list
  • B[3:0] is the number of blks under each chunk
  • C[2:0] It is the number of slices under each blk (data + 1 represents the number of slices).
  • Linked list ID application adopts RR (Round-Robin (polling)) scheduling.
  • RR application When a stream comes in, RR application will first select the large linked list, then RR will select a sub-linked list in the large linked list, and then apply for a linked list from the sub-linked list. The same stream will first exhaust the blk in a chunk, and different streams will reapply for chunks.
  • ⁇ T[9:0], B[3:0], T[12:10], T[16:13] ⁇ of address management can be used as a counter, and the address changes continuously.
  • the address management module needs to ensure balanced access among the pseudo-channel PCs to avoid frequent access by individual PCs in a short period of time while the rest of the PCs are idle. If a PC returns slowly, feedback When the MMU is busy, one scheduling can be reduced based on real-time command statistics and single PC historical command statistics. From the analysis of the entire process of data flow, there is still a balanced access between PCs.
  • the fourth step is small packet splicing technology. Storing on-chip PDs off-chip will theoretically reduce the bandwidth and efficiency of off-chip storage due to the packet length. Packing technology is used to store it. The purpose of packing is to squeeze out the invalid bytes in the PK and PD that need to be packed outside the chip by squeezing bubbles, and then splicing the valid bytes to improve the off-chip Cache utilization.
  • the step of grouping is that the MMU receives the outsourced PD and PK sent by the PMU, firstly extracts and combines the PD information, and at the same time squeezes out the invalid bytes in the PK, shifts and splices the small packets, and extracts valid data , and then combine the packaged PD information and the PK from which the effective data is extracted again, and send the result to off-chip for storage.
  • bit width of the off-chip bus is 384B, except for single packets (does not need to be combined), different types of combined packages will have different output results for each shot.
  • small package splicing There are two cases of small package splicing: PK_len ⁇ 384B); small package and small package (PD_len+PK_len>384B).
  • PK_len ⁇ 384B small package and small package
  • the first type of small packets are combined into small packets, the length of the PD plus the length of the extracted data is less than the bus bit width, and can be output in one shot;
  • the second type of small packets are combined into small packets, because the length is greater than the bus bit width, it needs to be divided into two shots.
  • Output the output of the first beat is 384B high, and the remaining part is output with zeros at the end of the second beat, which will affect the line speed and should be avoided as much as possible.
  • the solution disclosed in the present disclosure is not only compatible with various types of cache controllers, but also can improve off-chip read/write bandwidth and access efficiency.
  • the following uses the actual measurement data to illustrate the actual measurement data before and after the bandwidth and efficiency improvement in the case of external HBM/DDR5/DDR4 three types of controllers.
  • Table 1 The results of testing the plug-in HBM using standard address mapping are shown in Table 1.
  • Table 2 The test results of the efficiency improvement method of the present disclosure are shown in Table 2, Table 3 is the test data of the external DDR5, and Table 4 is the test data of the external DDR4. After comparative analysis of the actual measurement results, the total off-chip bandwidth and storage efficiency in each mode are all the same. There is an improvement.
  • the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the above-mentioned methods in various embodiments of the present disclosure.
  • a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
  • module may be a combination of software and/or hardware that realizes a predetermined function.
  • devices described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
  • FIG. 13 is a structural block diagram of a cache management device according to an embodiment of the present disclosure. As shown in FIG. 13 , the device includes:
  • the identification unit 1302 is configured to enable the cache management unit MMU to identify the type of the external cache controller based on the CPU configuration information;
  • the confirmation unit 1304 is configured to confirm the offset address by means of table lookup based on the address area corresponding to the address management submodule and the above cache controller type;
  • the calculation unit 1306 is configured to calculate the logical address of the above-mentioned cache controller type based on the above-mentioned offset address; wherein, the number of external connection channels selected by different cache controller types is different.
  • the cache management unit MMU is used to identify the type of external cache controller based on the CPU configuration information; based on the address management submodule and the address area corresponding to the above cache controller type, the offset address is confirmed in a table lookup manner; The logical address of the above-mentioned cache controller type is calculated based on the above-mentioned offset address; wherein, the number of external connection channels selected by different above-mentioned cache controller types is different; therefore, the switching of different controllers under the same framework is realized, which can Solve the problem of being compatible with different controllers under the same framework, and then be compatible with multiple controllers under the same framework, and improve the effect of storage efficiency.
  • the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.
  • Embodiments of the present disclosure also provide a computer-readable storage control program, where a computer program is stored in the computer-readable storage control program, wherein the computer program is configured to perform any of the methods described in any one of the above method embodiments when running. step.
  • the above-mentioned computer-readable storage control program may include but not limited to: a driver program for a CPU and a storage controller, a control program for connecting an FPGA to HBM/DDR3/DDR4/DDR5, and the like.
  • Embodiments of the present disclosure also provide a controller, including a buffer (cache part of data) and a processor, a computer program is stored in the controller, and the controller is configured to run the computer program to perform any one of the above methods Steps in the examples.
  • a controller including a buffer (cache part of data) and a processor, a computer program is stored in the controller, and the controller is configured to run the computer program to perform any one of the above methods Steps in the examples.
  • the above-mentioned controller may further include a transmission device for protocol conversion, wherein the transmission device is connected to the above-mentioned controller to realize the connection with the cache controller.
  • each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.

Abstract

本公开实施例提供了一种缓存管理方法和装置、控制程序及控制器;上述方法包括:缓存管理单元MMU基于中央处理器CPU配置信息,识别外接的缓存控制器类型;基于地址管理子模块和缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;基于偏移地址计算出缓存控制器类型的逻辑地址;其中,不同的缓存控制器类型选通的对外连接通道数不同。通过本公开,以解决同一种框架下兼容不同的控制器的问题,进而在同一框架下可兼容多种控制器,以及提高存储效率的效果。

Description

缓存管理方法和装置、控制程序及控制器
相关申请的交叉引用
本公开基于2021年09月02日提交的发明名称为“缓存管理方法和装置、控制程序及控制器”的中国专利申请CN202111028812.3,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本公开。
技术领域
本公开涉及通信技术领域,具体而言,涉及一种缓存管理方法和装置、控制程序及控制器。
背景技术
在以太网交换芯片的应用中,往往需要根据应用场景和成本等,选择不同型号的片外缓存单元,即缓存控制器,所以芯片的缓存管理单元(Memory Management Unit,简称MMU)不仅要能够满足基本的功能需求,且要具有良好的兼容性和可移植性,以便在不同的应用场景下,根据存储大小、速率、功耗和成本等因素连接不同的缓存控制器,而不需要进行多次开发,从而节省人力和成本。
随着处理器和存储控制器频率和带宽的不断提升,其各自性能也在不断提高,但缓存访问效率却往往成为系统性能的瓶颈问题,而缓存访问效率与MMU的实现方式有关。在以太网交换芯片中,MMU的主要功能是负责报文(packet,PK)的数据、报文描述符(packet descriptor,PD)的写请求、写数据分发,并按照写请求保序释放;再将数据根据读请求保序读回,在这个过程中,通过片外地址管理、物理地址映射和拼包等技术尽可能最大化利用缓存带宽,以提高存储控制器的效率。当下主流的缓存控制器有DDR3/DDR4/DDR5(Double Data Rate,DDR双倍速率)、HBM(High Bandwidth Memory)(高带宽存储器)等,如何在同一种框架下兼容不同的控制器且能够保证其存储效率是待解决的主要问题。
针对上述如何在同一种框架下兼容不同的控制器的问题,目前尚未提出有效的解决方案。
发明内容
本公开实施例提供了一种缓存管理方法和装置、控制程序及控制器,以至少解决相关技术中如何在同一种框架下兼容不同的控制器的问题。
根据本公开的一个实施例,提供了一种缓存管理方法,包括:缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同。
根据本公开的另一个实施例,提供了一种缓存管理装置,包括:识别单元,设置为使缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;确认单元,设置为基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;计算单元,设置为基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控 制器类型选通的对外连接通道数不同。
根据本公开的又一个实施例,还提供了一种计算机可读存储控制程序,上述计算机可读存储控制程序中存储有计算机程序,其中,上述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本公开的又一个实施例,还提供了一种控制器,包括缓存器和处理器,上述控制器中存储有计算机程序,上述处理器被设置为运行上述计算机程序以执行上述任一项方法实施例中的步骤。
通过本公开,由于采用了缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同;因此,实现了同一种框架下不同的控制器的切换,可以解决同一种框架下兼容不同的控制器的问题,进而在同一框架下可兼容多种控制器,以及提高存储效率的效果。
附图说明
图1是根据本公开实施例的缓存管理方法的移动终端的硬件结构框图;
图2是根据本公开实施例的缓存管理方法的流程图;
图3是根据本公开实施例的缓存管理系统的架构示意图;
图4是根据本公开实施例的缓存管理方法中的HBM的optionE模式的示意图;
图5是根据本公开实施例的缓存管理方法中的地址对应关系的示意图一;
图6是根据本公开实施例的缓存管理方法中的地址对应关系的示意图二;
图7是根据本公开实施例的缓存管理方法中的地址对应关系的示意图三;
图8是根据本公开实施例的缓存管理方法中的地址对应关系的示意图四;
图9是根据本公开实施例的缓存管理方法中的地址映射的示意图一;
图10是根据本公开实施例的缓存管理方法中的地址映射的示意图二;
图11是根据本公开实施例的缓存管理方法中的地址映射的示意图三;
图12是根据本公开实施例的缓存管理方法的片外地址管理示意图;
图13是根据本公开实施例的缓存管理装置的结构示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开的实施例。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本公开实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置中执行。以运行在移动终端上为例,图1是本公开实施例的缓存管理方法的移动终端的硬件结构框图。如图1所示,移动终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,其中,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者 具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的缓存管理方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
图2是根据本公开实施例的缓存管理方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,缓存管理单元MMU基于中央处理器CPU配置信息,识别外接的缓存控制器类型;
步骤S204,基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;
步骤S206,基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同。
通过本公开实施例,由于采用了缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同;因此,实现了同一种框架下不同的控制器的切换,可以解决同一种框架下兼容不同的控制器的问题,进而同一框架下可兼容多种控制器,以及提高存储效率的效果。
在一个或多个实施例中,上述缓存管理方法还包括:上述MMU的地址映射子模块读取预设的地址映射关系;上述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址。
在一个或多个实施例中,上述缓存管理方法还包括:上述方法还包括:上述MMU的地址映射子模块读取预设的地址映射关系;上述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址;根据不同应用场景,通过CPU进行重配置,以获取重配置后与上述应用场景对应的地址映射关系。
在一个或多个实施例中,上述缓存管理方法还包括:根据片外缓存的需求和缓存控制器的结构属性,按照块单元block将CPU发送给上述MMU的数据包进行切分,其中,切分后的数据包与block地址相对应,一个数据包对应多个block地址,上述缓存控制器的类型不同 时地址管理的区间不同。
在一个或多个实施例中,上述缓存管理方法还包括:上述MMU接收到报文缓存管理单元PMU发送的片外数据,其中,上述片外数据包括报文描述符PD和报文PK;提取上述PD的数据信息;将上述数据信息进行拼包得到拼包数据,并删除上述PK中的第一数据;这里,第一数据可以包括拼包数据中的无效数据。
对上述拼包数据按照有效数据位进行移位拼接,提取出第二数据;这里,第二数据可以包括拼包数据中的有效数据。将上述拼包数据中的上述PD的数据信息和提取出的上述第二数据再次进行拼包,得到目标拼包数据。上述目标拼包数据发送至片外缓存进行存储。
在一个或多个实施例中,上述缓存管理方法还包括:在上述PD的数据位长度加上提取出的上述第二数据的数据位长度小于或等于总线位宽时,同时输出上述PD和上述第二数据;
在上述PD的数据位长度加上提取出的上述第二数据的数据位长度大于总线位宽时,将上述PD和上述第二数据拆分为第一拆分数据和第二拆分数据;
输出上述PD和上述第一拆分数据;
将上述第二拆分数据进行补零操作,得到数据位长度等于上述总线位宽的补零处理数据,并输出上述补零处理数据。
在一个或多个实施例中,上述缓存管理方法还包括:报文缓存管理单元PMU下发写报文;
上述PMU存储上述写报文,发送写释放命令给流量缓存管理单元TMMU;上述TMMU下发报文描述符至队列管理单元QMU;
上述QMU通过上述TMMU下发写命令,并透传给上述MMU,上述QMU完成存储上述写命令后向上述TMMU发写释放信号;
命令队列下发读报文命令,读取读报文数据并将上述读报文数据返回到上述PMU;
上述TMMU下发读命令,并读取报文描述符数据。
基于上述实施例,在一应用实施例中,本公开提供的缓存管理装置中,MMU位于片外存储控制器和片上控制模块之间,图3是根据本公开实施例的缓存管理系统的架构示意图,如图3所示,MMU位于PMU(Packet Memory Unit)报文存储单元、TMMU(TM Memory Management Unit内存管理单元)、CMD_FIFO(Command FIFO,命令先进先出队列)和HBM(high Bandwidth Memory,高带宽存储器)/DDR之间,本公开实施例的缓存管理方法主要能实现功能如下:
1)负责多队列报文PK、报文描述符PD写请求和写数据的分发,写释放的保存;
2)负责多队列PK、PD读请求的分发,读数据的保存;
3)支持多类型的缓存控制器的模式切换;
4)逻辑地址到片外物理地址的映射可重配置。
上述PMU存储上述写报文,发送写释放命令给流量缓存管理单元TMMU;上述TMMU下发报文描述符至队列管理单元QMU;上述QMU通过上述TMMU下发写命令,并透传给上述MMU, 上述QMU完成存储上述写命令后向上述TMMU发写释放信号;
命令队列下发读报文命令,读取读报文数据并将上述读报文数据返回到上述PMU;上述TMMU下发读命令,并读取报文描述符数据。在整个框架中,为了解决兼容性和提高片外访问带宽及效率,主要采用的技术有地址管理、PC均衡、片外地址映射和拼包技术,兼容性是贯穿整个缓存管理过程,从而兼顾所有支持控制器的功能和性能。
在一应用实施例中,上述缓存管理方法包括:
第一步,多控制器切换。通过CPU配置,缓存管理模块识别外接的缓存控制器类型,地址管理子模块根据控制器类型对应的地址区域,通过查表的方式确认偏移地址,从而计算出该类型控制器的逻辑地址;配置不同的缓存控制器类型,选通的对外连接通道数不同。本公开根据片外缓存的需求,设计16个Channel(通道)的片外缓存接口,每个Channel均完整支持AXI4总线的写地址、写数据、写响应、读地址和读数据五个通道。在MMU中,分HBM和DDR模式,支持模式切换可配置,默认为16Channel的HBM。通过CPU配置,可切换到DDR模式下,用户根据自身的需求,可选择连接DDR4或DDR5等不同类型,只要连接的容量大小不小于地址管理的最大节点数对应的数据容量,连接的HBM/DDR空间均可被有效利用;
由于支持不同类型的缓存控制器,各类型的控制器的传输速率并不一定和MMU的系统时钟同步,为了适配不同的控制器和保证数据的线速传输,划分片上的部分缓存区域,以异步(先进先出队列)FIFO的形式,先将数据缓存,同时通过自研逻辑的预读功能,将数据和命令预读出来进入Ready等待状态,当有效缓存控制器到来的时候,通过握手机制同周期将数据送出,这样可以保证最大程度利用片外控制器的带宽,且可靠地将数据流跨时钟处理。
第二步,配置地址映射。地址映射子模块根据配置的类型,读取系统预设的地址映射关系,此地址映射关系可根据应用场景,通过CPU进行重配置,以调整最佳的映射方式;
由于逻辑地址不能直接索引到HBM/DDR的地址引脚上,经过MMU的处理,将逻辑地址转换为能被缓存芯片接受的地址,称为物理地址。因为缓存HBM/DDR的多层次结构,地址映射的方式和片外读写带宽、存储速率及效率有很大关系。
本公开的地址映射,是将每个物理通道(数据总线位宽128bit)分两个虚拟通道(Pseudo Channel,伪信道,简称PC,数据总线位宽64bit),两个虚拟通道共用一组地址和控制总线。
图4是根据本公开实施例的缓存管理方法中的HBM的optionE模式的示意图,在HBM的Option E模式下如图4所示,每个虚拟通道对应一个控制器,控制器以缓存频率的一半运行。Psgnt代表的虚拟通道(PS),是控制器内部仲裁信号,控制器根据PS的物理接口确定Psgnt的值。逻辑在处理的时候,只需把一个HBM stack的8个控制器当成16个控制器对待,一个控制对应一个物理PS。
SID是数据8Hi的特有地址,可以当做一个bank(存储体)地址,8Hi的bank数是4Hi的二倍,分别是32个bank,16个bank。连续4个ID的bank属于一个存储体集合bank_group。8Hi、4Hi分别是8个bank,4个bank。
laddr[N:O]是8Byte地址(一个控制的总线位宽为128bit,划分为两个PS,每个PS的 位宽为64bit),由于HBM控制器的预取倍为4,一次存储256bit,会占用4个地址,所以laddr[1:0]实际逻辑不会分配地址,实际默认为0,控制器也不用。
MMU送出的地址为384B地址(实际地址为2的整数次幂送出),AXI的地址为1字节,PS通道对应的颗粒每个地址存128bit数据,所以四者地址对应关系如下:
图5是根据本公开实施例的缓存管理方法中的地址对应关系的示意图一,使用Samsung HBM2 4Hi4G时,每个PS的存储空间为4GB/16=2Gb,对应关系如图5所示,{A[31:A28],A[4:0]}填充为0:
图6是根据本公开实施例的缓存管理方法中的地址对应关系的示意图二,使用Samsung HBM2 8Hi8G时,每个PS的存储空间为8GB/16=4Gb,对应关系如图6所示,{A[31:A29],A[4:0]}填充为0:
图7是根据本公开实施例的缓存管理方法中的地址对应关系的示意图三,使用Samsung HBM2E 4Hi8G时,每个PS的存储空间为8GB/16=4Gb,对应关系如图7所示,{A[31:A29],A[4:0]}填充为0:
图8是根据本公开实施例的缓存管理方法中的地址对应关系的示意图四,使用Samsung HBM2E 8Hi16G时,每个PS的存储空间为8GB/16=4Gb,对应关系如图8所示,{A[31:A30],A[4:0]}填充为0:
图9是根据本公开实施例的缓存管理方法中的地址映射的示意图一,要充分发挥HBM的带宽,需均衡的使用16个PS,同时通道内增加Bank切换,当前采用4Hi4G的颗粒,逻辑地址和物理地址间的映射关系如图9所示。
图10是根据本公开实施例的缓存管理方法中的地址映射的示意图二,外挂DDR5的时候,可配置连接3个通道的DDR5,其逻辑地址和物理地址的映射关系如图10所示。
图11是根据本公开实施例的缓存管理方法中的地址映射的示意图三,外挂DDR4的时候,可配置连接3个通道的DDR5,其逻辑地址和物理地址的映射关系如图11所示。
第三步,片外地址管理。根据片外缓存的需求,和缓存控制器的结构特点,片外缓存地址是以块block(可配置的固定大小数据单元)为单位进行管理,称为虚拟地址(也称作逻辑地址),处理器发送给MMU的一个数据包需按照block进行切分,以便与block地址进行对应,这样处理后,一个数据包有可能对应多个block数据,即需要产生多个block地址。根据片外缓存控制器的类型不同,地址管理的区间不同。
图12是根据本公开实施例的缓存管理方法的片外地址管理示意图,如图12所示,chunk(中继端口)地址为128K,T[16:13]为16个大链表ID,T[12:10]为每个大链表下面的子链表ID,T[9:0]为每个子链表内链表编号,B[3:0]为每chunk下面blk的个数,C[2:0]为每个blk下面的切片个数(数据+1代表切片个数)。链表ID申请采用RR(Round-Robin(轮询))调度,当一条流进来时会先RR申请选中大链表,大链表内再RR选中一个子链表,然后从子链表内申请一个链表。同一条流会优先耗尽一个chunk中的blk,不同流会重新申请chunk。地址管理的{T[9:0],B[3:0],T[12:10],T[16:13]}可作为一个counter,此地址是连续变 化。当外挂HBM的时候,T[16:13]个16个channel一一对应;当外挂DDR的时候,对应选通3个通道,使用T[14:13]作为channel_ID,即选通0(4、8、12)、1(5、9、13)、2(6、10、14)通道,3、7、11、15通道可配置不使用。由于T[16:13]的低两bit的粒度更小,所以其变化更快,通过地址映射后的缓存访问效率会更好。根据设定的位宽大小,MMU可管理2(4+3+10+4)=2M个节点,总共可管理2Mx3k(blk大小可配,以3k为例)=48G的数据。
为了保证整体的性能和带宽,在地址管理模块需保证各伪信道PC间的均衡访问,避免出现短时间内个别PC频繁访问,而其余PC空闲的情况,如果出现某个PC返回较慢,反馈给MMU忙状态时候,可根据实时命令统计和单PC历史命令统计,减少一次调度,从数据流的整个过程分析,PC之间依然是均衡访问。
第四步,小包拼接技术。将片上PD存储至片外,理论上因其包长的原因,会降低片外存储的带宽和效率,但因其存储数量较大,为了节省片上资源,将PD搬移至片外进行存储,此处采用拼包技术对其进行存储,拼包的目的是通过挤泡将片外需要拼包的PK和PD中的无效字节挤掉,然后将有效字节进行拼接,以此来提高片外缓存的利用率。
拼包的步骤是MMU收到PMU送过来的片外包PD和PK,首先进行PD信息的提取和拼包,同时将PK中的无效字节挤掉,对小包进行移位拼接,提取出有效数据,然后将拼包后的PD信息和提取出有效数据的PK再次进行拼包,将结果发送至片外进行存储。
由于片外的总线位宽是384B,除了单包(不需要拼包),不同种类的拼包情况对于每一拍输出的结果也不一样,小包拼接分两种情况:小包拼小包(PD_len+PK_len≤384B);小包拼小包(PD_len+PK_len>384B)。当第一种小包拼小包时,PD的长度加上提取出的数据长度小于总线位宽,可以一拍输出;当第二种小包拼小包的时候,由于长度大于总线位宽,需要分两拍输出,第一拍输出高384B,剩下的部分在第二拍尾部补零输出,这样会对线速产生影响,需尽量规避。
本公开的方案不仅能够兼容多种类型的缓存控制器,且能够提升片外读写带宽和访问效率。以下通过实测数据,分别举例说明外挂HBM/DDR5/DDR4三种类型控制器的情况下,带宽和效率提升前后的实测数据。
使用标准地址映射的测试外挂HBM的结果如表1所示。本公开的效率提升方法测试结果如表2所示,表3为外挂DDR5的测试数据,表4为外挂DDR4的测试数据,经过实测结果对比分析,各模式下的片外总带宽和存储效率均有提升。
表1
用例编号 片外存储包长(Byte) 总带宽(Gbps) 效率
1 128 862 0.35
2 192 1352 0.55
3 256 1643 0.67
4 288 1607 0.65
5 352 1739 0.71
6 384 1769 0.72
7 768 1648 0.67
8 1152 1634 0.67
9 1536 1628 0.66
10 1568 1601 0.65
表2
用例编号 片外存储包长(Byte) 总带宽(Gbps) 效率
1 128 1602 0.65
2 192 1899 0.77
3 256 2139 0.87
4 288 2063 0.84
5 352 2186 0.89
6 384 2217 0.90
表2(续)
用例编号 片外存储包长(Byte) 总带宽(Gbps) 效率
7 768 2169 0.88
8 1152 2166 0.88
9 1536 2133 0.86
10 1568 2028 0.83
表3
用例编号 片外存储包长(Byte) 总带宽(Gbps) 效率
1 128 136.89 0.59
2 192 154.22 0.67
3 256 163.54 0.71
4 384 168.49 0.73
5 768 170.63 0.74
6 1152 172.45 0.75
7 1184 169.61 0.74
8 1536 173.55 0.75
9 1568 169.95 0.74
10 12288 166.54 0.72
表4
用例编号 片外存储包长(Byte) 总带宽(Gbps) 效率
1 128 200.64 65.35
2 192 247.85 80.73
3 256 267.03 86.98
4 384 272.29 88.69
5 768 270.61 88.15
6 1152 243.78 79.41
7 1536 252.68 82.31
8 1568 232.25 75.65
9 6144 233.99 76.22
10 12288 220.05 71.68
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本公开各个实施例上述的方法。
在本实施例中还提供了一种图形渲染的处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图13本公开实施例的缓存管理装置的结构框图,如图13所示,该装置包括:
识别单元1302,设置为使缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;
确认单元1304,设置为基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;
计算单元1306,设置为基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同。
通过本公开,由于采用了缓存管理单元MMU基于CPU配置信息,识别外接的缓存控制器类型;基于地址管理子模块和上述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;基于上述偏移地址计算出上述缓存控制器类型的逻辑地址;其中,不同的上述缓存控制器类型选通的对外连接通道数不同;因此,实现了同一种框架下不同的控制器的切换,可以解决同一种框架下兼容不同的控制器的问题,进而同一框架下可兼容多种控制器,以及提高存储效率的效果。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
本公开的实施例还提供了一种计算机可读存储控制程序,该计算机可读存储控制程序中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述计算机可读存储控制程序可以包括但不限于:CPU和存储控制器的驱动程序、FPGA与HBM/DDR3/DDR4/DDR5连接的控制程序等。
本公开的实施例还提供了一种控制器,包括缓存器(缓存部分数据)和处理器,该控制器中存储有计算机程序,该控制器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。
在一个示例性实施例中,上述控制器还可以包括协议转换的传输设备,其中,该传输设备和上述控制器连接,实现与缓存控制器的连接。
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种缓存管理方法,包括:
    缓存管理单元MMU基于中央处理器CPU配置信息,识别外接的缓存控制器类型;
    基于地址管理子模块和所述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;
    基于所述偏移地址计算出所述缓存控制器类型的逻辑地址;其中,不同的所述缓存控制器类型选通的对外连接通道数不同。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述MMU的地址映射子模块读取预设的地址映射关系;所述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述MMU的地址映射子模块读取预设的地址映射关系;所述地址映射关系为将片内逻辑地址转换为能被缓存芯片接受的物理地址;
    根据不同应用场景,通过CPU进行重配置,以获取重配置后与所述应用场景对应的地址映射关系。
  4. 根据权利要求1所述的方法,其中,所述方法还包括:
    根据片外缓存的需求和缓存控制器的结构属性,按照块单元block将CPU发送给所述MMU的数据包进行切分,其中,切分后的数据包与block地址相对应,一个数据包对应多个block地址,所述缓存控制器的类型不同时地址管理的区间不同。
  5. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述MMU接收到报文缓存管理单元PMU发送的片外数据,其中,所述片外数据包括报文描述符PD和报文PK;
    提取所述PD的数据信息;
    将所述数据信息进行拼包得到拼包数据,并删除所述PK中的第一数据;
    对所述拼包数据按照有效数据位进行移位拼接,提取出第二数据;
    将所述拼包数据中的所述PD的数据信息和提取出的所述第二数据再次进行拼包,得到目标拼包数据
    将所述目标拼包数据发送至片外缓存进行存储。
  6. 根据权利要求5所述的方法,其中,所述方法还包括:
    在所述PD的数据位长度加上提取出的所述第二数据的数据位长度小于或等于总线位宽时,同时输出所述PD和所述第二数据;
    在所述PD的数据位长度加上提取出的所述第二数据的数据位长度大于总线位宽时,将所述PD和所述第二数据拆分为第一拆分数据和第二拆分数据;
    输出所述PD和所述第一拆分数据;
    将所述第二拆分数据进行补零操作,得到数据位长度等于所述总线位宽的补零处理数据,并输出所述补零处理数据。
  7. 根据权利要求1至6中任一项所述的方法,其中,所述方法还包括:
    报文缓存管理单元PMU下发写报文;
    所述PMU存储所述写报文,发送写释放命令给流量缓存管理单元TMMU;所述TMMU下发报文描述符至队列管理单元QMU;
    所述QMU通过所述TMMU下发写命令,并透传给所述MMU,所述QMU完成存储所述写命令后向所述TMMU发写释放信号;
    命令队列下发读报文命令,读取读报文数据并将所述读报文数据返回到所述PMU;
    所述TMMU下发读命令,并读取报文描述符数据。
  8. 一种缓存管理装置,包括:
    识别单元,设置为使缓存管理单元MMU基于中央处理器CPU配置信息,识别外接的缓存控制器类型;
    确认单元,设置为基于地址管理子模块和所述缓存控制器类型对应的地址区域,以查表的方式确认偏移地址;
    计算单元,设置为基于所述偏移地址计算出所述缓存控制器类型的逻辑地址;其中,不同的所述缓存控制器类型选通的对外连接通道数不同。
  9. 一种计算机可读存储控制程序,所述计算机可读存储程序中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至7任一项中所述的方法。
  10. 一种控制器,包括缓存器和处理器,所述控制器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至7任一项中所述的方法。
PCT/CN2022/115201 2021-09-02 2022-08-26 缓存管理方法和装置、控制程序及控制器 WO2023030195A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111028812.3 2021-09-02
CN202111028812.3A CN115756296A (zh) 2021-09-02 2021-09-02 缓存管理方法和装置、控制程序及控制器

Publications (1)

Publication Number Publication Date
WO2023030195A1 true WO2023030195A1 (zh) 2023-03-09

Family

ID=85332392

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115201 WO2023030195A1 (zh) 2021-09-02 2022-08-26 缓存管理方法和装置、控制程序及控制器

Country Status (2)

Country Link
CN (1) CN115756296A (zh)
WO (1) WO2023030195A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117440273A (zh) * 2023-12-18 2024-01-23 厦门鹏芯半导体有限公司 一种xgspon olt上行数据拼包的系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010042163A1 (en) * 1999-02-26 2001-11-15 Kevin J. Ryan Ram controller interface device for ram compatibility
CN1504900A (zh) * 2002-04-02 2004-06-16 英属盖曼群岛商旭上绘图股份有限公司 自内存读取数据的控制电路及其方法
CN103164368A (zh) * 2013-03-29 2013-06-19 惠州Tcl移动通信有限公司 一种嵌入式设备兼容不同地址映射内存芯片的方法及系统
CN106330741A (zh) * 2015-06-15 2017-01-11 深圳市中兴微电子技术有限公司 一种报文传输方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010042163A1 (en) * 1999-02-26 2001-11-15 Kevin J. Ryan Ram controller interface device for ram compatibility
CN1504900A (zh) * 2002-04-02 2004-06-16 英属盖曼群岛商旭上绘图股份有限公司 自内存读取数据的控制电路及其方法
CN103164368A (zh) * 2013-03-29 2013-06-19 惠州Tcl移动通信有限公司 一种嵌入式设备兼容不同地址映射内存芯片的方法及系统
CN106330741A (zh) * 2015-06-15 2017-01-11 深圳市中兴微电子技术有限公司 一种报文传输方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117440273A (zh) * 2023-12-18 2024-01-23 厦门鹏芯半导体有限公司 一种xgspon olt上行数据拼包的系统及方法
CN117440273B (zh) * 2023-12-18 2024-03-22 厦门鹏芯半导体有限公司 一种xgspon olt上行数据拼包的系统及方法

Also Published As

Publication number Publication date
CN115756296A (zh) 2023-03-07

Similar Documents

Publication Publication Date Title
EP3694165B1 (en) Managing congestion in a network
EP3706394A1 (en) Writes to multiple memory destinations
JP2021190123A (ja) キャッシュコヒーレントインターコネクトを使用するシステム及び方法
WO2016187813A1 (zh) 一种光电混合网络的数据传输方法及装置
WO2023155526A1 (zh) 一种数据流处理方法、存储控制节点及非易失性可读存储介质
US20200403919A1 (en) Offload of acknowledgements to a network device
US9774651B2 (en) Method and apparatus for rapid data distribution
US11700209B2 (en) Multi-path packet descriptor delivery scheme
CN116018790A (zh) 基于接收方的精密拥塞控制
US20230080588A1 (en) Mqtt protocol simulation method and simulation device
WO2021073546A1 (zh) 数据访问方法、装置和第一计算设备
TWI257790B (en) System for protocol processing engine
US20220078119A1 (en) Network interface device with flow control capability
US20160004445A1 (en) Devices and methods for interconnecting server nodes
CN104378161A (zh) 一种基于AXI4总线架构的FCoE协议加速引擎IP核
WO2023030195A1 (zh) 缓存管理方法和装置、控制程序及控制器
US9594702B2 (en) Multi-processor with efficient search key processing
Kissel et al. Evaluating high performance data transfer with rdma-based protocols in wide-area networks
CN107832149B (zh) 一种针对多核处理器动态分组管理的Receive-side Scaling电路
US20220210084A1 (en) Timestamp synchronization between host and network interface device
Qiu et al. Full-kv: Flexible and ultra-low-latency in-memory key-value store system design on cpu-fpga
US9594706B2 (en) Island-based network flow processor with efficient search key processing
US9137167B2 (en) Host ethernet adapter frame forwarding
CN114385534A (zh) 一种数据处理的方法及装置
US20230106771A1 (en) Data Processing Method for Network Adapter and Network Adapter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE