WO2023130316A1 - 一种兼顾服务质量和利用率的缓存动态划分方法及系统 - Google Patents

一种兼顾服务质量和利用率的缓存动态划分方法及系统 Download PDF

Info

Publication number
WO2023130316A1
WO2023130316A1 PCT/CN2022/070522 CN2022070522W WO2023130316A1 WO 2023130316 A1 WO2023130316 A1 WO 2023130316A1 CN 2022070522 W CN2022070522 W CN 2022070522W WO 2023130316 A1 WO2023130316 A1 WO 2023130316A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
service
way
quality
priority
Prior art date
Application number
PCT/CN2022/070522
Other languages
English (en)
French (fr)
Inventor
王诲喆
黄博文
张传奇
王卅
唐丹
包云岗
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to PCT/CN2022/070522 priority Critical patent/WO2023130316A1/zh
Publication of WO2023130316A1 publication Critical patent/WO2023130316A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods

Definitions

  • the invention belongs to the technical field of cloud computing service quality assurance.
  • the invention relates to a method and system for dynamically partitioning buffers that take into account service quality and utilization.
  • Cloud computing deploys different applications on shared hardware resources. Mixed operation of different applications can improve hardware utilization, but these applications will also compete on shared resources, affecting their respective running time or response speed, and ultimately interfere with the quality of service for end users.
  • the last-level large-capacity cache is a typical shared resource in cloud computing scenarios.
  • Cache is generally implemented by multi-way group associative, that is, the entire memory space is divided into several groups according to the address characteristics, and these groups are accessed based on the access address in the form of index, and multiple ways are stored in the group, and each way records the meta information of the cache block and data blocks.
  • cache coherency is required in modern multi-core processors and SoCs, that is, modifications to a cache by one processor core must eventually be perceived by other caches.
  • modifications to a cache by one processor core must eventually be perceived by other caches.
  • CMT Cache Monitoring Technology
  • CAT Cache Allocation Technology
  • CMT first checks system functions and configuration information through an instruction called CPUID, and then maps required threads, operating system instances, and virtual machines to specific resource monitoring identifiers (Resource Monitoring ID, RMID), and binds them to be monitored.
  • event code such as cache capacity
  • resource monitoring can be enabled in real time.
  • the resources used for monitoring are relatively economical, and the returned data will be enlarged by a certain percentage.
  • the balance point of the software reading frequency of monitoring information officially recommended by Intel is 1Hz.
  • CAT is a product technology independent of CMT. It first checks system functions and configuration information through the CPUID instruction, and then maps the required threads, operating system instances, and virtual machines to another identifier: Class of Service (Class of Service, CLOS), both RMID and CLOS are stored in the architecture register, so they can exist and be identified in the system at the same time. Then write the bitmask (bitmask) information corresponding to CLOS into the last-level cache supporting CAT by software.
  • the bitmask is generally used to indicate the proportion of the cache.
  • the overlapping bitmask indicates the part that can be shared. How to interpret it is left to the implementation of the cache. Generally, the bitmask is directly or proportionally corresponding to the road mask On, decide which road each CLOS can occupy.
  • the software platform monitors the implementation cache behavior of different applications through CMT, and then uses CAT to instruct the last-level cache on how to allocate cache capacity for each application based on this data, so as to avoid excessive replacement of caches for high-priority applications. Guaranteed latency and quality of service for high-priority applications.
  • the purpose of the present invention is to solve the problem that the quality of service guarantee and cache utilization rate are difficult to meet simultaneously due to the above-mentioned cache resource competition, and proposes an easy-to-use and practical cache partitioning method that uses hardware online statistics of dead blocks to guide cache division in real time. Sexual cache partitioning mechanism.
  • the present invention proposes a cache dynamic division method that takes into account service quality and utilization, including:
  • Step 1 bind the high-priority process number (Process ID, PID) to the process-level label ID p , set the core label value of the processor core assigned to the high-priority process as the core label ID c , and set the ID p and ID c is spliced to obtain the high-priority application tag ID 0 ;
  • Step 2 The high-priority process is executed on the corresponding core, and the memory access request sent reaches the target cache.
  • the target cache recognizes that the current memory access request is issued by the high-priority application through the tag value ID 0 carried by the memory access request, and judges Whether the memory access request hits in the target cache, if it hits, respond to the hit data and execute step 4, otherwise execute step 3;
  • Step 3 The target cache uses tag ID 0 as the address to access the control plane, and obtains the way mask (way mask) corresponding to ID 0. If the group (set) accessed by the memory access request is not a sampling set (sampling set), use the The way mask is used as a replacement candidate, and the replaced way is obtained through the replacement algorithm of the target cache; if the visited group is a sampling group, all the ways in the visited sampling group are used as replacement candidates, and the replaced way is obtained through the replacement algorithm of the target cache path; perform subsequent access, replacement, and response data based on the replaced path to complete the cache service process;
  • Step 4 When the target cache learns the number of the hit way and the accessed group is a sampling group, it adds 1 to the count value of the way hit counter corresponding to the number; when the preset update cycle is reached, the count value of all the way hit counters Sorting, determine the minimum number of ways that can meet the preset quality of service requirements, use the number of ways to generate a way mask as a division of high-priority applications and store it in the corresponding register of the control plane, and at the same time reverse the generated way mask bit by bit as The division of low-priority applications is stored in the corresponding registers of the control plane; all channel counters are cleared, and a new round of statistics cycle starts.
  • the process of generating the path mask in step 4 includes:
  • step 1 includes: attaching priority application tag ID 0 to the request channel by expanding the on-chip bus protocol, and propagating the priority application tag ID in the system 0 to make the target cache aware.
  • the quality of service requirement is the number of hits within an update period.
  • the above-mentioned cache dynamic division method taking into account service quality and utilization rate, wherein the update period includes: the system timer reaches a time period or the memory access requests of high-priority applications reach a preset number.
  • the present invention also proposes a cache dynamic division system that takes both quality of service and utilization into consideration, including:
  • Module 1 for binding the high-priority process number to the process-level label ID p , setting the core label value of the processor core assigned to the high-priority process as the core label ID c , and splicing ID p and ID c , Get high priority application tag ID 0 ;
  • Module 2 used to make the high-priority process execute on the corresponding core, and the memory access request issued reaches the target cache, and the target cache recognizes that the current memory access request is issued by the high-priority application through the tag value ID 0 carried by the memory access request , and judge whether the memory access request hits in the target cache, if it hits, respond to the hit data and call module 4, otherwise call module 3;
  • Module 3 is used to make the target cache access the control plane with the tag ID 0 as the address, and obtain the way mask corresponding to ID 0. If the group accessed by the memory access request is not a sampling group, use the way mask as a replacement candidate, and pass The replacement algorithm of the target cache obtains the replaced way; if the visited group is a sampling group, all the ways in the visited sampling group are used as replacement candidates, and the replaced way is obtained through the replacement algorithm of the target cache; according to the replaced way, Subsequent access, replacement and response data to complete the cache service process;
  • Module 4 when the target cache knows the number of the hit way and the accessed group is a sampling group, add 1 to the count value of the way hit counter corresponding to the number; when the preset update period is reached, add all the way hit counters Counting value sorting, determine the minimum number of channels that can meet the preset quality of service requirements, use the number of channels to generate a channel mask as a division of high-priority applications and store it in the corresponding register of the control plane, and at the same time extract the generated channel mask bit by bit Instead, the division of low-priority applications is stored in the corresponding registers of the control plane; all channel counters are cleared, and a new round of statistics cycle starts.
  • the process of generating the path mask in module 4 includes:
  • the cache dynamic division system that takes both quality of service and utilization into account, wherein the module 1 includes: by expanding the on-chip bus protocol, attaching the priority application tag ID 0 to the request channel, and propagating the priority application tag ID in the system 0 to make the target cache aware.
  • the quality of service requirement is the number of hits within an update period.
  • the update period includes: the system timer reaches a time period or the memory access requests of high-priority applications reach a preset number.
  • the present invention has the advantages of:
  • the invention improves the real-time performance and ease-of-use of the cache division through hardware real-time statistics of cache utilization and guidance of cache division according to specified service quality parameters.
  • the partitioning method based on useless block statistics can more accurately obtain the necessary cache capacity for high-priority applications, explore shared cache space as much as possible while ensuring service quality, and improve cache utilization.
  • Fig. 1 is a schematic diagram of the implementation, assembly and dissemination process of tags (also referred to by ID) in hardware;
  • Fig. 2 is the process diagram that road mask realizes division
  • Fig. 3 is a schematic diagram of group sampling
  • Figure 4 is an effect diagram of the number of hits sorted
  • Fig. 5 is the flow chart of sorting and generating road mask of the present invention.
  • Fig. 6 is an overall flowchart of the present invention.
  • a cache block is said to be "dead” between the time it was last accessed and when it was replaced. Identifying useless blocks as early as possible can effectively improve the utilization rate of the cache and the performance of the replacement algorithm.
  • identifying the useless block capacity of an application can be used to guide cache partitioning in real time.
  • One way to identify useless blocks is to count the number of visits to the block. If the number of visits is too low, it is considered useless.
  • this application includes the following key technical points:
  • Key point 2 directly generate road masks based on quality of service parameters; technical effect, system users only need to set quality of service target parameters, and can guide the invention to automatically count useless blocks to generate corresponding road masks, divide caches, and ensure Improve cache utilization while improving service quality.
  • the present invention aims to use the cache division mechanism to realize the high-priority application service quality guarantee at the cache level, and at the same time utilize the online statistics mechanism to dynamically determine the cache division ratio that meets the service quality, thereby improving the overall utilization rate of the cache.
  • the overall process is shown in Figure 6, including:
  • Step 1 The management software or the system administrator writes the expected value of service quality for the high-priority application tag ID 0 through the system interface (here is the cache hit rate expectation value of the high-priority application and the statistics to be counted in each statistical cycle as the statistical granularity The number of memory access requests);
  • Step 2 The management software or system administrator selects a high-priority process or program, binds the process number (Process ID, PID) to the process-level label ID p through the system interface, and binds the processor cores assigned to these processes To the core label ID c , ID p and ID c are spliced to take the value of the above-mentioned high-priority application label ID 0 ;
  • Step 3 The process is executed on the corresponding core, and the memory access request sent reaches the target cache, and the target cache recognizes that the current request is sent by a high-priority application through the tag value ID 0 carried by the bus signal;
  • Step 4 When the memory access request is missing in the target cache, the target cache uses ID 0 as the address to access the control plane to obtain the way mask corresponding to ID 0 . If the visited group is not a sampling group, use the road mask as a replacement candidate to obtain the replaced road through the replacement algorithm; if the visited group is a sampling group, then use all the roads in the group as replacement candidates to obtain the replaced road through the replacement algorithm. Finally, the next step is to fetch, replace, and respond to data to complete the cache service process;
  • Step 5 When the memory access request hits in the target cache, the target cache knows the number of the hit way, and when the accessed group is a sampling group, add one to the way hit counter of the corresponding number in the automatic sub-module that performs way mask update ;
  • Step 6 When the memory access requests of high-priority applications reach the preset number, after the automatic division module performs the sorting of the way hit counters as shown in Figure 4, according to the quality of service requirements, that is, the hit rate (because the statistical request in each update cycle The number is fixed, so the hit rate can be represented by the number of integer hits), find out the minimum number of ways that can meet the required number of hits, and use the number of ways to generate a way mask as a division of high-priority applications and store it in the corresponding register of the control plane. The generated road mask is reversed bit by bit, and the division of the low priority application is obtained and stored in the corresponding register of the control plane.
  • a tag with a specific value is regarded as a high priority, and the tag with a value is assigned to a specific process or thread, making it a high priority application.
  • VM refers to a virtual machine
  • the starting point of core label design is the application scenario of virtualization and resource isolation.
  • the core tag is called VM ID in the diagram.
  • old in the figure refers to the core tag "already implemented and existing" to emphasize the introduction and design of process-level tags.
  • the splicing of process-level tags and core tags forms a new official tag, corresponding to the New ID.
  • the core label is the identifier of the physical core, which is stored in the centralized management module of the System on Chip (SoC) and connected to the physical core, and can be rewritten by software; the process-level label is configured through the cgroups mechanism of the Linux system.
  • the identifier of the process is recorded in the control block (Process Control Block, PCB), and is written into a special register in the core when the context is switched.
  • PCB Program Control Block
  • the splicing of process-level tags and core tags is the actual tag in the hardware system. In the current implementation, the core tag occupies the low position, and the process tag occupies the high position.
  • a request is sent from an upper-level cache or master device, such as a processor core, and the target cache receives the request, and reads the path corresponding to the tag from the centralized management module in the form of index access according to the tag field. mask.
  • the function of the tag is to allow devices at the level of the last-level cache to know which processor core and process the earliest source of the request is.
  • the content of the request is generally to obtain a data block corresponding to an address. If the request is missing, the bit string of the candidate replacement way and the mask of the way are bitwise ANDed to obtain the bit string of the replaceable way, which is sent to the inherent replacement algorithm of the cache itself, and then the way to be replaced is determined.
  • Each bit of the final replacement string given by the replacement algorithm logically corresponds to the path at the subscript position.
  • This string is a one-hot code. We can use this bit string as a mask when writing meta information to achieve the replacement operation for the replaced path. .
  • the one-hot code bit string is also converted into a corresponding subscript value (for example, 010 is converted into 2) in the cache, and used by other circuit modules indexed by the subscript serial number.
  • Automatic update of cache division the system user sets two parameters to the automatic division module through software: the proportion of service quality to be guaranteed and the statistical update period.
  • the ratio of quality of service is reflected in the ratio of hit times in the cache.
  • the present invention screens out paths with fewer hit times than the ratio and cancels their monopoly, thereby releasing a certain degree of cache capacity while ensuring the quality of service.
  • the roads selected are located in the non-sampling area, originally allocated to high-priority applications for exclusive use, but the statistics of the present invention through the sampling area think that allocating so many roads is of little help to the high-priority service quality, so the non-sampling area will be allocated These excess ways are de-exclusive, so that the replacement algorithm can use them as replaced ways to be replaced by low-priority applications.
  • Automatic update includes two steps of group sampling statistics and sorting to generate road masks:
  • Group sampling statistics Because of hardware implementation limitations, group sampling must be used for real-time statistics. As shown in FIG. 3 , the present invention selects a group from every 64 cache groups as a statistical unit, accumulates them, and counts the hit times of each way. For example, the hits of packet 0 and packet 1 on the 0th way will be accumulated to the statistics counter corresponding to the 0th way.
  • Sort generates road masks.
  • the automatic division module will sort the counted hit times of each path, as shown in Figure 4, the left is an example without sorting, the right is an example of sorting, and the right figure will hit the smaller ones Ways are grouped together, and more ways can be released under the same hit count guarantee ratio, and the cache utilization rate is higher.
  • the real useless blocks can be arranged outside the interval of the quality of service guarantee ratio, and more cache space can be released.
  • the automatic division module finds the number of road divisions that meet the quality of service guarantee ratio, and generates a new road mask accordingly, so as to realize the automatic update of the cache division.
  • the present invention also proposes a cache dynamic division system that takes both quality of service and utilization into consideration, including:
  • Module 1 for binding the high-priority process number to the process-level label ID p , setting the core label value of the processor core assigned to the high-priority process as the core label ID c , and splicing ID p and ID c , Get high priority application tag ID 0 ;
  • Module 2 used to make the high-priority process execute on the corresponding core, and the memory access request issued reaches the target cache, and the target cache recognizes that the current memory access request is issued by the high-priority application through the tag value ID 0 carried by the memory access request , and judge whether the memory access request hits in the target cache, if it hits, respond to the hit data and call module 4, otherwise call module 3;
  • Module 3 is used to make the target cache access the control plane with the tag ID 0 as the address, and obtain the way mask corresponding to ID 0. If the group accessed by the memory access request is not a sampling group, use the way mask as a replacement candidate, and pass The replacement algorithm of the target cache obtains the replaced way; if the visited group is a sampling group, all the ways in the visited sampling group are used as replacement candidates, and the replaced way is obtained through the replacement algorithm of the target cache; according to the replaced way, Subsequent access, replacement and response data to complete the cache service process;
  • Module 4 when the target cache knows the number of the hit way and the accessed group is a sampling group, add 1 to the count value of the way hit counter corresponding to the number; when the preset update period is reached, add all the way hit counters Counting value sorting, determine the minimum number of channels that can meet the preset quality of service requirements, use the number of channels to generate a channel mask as a division of high-priority applications and store it in the corresponding register of the control plane, and at the same time extract the generated channel mask bit by bit Instead, the division of low-priority applications is stored in the corresponding registers of the control plane; all channel counters are cleared, and a new round of statistics cycle starts.
  • the process of generating the path mask in module 4 includes:
  • the cache dynamic division system that takes both quality of service and utilization into account, wherein the module 1 includes: by expanding the on-chip bus protocol, attaching the priority application tag ID 0 to the request channel, and propagating the priority application tag ID in the system 0 to make the target cache aware.
  • the quality of service requirement is the number of hits within an update period.
  • the update period includes: the system timer reaches a time period or the memory access requests of high-priority applications reach a preset number.
  • the present invention proposes a cache dynamic division method and system that takes both service quality and utilization into account, uses group sampling and hardware sorting network to count useless block information in real time, group sampling makes hardware implementation feasible, and sorts statistical information through hardware sorting network After that, the number of useless blocks can be excavated to the greatest extent; the present invention also includes directly generating road masks based on quality of service parameters, so that system users only need to set quality of service target parameters to guide the invention to automatically count useless blocks to generate corresponding road masks.
  • Mask divide the cache, improve the cache utilization while ensuring the quality of service.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种兼顾服务质量和利用率的缓存动态划分方法和系统,采用分组采样和硬件排序网络实时统计无用块信息,分组采样使得硬件实现具有可行性,通过硬件排序网络对统计信息进行排序后能够最大程度发掘出无用块数量;还包括基于服务质量参数直接生成路掩码,使得系统使用人员仅需设置服务质量目标参数,即可指导自动统计无用块以生成对应的路掩码,划分缓存,保障服务质量的同时提高缓存利用率。

Description

一种兼顾服务质量和利用率的缓存动态划分方法及系统 技术领域
本发明属于云计算服务质量保障技术领域。并特别涉及一种兼顾服务质量和利用率的缓存动态划分方法及系统。
背景技术
云计算将不同的应用部署在共享的硬件资源上。不同应用的混合运行可以提高硬件的利用率,但是这些应用也会在共享资源上进行竞争,影响各自的运行时间或者响应速度,最终干扰到面相终端用户的服务质量。末级大容量高速缓存是云计算场景中一种典型的共享资源。高速缓存一般采用多路组相联实现,即将整个内存空间按地址特征分为若干组,这些组基于访存地址以索引的方式访问,组内存储多个路,每一路记录缓存块的元信息和数据块。访问对应分组后,通过比对分组内每一路元信息中的地址信息,从而判断要访问的对应地址的数据块是否在缓存中,如果缺失,则需要选出组内的一路缓存块,将其写回下级存储,并从下级存储取回需要的数据块。对于共享的应用,则会出现一个应用将另一个应用需要用到的缓存块替换出缓存的情况,这就会导致后者在将来产生一个不必要的访存缺失。因为访存缺失时等待周期相比命中时高出一个数量级,所以这将会严重干扰被干扰应用的执行时间和响应速度。此外,现代多核处理器和片上系统中需要缓存一致性,即一个处理器核心对缓存的修改要能让其他缓存最终感知到。在替换时,为了处理被替换块的一致性,有可能需要访问所有处理器核心和对应的缓存设备,带来更多延迟周期,这进一步放大了混合部署应用的缓存干扰带来的服务质量损失。
针对这一问题,英特尔公司在其面向服务器领域的至强系列处理器中实现了缓存监控技术(Cache Monitoring Technology,CMT)和缓存分配技术(Cache Allocation Technology,CAT)。CMT首先通过一种名为CPUID的指令检查系统功能和配置信息,接着将需要的线程、操作系统实例和虚拟机映射到特定的资源监控标识符(Resource Monitoring ID,RMID),并绑定需要监控的事件代码(比如缓存容量),之后即可实时开启资源监控,出于产品实现方面 的考虑,监控所使用的资源较为节约,返回的数据会按一定比例进行放大。英特尔官方推荐的监控信息的软件读取频率的平衡点为1Hz,高于该频率能获取更加真实的软件行为信息,低于该频率则读取开销更低。CAT是独立于CMT的产品技术,它首先也要通过CPUID指令检查系统功能和配置信息,然后将需要的线程、操作系统实例和虚拟机映射到另一种标识符:服务分类(Class of Service,CLOS)上,RMID和CLOS均存储于体系结构寄存器中,所以可以在系统中同时存在并识别。然后通过软件将对应CLOS的位掩码(bitmask)信息写入支持CAT的末级缓存中。位掩码一般用于表示所占的缓存的比例,重叠的位掩码表示可以共享的部分,具体如何解释则交由缓存的实现负责,一般将位掩码直接或按比例对应到路掩码上,决定每个CLOS可以占有哪一路。
在英特尔的技术方案中,软件平台通过CMT监控不同应用的实施缓存行为,再基于此数据通过CAT指导末级缓存如何为每个应用分配缓存容量,从而避免高优先级应用的缓存被过度替换,保证了高优先级应用的延迟和服务质量。
在英特尔的方案中,CAT和CMT均需要通过软件进行使用,其中CAT为静态分配缓存容量比例,需要软件实时读取CMT提供的统计数据进行更新。其对服务质量的保障,还需要依赖软件的具体调教和适配,在使用和部署上存在一定的门槛,且调控的实时性没有充足的保障。
发明公开
本发明的目的是解决上述缓存资源竞争导致服务质量保障和缓存利用率难以同时满足的问题,提出了一种通过硬件在线统计无用块(dead block)以实时指导缓存划分的具有易用性和实用性的缓存划分机制。
针对现有技术的不足,本发明提出一种兼顾服务质量和利用率的缓存动态划分方法,其中包括:
步骤1、将高优先级进程号(Process ID,PID)绑定到进程级标签ID p,将分配给高优先级进程的处理器核心的核心标签值设置为核心标签ID c,将ID p和ID c拼接,得到高优先级应用标签ID 0
步骤2、该高优先级进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过访存请求携带的标签值ID 0识别出当前访存请求由高优先级 应用发出,并判断该访存请求在该目标缓存中是否命中,若命中则应答命中的数据并执行步骤4,否则执行步骤3;
步骤3、目标缓存以标签ID 0为地址访问控制平面,获得ID 0对应的路掩码(way mask),若该访存请求访问的组(set)不是采样组(sampling set),则以该路掩码作为替换候选,通过该目标缓存的替换算法得到被替换路;若访问的组是采样组,则以访问的采样组内全部路作为替换候选,通过该目标缓存的替换算法得到被替换路;根据该被替换路进行后续访存、替换和应答数据,以完成缓存服务流程;
步骤4、该目标缓存得知命中路的编号并且访问的组为采样组时,将编号对应的路命中计数器的计数值加1;达到预设的更新周期时,将所有的路命中计数器的计数值排序,确定能够满足预设的服务质量要求的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反作为低优先级应用的划分存入控制平面对应寄存器;所有路计数器清零,开始新一轮统计周期。
所述的兼顾服务质量和利用率的缓存动态划分方法,其中步骤4中生成路掩码的过程包括:
对各路命中计数器的计数值按照从大到小的顺序进行排序,得到有序序列,根据从大到小的顺序分别统计1路计数值之和、2路计数值之和、3路计数值之和,直到统计该目标缓存所有路计数值之和,得到统计序列,将该统计序列中小于该服务质量要求的数值置1,大于等于该服务质量要求的数值置0,以生成该路掩码。
所述的兼顾服务质量和利用率的缓存动态划分方法,其中该步骤1包括:通过拓展片内总线协议,将优先级应用标签ID 0附属在请求通道上,在系统中传播优先级应用标签ID 0以让目标缓存能够识别。
所述的兼顾服务质量和利用率的缓存动态划分方法,其中该服务质量要求为更新周期内的命中次数。
述的兼顾服务质量和利用率的缓存动态划分方法,其中该更新周期包括:系统计时器达到时间周期或高优先级应用的访存请求达到预设数量。
本发明还提出了一种兼顾服务质量和利用率的缓存动态划分系统,其中包括:
模块1,用于将高优先级进程号绑定到进程级标签ID p,将分配给高优先级进程的处理器核心的核心标签值设置为核心标签ID c,将ID p和ID c拼接,得到高优先级应用标签ID 0
模块2,用于使该高优先级进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过访存请求携带的标签值ID 0识别出当前访存请求由高优先级应用发出,并判断该访存请求在该目标缓存中是否命中,若命中则应答命中的数据并调用模块4,否则调用模块3;
模块3,用于使目标缓存以标签ID 0为地址访问控制平面,获得ID 0对应的路掩码,若该访存请求访问的组不是采样组,则以该路掩码作为替换候选,通过该目标缓存的替换算法得到被替换路;若访问的组是采样组,则以访问的采样组内全部路作为替换候选,通过该目标缓存的替换算法得到被替换路;根据该被替换路进行后续访存、替换和应答数据,以完成缓存服务流程;
模块4,用于该目标缓存得知命中路的编号并且访问的组为采样组时,将编号对应的路命中计数器的计数值加1;达到预设的更新周期时,将所有的路命中计数器的计数值排序,确定能够满足预设的服务质量要求的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反作为低优先级应用的划分存入控制平面对应寄存器;所有路计数器清零,开始新一轮统计周期。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中模块4中生成路掩码的过程包括:
对各路命中计数器的计数值按照从大到小的顺序进行排序,得到有序序列,根据从大到小的顺序分别统计1路计数值之和、2路计数值之和、3路计数值之和,直到统计该目标缓存所有路计数值之和,得到统计序列,将该统计序列中小于该服务质量要求的数值置1,大于等于该服务质量要求的数值置0,以生成该路掩码。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该模块1包括:通过拓展片内总线协议,将优先级应用标签ID 0附属在请求通道上,在系统中传播优先级应用标签ID 0以让目标缓存能够识别。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该服务质量要求为更新周期内的命中次数。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该更新周期包括:系统计时器达到时间周期或高优先级应用的访存请求达到预设数量。
由以上方案可知,本发明的优点在于:
该发明创造与现有技术比,通过硬件实时统计缓存利用情况,并根据指定的服务质量参数指导缓存划分,提高了缓存划分的实时性和易用性。基于无用块统计的划分方法相比静态划分,能够更加准确地获取高优先级应用所必需的缓存容量,在保障服务质量的前提下尽可能发掘出共享缓存空间,提高缓存利用率。
附图简要说明
图1为标签(亦用ID指代)在硬件中的实现、组装与传播过程示意图;
图2为路掩码实现划分的过程图;
图3为分组采样示意图;
图4为命中次数排序效果图;
图5为本发明排序生成路掩码的流程图;
图6为本发明整体流程图。
实现本发明的最佳方式
发明人在研究具有实用性的共享缓存服务质量保障技术方案时,发现现有CAT的缺陷在于纯软件配置,缺乏自动更新功能,实时性较差,且该方案本身并不直接提供对目标服务质量的设定,需要第三方软件进行适配和拟合。为了实现更好且更直接的缓存服务质量保障方案,发明人决定通过硬件自动统计与实时更新路掩码来提高缓存划分的实时性、准确度和便利性。但是硬件资源调控普遍存在着资源隔离和利用率的两难问题,且硬件难以实现复杂的算法来处理统计信息形成分配决策。
发明人经过对缓存替换算法历史工作进行调研发现,缓存中存在无用块的优化空间。一个缓存块在它最后一次访问到被替换期间称为“无用”(dead)。尽早地将无用块识别出来,便可以有效地提升缓存的利用率以及替换算法的性能。在共享场景里,将一个应用的无用块容量识别出来,便可以用于实时指导缓存划分。一种识别无用块的方法便是对该块的访问次数进行统计,如果访问 次数过低,则认为是无用的。
具体来说本申请包括下述关键技术点:
关键点1,采用分组采样和硬件排序网络实时统计无用块信息;技术效果,分组采样使得硬件实现具有可行性,通过硬件排序网络对统计信息进行排序后能够最大程度发掘出无用块数量;
关键点2,基于服务质量参数直接生成路掩码;技术效果,使得系统使用人员仅需设置服务质量目标参数,即可指导该发明自动统计无用块以生成对应的路掩码,划分缓存,保障服务质量的同时提高缓存利用率。
为让本发明的上述特征和效果能阐述得更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。
本发明旨在利用缓存划分机制实现缓存层面的高优先级应用服务质量保障的同时,利用在线统计机制动态决定满足服务质量的缓存划分比例,从而提高缓存的整体利用率。其整体流程如图6所示包括:
步骤1、管理软件或系统管理员通过系统界面针对高优先级应用标签ID 0写入服务质量期望值(此处为高优先级应用的缓存命中率期望值以及作为统计粒度的每个统计周期内要统计的访存请求数量);
步骤2、管理软件或系统管理员选择高优先级的进程或者程序,通过系统界面将进程号(Process ID,PID)绑定到进程级标签ID p,将分配给这些进程的处理器核心绑定到核心标签ID c,ID p和ID c拼接后取值为上述高优先级应用标签ID 0
步骤3、该进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过总线信号携带的标签值ID 0识别出当前请求由高优先级应用发出;
步骤4、该访存请求在目标缓存中缺失时,目标缓存以ID 0为地址访问控制平面,获得ID 0对应的路掩码。如果访问的组不是采样组,那么以该路掩码作为替换候选通过替换算法得到被替换路;如果访问的组是采样组,那么以组内全部路作为替换候选通过替换算法得到被替换路。最后进行下一步访存、替换、应答数据,完成缓存服务流程;
步骤5、该访存请求在目标缓存中命中时,目标缓存得知命中路的编号,且访问的组为采样组时,向执行路掩码更新的自动化分模块中对应编号的路命中计数器加一;
步骤6、高优先级应用的访存请求达到预设数量时,自动划分模块执行如图4所示的路命中计数器排序后,根据服务质量要求,即命中率(因为每次更新周期内统计请求数量固定,所以命中率可以用整型命中次数表示),寻找出能够满足要求命中次数的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反,得到低优先级应用的划分存入控制平面对应寄存器。
如图1所示,硬件系统中将特定取值的标签作为高优先级,将这种取值的标签赋值给特定进程或线程,使其成为高优先级应用。设置及传播识别高优先级应用的标识符(标签):标签分为核心标签和进程级标签,设置核心标签和进程级标签并进行区分可细化控制场合的粒度,例如虚拟化场景中一个用户实例绑定在特定的物理核上,支持设置核心标签并与进程级标签进行区分,在这种虚拟化场景中可以避免不同用户的相同进程级标签在硬件上产生混淆。
如图1所示,VM指虚拟机,核心标签设计的出发点是虚拟化和资源隔离应用场景。所以核心标签在图中被称为VM ID。此外图中的old指的是核心标签“已经实现并存在”,以强调进程级标签的引入和设计,进程级标签和核心标签拼接形成新的正式的标签,对应New ID。
核心标签是物理核心的标志符,存储在片上系统(System on Chip,SoC)的集中管理模块并连入物理核心中,可以由软件改写;进程级标签通过Linux系统的cgroups机制进行配置,在进程控制块(Process Control Block,PCB)中记录进程的标识符,并在上下文切换时写入核心内专门的寄存器中。进程级标签和核心标签拼接即为硬件系统里实际的标签,目前的实现中,核心标签占据低位,进程标签占据高位。通过拓展片内总线协议,将标签附属在请求通道上,在系统中传播标签以让目标缓存识别。
如图2所示,实施路划分:来自上级缓存或者master设备,比如处理器核发出请求,目标缓存收到请求,根据其中的标签字段从集中管理模块以索引访问的形式读出对应标签的路掩码。标签的作用就是让末级缓存这种层次的设备也能知道请求的最早来源是哪个处理器核心和进程。请求的内容一般是获取一个地址对应的数据块。如果该请求发生缺失,则将候选替换路的比特串与该路掩码按位与得到划分后的可替换路比特串,送入缓存自身固有的替换算法,进而决定要替换的路。
替换算法给出的最终的替换串每个比特逻辑上对应下标位置的路,这个串是独热码,我们可以将该比特串作为写meta信息时掩码,达到对被替换路的替换操作。缓存中还会将该独热码比特串转换成对应的下标值(比如010转换为2),给其他按下标序号进行索引的电路模块使用。
缓存划分自动更新:系统用户向自动划分模块通过软件设定两个参数:要保证的服务质量比例和统计更新周期。服务质量比例在缓存中体现为命中次数的比例,本发明将低于该比例的贡献命中次数较少的路筛选出来,取消其独占,从而在保障服务质量的同时释放一定程度的缓存容量。其中筛选出来的路位于非采样区,原本分配给高优先级应用独占,但是本发明经过采样区的统计认为分配这么多路对高优先级的服务质量帮助甚微,那么就会将非采样区的这些多余数量的路取消独占,让替换算法可以将它们作为被替换路被低优先级应用替换占用。
自动更新包含分组采样统计及排序生成路掩码两步:
分组采样统计。因为硬件实现限制,必须采用分组采样进行实时统计。如图3所示,本发明每64个缓存分组中选一组作为统计单元,将其累加,统计每一路的命中次数。比如,分组0和分组1在第0路的命中都会累加到第0路对应的统计计数器上。
排序生成路掩码。到了设定的更新周期后,自动划分模块将统计到的每一路命中次数进行排序,如图4所示,左边是没有采用排序的例子,右边是采用排序的例子,右图将命中较小的路聚在一起,相同的命中次数保障比例下可以释放出更多的路,缓存利用率更高。排序后能够将真实的无用块排列在服务质量保障比例的区间之外,能够释放更多的缓存空间。自动划分模块找到满足服务质量保障比例的路划分的数量,据此生成新的路掩码,从而实现缓存划分的自动更新。
以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。
本发明还提出了一种兼顾服务质量和利用率的缓存动态划分系统,其中包括:
模块1,用于将高优先级进程号绑定到进程级标签ID p,将分配给高优先级进程的处理器核心的核心标签值设置为核心标签ID c,将ID p和ID c拼接,得到高优先级应用标签ID 0
模块2,用于使该高优先级进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过访存请求携带的标签值ID 0识别出当前访存请求由高优先级应用发出,并判断该访存请求在该目标缓存中是否命中,若命中则应答命中的数据并调用模块4,否则调用模块3;
模块3,用于使目标缓存以标签ID 0为地址访问控制平面,获得ID 0对应的路掩码,若该访存请求访问的组不是采样组,则以该路掩码作为替换候选,通过该目标缓存的替换算法得到被替换路;若访问的组是采样组,则以访问的采样组内全部路作为替换候选,通过该目标缓存的替换算法得到被替换路;根据该被替换路进行后续访存、替换和应答数据,以完成缓存服务流程;
模块4,用于该目标缓存得知命中路的编号并且访问的组为采样组时,将编号对应的路命中计数器的计数值加1;达到预设的更新周期时,将所有的路命中计数器的计数值排序,确定能够满足预设的服务质量要求的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反作为低优先级应用的划分存入控制平面对应寄存器;所有路计数器清零,开始新一轮统计周期。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中模块4中生成路掩码的过程包括:
对各路命中计数器的计数值按照从大到小的顺序进行排序,得到有序序列,根据从大到小的顺序分别统计1路计数值之和、2路计数值之和、3路计数值之和,直到统计该目标缓存所有路计数值之和,得到统计序列,将该统计序列中小于该服务质量要求的数值置1,大于等于该服务质量要求的数值置0,以生成该路掩码。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该模块1包括:通过拓展片内总线协议,将优先级应用标签ID 0附属在请求通道上,在系统中传播优先级应用标签ID 0以让目标缓存能够识别。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该服务质量要求为更新周期内的命中次数。
所述的兼顾服务质量和利用率的缓存动态划分系统,其中该更新周期包括:系统计时器达到时间周期或高优先级应用的访存请求达到预设数量。
工业应用性
本发明提出一种兼顾服务质量和利用率的缓存动态划分方法和系统,采用分组采样和硬件排序网络实时统计无用块信息,分组采样使得硬件实现具有可行性,通过硬件排序网络对统计信息进行排序后能够最大程度发掘出无用块数量;本发明还包括基于服务质量参数直接生成路掩码,使得系统使用人员仅需设置服务质量目标参数,即可指导该发明自动统计无用块以生成对应的路掩码,划分缓存,保障服务质量的同时提高缓存利用率。

Claims (10)

  1. 一种兼顾服务质量和利用率的缓存动态划分方法,其特征在于,包括:
    步骤1、将高优先级进程号绑定到进程级标签ID p,将分配给高优先级进程的处理器核心的核心标签值设置为核心标签ID c,将ID p和ID c拼接,得到高优先级应用标签ID 0
    步骤2、该高优先级进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过访存请求携带的标签值ID 0识别出当前访存请求由高优先级应用发出,并判断该访存请求在该目标缓存中是否命中,若命中则应答命中的数据并执行步骤4,否则执行步骤3;
    步骤3、目标缓存以标签ID 0为地址访问控制平面,获得ID 0对应的路掩码,若该访存请求访问的组不是采样组,则以该路掩码作为替换候选,通过该目标缓存的替换算法得到被替换路;若访问的组是采样组,则以访问的采样组内全部路作为替换候选,通过该目标缓存的替换算法得到被替换路;根据该被替换路进行后续访存、替换和应答数据,以完成缓存服务流程;
    步骤4、该目标缓存得知命中路的编号并且访问的组为采样组时,将编号对应的路命中计数器的计数值加1;达到预设的更新周期时,将所有的路命中计数器的计数值排序,确定能够满足预设的服务质量要求的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反作为低优先级应用的划分存入控制平面对应寄存器;所有路计数器清零,开始新一轮统计周期。
  2. 如权利要求1所述的兼顾服务质量和利用率的缓存动态划分方法,其特征在于,步骤4中生成路掩码的过程包括:
    对各路命中计数器的计数值按照从大到小的顺序进行排序,得到有序序列,根据从大到小的顺序分别统计1路计数值之和、2路计数值之和、3路计数值之和,直到统计该目标缓存所有路计数值之和,得到统计序列,将该统计序列中小于该服务质量要求的数值置1,大于等于该服务质量要求的数值置0,以生成该路掩码。
  3. 如权利要求1所述的兼顾服务质量和利用率的缓存动态划分方法,其特 征在于,该步骤1包括:通过拓展片内总线协议,将优先级应用标签ID 0附属在请求通道上,在系统中传播优先级应用标签ID 0以让目标缓存能够识别。
  4. 如权利要求1所述的兼顾服务质量和利用率的缓存动态划分方法,其特征在于,该服务质量要求为更新周期内的命中次数。
  5. 如权利要求1所述的兼顾服务质量和利用率的缓存动态划分方法,其特征在于,该更新周期包括:系统计时器达到时间周期或高优先级应用的访存请求达到预设数量。
  6. 一种兼顾服务质量和利用率的缓存动态划分系统,其特征在于,包括:
    模块1,用于将高优先级进程号绑定到进程级标签ID p,将分配给高优先级进程的处理器核心的核心标签值设置为核心标签ID c,将ID p和ID c拼接,得到高优先级应用标签ID 0
    模块2,用于使该高优先级进程在对应核心上执行,发出的访存请求到达目标缓存,目标缓存通过访存请求携带的标签值ID 0识别出当前访存请求由高优先级应用发出,并判断该访存请求在该目标缓存中是否命中,若命中则应答命中的数据并调用模块4,否则调用模块3;
    模块3,用于使目标缓存以标签ID 0为地址访问控制平面,获得ID 0对应的路掩码,若该访存请求访问的组不是采样组,则以该路掩码作为替换候选,通过该目标缓存的替换算法得到被替换路;若访问的组是采样组,则以访问的采样组内全部路作为替换候选,通过该目标缓存的替换算法得到被替换路;根据该被替换路进行后续访存、替换和应答数据,以完成缓存服务流程;
    模块4,用于该目标缓存得知命中路的编号并且访问的组为采样组时,将编号对应的路命中计数器的计数值加1;达到预设的更新周期时,将所有的路命中计数器的计数值排序,确定能够满足预设的服务质量要求的最少路数,以该路数生成路掩码作为高优先级应用的划分存入控制平面对应寄存器,同时将生成的路掩码按位取反作为低优先级应用的划分存入控制平面对应寄存器;所有路计数器清零,开始新一轮统计周期。
  7. 如权利要求6所述的兼顾服务质量和利用率的缓存动态划分系统,其特征在于,模块4中生成路掩码的过程包括:
    对各路命中计数器的计数值按照从大到小的顺序进行排序,得到有序序列,根据从大到小的顺序分别统计1路计数值之和、2路计数值之和、3路计 数值之和,直到统计该目标缓存所有路计数值之和,得到统计序列,将该统计序列中小于该服务质量要求的数值置1,大于等于该服务质量要求的数值置0,以生成该路掩码。
  8. 如权利要求6所述的兼顾服务质量和利用率的缓存动态划分系统,其特征在于,该模块1包括:通过拓展片内总线协议,将优先级应用标签ID 0附属在请求通道上,在系统中传播优先级应用标签ID 0以让目标缓存能够识别。
  9. 如权利要求6所述的兼顾服务质量和利用率的缓存动态划分系统,其特征在于,该服务质量要求为更新周期内的命中次数。
  10. 如权利要求6所述的兼顾服务质量和利用率的缓存动态划分系统,其特征在于,该更新周期包括:系统计时器达到时间周期或高优先级应用的访存请求达到预设数量。
PCT/CN2022/070522 2022-01-06 2022-01-06 一种兼顾服务质量和利用率的缓存动态划分方法及系统 WO2023130316A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/070522 WO2023130316A1 (zh) 2022-01-06 2022-01-06 一种兼顾服务质量和利用率的缓存动态划分方法及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/070522 WO2023130316A1 (zh) 2022-01-06 2022-01-06 一种兼顾服务质量和利用率的缓存动态划分方法及系统

Publications (1)

Publication Number Publication Date
WO2023130316A1 true WO2023130316A1 (zh) 2023-07-13

Family

ID=87072924

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070522 WO2023130316A1 (zh) 2022-01-06 2022-01-06 一种兼顾服务质量和利用率的缓存动态划分方法及系统

Country Status (1)

Country Link
WO (1) WO2023130316A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909258A (zh) * 2024-03-18 2024-04-19 北京开源芯片研究院 一种处理器缓存的优化方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006082554A2 (en) * 2005-02-02 2006-08-10 Koninklijke Philips Electronics N.V. Data processing system comprising a cache unit
CN102609362A (zh) * 2012-01-30 2012-07-25 复旦大学 一种共享高速缓存动态划分方法与电路
CN106126434A (zh) * 2016-06-22 2016-11-16 中国科学院计算技术研究所 中央处理器的缓存区的缓存行的替换方法及其装置
CN106358215A (zh) * 2016-08-31 2017-01-25 上海交通大学 基于数据缓存的中继网络下的协作方法
CN113505087A (zh) * 2021-06-29 2021-10-15 中国科学院计算技术研究所 一种兼顾服务质量和利用率的缓存动态划分方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006082554A2 (en) * 2005-02-02 2006-08-10 Koninklijke Philips Electronics N.V. Data processing system comprising a cache unit
CN102609362A (zh) * 2012-01-30 2012-07-25 复旦大学 一种共享高速缓存动态划分方法与电路
CN106126434A (zh) * 2016-06-22 2016-11-16 中国科学院计算技术研究所 中央处理器的缓存区的缓存行的替换方法及其装置
CN106358215A (zh) * 2016-08-31 2017-01-25 上海交通大学 基于数据缓存的中继网络下的协作方法
CN113505087A (zh) * 2021-06-29 2021-10-15 中国科学院计算技术研究所 一种兼顾服务质量和利用率的缓存动态划分方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117909258A (zh) * 2024-03-18 2024-04-19 北京开源芯片研究院 一种处理器缓存的优化方法、装置、电子设备及存储介质
CN117909258B (zh) * 2024-03-18 2024-05-14 北京开源芯片研究院 一种处理器缓存的优化方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
US6381676B2 (en) Cache management for a multi-threaded processor
US7921276B2 (en) Applying quality of service (QoS) to a translation lookaside buffer (TLB)
KR101136610B1 (ko) 시퀀서 어드레스 관리
US10838864B2 (en) Prioritizing local and remote memory access in a non-uniform memory access architecture
JP5413001B2 (ja) キャッシュメモリ
US8495318B2 (en) Memory page management in a tiered memory system
CN103038755B (zh) 多节点系统中数据缓存的方法、装置及系统
KR101850318B1 (ko) 가상 메모리 관리 장치 및 방법
JP2004030574A (ja) 動的にキャッシュ・メモリの割り当てを行うプロセッサ集積回路
US7913123B2 (en) Concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers
US20080071939A1 (en) System and method for performance monitoring and reconfiguring computer system with hardware monitor
TW201633142A (zh) 基於記憶體訪問類型以組塊來分配的快取記憶體
JP6166616B2 (ja) 情報処理方法、情報処理装置及びプログラム
WO2023130316A1 (zh) 一种兼顾服务质量和利用率的缓存动态划分方法及系统
US8296552B2 (en) Dynamically migrating channels
Li et al. Elastic-cache: GPU cache architecture for efficient fine-and coarse-grained cache-line management
CN113505087B (zh) 一种兼顾服务质量和利用率的缓存动态划分方法及系统
CN102662891B (zh) 基于亲和性感知的dma缓冲区管理方法及装置
CN112540934B (zh) 保证多个延迟关键程序共同执行时服务质量的方法及系统
EP4298525A1 (en) Processor support for using cache way-locking to simultaneously record plural execution contexts into independent execution traces
US7536674B2 (en) Method and system for configuring network processing software to exploit packet flow data locality
US10990543B1 (en) Apparatus and method for arbitrating access to a set of resources
Scolari et al. A survey on recent hardware and software-level cache management techniques
WO2023241655A1 (zh) 数据处理方法、装置、电子设备以及计算机可读存储介质
CN104932990A (zh) 一种高速缓冲存储器中数据块的替换方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22917791

Country of ref document: EP

Kind code of ref document: A1