WO2023103296A1 - 一种写数据高速缓存的方法、系统、设备和存储介质 - Google Patents

一种写数据高速缓存的方法、系统、设备和存储介质 Download PDF

Info

Publication number
WO2023103296A1
WO2023103296A1 PCT/CN2022/095380 CN2022095380W WO2023103296A1 WO 2023103296 A1 WO2023103296 A1 WO 2023103296A1 CN 2022095380 W CN2022095380 W CN 2022095380W WO 2023103296 A1 WO2023103296 A1 WO 2023103296A1
Authority
WO
WIPO (PCT)
Prior art keywords
page table
control page
control
host
response
Prior art date
Application number
PCT/CN2022/095380
Other languages
English (en)
French (fr)
Inventor
王江
李树青
孙华锦
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023103296A1 publication Critical patent/WO2023103296A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements

Definitions

  • the present application relates to a method, system, device and storage medium for writing data cache.
  • the computing storage architecture reduces the corresponding data by offloading data calculation from the host CPU (Central Processing Unit, central processing unit) to the data processing acceleration unit close to the storage unit. The moving operation, thus releasing the system performance as much as possible.
  • CPU Central Processing Unit, central processing unit
  • Computational Storage introduces three product forms: Computational Storage Processor (CSP), Computational Storage Drive (CSD) and Computational Storage Array (CSA), and looks forward to redefining the architecture
  • CSP Computational Storage Processor
  • CSD Computational Storage Drive
  • CSA Computational Storage Array
  • a typical application is to use the built-in DDR in the computing storage system as a data cache (cache), cache the data of the host read/write operation, and the purpose is to reduce the system IO (Input Output, input and output ) operation delay.
  • IO Input Output, input and output
  • the computing storage system will send a response signal to the host after moving the data to be dropped from the host DDR to the power-failure protected area of the local DDR, so that the host thinks The data has completed the "drop" operation.
  • the calculation and storage system for this data block is still carried out step by step within the system. After the processing is completed, it will be merged and other conditions will be met, and finally the storage operation will be performed.
  • the inventor realizes that how to realize the write IO data cache function in the computing storage-oriented general computing acceleration architecture is a problem that needs to be solved.
  • one aspect of the embodiments of the present application provides a method for writing data cache, including the following steps: in response to receiving a write data operation command sent by the host, creating a control page table and writing to the Filling in a plurality of control blocks in turn in the control page table; submitting the entry pointer of the first control block to the work queue scheduling engine to intersperse and execute the tasks corresponding to the multiple control blocks in the work queue scheduling engine; Send a completion response to the host in advance and notify the firmware to perform subsequent data processing and storage; and release the used control page table resources in response to the completion of the task execution corresponding to the last control block.
  • the last control block refers to the control block of steps required for subsequent operations after the firmware takes over.
  • the creating a control page table and sequentially filling a plurality of control blocks into the control page table includes: filling the first control block into the control page table to double setting a memory area in the double rate synchronous dynamic random access memory; filling the second control block into the control page table to migrate the data to be written in the host double rate synchronous dynamic random access memory to the memory area; Fill in the third control block in the control page table to answer the host and end the current input and output.
  • the method further includes: performing power-off protection for the written data in the memory area of the local double-rate synchronous DRAM.
  • it also includes: in response to insufficient control page table space, applying for a new control page table space to create a new control page table, and combining the new control page table with the original control page table Initialized as a chained control page table.
  • the method further includes: in response to receiving a data read operation instruction from the host, creating an empty control page table and backing up the data read operation instruction into the empty control page table.
  • the method further includes: storing on-chip a continuous storage space as a resource pool of the control page table, and initializing the address pointer of each control page table into a list of free control page table entries.
  • it also includes: judging whether the count of the firmware's claimed control page table is less than a threshold; The count is increased by one, and the count of the host interface management engine's claimed control page table is decreased by one.
  • a creation module configured to create a control page table and write to the control page in response to receiving a write data operation command sent by the host A plurality of control blocks are filled in the table in turn;
  • a submission module is configured to submit the entry pointer of the first control block to the work queue scheduling engine to intersperse the tasks corresponding to the multiple control blocks in the work queue scheduling engine Execution;
  • notification module configured to send a completion response to the host in advance and notify the firmware to perform subsequent data processing and disk placement;
  • release module configured to respond to the end of the task execution corresponding to the last control block, to control the used Page table resources are released.
  • a computer device including: one or more processors and a memory, the memory stores computer-readable instructions that can be run on the processors, and the computer When the readable instructions are executed by the one or more processors, the one or more processors are executed to perform the steps of any one of the methods for writing data cache.
  • one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, one or more A plurality of processors execute the steps of any one of the methods for writing data cache described above.
  • FIG. 1 is a schematic diagram of an embodiment of a method for writing a data cache provided by the present application.
  • FIG. 2 is a schematic diagram of a general computing acceleration architecture provided by the present application.
  • FIG. 3 is a schematic diagram of the processing flow of the general computing acceleration framework provided by the present application.
  • FIG. 4 is a schematic diagram of the management of the control page table resource pool provided by the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a system for writing a data cache provided by the present application.
  • FIG. 6 is a schematic diagram of a hardware structure of an embodiment of a computer device for writing a data cache provided by the present application.
  • FIG. 7 is a schematic diagram of an embodiment of a computer storage medium for writing data cache provided by the present application.
  • FIG. 1 is a schematic diagram of an embodiment of a method for writing a data cache provided by the present application. As shown in Figure 1, the embodiment of the present application includes the following steps:
  • FIG. 2 is a schematic diagram of the general computing acceleration architecture provided by the present application.
  • HDMA HyperText Direct Memory Access, host direct memory access
  • dynamic memory management engine Hmalloc: HW Malloc, hardware dynamic memory management
  • the hardware module is responsible for managing the allocation and release of local DDR.
  • the embodiment of this application expands the function of AEM (Acceleration Engine Manager, host interface management engine), so that it can also create CP (Control Page, control page table) and compile CB (Control Block, control block), so that Before the firmware operation is introduced, a series of curing operations are completed relying solely on the hardware module.
  • AEM Acceleration Engine Manager, host interface management engine
  • CP Control Page, control page table
  • compile CB Control Block, control block
  • FIG. 3 is a schematic diagram of the processing flow of the general computing acceleration framework provided by the present application, and the embodiment of the present application will be described with reference to FIG. 3 .
  • the host interface management engine In response to receiving a write data operation command sent by the host, create a control page table and fill in multiple control blocks in sequence into the control page table.
  • the host interface management engine is responsible for interface host instructions, and analyze the host instructions. If it is a write IO operation instruction, the host interface management engine will create a common control page table and fill it with multiple control blocks accordingly.
  • the embodiment of the present application has the following beneficial technical effects: through the cooperative scheduling process of the host interface management engine and the firmware system, the scenario where the write cache is enabled is applicable, and the control page table is transferred from the host interface management engine to the firmware in a "zero copy" manner. The transfer can significantly reduce the latency performance of the computing storage system to respond to the host.
  • the creating a control page table and sequentially filling a plurality of control blocks into the control page table includes: filling the first control block into the control page table to double setting a memory area in the rate synchronous DRAM; filling the second control block into the control page table to migrate the data to be written in the host double rate synchronous DRAM to the memory area; The third control block is filled in the control page table to answer the host and end the current input and output.
  • Hmalloc used to open a piece of memory in the local DDR through the hardware engine
  • HDMA used to move the data to be written in the host DDR to the storage space opened in the local DDR through the hardware engine
  • AEM-Response used to respond to the host through the hardware engine, indicating that the data to be written has been processed and the current IO can be ended.
  • the entry pointer of the first control block is submitted to the work queue scheduling engine so that the tasks corresponding to the multiple control blocks are interspersed and executed in the work queue scheduling engine.
  • the host interface management engine After the host interface management engine completes the above operations, it will submit the entry pointer of the first control block to WQS (Work Queue Scheduler, work queue scheduling engine) for automatic hardware scheduling.
  • WQS is responsible for scheduling these three tasks in sequence and interspersed between engines implement.
  • the method further includes: performing power-off protection for the written data in the memory area of the local double-rate synchronous DRAM. After the write cache function is turned on, because the host has responded in advance, it is necessary to perform power-off protection on the pending write data of the local DDR, otherwise it will cause a fatal error.
  • it also includes: in response to insufficient control page table space, applying for a new control page table space to create a new control page table, and combining the new control page table with the original control page table Initialized as a chained control page table. After the firmware takes over, the control blocks of the steps required for subsequent operations can continue to be filled in the first common control page table. When the control page table space is not enough, you can continue to apply for a new control page table space and initialize it into a chain Control the page table, and fill the remaining steps to the corresponding positions of the chained control page table.
  • the used control page table resources are released.
  • the firmware submits the entry pointer of the subsequent first control block to WQS for automatic hardware scheduling. After the execution of the last control block is completed, the firmware will be notified and release the control page table resources used by the IO.
  • the method further includes: in response to receiving a data read operation instruction from the host, creating an empty control page table and backing up the data read operation instruction into the empty control page table.
  • the host interface management engine When receiving other IO commands, such as read commands and management command commands, the host interface management engine only needs to create an "empty" control page table, and after backing up the original IO commands into the control page table, it can pass WQS.
  • the hardware event queue notifies the firmware to take over processing.
  • the size of the control page table can be 512 bytes, 1024 bytes or 2048 bytes. Except for the control page table header, data cache and original host IO command backup, the remaining space is all Used to store control blocks. All control page tables are placed in a continuous memory space, thus forming a resource pool of control page tables. In addition, in order to facilitate the management of the control page table resource pool, the granularity of the control page table in a single resource pool needs to be consistent, that is, only one size of the control page table can be selected. In order to achieve the optimal write data cache performance, there are two problems to be solved: 1. The allocation and release of CP resources; 2. In the process of interaction between AEM and firmware, how to quickly transfer the CP claimed by AEM to the firmware transfer.
  • the method further includes: storing on-chip a continuous storage space as a resource pool of the control page table, and initializing the address pointer of each control page table into a list of free control page table entries.
  • FIG. 4 is a schematic diagram of the management of the control page table resource pool provided by the present application.
  • On-chip storage opens up a continuous storage space as a CP resource pool, and initializes the address pointer of each CP into a "free CP entry list".
  • the "head" and "tail” pointers of the list are managed by hardware.
  • When applying for CP resources read the CP entry address from the head pointer and update the applied count value of AEM/firmware.
  • it also includes: judging whether the count of the firmware's claimed control page table is less than a threshold; The count is increased by one, and the count of the host interface management engine's claimed control page table is decreased by one.
  • AEM passes the CP created at the beginning to the firmware for subsequent processing, if the CP content is copied to the CP newly claimed by the firmware, and then the CP claimed by AEM is released, it will increase Bus bandwidth consumption, and additional delay.
  • through the unified management of the aforementioned CP resource pool it is realized by handing over the ownership of the CP when the AEM hands over the CP to the firmware.
  • AEM After AEM notifies the firmware through the hardware event queue, it completes the transfer of CP ownership by adding "AEM has claimed CP count -1" and "firmware has claimed CP count +1". Through this "zero copy" method, AEM can use the increased available CP resources to receive the next host management or IO request, and the firmware will continue to be responsible for subsequent processing until the CP resources are released after the end.
  • the universal computing acceleration architecture of the embodiment of the present application unloads the firmware by hardening some fixed steps, which can significantly reduce the latency performance of the computing storage system responding to the host in the application scenario of writing IO to the Cache.
  • each step in each embodiment of the method for writing data cache can be interleaved, replaced, added, or deleted. It should belong to the protection scope of the present application, and should not limit the protection scope of the present application to the embodiment.
  • steps in the flow chart of FIG. 1 are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Fig. 1 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, the execution of these sub-steps or stages The order is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • the system 200 includes the following modules: a creation module configured to create a control page table and fill in multiple control blocks in sequence in the control page table in response to receiving a write data operation command sent by the host;
  • the submission module is configured to submit the entry pointer of the first control block to the work queue scheduling engine so that the tasks corresponding to the multiple control blocks are interspersed and executed in the work queue scheduling engine;
  • the notification module is configured to advance to the The host sends a completion response and notifies the firmware to perform subsequent data processing and storage; and the release module is configured to release the used control page table resources in response to the end of execution of the task corresponding to the last control block.
  • the last control block is configured as a control block for steps required for subsequent operations after the firmware takes over.
  • the creation module is configured to: fill the first control block into the control page table to set a memory area in the local double-rate synchronous dynamic random access memory; Fill in the second control block in the table to migrate the data to be written in the double rate synchronous DRAM of the host to the memory area; and fill in the third control block in the control page table to answer the host and end current input and output.
  • the system further includes a protection module configured to: perform power-off protection on data written in the memory area of the local double-rate synchronous dynamic random access memory.
  • system further includes an application module configured to: in response to insufficient control page table space, apply for a new control page table space to create a new control page table, and assign the new The control page table and the original control page table are initialized as chained control page tables.
  • system further includes a second creation module, and the second creation module is configured to: in response to receiving a read data operation command sent by the host, create an empty control page table and perform the read data operation Instructions are backed up into the empty control page table.
  • the system further includes a resource module, and the resource module is configured to: open up a continuous storage space on the chip as a resource pool of the control page table, and initialize the address pointer of each control page table into a list of free control page table entries.
  • the system further includes a judging module configured to: judge whether the count of the firmware's claimed control page table is less than a threshold; and in response to the count of the firmware's claimed control page table being less than the threshold, Increment the count of the control page table claimed by the firmware by one, and decrement the count of the control page table claimed by the host interface management engine by one.
  • the third aspect of the embodiments of the present application proposes a computer device, including: a memory and one or more processors, the memory stores computer-readable instructions, and the computer-readable instructions are processed by one or more When executed by a processor, the one or more processors are executed to execute the method for writing a data cache in any of the foregoing embodiments.
  • FIG. 6 it is a schematic diagram of the hardware structure of one or more embodiments of the computer device for writing data cache provided in this application.
  • the device includes a processor 301 and a memory 302 .
  • the processor 301 and the memory 302 may be connected through a bus or in other ways, and connection through a bus is taken as an example in FIG. 6 .
  • the memory 302 as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the method for writing data cache in the embodiment of the present application. program instructions/modules.
  • the processor 301 executes various functional applications and data processing of the server by running the non-volatile software programs, instructions and modules stored in the memory 302 , that is, implements the method of writing data cache.
  • the memory 302 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by using the method of writing data cache, etc. .
  • the memory 302 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
  • the memory 302 may optionally include memory that is remotely located relative to the processor 301, and these remote memories may be connected to the local module through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • One or more computer-readable instructions 303 corresponding to the method of writing data cache are stored in memory 302 , and when executed by processor 301 , execute the method of writing data cache in any of the above method embodiments.
  • Any one or more embodiments of a computer device that executes the above-mentioned method for writing a data cache can achieve the same or similar effect as any of the foregoing method embodiments corresponding thereto.
  • the present application also provides one or more non-volatile computer-readable storage media storing computer-readable instructions, the computer-readable storage medium storing computer-readable instructions for performing a method of writing data cache when executed by a processor .
  • FIG. 7 it is a schematic diagram of one or more embodiments of the computer storage medium for writing data cache provided in this application.
  • the computer-readable storage medium 401 stores computer-readable instructions 402 for executing the above method when executed by a processor.
  • the storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Abstract

一种写数据高速缓存的方法、系统、设备和存储介质,方法包括:响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。

Description

一种写数据高速缓存的方法、系统、设备和存储介质
相关申请的交叉引用
本申请要求于2021年12月09日提交中国专利局,申请号为202111497794.3,申请名称为“一种写数据高速缓存的方法、系统、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种写数据高速缓存的方法、系统、设备和存储介质。
背景技术
随着近年来计算存储(Computational Storage)技术的兴起,计算存储架构通过将数据计算从主机CPU(Central Processing Unit,中央处理单元)中卸载到靠近存储单元的数据处理加速单元,减少了相应的数据的搬移操作,从而尽可能地释放了系统性能。
计算存储引入了计算存储处理器(Computational Storage Processor,CSP)、计算存储驱动器(Computational Storage Drive,CSD)和计算存储阵列(Computational Storage Array,CSA)三种产品形态,并期待通过架构上的重新定义来降低CPU占用率、降低对网络和DDR(Double Data Rate,双倍速率同步动态随机存储器)带宽的消耗、降低系统功耗、支持潜在的大规模并行计算处理等。
在计算存储系统中,有一类典型应用就是使用计算存储系统中的自带的DDR作为数据Cache(缓存),缓存主机读/写操作的数据,目的都是降低系统的IO(Input Output,输入输出)操作时延。以承接主机的IO写操作为例,正常的,需要在数据可靠落盘后才能应答主机。但打开写高速缓存(Write Cache)功能后,计算存储系统会在将主机待落盘的数据从主机DDR搬入本地DDR中受掉电保护的区域后,就向主机发送应答信号,这样主机则认为该数据已经完成“落盘”操作。而实际上,计算存储系统针对该数据块的相关运算处理还在系统内部分步展开,待处理完成后再合并其他一些条件满足后最终进行落盘操作。
发明人意识到,在面向计算存储的通用计算加速架构中如何实现写IO数据高速缓存功能是一个需要解决的问题。
发明内容
基于本申请公开的各种实施例,本申请实施例的一方面提供了一种写数据高速缓存的方法,包括如下步骤:响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
在一个或多个实施例中,所述最后一个控制块是指在固件接手后,后续的操作所需步骤的控制块。
在一个或多个实施例中,所述创建控制页表并向所述控制页表中依次填入多个控制块,包括:向所述控制页表中填入第一控制块以在本地双倍速率同步动态随机存储器中设置内存区域;向所述控制页表中填入第二控制块以将主机双倍速率同步动态随机存储器中待写入数据迁移到所述内存区域中;以及向所述控制页表中填入第三控制块以应答主机并结束当前输入输出。
在一个或多个实施例中,还包括:对本地双倍速率同步动态随机存储器中所述内存区域中的写入数据进行掉电保护。
在一个或多个实施例中,还包括:响应于所述控制页表空间不足,申请新的控制页表空间以创建新的控制页表,并将新的控制页表和原来的控制页表初始化为链式控制页表。
在一个或多个实施例中,还包括:响应于接收到主机发出的读数据操作指令,创建一个空控制页表并将所述读数据操作指令备份进所述空控制页表。
在一个或多个实施例中,还包括:在片上存储开辟一块连续的存储空间用作控制页表的资源池,并将每个控制页表的地址指针初始化成空闲控制页表入口列表。
在一个或多个实施例中,还包括:判断固件已申领控制页表的计数是否小于阈值;以及响应于固件已申领控制页表的计数小于阈值,将固件已申领控制页表的计数加一,并将主机接口管理引擎已申领控制页表的计数减一。
本申请实施例的另一方面,提供了一种写数据高速缓存的系统,包括:创建模块,配置用于响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;提交模块,配置用于将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;通知模块,配置用于提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及释放模 块,配置用于响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
本申请实施例的又一方面,还提供了一种计算机设备,包括:一个或多个处理器以及存储器,所述存储器存储有可在所述处理器上运行的计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如上任一项的写数据高速缓存方法的步骤。
本申请实施例的再一方面,还提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一项的写数据高速缓存方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。
图1为本申请提供的写数据高速缓存的方法的实施例的示意图。
图2为本申请提供的通用计算加速架构示意图。
图3为本申请提供的通用计算加速架构的处理流程示意图。
图4为本申请提供的控制页表资源池的管理示意图。
图5为本申请提供的写数据高速缓存的系统的实施例的示意图。
图6为本申请提供的写数据高速缓存的计算机设备的实施例的硬件结构示意图。
图7为本申请提供的写数据高速缓存的计算机存储介质的实施例的示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本申请实施例的限定,后续实施例对此不再一一说明。
本申请实施例的第一个方面,提出了一种写数据高速缓存的方法的实施例。图1示出的是本申请提供的写数据高速缓存的方法的实施例的示意图。如图1所示,本申请实施例包括如下步骤:
S1、响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;
S2、将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;
S3、提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及
S4、响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
本申请的核心思想是需要专用硬件去卸载部分固件的职能,并将二者进行有机的融合。图2为本申请提供的通用计算加速架构示意图,如图2所示,本申请实施例在控制平面引入了两个新的引擎:HDMA(Host Direct Memory Access,主机直接存储器访问)引擎:负责在主机DDR和本地DDR之间搬移数据;动态内存管理引擎(Hmalloc:HW Malloc,硬件动态内存管理):由硬件模块负责管理本地DDR的分配与释放。本申请实施例对AEM(Acceleration Engine Manager,主机接口管理引擎)进行了功能上的拓展,使得它也可以创建CP(Control Page,控制页表)并编制CB(Control Block,控制块),得以在引入固件操作之前,单独依靠硬件模块就完成一系列的固化操作。
图3示出的是本申请提供的通用计算加速架构的处理流程示意图,结合图3对本申请实施例进行说明。
响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块。主机接口管理引擎负责接口主机指令,并针对主机指令做出解析,如果是写IO操作指令,主机接口管理引擎会创建一个普通控制页表,并向其中依此填入多个控制块。本申请实施例,具有以下有益技术效果:通过主机接口管理引擎和固件系统的协同调度流程以适用写缓存使能的场景,通过“零拷贝”的方式实现控制页表从主机接口管理引擎向固件的转移,可以显著降低计算存储系统对主机响应的时延性能。
在一个或多个实施例中,所述创建控制页表并向所述控制页表中依次填入多个控制块包括:向所述控制页表中填入第一控制块以在本地双倍速率同步动态随机存储器中设置内存区域;向所述控制页表中填入第二控制块以将主机双倍速率同步动态随机存储器中待写入数据迁移到所述内存区域中;以及向所述控制页表中填入第三控制块以应答主机并结束当前输入输出。例如:“Hmalloc”CB:用来通过硬件引擎在本地DDR中开辟一 块内存;“HDMA”CB:用来通过硬件引擎将主机DDR中待写入的数据搬移到本地DDR中开辟好的存储空间中;“AEM-Response”CB:用来通过硬件引擎应答主机,表明所要写入的数据已经处理,可以结束当前IO。
将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行。主机接口管理引擎在完成上述操作后会将首个控制块的入口指针提交到WQS(Work Queue Scheduler,工作队列调度引擎)进行硬件的自动调度,WQS负责依次调度这三个任务在引擎之间穿插执行。
提前向主机发送完成应答并通知固件进行数据的后续处理和落盘。在提前给了主机一个假的“完成”应答同时,通知固件进行数据的后续处理和落盘,如此构成写Cache的快速应答路径,极大地缩短了响应主机的时延特性。
在一个或多个实施例中,还包括:对本地双倍速率同步动态随机存储器中所述内存区域中的写入数据进行掉电保护。写缓存功能打开后,因为已经提前应答过主机了,所以需要对本地DDR待处理的写入数据进行掉电保护,否则会造成致命错误。
在一个或多个实施例中,还包括:响应于所述控制页表空间不足,申请新的控制页表空间以创建新的控制页表,并将新的控制页表和原来的控制页表初始化为链式控制页表。在固件接手后,后续的操作所需步骤的控制块可以继续填充在第一个普通控制页表之中,当控制页表空间不够时,可以继续申请新的控制页表空间并初始化成链式控制页表,并将剩余步骤依此填充至链式控制页表的对应位置。
响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。固件将后续第一个控制块的入口指针提交到WQS进行硬件的自动调度。在最后一个控制块执行结束后,固件会得到通知,并对该IO所用到的控制页表资源进行释放。
在一个或多个实施例中,还包括:响应于接收到主机发出的读数据操作指令,创建一个空控制页表并将所述读数据操作指令备份进所述空控制页表。当接收到其他IO指令,比如读指令和管理命令指令,主机接口管理引擎只需要创建一个“空”的控制页表,将原始的IO指令备份进该控制页表后,就可以通过WQS所负责的硬件事件队列通知固件来接手处理了。
通用计算加速架构的架构定义中,控制页表的大小可以是512字节,1024字节或者是2048字节,除了控制页表头,数据缓存和原始主机IO指令备份外,剩下的空间都用来存放控制块。所有的控制页表置于一个连续的内存空间,从而形成一个控制页表的资源池。此外,为了方便控制页表资源池的管理,单一资源池中的控制页表颗粒度需要保持一致,即只能选定一种控制页表的大小。为了实现最优的写数据高速缓存性能,有两个 问题需要解决:1、CP资源的分配与释放;2、在AEM和固件交互的过程中,如何做好AEM所申领CP向固件的快速转移。
针对上述第一个问题,通过统一的硬件逻辑对CP资源池进行管理。假设CP资源池中CP资源的总数是K个,因为AEM和固件都会申请领用CP,假设AEM和固件CP页领用的数目上限分别是m个和n个,那么需要保证K=m+n。
在一个或多个实施例中,还包括:在片上存储开辟一块连续的存储空间用作控制页表的资源池,并将每个控制页表的地址指针初始化成空闲控制页表入口列表。图4示出的是本申请提供的控制页表资源池的管理示意图。在片上存储开辟一块连续的存储空间用作CP的资源池,并将每个CP的地址指针初始化成“空闲CP入口列表”,该列表的“头”、“尾”指针均有硬件管理,当申领CP资源时,从头指针读取CP入口地址并更新AEM/固件的已申请计数值,当释放CP资源时,从尾指针存入要释放的CP入口地址并更新AEM/固件的已申请计数值。AEM要申领或释放CP时,直接通过硬件逻辑操作;固件则是通过“申领CP”和“释放CP”寄存器的访问来完成,申领时,直接尝试读取“申领CP”寄存器,并从读取的内容的某一个标志位判断是否是一个有效的CP入口地址,释放时,直接将待释放CP的入口地址写入“释放CP”寄存器,有硬件逻辑将其写入“空闲CP入口列表”中。AEM和固件每次申领CP资源时,都会讲其已申领的计数器和对应的额度上限进行比较,如果已经达到了额度上限,则当次申领失败,可以等待一段时间后重试。
在一个或多个实施例中,还包括:判断固件已申领控制页表的计数是否小于阈值;以及响应于固件已申领控制页表的计数小于阈值,将固件已申领控制页表的计数加一,并将主机接口管理引擎已申领控制页表的计数减一。针对上述第二个问题,在AEM将初期创建的CP传递给固件做后续处理时,如果是将CP内容拷贝到固件新申领的CP当中,再释放AEM所申领的CP,就会增加了总线带宽消耗,以及产生了额外的时延。本申请实施例通过前述CP资源池的统一管理,当AEM向固件移交CP时通过移交CP的所有权来实现。AEM在通过硬件事件对列通知固件后,通过将“AEM已申领CP计数-1”,同时将“固件已申领CP计数+1”的方式,完成CP所有权的转移。通过这种“零拷贝”的方式,AEM可以使用增加的可用CP资源去接下一个主机管理或者IO请求,固件则继续负责后续的处理直至结束后再释放CP资源。
本申请实施例通用计算加速架构通过硬化一些固定步骤来卸载固件,在面对IO写Cache的应用场景下,可以显著降低计算存储系统对主机响应的时延性能。
需要特别指出的是,上述写数据高速缓存的方法的各个实施例中的各个步骤均可以相互交叉、替换、增加、删减,因此,这些合理的排列组合变换之于写数据高速缓存的 方法也应当属于本申请的保护范围,并且不应将本申请的保护范围局限在实施例之上。
应该理解的是,虽然图1的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
基于上述目的,本申请实施例的第二个方面,提出了一种写数据高速缓存的系统。如图5所示,系统200包括如下模块:创建模块,配置用于响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;提交模块,配置用于将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;通知模块,配置用于提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及释放模块,配置用于响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
在一个或多个实施例中,所述最后一个控制块被配置用作在固件接手后,后续的操作所需步骤的控制块。
在一个或多个实施例中,所述创建模块配置用于:向所述控制页表中填入第一控制块以在本地双倍速率同步动态随机存储器中设置内存区域;向所述控制页表中填入第二控制块以将主机双倍速率同步动态随机存储器中待写入数据迁移到所述内存区域中;以及向所述控制页表中填入第三控制块以应答主机并结束当前输入输出。
在一个或多个实施例中,系统还包括保护模块,保护模块配置用于:对本地双倍速率同步动态随机存储器中所述内存区域中的写入数据进行掉电保护。
在一个或多个实施例中,系统还包括申请模块,申请模块配置用于:响应于所述控制页表空间不足,申请新的控制页表空间以创建新的控制页表,并将新的控制页表和原来的控制页表初始化为链式控制页表。
在一个或多个实施例中,系统还包括第二创建模块,第二创建模块配置用于:响应于接收到主机发出的读数据操作指令,创建一个空控制页表并将所述读数据操作指令备份进所述空控制页表。
在一个或多个实施例中,系统还包括资源模块,资源模块配置用于:在片上存储开辟一块连续的存储空间用作控制页表的资源池,并将每个控制页表的地址指针初始化成 空闲控制页表入口列表。
在一个或多个实施例中,系统还包括判断模块,判断模块配置用于:判断固件已申领控制页表的计数是否小于阈值;以及响应于固件已申领控制页表的计数小于阈值,将固件已申领控制页表的计数加一,并将主机接口管理引擎已申领控制页表的计数减一。
基于上述目的,本申请实施例的第三个方面,提出了一种计算机设备,包括:存储器及一个或多个处理器,存储器存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行上述任一实施例的写数据高速缓存的方法。
如图6所示,为本申请提供的上述写数据高速缓存的计算机设备的一个或多个实施例的硬件结构示意图。
以如图6所示的装置为例,在该装置中包括一个处理器301以及一个存储器302。
处理器301和存储器302可以通过总线或者其他方式连接,图6中以通过总线连接为例。
存储器302作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的写数据高速缓存的方法对应的程序指令/模块。处理器301通过运行存储在存储器302中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现写数据高速缓存的方法。
存储器302可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据写数据高速缓存的方法的使用所创建的数据等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器302可选包括相对于处理器301远程设置的存储器,这些远程存储器可以通过网络连接至本地模块。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
一个或者多个写数据高速缓存的方法对应的计算机可读指令303存储在存储器302中,当被处理器301执行时,执行上述任意方法实施例中的写数据高速缓存的方法。
执行上述写数据高速缓存的方法的计算机设备的任何一个或多个实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。
本申请还提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读存储介质存储有被处理器执行时执行写数据高速缓存的方法的计算机可读指 令。
如图7所示,为本申请提供的上述写数据高速缓存的计算机存储介质的一个或多个实施例的示意图。以如图7所示的非易失性计算机可读存储介质为例,计算机可读存储介质401存储有被处理器执行时执行如上方法的计算机可读指令402。
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机可读指令来指令相关硬件来完成,写数据高速缓存的方法的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,程序的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。上述计算机可读指令的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。
上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。

Claims (11)

  1. 一种写数据高速缓存的方法,其特征在于,包括:
    响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;
    将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;
    提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及
    响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
  2. 根据权利要求1所述的方法,其特征在于,所述创建控制页表并向所述控制页表中依次填入多个控制块,包括:
    向所述控制页表中填入第一控制块以在本地双倍速率同步动态随机存储器中设置内存区域;
    向所述控制页表中填入第二控制块以将主机双倍速率同步动态随机存储器中待写入数据迁移到所述内存区域中;以及
    向所述控制页表中填入第三控制块以应答主机并结束当前输入输出。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    对本地双倍速率同步动态随机存储器中所述内存区域中的写入数据进行掉电保护。
  4. 根据权利要求1所述的方法,其特征在于,还包括:
    响应于所述控制页表空间不足,申请新的控制页表空间以创建新的控制页表,并将新的控制页表和原来的控制页表初始化为链式控制页表。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    响应于接收到主机发出的读数据操作指令,创建一个空控制页表并将所述读数据操作指令备份进所述空控制页表。
  6. 根据权利要求1所述的方法,其特征在于,还包括:
    在片上存储开辟一块连续的存储空间用作控制页表的资源池,并将每个控制页表的地址指针初始化成空闲控制页表入口列表。
  7. 根据权利要求6所述的方法,其特征在于,还包括:
    判断固件已申领控制页表的计数是否小于阈值;以及
    响应于所述固件已申领控制页表的计数小于阈值,将所述固件已申领控制页表的计数加一,并将主机接口管理引擎已申领控制页表的计数减一。
  8. 根据权利要求1所述的方法,其特征在于,所述最后一个控制块是指在固件接手后,后续的操作所需步骤的控制块。
  9. 一种写数据高速缓存的系统,包括:
    创建模块,配置用于响应于接收到主机发出的写数据操作指令,创建控制页表并向所述控制页表中依次填入多个控制块;
    提交模块,配置用于将首个控制块的入口指针提交到工作队列调度引擎以将所述多个控制块对应的任务在所述工作队列调度引擎中穿插执行;
    通知模块,配置用于提前向主机发送完成应答并通知固件进行数据的后续处理和落盘;以及
    释放模块,配置用于响应于最后一个控制块对应的任务执行结束,对所使用的控制页表资源进行释放。
  10. 一种计算机设备,其特征在于,包括存储器及一个或多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-8任意一项所述的方法的步骤。
  11. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-8任意一项所述的方法的步骤。
PCT/CN2022/095380 2021-12-09 2022-05-26 一种写数据高速缓存的方法、系统、设备和存储介质 WO2023103296A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111497794.3A CN113918101B (zh) 2021-12-09 2021-12-09 一种写数据高速缓存的方法、系统、设备和存储介质
CN202111497794.3 2021-12-09

Publications (1)

Publication Number Publication Date
WO2023103296A1 true WO2023103296A1 (zh) 2023-06-15

Family

ID=79248793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095380 WO2023103296A1 (zh) 2021-12-09 2022-05-26 一种写数据高速缓存的方法、系统、设备和存储介质

Country Status (2)

Country Link
CN (1) CN113918101B (zh)
WO (1) WO2023103296A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008843A (zh) * 2023-09-26 2023-11-07 苏州元脑智能科技有限公司 控制页链表构建装置和电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918101B (zh) * 2021-12-09 2022-03-15 苏州浪潮智能科技有限公司 一种写数据高速缓存的方法、系统、设备和存储介质
CN115357540B (zh) * 2022-08-17 2023-04-14 北京超弦存储器研究院 存储系统及其计算存储处理器、固体硬盘和数据读写方法
US11928345B1 (en) 2022-08-17 2024-03-12 Beijing Superstring Academy Of Memory Technology Method for efficiently processing instructions in a computational storage device
CN116756059B (zh) * 2023-08-15 2023-11-10 苏州浪潮智能科技有限公司 查询数据输出方法、加速器件、系统、存储介质及设备
CN117806709A (zh) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 系统级芯片的性能优化方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120303855A1 (en) * 2011-05-24 2012-11-29 International Business Machines Corporation Implementing storage adapter performance optimization with hardware accelerators offloading firmware for buffer allocation and automatically dma
CN108959117A (zh) * 2018-06-22 2018-12-07 深圳忆联信息系统有限公司 H2d写操作加速方法、装置、计算机设备及存储介质
CN109976953A (zh) * 2019-04-10 2019-07-05 苏州浪潮智能科技有限公司 一种数据备份方法
CN111642137A (zh) * 2018-12-14 2020-09-08 华为技术有限公司 快速发送写数据准备完成消息的方法、设备和系统
CN113918101A (zh) * 2021-12-09 2022-01-11 苏州浪潮智能科技有限公司 一种写数据高速缓存的方法、系统、设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9122413B2 (en) * 2013-08-15 2015-09-01 International Business Machines Corporation Implementing hardware auto device operations initiator
CN113076281B (zh) * 2021-03-30 2022-11-04 山东英信计算机技术有限公司 一种Ceph内核客户端进行通信的方法、系统、设备及介质
CN113885945B (zh) * 2021-08-30 2023-05-16 山东云海国创云计算装备产业创新中心有限公司 一种计算加速方法、设备以及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120303855A1 (en) * 2011-05-24 2012-11-29 International Business Machines Corporation Implementing storage adapter performance optimization with hardware accelerators offloading firmware for buffer allocation and automatically dma
CN108959117A (zh) * 2018-06-22 2018-12-07 深圳忆联信息系统有限公司 H2d写操作加速方法、装置、计算机设备及存储介质
CN111642137A (zh) * 2018-12-14 2020-09-08 华为技术有限公司 快速发送写数据准备完成消息的方法、设备和系统
CN109976953A (zh) * 2019-04-10 2019-07-05 苏州浪潮智能科技有限公司 一种数据备份方法
CN113918101A (zh) * 2021-12-09 2022-01-11 苏州浪潮智能科技有限公司 一种写数据高速缓存的方法、系统、设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008843A (zh) * 2023-09-26 2023-11-07 苏州元脑智能科技有限公司 控制页链表构建装置和电子设备
CN117008843B (zh) * 2023-09-26 2024-01-19 苏州元脑智能科技有限公司 控制页链表构建装置和电子设备

Also Published As

Publication number Publication date
CN113918101A (zh) 2022-01-11
CN113918101B (zh) 2022-03-15

Similar Documents

Publication Publication Date Title
WO2023103296A1 (zh) 一种写数据高速缓存的方法、系统、设备和存储介质
WO2017114283A1 (zh) 一种在物理主机中处理读/写请求的方法和装置
US9164853B2 (en) Multi-core re-initialization failure control system
US10019181B2 (en) Method of managing input/output(I/O) queues by non-volatile memory express(NVME) controller
KR102353782B1 (ko) NVMe 기반 솔리드 스테이트 드라이브에서 읽기 버퍼 사이즈 요구량을 감소하는 방법
US8413158B2 (en) Processor thread load balancing manager
US10152275B1 (en) Reverse order submission for pointer rings
US10331499B2 (en) Method, apparatus, and chip for implementing mutually-exclusive operation of multiple threads
US9229765B2 (en) Guarantee real time processing of soft real-time operating system by instructing core to enter a waiting period prior to transferring a high priority task
JP2017538212A (ja) 中央処理装置(cpu)と補助プロセッサとの間の改善した関数コールバック機構
KR20130009926A (ko) 유연한 플래시 명령어
US9524114B2 (en) Optimizing synchronous write via speculation
WO2017107816A1 (zh) 一种虚拟化平台中的数据处理方法及装置
WO2019028682A1 (zh) 一种多系统共享内存的管理方法及装置
US10459771B2 (en) Lightweight thread synchronization using shared memory state
US20220253252A1 (en) Data processing method and apparatus
WO2019052576A1 (zh) 一种基于同步锁的多线程处理方法、终端以及存储介质
JP2013025794A (ja) フラッシュインタフェースの有効利用
CN112306652A (zh) 带有上下文提示的功能的唤醒和调度
WO2020087931A1 (zh) 一种数据备份方法、装置及系统
WO2022042127A1 (zh) 一种协程切换的方法、装置及设备
US10289306B1 (en) Data storage system with core-affined thread processing of data movement requests
US10459847B1 (en) Non-volatile memory device application programming interface
US20200387444A1 (en) Extended memory interface
US20190065371A1 (en) Split head invalidation for consumer batching in pointer rings

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22902734

Country of ref document: EP

Kind code of ref document: A1