CN116860664A - Accelerator for processing read command - Google Patents

Accelerator for processing read command Download PDF

Info

Publication number
CN116860664A
CN116860664A CN202210316721.8A CN202210316721A CN116860664A CN 116860664 A CN116860664 A CN 116860664A CN 202210316721 A CN202210316721 A CN 202210316721A CN 116860664 A CN116860664 A CN 116860664A
Authority
CN
China
Prior art keywords
read command
data
entry
cache
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210316721.8A
Other languages
Chinese (zh)
Inventor
王玉巧
王祎磊
谷兴杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Starblaze Technology Co ltd
Original Assignee
Chengdu Starblaze Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Starblaze Technology Co ltd filed Critical Chengdu Starblaze Technology Co ltd
Priority to CN202210316721.8A priority Critical patent/CN116860664A/en
Publication of CN116860664A publication Critical patent/CN116860664A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application relates to an accelerator for processing read commands, the accelerator comprising: logic circuitry and a plurality of caches; the logic circuit is used for responding to a plurality of first read commands sent by the main equipment, generating one or more second read commands according to each first read command, and storing the relation between the identification information for identifying the first read commands and the identification information of the one or more second read commands in a cache; responding to the received first data fed back by each second read command, and processing the first data to obtain second data and first protocol information; processing one or more second data corresponding to each first reading command to obtain an entry of the L2P table indicated by each first reading command; generating second protocol information and transmitting the second protocol information and its indicated entry accessing the L2P table to the master as a response to each first read command.

Description

Accelerator for processing read command
Technical Field
The present application relates generally to the field of memory. More particularly, the present application relates to an accelerator that processes read commands.
Background
FIG. 1 illustrates a block diagram of a solid state storage device. The solid state storage device 102 is coupled to a host for providing storage capability for the host. The host and solid state storage device 102 may be coupled by a variety of means including, but not limited to, connecting the host to the solid state storage device 102 via, for example, SATA (Serial Advanced Technology Attachment ), SCSI (Small Computer System Interface, small computer system interface), SAS (Serial Attached SCSI ), IDE (Integrated Drive Electronics, integrated drive electronics), USB (Universal Serial Bus ), PCIE (Peripheral Component Interconnect Express, PCIE, peripheral component interconnect Express), NVMe (NVM Express), ethernet, fibre channel, wireless communications network, and the like. The host may be an information processing device capable of communicating with the storage device in the manner described above, such as a personal computer, tablet, server, portable computer, network switch, router, cellular telephone, personal digital assistant, or the like. The storage device 102 (hereinafter, solid-state storage device will be simply referred to as storage device) includes an interface 103, a control section 104, one or more NVM chips 105, and a DRAM (Dynamic Random Access Memory ) 110.
The NVM chip 105 described above includes NAND flash memory, phase change memory, feRAM (Ferroelectric RAM, ferroelectric memory), MRAM (Magnetic Random Access Memory, magnetoresistive memory), RRAM (Resistive Random Access Memory, resistive memory), and the like, which are common storage media.
The interface 103 may be adapted to exchange data with a host by way of, for example, SATA, IDE, USB, PCIE, NVMe, SAS, ethernet, fibre channel, etc.
The control unit 104 is used for controlling data transmission among the interface 103, the NVM chip 105 and the DRAM 110, and also for memory management, host logical address to flash physical address mapping, erase balancing, bad block management, etc. The control component 104 can be implemented in a variety of ways, such as software, hardware, firmware, or a combination thereof, for example, the control component 104 can be in the form of an FPGA (Field-programmable gate array, field programmable gate array), an ASIC (Application Specific Integrated Circuit, application-specific integrated circuit), or a combination thereof. The control component 104 may also include a processor or controller in which software is executed to manipulate the hardware of the control component 104 to process IO (Input/Output) commands. The control unit 104 may also include a memory controller for coupling to the DRAM 110 and having access to the data of the DRAM 110.
The control section 104 includes a flash interface controller (or referred to as a media interface, a media interface controller, a flash channel controller) that is coupled to the NVM chip 105 and issues commands to the NVM chip 105 in a manner conforming to an interface protocol of the NVM chip 105 to operate the NVM chip 105 and receive command execution results output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", and the like.
Data is typically stored and read on a page-by-page basis on NVM storage media. While data is erased in blocks. A block (also referred to as a physical block) on an NVM storage medium includes a plurality of pages. Pages on a storage medium (referred to as physical pages) have a fixed size, e.g., 17664 bytes. The physical pages may also have other sizes.
In a storage device, FTL (Flash Translation Layer ) is utilized to maintain mapping information from logical addresses to physical addresses. The logical addresses constitute the memory space of the memory device perceived by upper software such as the operating system. The physical address is an address for accessing a physical storage unit of the solid state storage device. Address mapping can also be implemented in the prior art using an intermediate address modality. For example, logical addresses are mapped to intermediate addresses, which in turn are further mapped to physical addresses. Optionally, a host accessing the storage device provides the FTL.
The table structure storing mapping information from logical addresses to physical addresses is called FTL table (also called L2P table). Typically, the data items of the FTL table record address mapping relationships in units of storage units of a specified size (e.g., 512 bytes, 2KB, 4KB, etc.) in the storage device.
As the capacity of storage devices increases, the size of L2P tables increases in order to record more storage units, thereby requiring more memory to be consumed to accommodate the L2P tables. In order to address the updated memory locations, the size of each entry of the L2P table also needs to be increased. For example, a 32-bit sized L2P table entry can address 2A 32 (4G) data units. If each data unit is 4kb in size, 2 x 32 data units correspond to a storage capacity of 16TB, and accordingly the L2P table itself is 16GB in size (4bx4g=16gb, 4B for one entry, and a total of 4G entries, 16 GB), at least 16GB of memory space is required. While the storage device has a variety of capacities, for example, the capacity of the storage device provided to the user is, for example, 4TB, the L2P table itself may be 4GB in size. However, to provide 4TB memory space, if each data unit is 4KB, then 1G units are enough to be 30 times 2, then the number of data units that the L2P table needs to manage is 2≡30, each entry of the corresponding L2P table only needs to be 30 bits in size, and then the L2P table is 30 x 2≡30 bits (and 3.75GB, less than 4 GB). However, the CPU addressing channel is limited by the memory chip and the CPU addressing scheme, and the data width of the CPU addressing channel is one time an integer multiple of 32 bits or bytes, and the memory chip is also typically an integer multiple of bytes. Thus, if the L2P table entry is, for example, 30 bits in size, although the overall L2P table size is reduced, the cross-byte boundary entries therein require, for example, 2 or more bus accesses or memory accesses to load into the CPU, thereby significantly increasing the time to load the L2P table entry and limiting the performance of the memory device.
In order to reduce the amount of memory space occupied by the L2P table and reduce or eliminate the impact of non-byte aligned L2P table entries on CPU or other devices on the chip accessing the L2P table entries while providing multiple capacity storage devices, a compressed L2P table is typically provided. The entry size of the compressed L2P table provided may not be an integer multiple of bytes. And the compressed L2P table entries are closely arranged in memory without leaving unused memory space between the entries for byte alignment. But to eliminate the impact on the CPU or other device caused by the use of compressed L2P tables, the CPU or other device typically also accesses the L2P tables in its existing manner, either byte-aligned or byte-integer multiple aligned.
Disclosure of Invention
When a hardware accelerator is used to accelerate the access of the host device to the L2P table in the memory to share the burden of executing software to access the L2P table for the CPU and improve the access performance of the L2P table, in order to improve the processing efficiency of the access command, it is desirable to provide a parallel access command processing capability in the hardware accelerator, so that the hardware accelerator can process multiple access commands provided by the host device in parallel, so as to perform parallel access to the L2P table.
According to a first aspect of the present application, there is provided an accelerator for processing read commands according to the first aspect of the present application for coupling a host device with a memory and accelerating access by the host device to an L2P table in the memory, comprising: logic circuitry and a plurality of caches;
the logic circuit responds to the received multiple first read commands sent by the main equipment, generates one or more second read commands according to each first read command, and stores the relation between first identification information for identifying the first read command and second identification information for identifying the corresponding one or more second read commands in a cache; and in response to receiving the first data fed back from the memory based on each second read command, processing the first data to obtain second data and first protocol information, determining one or more second data corresponding to each first read command and generating second protocol information according to the first protocol information and the relation, and processing the one or more second data corresponding to each first read command to obtain an entry of the L2P table indicated by each first read command; and sending the second protocol information and its indicated entry accessing the L2P table to the master as a response to each first read command;
Wherein the memory comprises a plurality of aligned storage units, each storage unit for storing second data comprising partial data of one or more entries of an L2P table; partial data of one or more entries of the L2P table is not required to be stored in the memory in a byte boundary alignment manner; the first protocol information includes second identification information, and the second protocol information includes first identification information.
According to a first accelerator of a first aspect of the present application, there is provided a second accelerator according to the first aspect of the present application, the logic circuit comprising: the system comprises an analysis module, a calculation module and a command generation module; wherein,,
the analyzing module is used for responding to the received multiple first reading commands, analyzing each first reading command to obtain corresponding first identification information and address indexes, and caching the address indexes into a first cache;
the computing module is coupled with the first cache and is used for computing and obtaining addresses of memories accessed by one or more second read commands corresponding to each first read command according to the address index; setting second identification information corresponding to each second read command, and storing the relation between the first identification information and one or more pieces of second identification information corresponding to the first identification information into a second cache;
The command generating module is coupled with the calculating module, generates one or more second read commands corresponding to each first read command according to the address and the second identification information, and sends the one or more second read commands to the memory.
According to a second accelerator of the first aspect of the present application, there is provided the third accelerator of the first aspect of the present application, wherein the parsing module, the calculating module and the command generating module process a plurality of first read commands in parallel.
According to a third accelerator of the first aspect of the present application, there is provided a fourth accelerator of the first aspect of the present application, wherein the partial data is valid data of an entry of an L2P table; storing only valid data of each entry of the L2P table in a memory; each storage unit of the memory stores valid data of one item, valid data of a plurality of items, or a portion of valid data of an item; the effective data length of the entry of the L2P table is smaller than the data length corresponding to the entry and smaller than the size of each memory cell.
According to a fourth accelerator of the first aspect of the present application, there is provided a fifth accelerator of the first aspect of the present application, wherein valid data of each entry of the L2P table is sequentially connected end to end, and stored in each storage unit in the memory according to the size and address of the storage unit; the valid data of a portion of the entries in the memory are not aligned by memory location and/or byte alignment.
According to a fifth accelerator of the first aspect of the present application, there is provided a sixth accelerator of the first aspect of the present application, wherein the logic circuit is configured to parse the first data in response to receiving the first data fed back by a second read command to obtain first protocol information and second data corresponding to the first data; or first protocol information, second data, and a marker; wherein the tag is used for identifying the position of the last bit in the valid data of the entry of the access L2P table indicated by each first read command in the corresponding second data;
the logic circuit also responds to the second data of all second read commands corresponding to any first read command received according to the first protocol information, analyzes the valid data of the item corresponding to the first read command from the second data corresponding to all second read commands corresponding to any first read command, and generates response data serving as response data to the first read command according to the first identification information of the first read command and the valid data of the item corresponding to the first read command.
According to a sixth accelerator of the first aspect of the present application, there is provided a seventh accelerator of the first aspect of the present application, the logic circuit further comprising a merging unit; the merging unit merges the effective data of the entry corresponding to each first read command with the null bit data according to the entry length of the access L2P table indicated by each first read command to obtain the entry of the access L2P table indicated by the merging unit, wherein the effective data is positioned in the first N continuous bits of the first entry, and N is the length of the effective data;
Generating second protocol information according to the first protocol information of one or more second read commands corresponding to each first read command, and combining the item and the second protocol information to obtain data as a response to the first read command.
According to a seventh accelerator of the first aspect of the present application, there is provided the eighth accelerator of the first aspect of the present application, the merging unit updates the marker in response to obtaining the entry of the access L2P table indicated by each first read command, such that the updated marker indicates the position of the last bit of the valid data of the entry of the access L2P table indicated by each first read command or the entry of the access L2P table indicated by each first read command.
According to any one of the first to eighth accelerators of the first aspect of the present application, there is provided a ninth accelerator according to the first aspect of the present application, the plurality of caches including: a third buffer, a fourth buffer, a fifth buffer, a sixth buffer and a seventh buffer; wherein the third cache is coupled to the fourth cache and the fifth cache, caching one or more first data; the fourth buffer is configured to buffer first protocol information of one or more second read commands; the five caches are used for caching one or more second data and the marker; the sixth cache is coupled with the fifth cache and is used for caching the corresponding entry and the updated marker of each first reading command; the seventh buffer is coupled to the fourth buffer and the sixth buffer for buffering data responsive to each first read command.
According to a ninth accelerator of the first aspect of the present application, there is provided the tenth accelerator of the first aspect of the present application, wherein the logic circuit, in response to receipt of a response to a second read command, stores the response to the second read command in the third buffer;
in response to storing a response to a second read command in the third buffer, the first protocol information obtained from the response to the second read command in the third buffer is stored in the fourth buffer;
in response to storing a response to a second read command in the third cache, acquiring the second data and second identification information from the response to the second read command in the third cache and storing the second data and the second identification information in the fourth cache;
wherein, according to the relation between the first identification information and the second identification information stored in the second cache, in response to receiving the responses to all the second read commands generated according to any first read command, the merging unit obtains valid data of an entry accessing the L2P table indicated by the first read command from one or more second data in the fifth cache, merges the valid data of the entry with the null bit data according to the length of the entry to obtain the entry, and stores the entry in the sixth cache; and an update marker indicating a location in the sixth cache of the last bit of the entry or valid data of the entry obtained; updated tags are also stored in the sixth cache;
And obtaining an entry from a sixth cache, obtaining second protocol information corresponding to the first protocol information from a second cache, generating a response to the first read command according to the entry and the second protocol information, and storing the response in the seventh cache.
According to a tenth accelerator of the first aspect of the present application, there is provided the eleventh accelerator of the first aspect of the present application, wherein, in response to acquiring second identification information from a response to a second read command in the third buffer and storing in the fourth buffer, and acquiring second data and a tag from a response to a second read command in the third buffer and storing in the fifth buffer; deleting a response to the second read command from the third cache.
According to an eleventh accelerator of the first aspect of the present application, there is provided the twelfth accelerator of the first aspect of the present application, wherein, in response to storing the first protocol information of the second read command in the fourth buffer, the second buffer is also accessed according to the first protocol information to determine whether all second read commands corresponding to the first read command for generating the second read command have been received.
According to a twelfth accelerator of the first aspect of the present application, there is provided the thirteenth accelerator of the first aspect of the present application, wherein if all second read commands corresponding to a first read command for generating the second read command are not received, a number of received or not received second read commands corresponding to the first read command is marked.
According to a thirteenth accelerator of the first aspect of the present application, there is provided the fourteenth accelerator of the first aspect of the present application, wherein if all second read commands corresponding to a first read command for generating the second read command have been received, valid data of an entry accessing the L2P table indicated by the first read command is obtained from all second data of all second read commands in the fifth buffer, and the valid data of the entry is combined with null bit data to obtain the entry, and the entry is stored in the sixth buffer; and deleting all second data and markers of all second read commands from the fifth buffer.
According to a fourteenth accelerator of the first aspect of the present application, there is provided the fifteenth accelerator of the first aspect of the present application, wherein if all second read commands corresponding to the first read command for generating the second read command have been received, all first protocol information of all the second read commands is deleted from the fourth buffer.
According to a fifteenth accelerator of a first aspect of the present application, there is provided the sixteenth accelerator of the first aspect of the present application, wherein, in response to storing an entry of the access L2P table indicated by a first read command for generating the second read command in the sixth buffer, a response to the first read command is generated from the second protocol information of the first read command and the entry acquired from the sixth buffer, and stored in the seventh buffer; and deleting the second protocol information of the first read command and the first protocol information corresponding to the second protocol information from the second cache, and deleting the entry from the sixth cache.
According to a sixteenth accelerator of the first aspect of the present application, there is provided the seventeenth accelerator of the first aspect of the present application, wherein, in response to storing the response to the first read command in the seventh buffer, the response to the first read command is retrieved from the seventh buffer and sent to the host device, and the response to the first read command is deleted from the seventh buffer.
According to a seventeenth accelerator of the first aspect of the present application, there is provided the eighteenth accelerator of the first aspect of the present application, wherein the address index of the first read command is deleted from the first cache in response to generating one or more second read commands from the first read command.
According to an accelerator of any one of the ninth to eighteenth aspects of the present application, there is provided the nineteenth accelerator of the first aspect of the present application, wherein the second buffer, the fourth buffer, the fifth buffer, or the sixth buffer is a buffer array, and the buffer array includes a plurality of buffer units, each buffer unit is used for storing a relationship between first identification information of each first read command and corresponding one or more second identification information thereof, first protocol information of one or more second read commands corresponding to each first read command, response data corresponding to each second read command, or an entry corresponding to each first read command, and an updated tag.
According to a nineteenth accelerator of the first aspect of the present application, there is provided the twentieth accelerator of the first aspect of the present application, wherein the merging unit obtains valid data of an entry accessing the L2P table indicated by the first read command from one or more second data in the fifth buffer after determining that all responses of the second read commands are received according to the second identification information of each second read command, merges the valid data of the entry with null bit data according to an entry length to obtain the entry, and stores the entry in the sixth buffer.
According to the accelerator according to any one of the twenty-first values of the first aspect of the present application, there is provided a twenty-first accelerator according to the first aspect of the present application, the first protocol information and the second protocol information each containing AXI protocol information.
According to a control part of a second aspect of the present application, there is provided a first control part according to the second aspect of the present application, comprising the accelerator as set forth in any one of the first to twenty-first aspects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a block diagram of a prior art solid state storage device;
FIG. 2A is a schematic diagram of a control unit according to an embodiment of the present application;
FIG. 2B illustrates a relationship between identification information of each read command A and identification information of one or more read commands B corresponding to the read command A provided by an embodiment of the present application;
FIG. 2C is a schematic diagram of an L2P table in memory according to an embodiment of the present application;
FIG. 2D is a diagram illustrating the conversion between L2P table entries perceived by the host and L2P table entries stored in the memory according to the present application;
FIG. 3 illustrates a block diagram of an accelerator processing read commands in accordance with an embodiment of the application;
FIG. 4A is a schematic diagram showing the L2P accelerator processing the first data corresponding to each read command B to obtain second data and protocol information;
FIG. 4B is a schematic diagram showing the response of the logic circuit to each read command A;
FIG. 5 illustrates a processing mechanism by which an accelerator processes multiple read commands A in parallel;
FIG. 6A shows a schematic diagram of another accelerator configuration;
FIG. 6B is a schematic diagram illustrating a process of storing data in each cache in the logic circuit according to an embodiment of the present application;
FIG. 6C is a schematic diagram illustrating a plurality of caches in a logic circuit according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Fig. 2A shows a schematic structural diagram of a control component according to an embodiment of the present application.
In fig. 2A, the control means includes a master device, an accelerator, and a slave device. As an example, the host device is a CPU, a media interface controller, or a processing core; the slave device is a memory controller. The master device and the accelerator and/or the accelerator and the slave device are coupled, for example, by a bus. Also for example, the accelerator in the embodiment of the present application may be an L2P accelerator, which is used to accelerate access of the host device to the L2P table in the memory.
The control unit is also coupled to an external memory (DRAM of fig. 2A), which the memory controller uses to access. By way of example, the accelerator includes a slave device interface with a master device interface. The accelerators are coupled to the bus through slave interfaces and master interfaces, respectively. Whereby one or more master devices (e.g., CPU, media interface controller) of the control unit are given access to the accelerator via the slave device interface, with the accelerator being accessed as a bus slave device. And the accelerator is enabled as one or more slaves (e.g., a memory controller) of the master access control component via the master interface.
In order to facilitate identification of the read command sent by the master to the accelerator hereinafter as read command a, the read command sent by the accelerator to the slave is identified as read command B.
By way of example, a memory external to the control unit is used to store the L2P table, the master device being able to access the L2P table entries in the designated memory space of the external memory, the master device issuing a read command a to the bus to access the L2P table entries. The bus sends a read command a to an L2P accelerator coupled to the bus. The L2P accelerator determines the storage location of the corresponding entry in the L2P table stored in the memory according to the address indicated in the received read command A, and issues one or more read commands B for accessing the memory to the slave device (such as a memory controller) through the bus to acquire the entry data of the corresponding L2P table from the DRAM. The L2P accelerator processes the entry data of the L2P table provided by the memory controller to get a response of the read command a (the data of the entry of the L2P table to be accessed by the L2P accelerator), and then the L2P accelerator sends the data of the entry of the L2P table to be accessed by the L2P accelerator to the master device through its slave device interface.
Also for example, the master may send one read command a to the bus to access one entry of the L2P table, or may send multiple read commands a to the bus to access multiple entries of the L2P table simultaneously. For a read command a, the L2P accelerator may be processed using the process described above. For issuing multiple read commands a to the bus to access multiple entries of the L2P table simultaneously, since each read command a may correspond to multiple read commands B, when the L2P accelerator receives feedback data of all read commands B corresponding to each read command a fed back by the memory controller, the L2P accelerator considers that all data to be accessed by the read command a is received by the L2P accelerator. Therefore, when the L2P accelerator processes multiple read commands a in parallel, it is necessary to determine which read command a corresponds to the data fed back by the memory controller, so as to determine whether all the data to be accessed by one read command a is received. In order to facilitate identification of which read command a corresponds to the data fed back by the memory controller, when the L2P accelerator issues one or more read commands B to the slave device (such as the memory controller) through the bus, identification information (e.g., ID) is further set for each read command B, a relationship between the identification information of each read command a and the identification information corresponding to the one or more read commands B corresponding thereto is constructed, and the relationship is stored. The L2P accelerator can identify whether all the data to be accessed by one read command a is received according to the relationship and the identification information of the read command B carried in the received data fed back by the memory controller.
FIG. 2B illustrates a relationship between the identification information of each read command A and the identification information of one or more read commands B corresponding thereto according to an embodiment of the present application.
By way of example, the master sends two read commands to the bus, read command A1 and read command A2, respectively, where read command A1 corresponds to generating read command B11 and read command B12, and read command A2 corresponds to generating read command B21 and read command B22. The identification information is represented by ID, the identification information of the read command A1 is represented by ID1, the identification information corresponding to the read command A2 is represented by ID2, the identification information of the read command B11 is represented by ID11, the identification information of the read command B12 is represented by ID12, the identification information of the read command B21 is represented by ID21, and the identification information of the read command B22 is represented by ID22. In fig. 2B, the L2P accelerator records the relationship between the identification information ID1 and the identification information IDs 11 and 12, and records the relationship between the identification information ID2 and the identification information IDs 21 and 22.
Fig. 2C is a schematic diagram of an L2P table structure in a memory according to an embodiment of the present application.
In the present application, the L2P table entry accessed by other bus devices such as a CPU or control unit (shown in fig. 1) contains not only valid data, but also null bit data, which is a bit in the L2P table entry other than valid data, as long as valid data has been mentioned above as determined according to the number of data units provided by the addressed NVM chip; for example, if the length of access to an L2P table entry by other bus devices such as a CPU or control unit is 64 bits and the effective data is 30 bits, then the null bit data is 34 bits. The data corresponding to the valid data will be hereinafter referred to as valid data.
Referring to fig. 2C, as an example, in order to reduce waste of memory storage resources, only valid data of each entry of the L2P table is stored in a storage unit of the memory, and valid data of each entry of the L2P table is stored in a storage space provided in the memory in a head-to-tail manner. The size of the storage unit of the memory does not change with the size of each entry of the stored L2P table, for example, the size of the storage unit of the memory is 64 bits, and the size of the storage unit is 64 bits regardless of whether all data (valid data+null bit data) of the L2P table entry or valid data is stored in the storage unit. Thus, when only valid data for each entry of the L2P table is stored in memory, each storage unit may store valid data for one or more L2P table entries, or store part of the valid data for one L2P table entry. For example, the valid data of the L2P table entry is 30 bits, the first storage unit in the memory stores the valid data of the L2P table entry corresponding to the logical addresses lba=0 and lba=1, and the first 4 bits of valid data of the L2P table entry corresponding to lba=2, and the remaining 26 bits of valid data of the L2P table entry corresponding to lba=2 are stored in the next storage unit.
In fig. 2C, a tag, such as PBA (i), represents one of the L2P table entries stored in the memory (its value is the i-th physical address (PBA) of the L2P table, that is, the physical address (PBA) corresponding to the logical address lba=i, i being, for example, an integer). Referring to fig. 2C, since valid data of each entry of the L2P table is stored in the memory, if the valid data of each entry of the L2P table is 30 bits, valid data of the L2P table entry PBA (0) is stored 30 bits from a 0 byte address of the memory, valid data of the L2P table entry PBA (1) is stored 31 bits to 60 bits from the 0 byte address of the memory, the first 4 bits of valid data of the L2P table entry PBA (2) are stored 61 bits to 64 bits from the 0 byte address of the memory, and the remaining 26 bits of valid data of the L2P table entry PBA (2) are stored at a position from an 8 byte address of the memory. Next, the valid data of the L2P table entry PBA (3) is stored in 27 bits to 56 bits from the memory 8 byte address, the first 8 bits of the valid data of the L2P table entry PBA (4) is stored in 57 bits to 64 bits from the memory 8 byte address, the remaining 22 bits of the valid data of the L2P table entry PBA (4) are stored in 16 byte address, the valid data of the L2P entry PBA (5) is stored in 23 bits to 52 bits from the memory 16 byte address, the first 12 bits of the valid data of the L2P entry PBA (6) is stored in 53 bits to 64 bits from the memory 16 byte address, the remaining 18 bits of the valid data of the L2P entry PBA (6) is stored in 19 bits to 48 bits from the memory 24 byte address, and the 49 bits to 64 bits from the memory 24 byte address are null bits.
Thus, one or more entries of the L2P table are not aligned in memory by byte boundaries or by read data bit widths of the memory. Thus, the L2P table occupies less memory space in the memory than the CPU perceives the L2P table size.
According to the embodiment of the application, the CPU accesses the L2P table by taking the logical address LBA as an index, and the L2P accelerator calculates the address of the corresponding effective data of the entry in the L2P table in the memory according to the logical address LBA, and acquires the entry of the L2P table to be accessed through one or more memory accesses to respond to the access of the CPU to the L2P table. Since the size of the L2P table entry accessed by the CPU is different from the size of the L2P table entry stored in the memory in the present application, the L2P accelerator needs to convert the L2P table entry stored in the memory into the L2P table entry accessible by the CPU or convert the L2P table entry accessible by the CPU into the L2P table entry stored in the memory for storage during the process of accessing the L2P table.
FIG. 2D is a diagram illustrating the conversion between L2P table entries and L2P table entries stored in memory as perceived by a host device according to the present application.
By way of example, an L2P table stored in memory (SRAM or DRAM) includes a plurality of entries, the entries of the L2P table being addressed by a logical address (noted LBA). In fig. 2D, the L2P table entries perceived by the host device correspond to the L2P table entries stored in the memory one-to-one, so the L2P table entries perceived by the host device have the same number of entries as the L2P table entries stored in the memory, e.g., the L2P table includes 8 entries, namely, entry 0, entry 1, entry 2, entry 3, entry 4, entry 5, entry 6, and entry 7, respectively. The L2P table entry perceived by the master device is M bits in size, and the L2P table entry stored in the memory is N bits in size, wherein M and N are both positive integers.
To facilitate, for example, a CPU accessing a logical L2P table, the size of M is, for example, an integer multiple of bytes (e.g., 8 bytes) such that entries of the logical L2P table are aligned by 8 bytes or bytes. In fig. 2D, from the perspective of the CPU accessing the logical L2P table, each entry in the L2P table perceived by the master is M bits (m=64 in the example of fig. 2A), and the entries of the L2P table perceived by the master are arranged end to end in the storage space, and the storage space of the L2P table perceived by the master is indexed by the logical address (LBA) to obtain a corresponding L2P table entry, for example, L2P table entry address=base address+lba size (L2P entry), where size (L2P entry) represents the storage space size occupied by each entry. Recorded in an entry of the L2P table is an address for the NVM chip (referred to as a physical address, denoted PBA). Since the L2P table as perceived by the master is aligned in bytes or bytes, the start address of each entry in the memory space is at the start of bytes or an integer multiple of 8 bytes, and the end of an entry is at the end of bytes or an integer multiple of 8 bytes. In the example of fig. 2C, when the CPU accesses the corresponding entry of the L2P table, the address of the corresponding entry of the L2P table (64 bits for 8 bytes) is obtained, for example, at lba×8, based on the logical address (LBA) as an index.
Since some or all of each entry in the L2P table is valid data as perceived by the master device. When all entries in the L2P table perceived by the master device are valid data, N is equal to M; when each entry in the L2P table perceived by the host is partially valid data and partially null bit data, the L2P table entry size N stored by the memory is equal to the valid data in the L2P table entry perceived by the host, and the valid data in the L2P table entry perceived by the host is determined based on the number of data units (e.g., pages) provided by the addressed NVM chip. For example, to address 2≡30 data units, then N is 30. Generally, if an entry of the L2P table stored by the memory can address one of the 2N data units, then N=n. As an example, in fig. 2D, n=30. The L2P table stored in the memory stores the effective data in each item, the effective data of each item is stored in the storage space provided by the memory according to the head-to-tail connection, and unused storage space is not reserved between adjacent items. So that the starting and/or ending locations of some entries in memory are not located at byte boundaries.
FIG. 3 illustrates a block diagram of an accelerator processing read commands according to an embodiment of the application.
In FIG. 3, the accelerator is used to couple the host device with the memory and accelerate access by the host device to the L2P table in the memory; the accelerator includes: logic circuitry and a plurality of caches; the logic circuit responds to receiving a plurality of read commands A sent by the master device; generating one or more read commands B according to each read command A, and storing the relation between the identification information for identifying the read command A and the identification information for identifying the corresponding one or more read commands B in a cache; and responding to the received first data fed back from the memory based on each read command B, processing the first data to obtain second data and first protocol information, determining one or more second data corresponding to each read command A and generating second protocol information according to the first protocol information and the relation, and processing one or more second data corresponding to each read command A to obtain an item of the L2P table accessed by each read command A; and sending the second protocol information and its indicated entry accessing the L2P table to the master as a response to each read command a.
By way of example, in fig. 3, the logic circuit includes: the system comprises an analysis module, a calculation module and a command generation module; the system comprises a resolving module, a first buffer storage module and a second buffer storage module, wherein the resolving module is used for resolving each read command A to obtain corresponding identification information and address indexes in response to receiving a plurality of read commands A, and buffering the address indexes into first buffers of a plurality of buffers; the computing module is coupled with the first cache and is used for computing and obtaining the address of the memory accessed by one or more read commands B corresponding to each read command A according to the address index; setting corresponding second identification information for each read command B, and storing the relation between the first identification information and one or more pieces of corresponding second identification information into a second cache; the command generating module is coupled with the calculating module, generates one or more read commands B corresponding to each read command A according to the address and the second identification information, and sends the one or more read commands B to the memory through the memory controller.
The master sends a number of read commands a to the accelerator, denoted as procedure (4.1), in which data interaction between the master and the accelerator can take place via a bus, for example an AXI bus. Logic in the accelerator receives a read command a from the bus. In addition, each read command a indicates an address index of an L2P table entry perceived by the master device, e.g., the address index is a logical address LBA; identification information, such as an ID, for identifying the read command a itself is also indicated. After receiving each read command a, the logic circuit parses each read command a to obtain an address index, identification information, and the like. After the logic circuit resolves the address index indicated by each read command a, the logic circuit calculates the address stored in the memory of the L2P table entry perceived by the host device according to the address index indicated by each read command a, and then stores the address of the L2P table entry and the identification information in the cache, which is denoted as a process (4.2); for example, the L2P table entry perceived by the host device stores in memory an address=base address+lba×size (L2P entry), which represents the size of each L2P table entry perceived by the host device, e.g. 64 bits.
When part of each entry in the L2P table perceived by the main equipment is effective data and part is null bit data, only the effective data of the L2P table entry is stored in the memory, so that each storage unit in the memory stores the effective data of one or more L2P table entries perceived by the main equipment or the part of the effective data of the L2P table entries perceived by the main equipment, and all the entries of the L2P table in the memory are sequentially stored in the memory in a head-tail connection mode. Thus, the valid data of the L2P table entry to be accessed by each read command a sent by the master device may be stored in one storage unit or in a plurality of storage units in the memory. I.e. the L2P table entries to be accessed by the master device are different, the number of occupied memory locations stored in the memory is also different. When the L2P table entry to be accessed by the host occupies a plurality of memory locations in the memory (the host accesses the plurality of memory locations in the memory), the logic circuitry generates a plurality of read commands B for each read command a, wherein each read command B is for reading data of one memory location, denoted as process (4.3). For example, if the L2P table entry to be accessed by each read command a occupies two memory locations in the memory, the logic circuit generates two read commands B; if the L2P table entry to be accessed by each read command A occupies a memory location in the memory, the logic circuit generates a read command B for each read command A.
In addition, since the logic circuit needs to process the plurality of read commands a in parallel, each read command a generates one or more read commands B again, after generating one or more read commands B for each read command a, identification information for identifying itself is set for each read command B, and a relationship between the identification information of each read command a and the identification information of one or more read commands B corresponding thereto is constructed and stored in the cache, and the process is represented as a process (4.4). By way of example, the relationship between the L2P table entry address and the identification information in process (4.2) and the identification information of each read command A and its corresponding one or more read commands B in process (4.4) may be stored in the same cache or in different caches. That is, in the embodiment of the present application, the "first buffer" and the "second buffer" may be the same buffer, or may be different buffers. The command generating module sends one or more read commands B corresponding to each read command a to the memory controller, denoted as a process (4.5), then the memory controller reads first data from the memory according to each read command B, denoted as a process (4.6) and a process (4.7), for example, if a plurality of memory units in the memory are aligned according to 8 bytes and the corresponding byte addresses are 0, 8, 16 and 24 … …, each read command B is used for reading the first data of 8 bytes by taking any byte address as a starting address, and then the memory controller sends the first data read according to each read command B and corresponding first protocol information thereof as a response accelerator (also called as an L2P accelerator) to each read command B, denoted as a process (4.8), wherein the first data refers to data of one memory unit read according to each read command B, and the first protocol information contains identification information of each read command B; then, the logic circuit processes the first data corresponding to each read command B to obtain second data and first protocol information, determines one or more second data corresponding to each read command A according to the first protocol information and the relation and generates second protocol information, and processes the one or more second data corresponding to each read command A to obtain an entry of the L2P table indicated by each read command A; and transmitting the protocol information P and the indicated entry accessing the L2P table to the master as a response to each read command a, denoted as a procedure (4.9), wherein the second data refers to part of the data of the L2P table entry to be accessed by the read command a contained in the first data corresponding to each read command B, and the second protocol information contains identification information of the read command a.
As another example, the logic circuit responds to the first data fed back by the read command B, and analyzes the first data to obtain first protocol information and second data corresponding to the first data; or first protocol information, second data, and a marker; wherein the tag is used for identifying the position of the last bit in the valid data of the entry of the access L2P table indicated by each read command A in the corresponding second data; the logic circuit also responds to the second data of all the read commands B corresponding to any read command A according to the first protocol information, analyzes the effective data of the item corresponding to the read command A from all the second data corresponding to the read command A corresponding to any read command A, and generates response data serving as the read command A according to the identification information of the read command A and the effective data of the item corresponding to the read command A.
Also by way of example, the logic circuit further includes a merge unit; the merging unit merges the effective data of the corresponding entry of each read command A with the null bit data according to the entry length of the accessed L2P table indicated by each read command A to obtain the indicated entry of the accessed L2P table, wherein the effective data is positioned in front of the first entry by N continuous bits, and N is the length of the effective data; generating second protocol information according to the first protocol information of one or more read commands B corresponding to each read command A, and combining the item and the second protocol information to obtain data serving as a response to the read command A. For example, the length of the L2P table entry read by the read command a is 64 bits, and the valid data length is 30 bits, and then the merging unit merges the obtained 30-bit valid data with 34-bit null data to obtain the L2P table entry with 64-bit length to be read by the read command a.
Fig. 4A is a schematic diagram showing that the L2P accelerator processes the first data corresponding to each read command B to obtain the second data and the first protocol information.
By way of example, in fig. 4A, the L2P table stored in the memory includes entries 120, 121, 122, 123,124, … …, wherein the valid data of each entry of the L2P table is 30 bits, and the size of each storage unit in the memory is 64 bits. If the master sends a read command A1 to the logic circuit, wherein the read command A1 is used to read the entry 122 in the L2P table, the first 4 bits of the entry 122 in the L2P table are located in one storage location in the memory, and the last 26 bits of the entry 122 are located in another, different storage location in the memory, i.e., the entry 122 in the L2P table is located in two storage locations in the memory. After receiving the read command A1, the L2P accelerator generates read commands B11 and B12 for accessing the memory according to the read command A1, wherein the read command B11 is used for reading data of the memory cell storing the first 4 bits in the entry 122, and the read command B12 is used for reading data of the memory cell storing the last 26 bits in the entry 122. The L2P accelerator sends the read command B11 and the read command B12 to the memory controller, the memory controller reads the data of the first 4 bits of the memory cells stored in the entry 122 from the memory according to the read command B11, and adds protocol information 11 (such as AXI protocol information) to the data to generate data M11, wherein the protocol information 11 includes identification information ID11 of the read command B11, and the memory controller sends the data M11 to the L2P accelerator as a response to the read command B11; the memory controller reads the data of the memory location storing the last 26 bits in the entry 122 from the memory according to the read command B12, and adds protocol information 12 (such as AXI protocol information) to the data to generate data M12, wherein the protocol information 12 contains identification information ID12 of the read command B12, and the memory controller sends the data M12 to the L2P accelerator as a response to the read command B12. Namely, the data M11 includes the data of the first 4 bits of the memory location in the entry 122 and the protocol information 11, and the data M12 includes the data of the last 26 bits of the memory location in the entry 122 and the protocol information 12.
Further, the L2P accelerator, after receiving the response data M11 of the read command B11 and the response data M12 of the read command B12, extracts the valid data of the entry 122 from the data M11 and the data M12 according to the number of valid data bits of the entry in the L2P table. I.e. processing the data M11 extracts the first 4 bits of data in the entry 122 and the protocol information 11 and identifies the 4 bits of data as data P11. Similarly, processing the data M12 extracts the last 26 bits of data in the entry 122 and the protocol information 12, and identifies the 26 bits of data as data P12.
FIG. 4B illustrates a schematic diagram of the L2P accelerator response to each read command A.
As another example, after the L2P accelerator generates the read commands B11 and B12 for accessing the memory according to the read command A1, the correspondence between the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 is also recorded, for example, the identification information of the read command A1 is ID1, the identification information of the read command B11 is ID11, and the identification information of the read command B12 is ID 12. After receiving the response data M11 of the read command B11 and the response data M12 of the read command B12, the L2P accelerator processes the data M11 to obtain data P11 and protocol information 11, where the protocol information 11 includes identification information ID11, and the data P11 represents the first 4 bits in the entry 122 in the data M11; the data M12 is processed to obtain data P12, protocol information 12, and a tag Q, where the protocol information 12 includes identification information ID12, the tag Q identifies a position of a last bit in valid data of an entry accessing the L2P table indicated by each read command a in the corresponding data, and the data P12 represents a last 26 bits in an entry 122 in the data M12. The L2P accelerator knows that all data to be read by the read command A1 are received after the data M11 and the data M12 are received according to the protocol information 11, the protocol information 12 and the recorded relation between the identification information of the read command A1 and the identification information of the read commands B11 and B12; the valid data of the L2P table entry to be accessed corresponding to the read command A1 is data P11 and data P12, the L2P accelerator splices the data P11 and the data P12 to obtain the valid data (30 bits) of the L2P table entry to be accessed by the read command A1, and since the entry size of the L2P table perceived by the host device is 64 bits, in other words, the data size to be read by the read command A1 is 64 bits, after splicing the data P11 and the data P12, 34 bits of null bit data is added at the end of the spliced data to obtain the L2P table entry to be accessed by the read command A1. In addition, the L2P accelerator generates protocol information 2 of the read command A1, wherein the protocol information 2 contains identification information ID1 of the read command A1. The L2P accelerator, after concatenating the valid data of the L2P table entry to be accessed by the read command A1 with the 34-bit null data, also updates the position of the marker Q such that the marker Q indicates the position of the last bit of the entry to be accessed by the L2P table in the concatenated data.
As another example, after receiving a plurality of read commands a, the accelerator may process the plurality of read commands a in parallel among the parsing module, the computing module, and the command generating module in the logic circuit.
Fig. 5 illustrates a processing mechanism by which the accelerator processes multiple read commands a in parallel.
For example, in fig. 5, the accelerator receives two read commands sent by the host device, namely a read command A1 and a read command A2, respectively, wherein the accelerator generates two commands for accessing the memory according to the read command A1, namely a read command B11 and a read command B12, respectively, and the accelerator generates two commands for accessing the memory according to the read command A2, namely a read command B21 and a read command B22, respectively. The parallel processing mechanism of the accelerator will be described below taking the processing procedure of the read command A1 and the read command A2 by the accelerator as an example.
In fig. 5, T0 to T4 represent a plurality of time periods that are continuous in time, and the contents below each time period represent operations performed by the respective modules of the accelerator in that time period.
In the period of T0, the analysis module receives the read command A1, and analyzes the read command A1 to obtain an address index and identification information. After the address index and the identification information of the read command A1 are obtained, the calculation module calculates a memory address according to the address index and the identification information of the read command A1; then, after the memory address is calculated, a read command B11 and a read command B12 corresponding to the read command A1 are generated from the memory address. Next, after the read command B11 and the read command B12 corresponding to the read command A1 are generated, the relationship between the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 is stored in a second cache among the plurality of caches.
In the period T1 (the period T1 is a period after the period T0), the accelerator receives the data corresponding to the read command B11, and stores the data corresponding to the read command B11 in the third buffer.
In the period of T2 (the period of T2 is a period of time after the period of T1), the parsing module receives the read command A2, and parses the read command A2 to obtain the address index and the identification information. After the address index and the identification information of the read command A2 are obtained, the calculation module calculates a memory address according to the address index and the identification information of the read command A2; then, after the memory address is calculated, a read command B21 and a read command B22 corresponding to the read command A2 are generated from the memory address. Next, after the read command B21 and the read command B22 corresponding to the read command A2 are generated, the relationship between the identification information of the read command A2 and the identification information of the read command B21 and the read command B22 is stored in the second cache among the plurality of caches. At this time, since the read command A1 is not processed, in the second buffer, in addition to the relationship between the identification information of the read command A2 and the identification information of the read command B21 and the read command B22, the relationship between the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 is stored. In accordance with an embodiment of the present application, during the T2 period, the L2P accelerator may still process the read command A2 that is received again, although the read command A1 has not yet been processed. The L2P accelerator thus has the ability to process multiple read commands issued by the host device in parallel. Although 2 read commands A1 and A2 issued by the host are illustrated in FIG. 5, it is understood that the L2P accelerator may process a greater number of read commands from the host in parallel.
In the period T3 (the period T3 is a period after the period T2), the accelerator receives the data corresponding to the read command B12, and stores the data corresponding to the read command B12 in the third buffer. At this time, the third buffer stores data corresponding to the read command B11 and the read command B12. Further, after the accelerator receives the data corresponding to the read command B11 and the read command B12, all the data corresponding to the read command A1 are received, the accelerator processes the received data corresponding to the read command B11 and the read command B12 and splices the data to obtain an entry of the L2P table to be accessed by the read command A1, generates corresponding protocol information according to the identification information of the read command A1, and stores the protocol information and the entry of the L2P table to be accessed by the read command A1 in the seventh buffer as a response to the read command A1. At this time, since the response of the read command A1 has been obtained, the relationship between the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 in the second buffer can be deleted, and the relationship between the identification information of the unprocessed read command A2 and the identification information of the command B21 and the read command B22 remains.
In the period T4 (the period T4 is a period after the period T3), the accelerator receives data corresponding to the read command B21 and the read command B22, and stores the data corresponding to the read command B21 and the read command B22 in the third buffer. At this time, the third buffer stores the data corresponding to the read command B21 and the read command B22, that is, all the data corresponding to the read command A2 are received, the accelerator processes the received data corresponding to the read command B21 and the read command B22 and splices the data to obtain the entry of the L2P table to be accessed by the read command A2, and generates the corresponding protocol information according to the identification information of the read command A2, and stores the protocol information and the entry of the L2P table to be accessed by the read command A2 as a response to the read command A2 in the seventh buffer. At this time, since the response of the read command A2 has been obtained, the relationship between the identification information of the read command A2 and the identification information of the read command B21 and the read command B22 in the second buffer can be deleted. Since both the read command A1 and the read command A2 are processed at this time, identification information of the read command to be processed does not exist in the second buffer.
As is clear from the above, when receiving the data of the read command B11 and the read command B12, the data of the received read command B11 and the data of the received read command B12 may be discontinuous in time, that is, the accelerator may interleave processing other read commands (the read command A2) between the data of the received read command B11 and the data of the received read command B12, and thus, during the processing of the read command A1 and the read command A2, the accelerator may process the read command A1 and the read command A2 in parallel.
Fig. 6A shows a schematic structural view of another accelerator.
In fig. 6A, a plurality of caches in an accelerator, comprising: the first buffer, the second buffer, the third buffer, the fourth buffer, the fifth buffer, the sixth buffer and the seventh buffer; wherein the third cache is coupled to the fourth cache and the fifth cache, and caches the response of one or more read commands B; the fourth buffer is used for buffering protocol information of one or more read commands B; the fifth cache is used for caching partial data of the L2P table entry to be accessed by the read command A and the marker Q in the response of the one or more read commands B; the sixth cache is coupled with the fifth cache and is used for caching the corresponding entry of each read command A and the updated marker Q; the seventh buffer is coupled to the fourth buffer and the sixth buffer for buffering the response data for each read command a.
The following describes how the logic circuit buffers the data corresponding to each read command B.
By way of example, the logic circuitry, in response to receiving a response to the read command B provided by the memory controller, integrally stores the response to the read command B in the third cache; in response to storing the response to the read command B in the third buffer, first protocol information acquired from the response to the read command B in the third buffer is stored in the fourth buffer; in response to storing the response to the read command B in the third cache, obtaining second data and an identifier from the response to the read command B in the third cache and storing the second data and the identifier in the fifth cache; wherein, according to the relation between the first identification information and the second identification information stored in the second cache, in response to receiving responses to all second read commands generated according to any first read command, the merging unit obtains valid data of an entry accessing the L2P table indicated by the first read command from one or more second data in the fifth cache, merges the valid data of the entry with the null bit data according to the length of the entry to obtain the entry, and stores the entry in the sixth cache; and an update marker indicating a location in the sixth cache of the last bit of the entry or the valid data of the entry obtained; the updated tag is also stored in the sixth cache; and obtaining an entry from the sixth cache, and the updated marker, obtaining second protocol information corresponding to the first protocol information from the second cache, generating a response to the first read command according to the entry and the second protocol information, and storing the response in the seventh cache. And next, obtaining a response to the first read command from the seventh buffer and providing the response to the master device through the bus.
FIG. 6B is a schematic diagram illustrating a process of storing data in each cache in the logic circuit according to an embodiment of the application.
The logic circuit is used for receiving a read command A1 from the master device, the read command A1 is to access data corresponding to an entry 122 in the L2P table, the length of the entry 122 is 64 bits, the valid data is 30 bits, the valid data of the entry 122 is stored in two consecutive storage units in the memory, wherein the first 4 bits in the valid data of the entry 122 are stored in the previous storage unit, and the remaining 26 bits are stored in the latter storage unit. Since the valid data of the entry 122 is located in two consecutive memory cells, after receiving the read command A1, the logic circuit generates two read commands according to the read command A1, namely a read command B11 and a read command B12, where the read command B11 is used to read 64 bits (8 bytes) of data in the memory cell corresponding to the first 4 bits in the valid data of the entry 122, and the read command B12 is used to read 64 bits (8 bytes) of data in the memory cell corresponding to the remaining 26 bits in the valid data of the entry 122. In fig. 6B, the logic circuit stores the identification information of the read command A1 in the second buffer in association with the identification information of the read command B11 and the read command B12, for example, in the form of < identification information of the read command A1, identification information of the read command B11, identification information of the read command B12 >, as a procedure (6.1). The memory controller transmits the data M11 read according to the read command B11 to the L2P accelerator, the L2P accelerator controls the data M11 to be stored in the third buffer, denoted as a process (6.2), and then the L2P accelerator parses the data M11 to obtain the protocol information 11 and the data Q11, and controls the protocol information 11 to be stored in the fourth buffer, denoted as a process (6.3), and controls the data Q11 to be stored in the fifth buffer, denoted as a process (6.4), wherein the data Q11 represents 64 bits of data read from a storage unit of the memory according to the read command B11. In response to the memory controller sending the data M12 read in accordance with the read command B12 to the L2P accelerator, the L2P accelerator controls storing the data M12 in a third buffer, denoted as process (6.5), and then the L2P accelerator parses the data M12 to obtain the protocol information 12 and the data Q12, and controls storing the protocol information 12 in a fourth buffer, denoted as process (6.6), and controls storing the data Q12 in a fifth buffer, denoted as process (6.7), wherein the data Q12 represents 64 bits of data read in accordance with the read command B12 from a storage unit of the memory. Although shown as process (6.2) and process (6.5), respectively, the order in which the memory controllers provide the data M11 and M12 to the L2P accelerator is not limited according to embodiments of the application. And, between the L2P accelerator receiving data M11 and M12, response data to other read commands provided by the memory controller may also be received.
According to the relation between the protocol information 11 and the protocol information 12 stored in the fourth buffer and the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 stored in the second buffer, it is determined whether the data corresponding to the read command B11 and the read command B12 are both stored in the fifth buffer, which is represented as a process (6.8). After the data corresponding to the read command B11 and the read command B12 are both stored in the fifth buffer, the L2P accelerator obtains the valid data of the entry 122 from the fifth buffer, merges the valid data of the entry 122 with the null bit data according to the length of the entry 122 to obtain the entry 122, and stores the entry 122 in the sixth buffer; and generating a new tag Q to indicate the location in the sixth cache of the last bit of the valid data of the acquired entry 1222 or entry 122, denoted as procedure (6.9). Then, the L2P accelerator acquires the identification information of the read command A1 from the second cache according to the corresponding relation between the identification information of the read command A1 and the identification information of the read command B11 and the read command B12 recorded in the second cache, and generates corresponding protocol information 2 (comprising the identification information of the read command A1) according to the identification information, which is expressed as a process (6.10); a response to the read command A1 is generated from the generated protocol information 2 and the retrieved entry 122 from the sixth cache and stored in the seventh cache, denoted process (6.11).
According to embodiments of the present application, the L2P accelerator is assisted in being able to process responses to multiple read commands from the memory controller simultaneously by providing multiple caches to record responses to the read commands from the memory controller, and these responses need not correspond to the same read command from the host device, but may correspond to multiple read commands from the host device. For example, each response received from the memory controller is recorded in the third cache so that the response can still be cached and does not affect the receipt of other responses even though the response has not yet provided the complete entry required for the read command from the master. And after the response data in the third buffer memory are respectively moved to the fourth buffer memory and the fifth buffer memory, the part of the data in the third buffer memory can be deleted so as to reduce the occupation of the third buffer memory. For another example, the valid data of the entry to be accessed by the host device is recorded in the fifth cache, and when all the valid data of one entry to be acquired by the host device has not been received, the received partial valid data of the L2P accelerator is recorded in the fifth cache, and even if the host device issues a plurality of read commands at the same time, the portion of each read command received by the L2P accelerator is recorded in the fifth cache, thereby supporting parallel processing of the plurality of read commands issued by the host device. When the L2P accelerator receives all the effective data of an item to be acquired by the main device, the effective data is moved from the fifth buffer memory to the sixth buffer memory in time to construct an item to be accessed by the main device, and the space occupied by the effective data in the fifth buffer memory is emptied. Thus, the fifth cache functions to cache a plurality of responses of a plurality of read commands corresponding to the read commands issued by the host device. And the sixth cache functions as the entire entry to be accessed by the splice master.
The following describes other operations in each cache by taking a logic circuit to receive a read command A1 from a master device and generate a read command B11 and a read command B12 according to the read command A1 as an example.
For example, in response to acquiring the corresponding protocol information 11 from the response to the read command B11 in the third buffer and storing the protocol information in the fourth buffer, and in response to the read command B11 in the third buffer, acquiring the corresponding data Q11 and the tag Q and storing the data in the fifth buffer; the response to the read command B11 is deleted from the third cache.
As another example, in response to storing the protocol information 11 corresponding to the read command B11 in the fourth buffer, the second buffer is also accessed according to the protocol information 11 to determine whether all of the read command B11 and the read command B12 corresponding to the read command A1 have been received.
As another example, if the read command B12 corresponding to the read command A1 is not received, the number of received or not received read commands is marked.
As another example, if both the read command B11 and the read command B12 have been received, obtaining valid data of an entry accessing the L2P table indicated by the read command A1 from the data Q11 corresponding to the read command B11 and the data Q12 corresponding to the read command B12 in the fifth cache, merging the valid data of the entry with the null bit data to obtain the entry, and storing the entry in the sixth cache; and deleting the data corresponding to the read command B11 and the read command B12 and the tag Q from the fifth cache.
As another example, if both the read command B11 and the read command B12 have been received, the protocol 11 corresponding to the read command B11 and the protocol information 12 corresponding to the read command B12 are deleted from the fourth buffer.
As another example, in response to storing the entry of the access L2P table indicated by the read command A1 in the sixth cache, a response to the read command A1 is generated according to the protocol information 2 corresponding to the read command A1 and the entry obtained from the sixth cache, and stored in the seventh cache; and deleting the protocol information 2 corresponding to the read command A1 and the protocol information 11 corresponding to the read command B11 and the protocol information 12 corresponding to the read command B12 from the second cache, and deleting the entry from the sixth cache.
By way of further example, in response to storing the response to the read command A1 in the seventh cache, the response to the read command A1 is retrieved from the seventh cache and sent to the master, and the response to the read command A1 is deleted from the seventh cache.
By way of further example, in response to generating the read command B11 and the read command B12 from the read command A1, the address index of the read command A1 is deleted from the first cache.
By timely deleting the data in the cache, the utilization rate of the cache is improved, so that the capacity of the cache is not required to be large, and the processing of a plurality of concurrent read commands can be supported. For example, if the L2P accelerator supports the maximum simultaneous processing of N read commands from the host, the capacity of the sixth cache requires at most 6L 2P table entries and Q thereof, the capacity of the third cache requires at most 2N responses from the memory controller, and the capacities of the fourth and fifth caches require at most 2N data. While the seventh cache only needs to accommodate responses to 1 master read command. The capacity of the first buffer and the second buffer need to buffer N data at most. Optionally, the capacity of each cache is smaller than the above values to reduce cost without significantly reducing the concurrent processing capacity of the L2P accelerator read master read command.
As an example, the second buffer, the fourth buffer, the fifth buffer and the sixth buffer are a buffer array, where the buffer array includes a plurality of buffer units, each buffer unit is configured to store a relationship between identification information of one read command a and identification information of one or more read commands B corresponding to the buffer unit, protocol information of one or more read commands B corresponding to each read command a, response data corresponding to each read command B, or an entry and an updated tag corresponding to each read command a.
FIG. 6C is a schematic diagram illustrating a plurality of caches in a logic circuit according to an embodiment of the application.
Taking the logic circuit to receive two read commands A1 and A2 from the device and generate a read command B11 and a read command B12 according to the read command A1 and generate a read command B21 and a read command B22 according to the read command A2 as an example, in fig. 6C, the cache array corresponding to the second cache includes two cache units for caching the relationship between the identification information of the read command A1 and the identifications of the read command B11 and the read command B12, and the relationship between the identification information of the cache read command A2 and the identifications of the read command B21 and the read command B22, for example, < identifications of A1, B11, B12 >, < identifications of A2, B21, B22 >. The buffer array corresponding to the fourth buffer includes four buffer units, which are respectively used for buffering the protocol information of the read command B11, buffering the protocol information of the read command B12, buffering the protocol information of the read command B21, and buffering the protocol information of the read command B22, wherein the protocol information of the read command B11 includes a B11 identifier, the protocol information of the read command B12 includes a B12 identifier, the protocol information of the read command B21 includes a B21 identifier, and the protocol information of the read command B22 includes a B22 identifier. The fifth buffer corresponds to a buffer array comprising four buffer units for buffering the response data of the read command B11, the response data of the read command B12, the response data of the read command B21, and the response data of the read command B22, respectively. The cache array corresponding to the sixth cache includes two cache units, which are respectively used for caching the L2P table entry and the updated marker to be accessed by the read command A1, and caching the L2P table entry and the updated marker to be accessed by the read command A2.
In addition, it should be understood that in fig. 6C, after the L2P table entry to be accessed by the read command A1 is cached in the sixth cache, the B11 identifier and the B12 identifier cached in the fourth cache, the relationship between the identifier information of the read command A1 and the identifiers of the read command B11 and the read command B12 in the second cache, and the B11 data and the B12 data in the fifth cache may be deleted. Similarly, after the L2P table entry to be accessed by the read command A2 is cached in the sixth cache, the B21 identifier and the B22 identifier cached in the fourth cache, the relationship between the identifier information of the read command A2 and the identifiers of the read command B21 and the read command B22 in the second cache, and the B21 data and the B22 data in the fifth cache may be deleted.
It should be noted that, for the sake of simplicity, the present application represents some methods and embodiments thereof as a series of acts and combinations thereof, but it will be understood by those skilled in the art that the aspects of the present application are not limited by the order of acts described. Thus, those skilled in the art will appreciate, in light of the present disclosure or teachings, that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described herein may be considered as alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or all aspects of the present application. In addition, the description of some embodiments of the present application is also focused on according to the different schemes. In view of this, those skilled in the art will appreciate that portions of one embodiment of the application that are not described in detail may be referred to in connection with other embodiments.
In particular implementations, based on the disclosure and teachings of the present application, those skilled in the art will appreciate that several embodiments of the present disclosure may be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are split in consideration of the logic function, and there may be another splitting manner when actually implemented. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustical, magnetic, or other forms of signal transmission.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. An accelerator for processing read commands for coupling a host device to a memory and accelerating access by the host device to an L2P table in the memory, comprising: logic circuitry and a plurality of caches;
the logic circuit responds to the received multiple first read commands sent by the main equipment, generates one or more second read commands according to each first read command, and stores the relation between first identification information for identifying the first read command and second identification information for identifying the corresponding one or more second read commands in a cache; and in response to receiving the first data fed back from the memory based on each second read command, processing the first data to obtain second data and first protocol information, determining one or more second data corresponding to each first read command and generating second protocol information according to the first protocol information and the relation, and processing the one or more second data corresponding to each first read command to obtain an entry of the L2P table indicated by each first read command; and sending the second protocol information and its indicated entry accessing the L2P table to the master as a response to each first read command;
Wherein the memory comprises a plurality of aligned storage units, each storage unit for storing second data comprising partial data of one or more entries of an L2P table; partial data of one or more entries of the L2P table is not required to be stored in the memory in a byte boundary alignment manner; the first protocol information includes second identification information, and the second protocol information includes first identification information.
2. The accelerator of claim 1, wherein the logic circuit comprises: the system comprises an analysis module, a calculation module and a command generation module; wherein,,
the analyzing module is used for responding to the received first reading commands, analyzing each first reading command to obtain first identification information and address indexes corresponding to the first reading command, and caching the address indexes into first caches of the caches;
the computing module is coupled with the first cache and is used for computing and obtaining addresses of memories accessed by one or more second read commands corresponding to each first read command according to the address index; setting second identification information corresponding to each second read command, and storing the relation between the first identification information and one or more pieces of second identification information corresponding to the first identification information into a second cache;
The command generating module is coupled with the calculating module, generates one or more second read commands corresponding to each first read command according to the address and the second identification information, and sends the one or more second read commands to the memory.
3. The accelerator of claim 2, wherein the parsing module, the computing module, and the command generation module process a plurality of first read commands in parallel.
4. An accelerator according to any one of claims 1 to 3,
the logic circuit responds to the first data fed back by each second read command, and analyzes the first data to obtain first protocol information and second data corresponding to the first data; or first protocol information, second data, and a marker; wherein the tag is used for identifying the position of the last bit in the valid data of the entry accessing the L2P table indicated by each first read command in the corresponding second data;
the logic circuit also responds to the second data of all second read commands corresponding to any first read command received according to the first protocol information, analyzes the valid data of the item corresponding to the first read command from the second data corresponding to all second read commands corresponding to any first read command, and generates response data serving as response data to the first read command according to the first identification information of the first read command and the valid data of the item corresponding to the first read command.
5. The accelerator of claim 4, wherein the logic circuit further comprises a merge unit; the merging unit merges the effective data of the entry corresponding to each first read command with the null bit data according to the entry length of the access L2P table indicated by each first read command to obtain the entry of the access L2P table indicated by the merging unit, wherein the effective data is positioned in the first N continuous bits of the first entry, and N is the length of the effective data;
generating second protocol information according to the first protocol information of one or more second read commands corresponding to each first read command, and combining the item and the second protocol information to obtain data as a response to the first read command.
6. The accelerator of claim 5, wherein the merge unit is to update the tag in response to obtaining the entry of the access L2P table indicated by each first read command such that the updated tag indicates the entry of the access L2P table indicated by each first read command or the location of the last bit of valid data of the entry of the access L2P table indicated by each first read command.
7. The accelerator of claim 6, wherein the logic circuitry is to store a response to a second read command in the third cache in response to receiving the response to the second read command;
In response to storing a response to a second read command in the third buffer, the first protocol information obtained from the response to the second read command in the third buffer is stored in the fourth buffer;
in response to storing the response to the second read command in the third cache, acquiring second identification information from the response to the second read command in the third cache and storing the second identification information in the fourth cache;
wherein, according to the relation between the first identification information and the second identification information stored in the second cache, in response to receiving the responses to all the second read commands generated according to any first read command, the merging unit obtains valid data of an entry accessing the L2P table indicated by the first read command from one or more second data in the fifth cache, merges the valid data of the entry with the null bit data according to the length of the entry to obtain the entry, and stores the entry in the sixth cache; and an update marker indicating a location in the sixth cache of the last bit of the entry or valid data of the entry obtained; updated tags are also stored in the sixth cache;
And obtaining an entry from a sixth cache, obtaining second protocol information corresponding to the first protocol information from a second cache, generating a response to the first read command according to the entry and the second protocol information, and storing the response in the seventh cache.
8. The accelerator of claim 7, wherein the accelerator comprises a catalyst,
in response to obtaining second identification information from a response to a second read command in the third buffer and storing the second identification information in the fourth buffer, and obtaining second data and a tag from a response to a second read command in the third buffer and storing the second data and tag in the fifth buffer; deleting a response to the second read command from the third cache.
9. The accelerator of claim 8, wherein the accelerator comprises a catalyst,
if all the second read commands corresponding to the first read command for generating the second read command are received, acquiring valid data of an entry accessing the L2P table indicated by the first read command from all the second data of all the second read commands in the fifth cache, merging the valid data of the entry with null bit data to obtain the entry, and storing the entry in the sixth cache; and deleting all second data and markers of all second read commands from the fifth buffer.
10. The accelerator of any of claims 7-9, wherein the second cache, the fourth cache, the fifth cache, or the sixth cache is a cache array comprising a plurality of cache units, each cache unit to store a relationship between first identification information of each first read command and its corresponding one or more second identification information, first protocol information of one or more second read commands corresponding to each first read command, response data corresponding to each second read command, or an entry and updated tag corresponding to each first read command, respectively.
CN202210316721.8A 2022-03-28 2022-03-28 Accelerator for processing read command Pending CN116860664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210316721.8A CN116860664A (en) 2022-03-28 2022-03-28 Accelerator for processing read command

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210316721.8A CN116860664A (en) 2022-03-28 2022-03-28 Accelerator for processing read command

Publications (1)

Publication Number Publication Date
CN116860664A true CN116860664A (en) 2023-10-10

Family

ID=88225541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210316721.8A Pending CN116860664A (en) 2022-03-28 2022-03-28 Accelerator for processing read command

Country Status (1)

Country Link
CN (1) CN116860664A (en)

Similar Documents

Publication Publication Date Title
JP4908017B2 (en) DMA data transfer apparatus and DMA data transfer method
US20150142996A1 (en) Dma transmission method and system thereof
US20150143031A1 (en) Method for writing data into storage device and storage device
CN109164976B (en) Optimizing storage device performance using write caching
CN111061655B (en) Address translation method and device for storage device
CN113032293A (en) Cache manager and control component
CN110275757A (en) Multi-protocol storage is provided using system abstraction layer
CN113468083B (en) Dual-port NVMe controller and control method
CN115048034A (en) Storage space mapping method and device for SGL (serving gateway L)
CN110554833A (en) Parallel processing of IO commands in a storage device
CN114253461A (en) Mixed channel memory device
CN113485643B (en) Method for data access and controller for data writing
CN112148626A (en) Storage method and storage device for compressed data
CN116860664A (en) Accelerator for processing read command
CN111290975A (en) Method for processing read command and pre-read command by using unified cache and storage device thereof
CN111290974A (en) Cache elimination method for storage device and storage device
CN114840447B (en) Accelerator
CN110096452A (en) Non-volatile random access memory and its providing method
CN213338708U (en) Control unit and storage device
CN112988623B (en) Method and storage device for accelerating SGL (secure gateway) processing
CN117009259A (en) L2P accelerator
CN116643999A (en) L2P accelerator
CN116955228A (en) Accelerator for processing write command
CN113031849A (en) Direct memory access unit and control unit
CN113515234B (en) Method for controlling data read-out to host and controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination