CN116955228A - Accelerator for processing write command - Google Patents

Accelerator for processing write command Download PDF

Info

Publication number
CN116955228A
CN116955228A CN202210412614.5A CN202210412614A CN116955228A CN 116955228 A CN116955228 A CN 116955228A CN 202210412614 A CN202210412614 A CN 202210412614A CN 116955228 A CN116955228 A CN 116955228A
Authority
CN
China
Prior art keywords
table entry
memory
data
cache
valid data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210412614.5A
Other languages
Chinese (zh)
Inventor
王玉巧
王祎磊
谷兴杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Starblaze Technology Co ltd
Original Assignee
Chengdu Starblaze Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Starblaze Technology Co ltd filed Critical Chengdu Starblaze Technology Co ltd
Priority to CN202210412614.5A priority Critical patent/CN116955228A/en
Publication of CN116955228A publication Critical patent/CN116955228A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Abstract

The application relates to an accelerator for processing write commands, the accelerator comprising: the write channel comprises a logic circuit and a plurality of caches, and the logic circuit acquires a first address index and a first L2P table entry from a first write command; determining one or more first memory addresses and a first position of a first bit in the effective data of the first L2P table entry in a memory according to the first address index and the effective data bit number; whether or not the writing of the valid data of the first L2P table entry into the memory is completed, the logic circuitry obtains a second address index and a second L2P table entry from the second write command, and determines one or more second memory addresses and a second location in the memory of the first bit in the valid data of the second L2P table entry according to the second index address and the bit number of the valid data of the second L2P table entry.

Description

Accelerator for processing write command
Technical Field
The present application relates generally to the field of memory. More particularly, the present application relates to an accelerator that processes write commands.
Background
FIG. 1 illustrates a block diagram of a solid state storage device. The solid state storage device 102 is coupled to a host for providing storage capability for the host. The host and solid state storage device 102 may be coupled by a variety of means including, but not limited to, connecting the host to the solid state storage device 102 via, for example, SATA (Serial Advanced Technology Attachment ), SCSI (Small Computer System Interface, small computer system interface), SAS (Serial Attached SCSI ), IDE (Integrated Drive Electronics, integrated drive electronics), USB (Universal Serial Bus ), PCIE (Peripheral Component Interconnect Express, PCIE, peripheral component interconnect Express), NVMe (NVM Express), ethernet, fibre channel, wireless communications network, and the like. The host may be an information processing device capable of communicating with the storage device in the manner described above, such as a personal computer, tablet, server, portable computer, network switch, router, cellular telephone, personal digital assistant, or the like. The storage device 102 (hereinafter, solid-state storage device will be simply referred to as storage device) includes an interface 103, a control section 104, one or more NVM chips 105, and a DRAM (Dynamic Random Access Memory ) 110.
The NVM chip 105 described above includes NAND flash memory, phase change memory, feRAM (Ferroelectric RAM, ferroelectric memory), MRAM (Magnetic Random Access Memory, magnetoresistive memory), RRAM (Resistive Random Access Memory, resistive memory), and the like, which are common storage media.
The interface 103 may be adapted to exchange data with a host by way of, for example, SATA, IDE, USB, PCIE, NVMe, SAS, ethernet, fibre channel, etc.
The control unit 104 is used for controlling data transmission among the interface 103, the NVM chip 105 and the DRAM 110, and also for memory management, host logical address to flash physical address mapping, erase balancing, bad block management, etc. The control component 104 can be implemented in a variety of ways, such as software, hardware, firmware, or a combination thereof, for example, the control component 104 can be in the form of an FPGA (Field-programmable gate array, field programmable gate array), an ASIC (Application Specific Integrated Circuit, application-specific integrated circuit), or a combination thereof. The control component 104 may also include a processor or controller in which software is executed to manipulate the hardware of the control component 104 to process IO (Input/Output) commands. The control unit 104 may also include a memory controller for coupling to the DRAM 110 and having access to the data of the DRAM 110.
The control section 104 includes a flash interface controller (or referred to as a media interface, a media interface controller, a flash channel controller) that is coupled to the NVM chip 105 and issues commands to the NVM chip 105 in a manner conforming to an interface protocol of the NVM chip 105 to operate the NVM chip 105 and receive command execution results output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", and the like.
Data is typically stored and read on a page-by-page basis on NVM storage media. While data is erased in blocks. A block (also referred to as a physical block) on an NVM storage medium includes a plurality of pages. Pages on a storage medium (referred to as physical pages) have a fixed size, e.g., 17664 bytes. Physical pages may also have other dimensions.
In a storage device, FTL (Flash Translation Layer ) is utilized to maintain mapping information from logical addresses to physical addresses. The logical addresses constitute the memory space of the memory device perceived by upper software such as the operating system. The physical address is an address of a physical storage unit used to access the solid state storage device. Address mapping can also be implemented in the prior art using an intermediate address modality. For example, logical addresses are mapped to intermediate addresses, which in turn are further mapped to physical addresses. Optionally, a host accessing the storage device provides the FTL.
The table structure storing mapping information from logical addresses to physical addresses is called FTL table (also called L2P table). Typically, the data items of the FTL table record address mapping relationships in units of storage units of a specified size (e.g., 512 bytes, 2KB, 4KB, etc.) in the storage device.
As the capacity of storage devices increases, the size of L2P tables increases in order to record more storage units, thereby requiring more memory to be consumed to accommodate the L2P tables. In order to address the updated memory locations, the size of each entry of the L2P table also needs to be increased. For example, a 32-bit sized L2P table entry can address 2A 32 (4G) data units. If each data unit is 4KB in size, 2 x 32 data units correspond to a storage capacity of 16TB, and accordingly the L2P table itself is 16GB in size (4bx4g=16gb, one entry 4KB, 4G entries total, 16 GB), at least 16GB of memory space is required. While the storage device has a variety of capacities, for example, the capacity of the storage device provided to the user is, for example, 4TB, the L2P table itself may be 4GB in size. However, to provide 4TB memory space, if each data unit is 4KB, then 1G units are enough to be 30 times 2, then the number of data units that the L2P table needs to manage is 2≡30, each entry of the corresponding L2P table only needs to be 30 bits in size, and then the L2P table is 30 x 2≡30 bits (and 3.75GB, less than 4 GB). However, the CPU address channel is limited by the memory chip and the CPU addressing scheme, and the CPU address channel is addressed to a data width of an integer multiple of 32 bits or bytes, and the memory chip is also typically addressed to a data width of an integer multiple of bytes. Thus, if the L2P table entry is, for example, 30 bits in size, although the overall L2P table size is reduced, the cross-byte boundary entries therein require, for example, 2 or more bus accesses or memory accesses to load into the CPU, thereby significantly increasing the time to load the L2P table entry and limiting the performance of the memory device.
In order to reduce the amount of memory space occupied by the L2P table and reduce or eliminate the impact of non-byte aligned L2P table entries on CPU or other devices on the chip accessing the L2P table entries while providing multiple capacity storage devices, a compressed L2P table is typically provided. The entry size of the compressed L2P table provided may not be an integer multiple of bytes. And the compressed L2P table entries are closely arranged in memory without leaving unused memory space between the entries for byte alignment. But to eliminate the impact on the CPU or other device caused by the use of compressed L2P tables, the CPU or other device typically also accesses the L2P tables in its existing manner, either byte-aligned or byte-integer multiple aligned.
Disclosure of Invention
As the capacity of storage devices increases, the size of L2P tables increases in order to record more storage units, thereby requiring more memory to be consumed to accommodate the L2P tables. In order to address the updated memory locations, the size of each entry of the L2P table also needs to be increased. For example, a 32-bit sized L2P table entry can address 2A 32 (4G) data units. If each data unit is 4kb in size, 2 x 32 data units correspond to a storage capacity of 16TB, and accordingly the L2P table itself is 16GB in size (4bx4g=16gb, 4B for one entry, and a total of 4G entries, 16 GB), at least 16GB of memory space is required. While the storage device has a variety of capacities, for example, the capacity of the storage device provided to the user is, for example, 4TB, the L2P table itself may be 4GB in size. However, to provide 4TB memory space, if each data unit is 4KB, then 1G units are enough to be 30 times 2, then the number of data units that the L2P table needs to manage is 2≡30, each entry of the corresponding L2P table only needs to be 30 bits in size, and then the L2P table is 30 x 2≡30 bits (and 3.75GB, less than 4 GB). However, the CPU address channel is limited by the memory chip and the CPU addressing scheme, and the CPU address channel is addressed to a data width of an integer multiple of 32 bits or bytes, and the memory chip is also typically addressed to a data width of an integer multiple of bytes. Thus, if the L2P table entry is, for example, 30 bits in size, although the overall L2P table size is reduced, the cross-byte boundary entries therein require, for example, 2 or more bus accesses or memory accesses to load into the CPU, thereby significantly increasing the time to load the L2P table entry and limiting the performance of the memory device.
In order to reduce the amount of memory space occupied by the L2P table and reduce or eliminate the impact of non-byte aligned L2P table entries on CPU or other devices on the chip accessing the L2P table entries while providing multiple capacity storage devices, a compressed L2P table is typically provided. The entry size of the compressed L2P table provided may not be an integer multiple of bytes. And the compressed L2P table entries are closely arranged in memory without leaving unused memory space between the entries for byte alignment. But to eliminate the impact on the CPU or other device caused by the use of compressed L2P tables, the CPU or other device typically also accesses the L2P tables in its existing manner, either byte-aligned or byte-integer multiple aligned. The embodiment of the application hopes to accelerate the writing of the entry data of the L2P table into the memory by a hardware accelerator so as to share the burden of a CPU and improve the performance of the storage device.
According to a first aspect of the present application, there is provided a first accelerator for processing write commands according to the first aspect of the present application, for coupling a host device with a memory and accelerating storage of valid data of L2P table entries indicated by write commands sent by the host device into L2P tables of the memory, the accelerator comprising: a write channel, wherein the write channel comprises a logic circuit and a plurality of caches;
The logic circuit responds to a first write command sent by the main equipment, acquires a first address index and a first L2P table entry from the first write command, and stores data of the first address index and the first L2P table entry into a cache; determining one or more first memory addresses and a first position of a first bit in the effective data of the first L2P table entry according to the first address index and the effective data bit number of the first L2P table entry, and storing the mapping relation between the identification information of the first write command and the first memory addresses and the first position in a cache; storing valid data of the first L2P table entry into a cache;
the logic circuit is used for responding to the received second write command, whether the operation of writing the valid data of the first L2P table entry into the memory is completed or not, acquiring a second address index and a second L2P table entry from the second write command, and determining one or more second storage addresses and second positions of first bits in the valid data of the second L2P table entry in the memory according to the second index address and the bits of the valid data of the second L2P table entry; storing the mapping relation between the identification information of the second write command, the second memory address and the second position into a cache; storing the valid data of the second L2P table entry into a cache;
And writing the valid data of the first L2P table entry and/or the valid data of the second L2P table entry into a memory from a cache, wherein the address of the valid data of the first L2P table entry in the memory corresponds to the first memory address and the first position, and the address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the second position.
According to a first accelerator of the first aspect of the present application, there is provided a second accelerator according to the present application, further comprising: a read channel; the read channel generates one or more first read commands according to the first memory address in response to byte non-byte alignment of valid data of the first L2P table entry or byte alignment but the first position is not located at a starting position of a corresponding storage unit in the memory, stores a first mapping relationship between identification information of a first write command and identification information of one or more first read commands corresponding to the first write command in a cache, and sends the one or more first read commands to the memory; and/or in response to byte-non-byte alignment of valid data of the second L2P table entry, or byte-aligned but the second location is not located at a starting location of its corresponding storage unit in memory, generating one or more second read commands according to the second memory address, storing a second mapping relationship between identification information of a second write command and identification information of its corresponding one or more second read commands in a cache, and sending the one or more second read commands to the memory;
The logic circuit responds to the first response data of all the first read commands fed back from the memory, and combines the valid data of the first L2P table entry and part of data in the first response data according to the first position to obtain first data; or second response data of all second read commands fed back from the memory are received, and valid data of a second L2P table entry and partial data in the second response data are combined according to the second position to obtain second data; generating third data according to the protocol information stored in the cache and the first data or the second data, and sending the third data to the memory;
the memory comprises a plurality of aligned storage units, wherein each storage unit is used for storing valid data of a plurality of entries of the L2P table; valid data of the plurality of entries of the L2P table need not be stored in the memory in byte boundary alignment.
According to a second accelerator of the first aspect of the present application, there is provided a third accelerator of the present application, wherein the logic circuit is configured to respond to the first write command without corresponding one or more first read commands, and the first mapping relationship stored in the cache is identification information of the first write command and information without generating one or more first read commands; and/or responding to one or more second read commands which do not correspond to the second write command, wherein the second mapping relation stored in the cache is the identification information of the second write command and the information of the one or more second read commands which are not generated.
According to a first to third accelerator of the first aspect of the present application, there is provided a fourth accelerator of the present application, wherein the logic circuit is configured to generate fourth data from the valid data of the first L2P table entry and the protocol information in response to the valid data of the first L2P table entry being byte-aligned and a first bit in the valid data of the first L2P table entry being located at a start position of a corresponding memory cell thereof, and send the fourth data to the memory; and/or generating fifth data by responding to the valid data byte alignment of the second L2P table entry, wherein the first bit in the valid data of the second L2P table entry is positioned at the starting position of the corresponding storage unit, and transmitting the fifth data to the memory.
According to a second to fourth accelerator of the first aspect of the present application, there is provided a fifth accelerator of the present application, wherein the logic circuitry is responsive to the first L2P table entry and the second L2P table entry being adjacent in the L2P table of the memory, concatenating the valid data of the first L2P table entry with the valid data of the second L2P table entry to obtain one or more concatenated data, and updating the first mapping relationship and the second mapping relationship, before issuing the one or more first read commands,
In response to obtaining one or more pieces of spliced data, causing the read channel to generate one or more third read commands to replace the one or more first read commands according to a memory address of the spliced data; the combination of the updated first mapping relation and the second mapping relation comprises all the identifiers of the third read commands, and the identifiers of the first mapping relation and the second mapping relation for the third read commands can be the same or different.
According to a second to fifth accelerator of the first aspect of the application, there is provided a sixth accelerator of the application, the logic circuitry is responsive to identifying, after the one or more first read commands are issued, that there is a conflict with a memory address indicated by one or more first read commands before generating one or more second read commands from the second memory address.
According to a sixth accelerator of the first aspect of the present application, there is provided a seventh accelerator according to the present application, further suspending processing of subsequent other write commands in response to suspending processing of the second write command.
According to a seventh accelerator of the first aspect of the present application, there is provided an eighth accelerator according to the present application, the logic circuit resumes processing of the suspended second write command in response to receiving information that the writing of valid data of the first L2P table entry into the memory is complete.
According to first to eighth accelerators of the first aspect of the present application, there is provided a ninth accelerator according to the present application, the logic circuit comprising: the device comprises an analysis module, a calculation module and a packaging module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the parsing module is used for responding to the received first write command, parsing the first write command to obtain a first address index and a first L2P table entry, caching the first address index into a first cache of the plurality of caches and caching the first L2P table entry into a second cache of the plurality of caches;
the computing module is coupled with the first cache and is used for computing the one or more first memory addresses and the first position according to the first address index and the number of valid data bits; storing the first memory address and the first location in a third cache of the plurality of caches and storing a mapping relationship between identification information of the first write command and the first memory address and the first location in a fourth cache; and caching valid data of the first L2P table entry into a sixth cache;
the parsing module receives a second write command again, and obtains a second address index and a second L2P table entry from the second write command, and stores the second address index in the first cache and the second L2P table entry in the second cache, regardless of whether the writing of the valid data of the first L2P table entry into the memory is completed or not;
The computing module is further coupled with the fourth cache, and determines one or more second storage addresses and a second position of a first bit in the effective data of the second L2P table entry in the memory according to the second index address and the bit number of the effective data of the second L2P table entry; storing the second memory address and the second location in a third cache of the plurality of caches and storing a mapping relationship between the identification information of the second write command and the second memory address and the second location in a fourth cache; and caching valid data of the second L2P table entry into a sixth cache;
the packing module is coupled with the second buffer, the third buffer, the sixth buffer and a fifth buffer for buffering valid data byte alignment information, and writes valid data of the first L2P table entry and/or valid data of the second L2P table entry into a memory from the sixth buffer, wherein an address of the valid data of the first L2P table entry in the memory corresponds to the first memory address and the first location, and an address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the second location.
According to a ninth accelerator of the first aspect of the present application, there is provided a tenth accelerator according to the present application, the logic circuit further comprising a merging unit; the merging unit extracts the valid data of the first L2P table entry from the first L2P table entry, stores the valid data of the first L2P table entry in the sixth cache, extracts the valid data of the second L2P table entry from the second L2P table entry, and stores the valid data of the second L2P table entry in the sixth cache.
According to a tenth accelerator of the first aspect of the present application, there is provided an eleventh accelerator of the present application, wherein the merging unit is configured to, in response to a first L2P table entry being adjacent to the second L2P table entry in the L2P table of the memory, splice the valid data of the first L2P table entry with the valid data of the second L2P table entry according to the first location and the second location to obtain one or more spliced data.
According to an eleventh accelerator of the first aspect of the present application, there is provided a twelfth accelerator of the present application, in response to obtaining one or more pieces of spliced data, and the first write command has corresponding one or more first read commands and/or the second write command has corresponding one or more second read commands, the merging unit further stores identification information of the first write command and identification information of the one or more first read commands in a seventh buffer; and/or storing the identification information of the second write command and the identification information of one or more second read commands in a seventh cache.
According to an eleventh or twelfth accelerator of the first aspect of the present application, there is provided a thirteenth accelerator of the present application, in response to obtaining one or more pieces of spliced data, and neither the first write command nor the second write command corresponds to one or more read commands, the packaging module generates sixth data according to the spliced data and the protocol information, and sends the sixth data to the memory.
According to an eleventh to thirteenth accelerator of the first aspect of the present application, there is provided a fourteenth accelerator of the present application, in response to obtaining one or more pieces of spliced data, and the read channel generates one or more first read commands from the first memory address and/or one or more second read commands from a second memory address, wherein the one or more second read commands do not access the same memory address as the one or more second read commands; the packaging module responds to receiving first response data of all first read commands and/or second response data of all second read commands fed back from the memory, and the spliced data and partial data of the first response data and/or the second response data are combined to obtain seventh data; generating eighth data according to the protocol information stored in the cache and the seventh data, and sending the eighth data to the memory.
According to a fourteenth accelerator of the first aspect of the present application there is provided a fifteenth accelerator according to the present application, in response to obtaining one or more pieces of spliced data, causing the read channel to generate one or more third read commands in dependence on the memory address of the spliced data,
the merging unit updates the first mapping relation and/or the second mapping relation, wherein the combination of the updated first mapping relation and the updated second mapping relation comprises all the identifiers of the third read command, and the identifiers of the first mapping relation and the second mapping relation for the third read command may be the same or different. .
According to first to fifteenth accelerators of the first aspect of the present application, there is provided a sixteenth accelerator according to the present application, the plurality of caches including: a first buffer, a second buffer, a third buffer, a fourth buffer, a fifth buffer, a sixth buffer, a seventh buffer, an eighth buffer, a ninth buffer, and a tenth buffer; the first cache is used for caching index addresses; the second cache is used for caching L2P table entries indicated by the write command; the third buffer is used for buffering a memory address corresponding to the L2P table entry and the position of the effective data first bit of the L2P table entry in the memory; the fourth cache is used for caching the mapping relation between the identification information of the write command and the memory address and the position of the effective data first bit of the L2P table entry in the memory; the fifth cache user caches valid data byte alignment information; the sixth cache is coupled with the second cache, and caches valid data of the L2P table entry indicated by the write command; the seventh cache is used for caching the identification information of the write command and the identification information of one or more read commands corresponding to the identification information; the eighth cache is used for caching protocol information; the ninth cache is coupled with the sixth cache and is used for caching the valid data of the L2P table entry; the tenth buffer is coupled to the logic circuit and configured to buffer response data of the read command sent by the read channel.
According to a sixteenth accelerator of the first aspect of the present application, there is provided a seventeenth accelerator of the present application, wherein the logic circuit is responsive to obtaining a first address index and a first L2P table entry from a first write command, storing the first address index in the first cache and buffering the first L2P table entry in the second cache; and determining one or more first memory addresses and the first location according to the first address index and the valid data bit number, and storing the one or more first memory addresses and the first location in the third cache; storing the mapping relation between the identification information of the first write command, the first memory address and the first position in a fourth cache;
in response to caching a first L2P table entry in the second cache, obtaining valid data of the first L2P table entry from the second cache and storing the valid data in a sixth cache;
in response to a second L2P table entry in a second cache being unable to be spliced with a first L2P table entry stored in the sixth cache, moving valid data of the first L2P table entry to the ninth cache; in response to the second L2P table entry in the second cache being able to splice with the first L2P table entry stored in the sixth cache, writing valid data of the second L2P table entry into the sixth cache, and then moving the spliced valid data of the first L2P table entry and the valid data of the second L2P table entry from the sixth cache to the ninth cache;
And writing the valid data of the first L2P table entry and the valid data of the second L2P table entry from a ninth cache into a memory in response to moving the valid data of the first L2P table entry and the valid data of the second L2P table entry to the ninth cache and in response to the first write command and the second write command not having corresponding read commands.
According to a seventeenth accelerator of the first aspect of the present application, there is provided an eighteenth accelerator of the present application, wherein in response to a first write command having a corresponding one or more first read commands and/or a second write command having a corresponding one or more second read commands, the valid data of the first L2P table entry is combined with the first response data to obtain first data and/or the valid data of the second L2P table entry is combined with the second response data to obtain second data in response to storing first response data corresponding to all first read commands in the tenth buffer and/or in response to storing second response data corresponding to all second read commands in the tenth buffer; and generating third data according to the protocol information stored in the sixth cache and the first data or the second data, and sending the third data to the memory.
According to a sixteenth or seventeenth accelerator of the first aspect of the present application, there is provided an eighteenth accelerator of the present application, which responds to a first write command with corresponding one or more first read commands, and stores a first mapping relationship between identification information of the first write command and identification information of the one or more first read commands in the seventh cache; and/or in response to the second write command having the corresponding one or more second read commands, storing a mapping relationship between the identification information of the second write command and the identification information of the one or more second read commands in the seventh cache.
According to a seventeenth to nineteenth accelerator of the first aspect of the present application, there is provided a twentieth accelerator according to the present application, the first mapping relationship stored in the seventh buffer is identification information of a first write command and information that does not generate one or more first read commands in response to the first write command not having the corresponding one or more first read commands; and/or in response to the second write command not having the corresponding one or more second read commands, the second mapping relationship stored in the seventh buffer is the identification information of the second write command and the information of the one or more second read commands not generated.
According to first to twentieth accelerators of the first aspect of the present application, there is provided a twenty-first accelerator according to the present application, the plurality of caches including: the first buffer, the second buffer, the fourth buffer, the sixth buffer and the seventh buffer; the first cache is used for caching index addresses; the second cache is used for caching L2P table entries indicated by the write command; the fourth cache is used for caching the mapping relation between the identification information of the write command and the memory address and the position of the effective data first bit of the L2P table entry in the memory; the sixth cache is coupled with the second cache, and caches valid data of the L2P table entry indicated by the write command; the seventh buffer is used for buffering the identification information of the write command and the identification information of one or more read commands corresponding to the identification information.
According to a sixteenth to twenty-first accelerator of the first aspect of the present application, there is provided a twenty-second accelerator of the present application, wherein the sixth cache includes a first storage unit and a second storage unit, and the first storage unit and the second storage unit are the same size as the storage units in the memory;
in response to a second L2P table entry in a second cache being able to be spliced with the first L2P table entry stored in the sixth cache, and a first L2P table entry in the L2P table being adjacent to and preceding the second L2P table entry, a first write command being identical to a memory address to which the second write command corresponds, the merging unit storing the valid data of the first L2P table entry to a first storage unit in the sixth cache according to a location of the first bit in the valid data of the first L2P table entry in its corresponding memory storage unit, wherein a location of the first bit in the valid data of the first L2P table entry in the first storage unit is identical to a location of the first bit in the first storage unit in its corresponding memory storage unit;
And the merging unit is used for sequentially connecting the effective data of the second L2P table entry with the effective data of the first L2P table entry in the first storage unit to obtain a spliced data, wherein the position of a first bit in the effective data of the second L2P table entry in the first storage unit is the same as the position of the first bit in the corresponding storage unit of the storage unit.
According to a twenty-second accelerator of a first aspect of the present application, there is provided a twenty-third accelerator of the present application, wherein, in response to a first write command and a memory address corresponding to a second write command being different, the merging unit stores the valid data of a first L2P table entry in the first storage unit according to a position of a first bit in the valid data of the first L2P table entry in the corresponding memory storage unit thereof, so as to obtain first spliced data, and stores the valid data of the first L2P table entry in the second storage unit according to a position of a first bit in the valid data of a second L2P table entry in the corresponding memory storage unit thereof, so as to obtain second spliced data.
According to a twenty-third accelerator of the first aspect of the present application, there is provided a twenty-fourth accelerator of the present application, wherein in response to a first write command and a memory address portion corresponding to a second write command being identical, a valid data portion of the second L2P table entry is stored in a first storage unit, and is sequentially connected end to end with valid data of the first L2P table entry to obtain first spliced data, and another portion of the valid data of the first L2P table entry is stored in the second storage unit to obtain second spliced data.
According to a sixteenth to twenty-fourth accelerator of the first aspect of the present application, there is provided the twenty-fifth accelerator of the present application, wherein in response to the merging unit storing the valid data of the first L2P table entry in the first storage unit with a part of the valid data of the second L2P table entry stored in the first storage unit between the start position and the first bit of the valid data of the first L2P table entry being smaller than the number of bits of the valid data of the second L2P table entry, the merging unit obtains a part of the spliced data between the start position and the first bit of the valid data of the first L2P table entry, moves the obtained part of the spliced data to a ninth buffer, and moves the remaining part of the valid data of the second L2P table entry to the sixth buffer.
According to a sixteenth to twenty-fifth accelerator of the first aspect of the present application, there is provided a twenty-sixth accelerator according to the present application, in response to caching data of a second L2P table entry in the second cache and the second L2P table entry being unable to be spliced with the first L2P table entry, moving valid data of the first L2P table entry into the ninth cache, and moving valid data of the second L2P table entry into the sixth cache.
According to a sixteenth to twenty-sixth accelerator of the first aspect of the present application, there is provided a twenty-seventh accelerator of the present application, in response to the valid data of the first L2P table entry being identical to a memory address corresponding to the valid data of the second L2P table entry and a position of a first bit in a memory, the merging unit overwrites the valid data of the first L2P table entry in the first storage unit with the valid data of the second L2P table entry to obtain a spliced data.
According to a twenty-seventh accelerator of the first aspect of the present application, there is provided a twenty-eighth accelerator of the present application, wherein the merging unit further moves the data in the first storage unit and/or the second storage unit as a whole into the ninth cache in response to obtaining one or two pieces of spliced data.
According to a twenty-eighth accelerator of the first aspect of the present application, there is provided a twenty-ninth accelerator according to the present application, wherein the sixth cache includes a first storage unit and a second storage unit, and the sizes of the first storage unit and the second storage unit are the same as the sizes of the storage units in the memory;
The merging unit stores the valid data of the first L2P table entry into a first storage unit in the sixth cache according to the position of a first bit of the valid data of the first L2P table entry in a corresponding memory storage unit;
the merging unit is used for responding to caching the data of the first L2P table entry in the second cache, acquiring the valid data of the first L2P table entry from the second cache, and storing the valid data of the first L2P table entry into a first storage unit in the sixth cache according to the position of a first bit of the valid data of the first L2P table entry in a corresponding memory storage unit of the first L2P table entry, wherein the position of the first bit of the valid data of the first L2P table entry in the first storage unit is the same as the position of the first bit of the valid data of the first L2P table entry in the corresponding memory storage unit; in response to caching data of a second L2P table entry in the second cache, and the first L2P table entry being adjacent to the second L2P table entry in the L2P table, storing the valid data of the second L2P table entry to a first storage location and/or a second storage location in the sixth cache according to a position of a first bit of the valid data of the second L2P table entry in its corresponding memory storage location, wherein the position of the first bit of the valid data of the second L2P table entry in the first storage location and/or the second storage location is the same as its position in the corresponding memory storage location, and the valid data of the first L2P table entry and the valid data of the second L2P table entry are adjacent and non-overlapping in the sixth cache.
According to a sixteenth to twenty-ninth accelerator of the first aspect of the present application, there is provided the thirty-first accelerator of the present application, after the read channel issues the one or more first read commands, the logic circuit is responsive to the one or more second read commands colliding with memory addresses indicated by the one or more first read commands, to not store identification information of the second write command and identification information of its corresponding one or more second read commands in the seventh cache and to not store valid data of the second L2P table entry in the sixth cache to suspend processing of the second write command, and to also suspend processing of subsequent write commands.
According to a thirty-first accelerator of the first aspect of the present application, there is provided the thirty-first accelerator of the present application, the logic circuit, in response to receiving information that the writing of valid data of the first L2P table entry into the memory is completed, resumes processing of the second write command, stores identification information of the second write command and identification information of one or more second read commands corresponding thereto in a seventh cache, and stores valid data of a second L2P table entry in a sixth cache; and moving the valid data of the second L2P table entry to a ninth cache to write the valid data of the second L2P table entry into the memory.
According to a thirty-first accelerator of the first aspect of the present application, there is provided a thirty-second accelerator according to the present application, wherein the fourth buffer stores a mapping relationship between identification information of one or more write commands and a storage address and a location in the memory of a first bit of valid data of an L2P entry indicated by the storage address in the form of a lookup table.
According to a sixteenth to thirty-second accelerator of the first aspect of the present application, there is provided a thirty-third accelerator according to the present application, wherein the mapping relationship between the identification information of the first write command and the first memory address and the first location is deleted in response to the fourth information of completion of writing the valid data into the memory of the first L2P table entry.
According to a second to thirty-third accelerator of the first aspect of the present application, there is provided a thirty-fourth accelerator according to the present application, wherein the valid data of each entry of the L2P table is connected end to end in sequence, and stored in each storage unit in the memory according to the size and address of the storage unit; the valid data of a portion of the entries in the memory are not aligned by memory location and/or byte alignment.
According to the first to thirty-fourth accelerators of the first aspect of the present application, there is provided a thirty-fifth accelerator according to the present application, the protocol information is AXI protocol information.
According to a second aspect of the present application, there is provided a control unit according to the second aspect of the present application, comprising the accelerator as set forth in any one of the first to thirty-fifth aspects.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a block diagram of a prior art solid state storage device;
FIG. 2A is a schematic diagram of a control unit according to an embodiment of the present application;
FIG. 2B is a schematic diagram of an accelerator according to an embodiment of the present application;
FIG. 2C is a schematic diagram illustrating the conversion between L2P table entries perceived by the host device and L2P table entries stored in the memory provided by the present application;
FIG. 2D illustrates a schematic diagram of an accelerator processing multiple write commands in parallel, provided by an embodiment of the present application;
FIG. 2E illustrates another embodiment of the present application for an accelerator to process multiple write commands in parallel;
FIG. 2F is a diagram showing the combination of valid data of an L2P table entry indicated by a write command and response data fed back by a memory according to an embodiment of the present application;
FIG. 3 is a schematic diagram showing a mapping relationship between updated write command identification information and read command identification information in a process of splicing L2P table entries indicated by a plurality of write commands according to an embodiment of the present application;
FIG. 4A is a schematic diagram of another accelerator according to an embodiment of the present application;
FIG. 4B is a schematic diagram of another accelerator according to an embodiment of the present application;
FIG. 5A is a schematic diagram showing the concatenation of valid data of a plurality of L2P table entries according to an embodiment of the present application;
FIG. 5B is a schematic diagram showing another embodiment of the present application for concatenating valid data from multiple L2P table entries;
FIG. 5C is a schematic diagram showing another embodiment of the present application for concatenating valid data from multiple L2P table entries;
FIG. 5D is a schematic diagram showing another embodiment of the present application for concatenating valid data from multiple L2P table entries;
FIG. 5E is a schematic diagram showing another embodiment of the present application for concatenating valid data from multiple L2P table entries;
FIG. 5F is a schematic diagram showing another embodiment of the present application for concatenating valid data from multiple L2P table entries;
FIG. 6A is a schematic diagram showing a combination of spliced data and response data to obtain combined data according to an embodiment of the present application;
FIG. 6B is a schematic diagram showing another embodiment of the present application in which spliced data and response data are combined to obtain combined data;
FIG. 7 is a schematic diagram of an accelerator for processing multiple write commands by controlling multiple caches according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any invasive effort, are intended to be within the scope of the application.
Fig. 2A shows a schematic structural diagram of a control unit according to an embodiment of the present application.
In fig. 2A, the control means includes a master device, an accelerator, and a slave device. As an example, the host device is a CPU, a media interface controller, or a processing core; the slave device is a memory controller. The master device and the accelerator and/or the accelerator and the slave device are coupled, for example, by a bus. As another example, the accelerator in the embodiment of the present application may be an L2P accelerator, which is configured to accelerate storing valid data of an L2P table entry indicated by a write command sent by a master device into an L2P table of a memory.
The control unit is also coupled to an external memory (DRAM of fig. 2A), which the memory controller uses to access. By way of example, the accelerator includes a slave device interface with a master device interface. The accelerators are coupled to the bus through slave interfaces and master interfaces, respectively. Whereby one or more master devices (e.g., CPU, media interface controller) of the control unit are given access to the accelerator via the slave device interface, with the accelerator being accessed as a bus slave device. And the accelerator is enabled as one or more slaves (e.g., a memory controller) of the master access control component via the master interface.
By way of example, a memory external to the control unit is used to store the L2P table, and the master device may write entry data of the L2P table to the L2P table of the memory. The master device issues a write command indicating entry data of the L2P table to the bus. The bus sends the write command to an accelerator coupled to the bus. The accelerator determines the storage position of the entry data of the L2P table according to the address index indicated in the received write command, and sends the entry data of the L2P table indicated by the write command to the slave device (such as a memory controller) through the bus, the slave device sends the received entry data of the L2P table to the memory, and the memory stores the entry data of the L2P table in the L2P table according to the storage position of the entry data of the L2P table.
Fig. 2B shows a schematic structural diagram of an accelerator according to an embodiment of the application.
In fig. 2B, the accelerator includes a write channel, which refers to a circuit that forms a data path for writing data into a memory. In the solution provided in this application embodiment, in order to reduce the memory space occupied by the L2P table and reduce or eliminate the effect of the non-byte aligned L2P table entry on the CPU or other devices in the chip accessing the L2P table entry, the L2P table stored in the memory is a compressed L2P table, where the size of the entry of the compressed L2P table may not be an integer multiple of the byte. And the compressed L2P table entries are closely arranged in memory without leaving unused memory space between the entries for byte alignment.
Further, in order to enable the memory to store the compressed L2P table, the accelerator provided by the embodiment of the present application processes the data of the L2P table entry indicated by the write command, extracts only the valid data of the L2P table entry and sends the valid data to the memory for storage, while the invalid data of the L2P table entry is not provided to the memory. In addition, the memory includes a plurality of aligned memory locations, each memory location for storing valid data for a plurality of entries of the L2P table, the valid data for the plurality of entries of the L2P table need not be stored in the memory in a byte boundary aligned manner.
FIG. 2C is a schematic diagram illustrating the conversion between L2P table entries and L2P table entries stored in memory as perceived by a master device in accordance with the present application.
The L2P table stored in memory (SRAM or DRAM) includes a plurality of entries, the entries of the L2P table being addressed by a logical address (noted LBA). In fig. 2C, the L2P table entries perceived by the host device correspond to the L2P table entries stored in the memory one-to-one, so the L2P table entries perceived by the host device have the same number of entries as the L2P table entries stored in the memory, e.g., the L2P table includes 8 entries, namely, entry 0, entry 1, entry 2, entry 3, entry 4, entry 5, entry 6, and entry 7, respectively. The L2P table entry perceived by the master device is M bits in size, and the L2P table entry stored in the memory is N bits in size, wherein M and N are both positive integers. It should be appreciated that the L2P table entry perceived by the master may be either an L2P table entry to be written by the master via a write command or an L2P table entry read by the master via a read command. The L2P table perceived by the master is also referred to as a logical L2P table.
To facilitate, for example, a CPU accessing a logical L2P table, the size of M is, for example, an integer multiple of bytes (e.g., 8 bytes) such that entries of the logical L2P table are aligned by 8 bytes or bytes. In fig. 2C, each entry in the L2P table perceived by the host is M bits in size (m=64 in the example of fig. 2A) as seen from the CPU accessing the logical L2P table, and the entries of the L2P table perceived by the host are arranged in end-to-end order in the storage space, and indexing the storage space of the L2P table perceived by the host with the logical address (LBA) results in the corresponding L2P table entry, e.g., the data of the L2P table entry is as follows size (L2P entry) represents the size of an L2P table entry, e.g., 64 bits; />Representing a rounding down. Recorded in an entry of the L2P table is an address for the NVM chip (referred to as a physical address, denoted PBA). Since the L2P table as perceived by the master is aligned in bytes or bytes, the start address of each entry in the memory space is at the start of bytes or an integer multiple of 8 bytes, and the end of an entry is at the end of bytes or an integer multiple of 8 bytes. In the example of fig. 2C, when the CPU accesses the corresponding entry of the L2P table, the address of the corresponding entry of the L2P table (64 bits for 8 bytes) is obtained, for example, at lba×8, based on the logical address (LBA) as an index.
Since some or all of each entry in the L2P table is valid data as perceived by the master device. When all entries in the L2P table perceived by the master device are valid data, N is equal to M; when each entry in the L2P table perceived by the host is partially valid data and partially null bit data, the L2P table entry size N stored by the memory is equal to the valid data in the L2P table entry perceived by the host, and the valid data in the L2P table entry perceived by the host is determined based on the number of data units (e.g., pages) provided by the addressed NVM chip. For example, to address 2≡30 data units, then N is 30. Generally, if an entry of the L2P table stored by the memory can address one of the 2N data units, then N=n. As an example, in fig. 2C, n=30. The L2P table stored in the memory stores the effective data in each item, the effective data of each item is stored in the storage space provided by the memory according to the head-to-tail grounding, and unused storage space is not reserved between adjacent items. So that the starting and/or ending locations of some entries in memory are not located at byte boundaries.
Returning to FIG. 2B, by way of example, to send valid data of an L2P table entry to memory, the write channel includes logic and multiple caches. The master device sends write commands to the write channel, denoted as procedure (2.1), in which data interaction between the master device and the accelerator can take place via a bus, e.g. an AXI bus; in addition, the write command indicates the data of the L2P table entry and the address index (e.g., logical address LBA). The logic circuit in the write channel responds to the received write command sent by the main equipment, acquires the address index and the data of the L2P table entry indicated by the address index from the write command, and stores the address index and the data of the L2P table entry into a cache; one or more memory addresses (e.g., one or more memory location addresses) in memory that store valid data for the L2P table entry are then calculated from the address index. After calculating the one or more memory addresses storing the valid data of the L2P table entry, the logic circuitry further determines a location of the first bit of the valid data of the L2P table entry in its corresponding memory location according to the number of valid data bits of the L2P table entry and stores the one or more memory addresses and the location of the first bit of the valid data in the memory into the cache, denoted as process (2.2).
Further, since the accelerator and the master device and the accelerator and the slave device all transmit data through the bus, the data transmitted between the accelerator and the master device and the data transmitted between the accelerator and the slave device all need to satisfy the data transmission mode defined by the bus protocol (such as AXI protocol). For example, the bus protocol defines that data transmitted through the bus needs to be transmitted in a byte alignment manner, for example, the bit width of the transmitted data is an integer multiple of bytes, the bit width of the transmitted data is an integer multiple of 8 bytes, and the like, that is, the accelerator provided by the application provides valid data of the L2P table entry to the slave device, and if the valid data of the L2P table entry is not byte aligned or the first bit is not located at the starting position of the corresponding storage unit in the memory, the accelerator needs to perform a read-write operation, where the read-write operation includes that the accelerator reads data from the memory according to the memory address corresponding to the L2P table entry, combines the read data with the valid data of the L2P table entry to obtain combined data (meeting the byte alignment), and then rewrites the combined data into the memory. Since the read-write operation involves reading data from the memory, in FIG. 2B the accelerator also includes a read channel, where the read channel refers to the circuit that forms the data path for reading data from the memory. Referring to fig. 2B, the read-write operation includes processes (2.3) to (2.10). As an example, the read channel generates one or more read commands from one or more memory addresses and sends the one or more read commands to the memory in response to the valid data byte alignment and the first bit not being at a starting location or non-byte alignment of its corresponding memory location in the memory. Logic circuitry, responsive to receiving response data from the memory based on feedback from each read command, combines the valid data with a portion of the response data based on the location of the first bit in the memory to obtain a data (byte aligned data); another data is generated from the protocol information stored in the cache and the data, and the another data is sent to the memory, denoted as procedure (2.9) and procedure (2.10). In addition, after the valid data of the L2P table entry indicated by the write command is written into the memory, the write channel also transmits feedback information to the host device to indicate that the write command processing is completed, denoted as procedure (2.11).
As another example, if the valid data bytes of the L2P table entry indicated by the write command are aligned and the first bit is located at the start position of the corresponding memory location, the logic circuit directly generates third data according to the valid data and the protocol information of the L2P table entry, and sends the third data to the memory according to the memory address, which is represented as a procedure (2.9) and a procedure (2.10). That is, in the scheme provided by the embodiment of the application, the method is executed when the valid data of the L2P table entry is aligned in a non-byte manner or the first bit is not positioned at the starting position of the corresponding storage unit in the memory
Read-write operations (procedures (2.3) to (2.10)).
Further, the accelerator may need a plurality of cycles from receiving a write command to writing the L2P table entry indicated by the write command into the memory, and before a write command is processed, the accelerator may also receive one or more other write commands.
FIG. 2D illustrates a schematic diagram of an accelerator processing multiple write commands in parallel, according to an embodiment of the present application.
By way of example, in fig. 2D, the accelerator receives two write commands sent by the host device, write command A1 and write command A2, respectively. The parallel processing mechanism of the accelerator will be described below taking the processing procedure of the write command A1 and the write command A2 by the accelerator as an example.
In fig. 2D, T0-T2 represent time periods that are consecutive in time, with larger data following the label T representing later times, and the content below each period representing the operations performed by the write channel in the accelerator during that period.
In the T0 period, the logic circuit receives the write command A1, parses the write command A1 to obtain the address index 1 and the L2P table entry 120, and then stores the address index 1 and the L2P table entry 120 in the caches of the write channel, which should be understood that the address index 1 and the L2P table entry 120 may be stored in the same cache of the caches of the write channel, or may be stored in different caches of the caches, which is not limited herein. Next, the logic circuit determines one or more memory addresses 1 and a position 1 of a first bit in the valid data of the L2P table entry 120 in the memory according to the address index 1 and the valid data bit number of the L2P table entry transmitted between the host device and the accelerator, and stores the mapping relationship between the identification information of the write command A1 and the memory addresses 1 and the position 1 in the cache; valid data of the L2P table entry 120 is stored in the cache. As an example, the number of valid data bits of the L2P table entry transmitted between the host and the accelerator may be configured in advance.
During the T1 period (the T1 period is a period after the T0 period), the logic circuit also receives the write command A2; in the time period between the time period T0 and the time period T1, the logic circuit responds to the valid data byte alignment of the L2P table entry 120, and the first position of the valid data is located at the starting position of the corresponding storage unit in the memory, generates a data according to the valid data and the protocol information of the L2P table entry 120, sends the data to the memory, and receives the information of the completion of the processing of the write command A1 fed back by the memory in the time period, namely, the logic circuit completes the processing of the write command A1 before receiving the write command A2; or in a period between the period T1 and the period T2 (the period T2 is a period after the period T1), the logic circuit responds to the valid data byte alignment of the L2P table entry 120, and the first position of the valid data is located at the starting position of the corresponding storage unit in the memory, generates a data according to the valid data and the protocol information of the L2P table entry 120, sends the data to the memory, and receives the information that the processing of the write command A1 fed back by the memory is completed in the period, that is, the logic circuit does not complete the processing of the write command A1 before receiving the write command A2. In the scheme provided by the embodiment of the application, when the logic circuit receives the write command A2, no matter whether the write command A1 is processed or not, the logic circuit can process the write command A2, that is, in the period of T1, no matter whether the operation of writing the valid data of the L2P table entry 120 into the memory is completed or not, the logic circuit analyzes the write command A2 to obtain the address index 2 and the L2P table entry 121, and then stores the address index 2 and the L2P table entry 121 into the caches of the write channel, and it should be understood that the address index 2 and the L2P table entry 121 can be stored in the same cache of multiple caches of the write channel or in different caches of the multiple caches, which is not limited herein. Next, the logic circuit determines one or more memory addresses 2 and a location 2 of a first bit in the valid data of the L2P table entry 121 according to the address index 2 and the valid data bit number of the L2P table entry transmitted between the host and the accelerator, and stores the mapping relationship between the identification information of the write command A2 and the memory addresses 2 and the location 2 into the cache; valid data of the L2P table entry 121 is stored in the cache.
According to an embodiment of the present application, during the T1 period, the accelerator may still process the write command A2 that is received again, regardless of whether the writing of valid data of the L2P table entry 120 to memory is complete. So that the accelerator has the ability to process multiple write commands issued by the host device in parallel. Although 2 write commands A1 and A2 issued by the host are illustrated in FIG. 2D as an example, it is understood that the accelerator may process a greater number of write commands from the host in parallel.
Further, it has been taught above that the accelerator needs to perform a read and write operation in writing the valid data of the L2P table entry to the memory, either for byte alignment of the valid data of the L2P table entry or byte alignment but with the first bit not located at the beginning of its corresponding memory location in the memory. When an accelerator processes multiple write commands in parallel, a read-write operation on the multiple write commands may be involved. In order to facilitate understanding, the following description will also briefly discuss the process of performing a read-write operation in the process of parallel processing of a plurality of write commands by the accelerator, taking the process of processing the write commands A1 and A2 by the accelerator as an example.
FIG. 2E illustrates another embodiment of the present application for an accelerator to process multiple write commands in parallel.
As an example, in fig. 2E, the accelerator receives two write commands sent by the host device, namely, a write command A1 and a write command A2, respectively, where the accelerator generates two commands for accessing the memory, namely, a read command B11 and a read command B12, according to the write command A1, and the accelerator generates two commands for accessing the memory, namely, a read command B21 and a read command B22, according to the write command A2.
In fig. 2E, T0 to T4 represent a plurality of time periods that are continuous in time, and the contents below each time period represent operations performed by the components in the accelerator during that time period.
In the period of T0, the logic circuit receives the write command A1, parses the write command A1 to obtain an address index 1 and an L2P table entry 120, then the logic circuit determines a location 1 of one or more memory addresses 1 and a first bit in valid data of the L2P table entry 120 in the memory according to the address index 1 and a valid data bit of the L2P table entry transmitted between the host device and the accelerator, then the read channel generates a read command B11 and a read command B12 according to one or more memory addresses 1 corresponding to the write command A1, and stores a mapping relationship between identification information of the write command A1 and identification information of the read command B11 and the read command B12 in a cache, for example, in a form of < A1 identification, B11 identification, B12 identification > and sends the read command B11 and the read command B12 to the memory.
In the period of T1 (the period of T1 is a period of time after the period of T0), the logic circuit further receives a write command A2, parses the write command A2 to obtain an address index 2 and an L2P table entry 121, then the logic circuit determines a location 2 in the memory of the first bit in the valid data of one or more memory addresses 2 and the L2P table entry 121 according to the address index 2 and the number of valid data bits of the L2P table entry transmitted between the host and the accelerator, then the read channel generates a read command B21 and a read command B22 according to the one or more memory addresses 2, stores a mapping relationship between identification information of the write command A2 and identification information of the read command B21 and the read command B22 in the cache, and sends the read command B21 and the read command B22 to the memory. At this time, since the write command A1 is not processed, the relationship between the identification information of the write command A1 and the identification information of the read command B11 and the read command B12 is stored in addition to the relationship between the identification information of the write command A2 and the identification information of the read command B21 and the read command B22 in the cache.
In the period T2 (the period T2 is a period after the period T1), the accelerator receives the response data corresponding to the read command B11, and stores the response data corresponding to the read command B11 in the cache.
In the period T3 (period T3 is a period after the period T2), the accelerator receives the response data corresponding to the read command B12, and stores the response data corresponding to the read command B12 in the cache. At this time, the data corresponding to the read command B11 and the read command B12 are stored in the cache. Further, after the accelerator receives the response data corresponding to the read command B11 and the read command B12, all the data to be read from the memory corresponding to the write command A1 is received, the accelerator combines the valid data of the L2P table entry 120 with the response data portion data corresponding to the read command B11 and the read command B12, generates new data by the combined data and the protocol information, and sends the new data to the memory. At this time, since all the data to be read by the write command A1 has been obtained, the relationship between the identification information of the write command A1 and the identification information of the read command B11 and the read command B12 in the cache can be deleted, and the relationship between the identification information of the unprocessed write command A2 and the identification information of the command B21 and the read command B22 remains.
In the period T4 (the period T4 is a period after the period T3), the accelerator receives the response data corresponding to the read command B21 and the read command B22, and stores the response data corresponding to the read command B21 and the read command B22 in the cache. At this time, the cache stores the response data corresponding to the read command B21 and the read command B22, that is, all the data to be read corresponding to the write command A2 are received, the accelerator combines the valid data of the L2P table entry 121 with the response data portion corresponding to the read command B21 and the read command B22, generates new data by combining the data and the protocol information, and sends the data to the memory. At this time, since all the data to be read by the write command A2 has been obtained, the relationship between the identification information of the write command A2 and the identification information of the read command B21 and the read command B22 in the cache can be deleted. Since at this time, both the write command A1 and the write command A2 are processed, the identification information of the write command to be processed does not exist in the cache.
As another example, if the valid data bytes of the L2P table entry 120 indicated by the write command A1 are aligned and the first bit of the valid data is located at the starting position of the corresponding memory location in the memory, then there is no need to generate one or more read commands according to the write command A1, and the identification information of the write command A1 and the information that does not generate one or more read commands, for example, < identification of the write command A1, are still stored in the memory. Similarly, if the valid data bytes of the L2P table entry 121 indicated by the write command A2 are aligned, and the first bit of the valid data is located at the starting position of the corresponding storage unit in the memory, it is not necessary to generate one or more read commands according to the write command A2, and the identification information of the write command A2 and the information that does not generate one or more read commands, for example, < identification of the write command A2, do not generate a read command >, still need to be stored in the cache.
FIG. 2F is a diagram showing the combination of the valid data of the L2P table entry indicated by the write command and the response data fed back by the memory according to the embodiment of the present application.
By way of example, in FIG. 2F, the write command A1 received by the accelerator indicates an L2P table entry 120, wherein the L2P table entry 120 is 64 bits in size, the valid data is 30 bits in size, and the memory locations in which the valid data of the L2P table entry is stored are also 64 bits (0-63); the valid data of the L2P table entry 120 is stored in the memory from the 31 st bit to the 60 th bit of the first memory location, i.e., the valid data of the L2P table entry 120 is aligned in a non-byte manner, and the first bit of the valid data is located in the 31 st bit of the first memory location in the memory and is not located in the beginning of its corresponding memory location in the memory. The accelerator is required to perform a read-write operation when writing valid data of the L2P table entry 120 into memory. The read channel generates a read command according to the address of the first storage unit in the memory and sends the read command to the memory controller, the memory controller reads 64-bit data of the first storage unit from the memory and sends the 64-bit data to the accelerator, and the accelerator combines the valid data of the L2P table entry 120 with the data from the 0 th bit to the 30 th bit in the first storage unit and the data from the 61 st bit to the 63 st bit in the first storage unit to obtain data A, wherein the valid data of the L2P table entry 120 is located in the 31 st bit to the 60 th bit in the data A.
Further, in the scheme provided by the embodiment of the application, for the L2P table entry indicated by the write command, the accelerator may perform a write operation once after receiving a write command, and write the valid data of the L2P table entry indicated by the write command into the memory. In order to save bus bandwidth resources or reduce the amount of data accessing the memory, the accelerator may also wait for receiving the L2P table entries indicated by the plurality of write commands, and then perform a write operation to write the valid data of the plurality of L2P table entries into the memory. As another example, after waiting to receive the L2P table entries indicated by the plurality of write commands, the accelerator may splice the valid data of the plurality of L2P table entries to obtain one or more parts of spliced data, and write the one or more parts of spliced data into the memory by executing a write operation; the valid data of the L2P table entries may be written into the memory directly according to the memory address corresponding to each L2P table entry and the position of the first bit of the valid data in the memory without splicing the valid data of the L2P table entries. The following describes a splicing process taking the processing procedure of the write command A1 and the write command A2 by the accelerator as an example.
Fig. 3 is a schematic diagram showing a mapping relationship between updated write command identification information and read command identification information in a process of splicing L2P table entries indicated by a plurality of write commands in the scheme provided by the embodiment of the present application.
For example, the L2P table entry 120 corresponding to the write command A1 and the L2P table entry 121 corresponding to the write command A2 are adjacent L2P table entries in the L2P table, so the accelerator may splice the L2P table entry 120 and the L2P table entry 121. In addition, the read channel generates a read command B11 and a read command B12 according to the memory address 1 corresponding to the write command A1 in the process of processing the write command A1, and stores the mapping relationship between the identification information of the write command A1 and the identification information of the read command B11 and the read command B12 in the cache; in the process of processing the write command A2, the read channel generates a read command B21 and a read command B22 according to the memory address 2 corresponding to the write command A2, and stores the mapping relationship between the identification information of the write command A2 and the identification information of the read command B21 and the read command B22 in the cache.
Before the accelerator issues the read command B11 and the read command B12 to the memory, the logic circuit splices the valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 to obtain one or more spliced data (see the specific splicing process and the splicing result below), and updates the mapping relationship between the identification information of the write command A1 and the identification information of the read command B11 and the read command B12 and the mapping relationship between the identification information of the write command A2 and the identification information of the read command B21 and the read command B22. For example, in fig. 3, the logic circuit may make the read channel generate the read command C1 and the read command C2 according to the memory address corresponding to the spliced data to replace the read command B11, the read command B12, the read command B21 and the read command B22, where the updated mapping relationship includes the identification information of the read command C1 and the read command C2, and the updated mapping relationship includes: identification information of write command a1→identification information of read command C1 and read command C2, identification information of write command a2→identification information of read command C1 and read command C2. It should be understood that the identification information of the read command corresponding to the write command A1 in the updated mapping relationship is the same as or different from the identification information of the read command corresponding to the write command A2.
In order to replace the read commands B11, B12, B21 and B22 with the read commands C1 and C2, it is necessary to wait a period of time (noted as threshold Tw) after the generation of the read commands B11 and B12 to identify whether further write commands A2 which can be merged with the write command A1 are received again. Alternatively, instead of replacing the read commands B11, B12, B21 and B22 with the read commands C1 and C2, after the read commands B21 and B22 are generated, depending on whether the data to be read by the read commands B21 and B22 can be read by the read commands B11 and/or B12, if the data read by the read commands B11/B12 already contain the data to be read by the read commands B21/B22, the read commands B21/B22 do not need to be issued to the memory, but only the read commands B11/B12 are issued to the memory, thereby reducing the occupation of bus bandwidth and the access load to the memory. It will be appreciated that the data to be read by one of the read commands B21/B22 is contained in the data read by the read command B11/B12, then only the read command whose read data was not contained by the preceding read command (B11/B12) is issued to the memory. Accordingly, the mapping relation recorded in the cache may not need to be updated, namely, the identification information of the write command A1, the identification information of the read command B11 and the read command B12, and the identification information of the write command A2, the identification information of the read command B21 and the read command B22.
Also for example, in parallel processing of the write command A1 and the write command A2, after issuing the read command B11 and the read command B12 to the memory, when the latency (threshold Tw) of the merge write command is exceeded, or the write command corresponding to the read-write operation of the write command A1 has been issued to the memory, in response to generating the read command B21 and the read command B22 from the memory address 2, the accelerator recognizes that there is a conflict between the memory addresses corresponding to the read command B11 and the read command B12 and the memory addresses indicated by the read command B21 and the read command B22, that is, there is a conflict between the memory addresses corresponding to the read command B11 and the read command B12 and the one or more memory addresses 2 corresponding to the write command A2, for example, the memory address indicated by the read command B11 is the same as one memory address 2, and the accelerator suspends the processing of the write command A2. To avoid interaction with write commands A1 and A2, resulting in data errors. In addition, the accelerator also pauses processing of other write commands received after write command A2 in response to pausing processing of write command A2. By way of further example, the accelerator resumes processing of the suspended write command A2 in response to receiving information that the write command A1 processing is complete, e.g., information that the valid data of the L2P table entry 120 is complete to write to memory.
Fig. 4A shows a schematic structural diagram of another accelerator according to an embodiment of the present application.
By way of example, in FIG. 4A, the logic circuitry in the write channel includes an parsing module, a computing module, and a packing module; the resolving module is used for resolving the write command to obtain the data of the indicated address index 1 and the L2P table entry 120 in response to receiving the write command A1, and storing the address index 1 and the data of the L2P table entry 120 in a cache; the calculation module is coupled with the cache and used for determining a memory address 1 for storing the effective data of the L2P table entry 120 in the memory and a position 1 of a first bit of the effective data in the memory according to the address index 1 and the effective data bit number; storing memory address 1 and a first bit of valid data in a cache at location 1 in memory; and storing the mapping relation between the identification information of the write command A1 and the memory address 1 and the position 1 in a cache.
The parsing module receives the write command A2 again, and obtains the address index 2 and the L2P table entry 121 from the write command A2, and stores the address index 2 and the L2P table entry 121 in the cache, regardless of whether the writing of the valid data of the L2P table entry 120 into the memory is completed; the calculation module determines one or more storage addresses 2 and the position 2 of the first bit in the effective data of the L2P table entry 121 in the memory according to the index address 2 and the bit number of the effective data of the L2P table entry 121; storing the memory address 2 and the position 2 in a cache and storing the mapping relation between the identification information of the write command A2 and the memory address 2 and the position 2 in the cache; and caching the valid data of the L2P table entry 2 into a cache;
And the packing module is coupled with the cache and is used for writing the valid data of the L2P table entry 120 and/or the valid data of the L2P table entry 121 into the memory from the cache, wherein the address of the valid data of the L2P table entry 120 in the memory corresponds to the memory address 1 and the position 1, and the address of the valid data of the L2P table entry 121 in the memory corresponds to the memory address 2 and the position 2. For example, if the memory address 1 is 0 and the location is the 10 th bit, the memory stores the valid data of the L2P table entry 120 in the memory location from 0 bytes to 7 th bytes (0 to 63 bits), and the first bit of the valid data of the L2P table entry 120 is located in the 10 th bit of the memory location.
Also for example, in fig. 4A, the logic circuit in the write channel includes a merging unit in addition to the parsing module, the calculating module, and the packing module. For example, after receiving a write command and resolving the write command to obtain an L2P table entry, the merging unit extracts valid data from the L2P table entry and stores the valid data in the cache. For another example, for the accelerator to perform a write operation, the valid data of the plurality of L2P table entries is written into the memory, the merging unit may extract the valid data from each L2P table entry, and splice the valid data of the plurality of L2P table entries according to the position of the first bit in the valid data of each L2P table entry in the memory to obtain one or more parts of spliced data, and store the spliced data into the cache, so that the accelerator writes the one or more parts of spliced data into the memory.
As another example, after waiting to receive the L2P table entries indicated by the plurality of write commands, the accelerator splices the valid data of the plurality of L2P table entries to obtain one or more parts of spliced data, and writes the one or more parts of spliced data into the memory by executing a write operation. In the process, in the process of writing one or more spliced data into a memory, a read-write operation is needed, a read channel generates a new one or more read commands according to the memory address of the spliced data, and a merging unit updates the mapping relation between the identification of each write command and the corresponding read command identification according to the identification of the one or more read commands, so that the updated mapping relation contains all the identifications of the new one or more read commands.
Fig. 4B shows a schematic structural diagram of another accelerator according to an embodiment of the application.
As an example, in fig. 4B, the plurality of caches includes a first cache, a second cache, a third cache, a fourth cache, a fifth cache, a sixth cache, a seventh cache, an eighth cache, a ninth cache, and a tenth cache; the first cache is used for caching index addresses; the second cache is used for caching L2P table entries indicated by the write command; the third buffer is used for buffering a memory address corresponding to the L2P table entry and the position of the effective data first bit of the L2P table entry in the memory; the fourth cache is used for caching the mapping relation between the identification information of the write command and the memory address and the position of the first bit of the valid data of the L2P table entry in the memory; the fifth cache user caches valid data byte alignment information; the sixth cache is coupled with the second cache, and caches valid data of the L2P table entry indicated by the write command; the seventh cache is used for caching the identification information of the write command and the identification information of one or more read commands corresponding to the identification information; the eighth cache is used for caching protocol information; the ninth cache is coupled with the sixth cache and is used for caching the valid data of the L2P table entry; the tenth buffer is coupled to the logic circuit and configured to buffer response data of the read command sent by the read channel.
By way of further example, the accelerator may store valid data for one or more of the individual L2P table entries in the sixth cache upon receiving a write command, i.e., performing a write operation. When the accelerator executes a write operation on the received write commands, the sixth buffer may store valid data of a plurality of L2P table entries alone, or may store one or more spliced data after the valid data of the plurality of L2P table entries are spliced, where "alone" means that the valid data of the plurality of L2P table entries is not spliced with respect to the spliced data. Also by way of example, to store one or more spliced portions of data in a sixth cache, the sixth cache includes one or more storage locations.
The size of the storage unit in the sixth cache is the same as the size of the storage unit in the memory; in addition, the position of the spliced data stored in the storage unit of the sixth buffer is the same as the position of the valid data of each L2P table entry stored in the corresponding storage unit in the memory, for example, the storage position of the valid data of the L2P table entry 120 in the memory is 0 th bit to 29 th bit of a certain storage unit, the valid data of the L2P table entry 121 is stored in the 30 th bit to 59 th bit of the storage unit, the valid data of the L2P table entry 120 in the sixth buffer is stored in the 0 th bit to 29 th bit of the first storage unit, and the valid data of the L2P table entry 121 is stored in the 30 th bit to 59 th bit of the first storage unit.
Also by way of example, in fig. 4B, the plurality of caches may include a first cache, a second cache, a fourth cache, a sixth cache, and a seventh cache; the first cache is used for caching index addresses; the second cache is used for caching L2P table entries indicated by the write command; the fourth cache is used for caching the mapping relation between the identification information of the write command and the memory address and the position of the effective data first bit of the L2P table entry in the memory; the sixth cache is coupled with the second cache, and caches valid data of the L2P table entry indicated by the write command; the seventh buffer is used for buffering the identification information of the write command and the identification information of one or more read commands corresponding to the identification information. While valid data byte alignment information, protocol information, etc. are known in advance by the accelerator.
Still by way of example, in FIG. 4B, the ninth cache includes a plurality of memory locations. The size of the memory cell of the ninth cache is, for example, the same as the size of the memory cell of the sixth cache. For valid data of an L2P table entry that needs to be read-rewritten, it is placed in the ninth cache, and data read out from the memory by the read-rewrite operation is waited for. Valid data of the L2P table entry that does not need to be read and rewritten can continue to be written into the memory after entering the ninth cache.
The operation of the accelerator of fig. 4B will be briefly described below for the sake of easy understanding.
In FIG. 4B, the master sends a write command A1 to the accelerator; the accelerator receives the write command A1, and an analysis module in a logic circuit of the write channel analyzes the write command A1 to obtain data of an address index 1 and an L2P table entry 120, and stores the address index 1 into a first cache; and storing the data of the L2P table entry 120 in a second cache; then, a computing module in the logic circuit is coupled with the first cache, acquires an address index 1 from the first cache, computes one or more memory addresses 1 in the memory for storing the valid data of the L2P table entry 120 and a position 1 of a first bit of the valid data in the memory according to the address index 1 and the valid data bit number, and stores the one or more memory addresses 1 and the position 1 of the first bit of the valid data in the memory into a third cache; and storing the mapping relationship between the identification information of the write command A1 and the memory address 1 and the location 1 in the fourth cache. The merging unit obtains the valid data of the L2P table entry 120 from the second cache and stores the valid data in the sixth cache.
The main equipment sends a writing command A2 to the accelerator; the accelerator receives a write command A2, an analysis module in a logic circuit of the write channel analyzes the write command A2 to obtain data of an address index 2 and an L2P table entry 121, and the address index 2 is stored in a first cache; and storing the data of the L2P table entry 121 in a second cache; then, a computing module in the logic circuit is coupled with the first cache, acquires an address index 2 from the first cache, calculates one or more memory addresses 2 in the memory for storing the valid data of the L2P table entry 121 and a position 2 of a first bit of the valid data in the memory according to the address index 2 and the valid data bit number, and stores the one or more memory addresses 2 and the position 2 of the first bit of the valid data in the memory into a third cache; and storing the mapping relation between the identification information of the write command A2 and the memory address 2 and the location 2 in a fourth cache.
Further, the merging unit moves the valid data of the L2P table entry 120 to the ninth cache in response to the L2P table entry 121 in the second cache being unable to splice with the L2P table entry 120 stored in the sixth cache; in response to the L2P table entry 121 in the second cache being able to be spliced with the L2P table entry 120 stored in the sixth cache, writing the valid data of the L2P table entry 121 into the sixth cache, and moving the spliced valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 from the sixth cache to the ninth cache; alternatively, the merging unit writes the valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 from the ninth cache into the memory in response to moving the valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 to the ninth cache, and in response to the write command A1 and the write command A2 not having corresponding read commands.
When the read-write operation is not required to be executed in the processing process of the write command A1 and/or the write command A2, the packaging module acquires the effective data or the spliced data from the ninth cache and the protocol information from the eighth cache, then generates data by the effective data or the spliced data and the protocol information, sends the data to the memory controller, and then the memory controller sends the data to the memory. In addition, after the accelerator sends the data to the memory controller, the accelerator sends a feedback message to the host device for indicating that the write command processing is complete. For example, the packaging module may further add a tag to the data after generating the data to obtain processed data, and send the processed data to the memory, where the tag is used to identify a position of a last bit in the processed data in the data. In addition, the computing module stores the identification information of the write command A1 and information of the one or more read commands not generated in the seventh cache; and/or the computing module stores the identification information of the write command A2 and the information of the one or more read commands not generated in the seventh cache.
Further, when the read-write operation is required to be executed in the processing process of the write command A1 and/or the write command A2, the read channel triggers the command generating module in the read channel to generate one or more read commands according to the valid data byte alignment information stored in the fifth cache, the identification of the write command A1 and the identification of the one or more read commands corresponding to the identification are recorded in the seventh cache, and/or the identification of the write command A2 and the identification of the one or more read commands corresponding to the identification are recorded in the seventh cache, and then the read channel sends the one or more read commands to the memory controller; the memory controller sends one or more read commands to the memory; the memory then feeds back response data according to each read command, the memory controller sends the response data to the accelerator, and the write channel stores the response data in the tenth buffer. In the process of packaging the data, the packaging module is required to obtain response data from the tenth buffer memory in addition to obtaining effective data or spliced data from the ninth buffer memory and obtaining protocol information from the eighth buffer memory, then the packaging module combines the obtained effective data or spliced data with the response data to obtain combined data, generates the data according to the combined data and the protocol information, and sends the data to the memory controller, and the memory controller sends the data to the memory.
Fig. 5A illustrates a schematic diagram of splicing valid data of a plurality of L2P table entries according to an embodiment of the present application.
As an example, in fig. 5A, the size of the sixth buffer is 128 bits, including two memory units, namely, memory unit 1 and memory unit 2, and the sizes of the memory unit 1 and the memory unit 2 are 64 bits, wherein the range of the memory unit 1 is 0 to 63 bits, and the range of the memory unit 2 is 64 to 127 bits. After waiting to receive the data of the L2P table entry 120 and the L2P table entry 121, the accelerator executes a write operation, and writes the valid data of the L2P table entry 120 and the L2P table entry 121 into the memory, where the L2P table entry 120 and the L2P table entry 121 are adjacent to each other in the L2P table and are located before each other, and the sizes of the valid data are all 64 bits, and the sizes of the valid data are all M bits, where M is greater than or equal to 1 and less than or equal to 64. If the storage units corresponding to the L2P table entry 120 and the L2P table entry 121 are the same storage unit in the memory, the first bit of the valid data of the L2P table entry 120 is at the 0 th bit of the storage unit, and the first bit of the valid data of the L2P table entry 121 is at the M th bit of the storage unit. If the accelerator receives the L2P table entry 120 before the L2P table entry 121, the merging unit extracts the valid data of the L2P table entry 120 from the second cache, then stores the data in the storage unit 1 from the 0 th bit to the M-1 th bit, and after extracting the valid data of the L2P table entry 121, stores the data in the storage unit 1 from the M-th bit to the 2M-1 th bit to obtain a piece of spliced data, wherein the size of the spliced data is 2M bits, and the position in the storage unit 1 is from the 0 th bit to the 2M-1 th bit. It should be understood that, when the memory units corresponding to the L2P table entry 120 and the L2P table entry 121 are the same memory unit in the memory, the first bit of the valid data of the L2P table entry 120 may not be located in the 0 th bit of the memory unit, but may be any bit L of the memory unit, where L is 1+.ltoreq.l+.63-2M, which is not described herein.
Fig. 5B is a schematic diagram illustrating another embodiment of splicing valid data of multiple L2P table entries.
For example, in FIG. 5B, L2P table entry 120 is adjacent to and before L2P table entry 121 in the L2P table, and is 64 bits in size, and the valid data is M bits in size, where 1.ltoreq.M.ltoreq.64. If the storage units corresponding to the L2P table entry 120 and the L2P table entry 121 are different storage units in the memory, the first bit of the valid data of the L2P table entry 120 is located in the 63 st-M th bit of one storage unit in the memory, and the first bit of the valid data of the L2P table entry 121 is located in the 0 th bit of the other storage unit in the memory, where the storage unit storing the valid data of the L2P table entry 120 in the memory is adjacent to the storage unit storing the valid data of the L2P table entry 121. If the accelerator receives the L2P table entry 120 before the L2P table entry 121, the merging unit extracts the valid data of the L2P table entry 120 from the second cache, then stores the data in the storage unit 1 from the 63 st-M bit to the 63 st bit, and stores the data in the storage unit 2 from the 0 th bit to the M-1 st bit after extracting the valid data of the L2P table entry 121 to obtain two spliced data, wherein the size of one spliced data stored in the storage unit 1 is M, and the position in the storage unit 1 is from the 63 st-M bit to the 63 st bit; the size of one piece of spliced data stored in the storage unit 2 is M, and the positions thereof in the storage unit 2 are from the 0 th bit to the M-1 th bit.
Fig. 5C is a schematic diagram illustrating another embodiment of the present application for splicing valid data of multiple L2P table entries.
By way of example, in FIG. 5C, the L2P table entry 120 is partially identical to the memory location in the memory corresponding to the L2P table entry 121, e.g., a portion of the valid data of the L2P table entry 120 is stored in one memory location of the memory, and another portion of the valid data of the L2P table entry 120 is stored in another memory location of the memory with the valid data of the L2P table entry 121, e.g., the first Q bits of the valid data of the L2P table entry 120 are stored in the last memory location, and M-Q bits are stored in the same memory location of the memory as the valid data of the L2P table entry 121, where 1.ltoreq.Q.ltoreq.M. In this case, the merging unit obtains two pieces of spliced data when the valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 are spliced, wherein one piece of spliced data is stored in the storage unit 1 of the sixth buffer, which is the first Q bits in the valid data of the L2P table entry 120, and the other piece of spliced valid data is stored in the storage unit 2 of the sixth buffer, which is the data after the last M-Q bits of the L2P table entry 120 are spliced with the valid data of the L2P table entry 121.
Fig. 5D is a schematic diagram illustrating another embodiment of splicing valid data of multiple L2P table entries.
For example, in FIG. 5D, L2P table entry 120 is adjacent to and before L2P table entry 121 in the L2P table, and is 64 bits in size, and the valid data is M bits in size, where 1.ltoreq.M.ltoreq.64. If the storage units in the memories corresponding to the L2P table entry 120 and the L2P table entry 121 are partially identical, for example, a part of the valid data of the L2P table entry 120 is stored in the same storage unit as the valid data of the L2P table entry 121, and another part of the valid data of the L2P table entry 120 is stored in the last storage unit (see fig. 5C in particular) of the storage unit, for example, Q bits in the valid data of the L2P table entry 120 are stored in the last storage unit, and M-Q bits are stored in the same storage unit as the valid data of the L2P table entry 121, where 1.ltoreq.q.ltoreq.m. If the accelerator receives the L2P table entry 121 before the L2P table entry 120, the merging unit extracts the valid data of the L2P table entry 121 from the second buffer, and then stores the data in the storage unit 1 from the M-Q-1 bit to the 2M-Q-1 bit, and since the L2P table entry 120 is located before the L2P table entry 121 in the L2P table, the merging unit needs to place the valid data of the L2P table entry 120 in the sixth buffer before the valid data of the L2P table entry 121, and the idle/invalid bit before the valid data of the L2P table entry 121 in the storage unit 1 is M-Q bits, and the valid data of the L2P table entry 120 is M bits, the merging unit can splice the last M-Q bits in the valid data of the L2P table entry 120 with the last M2P bits in the sixth buffer, and splice the last M-Q bits in the valid data of the L2P table entry 120 to the first buffer is M-Q bits, and splice the last 2P bits in the sixth buffer is 0, and splice the last 2 bits in the L2P table entry 120 to the last splice the last bit to the M2 bits to the last 2 bits.
Optionally, for the first Q bits in the valid data of the L2P table entry 120, after the concatenated data in the storage unit 1 is moved to the ninth cache, the storage unit 1 is emptied, and the first Q bits in the valid data of the L2P table entry 120 are accommodated with the clear storage unit 1.
Fig. 5E is a schematic diagram illustrating another embodiment of the present application for splicing valid data of multiple L2P table entries.
For example, in FIG. 5E, the L2P table entry 120 and the L2P table entry 121 are each 64 bits in size, and the valid data is M bits in size, where 1.ltoreq.M.ltoreq.64. The storage unit in the memory corresponding to the L2P table entry 120 and the L2P table entry 121 is the same, and the first bit in the valid data of the L2P table entry 120 and the first bit in the valid data of the L2P table entry 121 are located in the same position in the storage unit, for example, the first bit is located in the storage unit, where 0L is equal to or less than 63-M, i.e., the L2P table entry 120 and the L2P table entry 121 are the same L2P table entry in the L2P table. If the accelerator receives the L2P table entry 121 before the L2P table entry 120, the merging unit extracts the valid data of the L2P table entry 121 from the second cache, then stores the data in the storage unit 1, and after extracting the valid data of the L2P table entry 120, the merging unit overwrites the valid data of the L2P table entry 121 with the valid data of the L2P table entry 120 at the location where the valid data of the L2P table entry 121 is stored in the storage unit 1, so as to obtain a piece of spliced data, where the spliced data is located in the storage unit 1 and has the same size as the valid data of the L2P table entry 120 or the valid data of the L2P table entry 121. Similarly, if the accelerator receives the L2P table entry 120 prior to the L2P table entry 121, the storage unit 1 overwrites the valid data of the L2P table entry 120 with the valid data of the L2P table entry 121 to obtain spliced data. I.e. the spliced data is valid data of the L2P table entry 120 or valid data of the L2P table entry 121.
Fig. 5F is a schematic diagram illustrating another embodiment of splicing valid data of multiple L2P table entries.
For example, in FIG. 5F, the L2P table entry 120 and the L2P table entry 121 are each 64 bits in size, and the effective data is M bits in size, where 1.ltoreq.M.ltoreq.64. The storage units in the memories corresponding to the L2P table entry 120 and the L2P table entry 121 are different, and the L2P table entry 120 and the L2P table entry 121 are not adjacent in the L2P table. If the accelerator receives the L2P table entry 120 prior to the L2P table entry 121, the merging unit extracts the valid data of the L2P table entry 120 from the second cache and stores the data in the storage unit 1, for example, the valid data of the L2P table entry 120 is stored from the L-th bit in the storage unit, wherein 0.ltoreq.L.ltoreq.63-M. Since the L2P table entry 121 is not adjacent to the L2P table entry 120 in the L2P table, the merging unit cannot store the valid data of the L2P table entry 121 in the sixth buffer after extracting the valid data of the L2P table entry 121, and the merging unit does not splice the valid data of the L2P table entry 120 with the valid data of the L2P table entry 121, but stores the valid data of the L2P table entry 120 in the sixth buffer and transfers the valid data of the L2P table entry 120 in the sixth buffer to the ninth buffer. The merging unit then stores the valid data of the L2P table entry 121 in the sixth buffer, and transfers the valid data of the L2P table entry 121 in the sixth buffer to the ninth buffer.
Further, after the logic circuit of the write channel in the accelerator stores the valid data of the L2P table entry or one/more spliced data in the sixth buffer, the valid data of the L2P table entry or one/more spliced data needs to be stored in the memory. As already mentioned above, since data is transferred between the accelerator and the slave device (e.g., memory controller) via the bus, the transferred data needs to satisfy a bus protocol, for example, valid data byte alignment or 8 byte alignment of the transferred L2P table entry, etc. When the valid data of the transmitted L2P table entry is not byte aligned or the first bit in the valid data of the L2P table entry is not located at the start position of the corresponding memory cell, a read-write operation needs to be performed. For ease of understanding, the following briefly describes the read-write operation procedure in connection with the above-described scenarios of fig. 5A to 5B.
For the scenario shown in fig. 5A, for example, if the effective data size M of the L2P table entry 120 and the L2P table entry 121 is not an integer multiple of bytes, e.g., m=30, then the effective data of the L2P table entry 120 and the L2P table entry 121 are aligned in a non-byte manner, and a read-write operation needs to be performed. At this time, the command generating module in the write channel generates a read command according to the addresses of the storage units in the memories corresponding to the L2P table entry 120 and the L2P table entry 121, and sends the read command to the memory, and the memory sends the storage unit data (response data) in the memories corresponding to the L2P table entry 120 and the L2P table entry 121 to the logic circuit according to the read command, and the logic circuit combines the spliced data in the sixth cached storage unit 1 with the response data to obtain combined data (see fig. 6A), and sends the combined data to the slave device (e.g., the memory controller). As another example, if the valid data size M of the L2P table entry 120 and the L2P table entry 121 is an integer multiple of bytes, for example, m=24, but the first bit of the valid data of the L2P table entry 120 is not located at the 0 th bit of the storage unit, for example, the first bit of the valid data of the L2P table entry 120 is located at the 2 nd bit of the storage unit, the first bit of the valid data of the L2P table entry 120 is not located at the start position of the corresponding storage unit, and in this case, a read/write operation is also needed, which is similar to the above, and detailed description thereof is omitted. For example, if the effective data sizes M of the L2P table entry 120 and the L2P table entry 121 are integer multiples of bytes, and the first bit of the effective data of the L2P table entry 120 is located at the 0 th bit of the memory cell, the data corresponding to the memory cell 1 is directly sent to the slave device in the ninth cache without performing the read/write operation, so that the data in the memory cell 1 is stored in the memory. For the scenario shown in fig. 5E, a portion of the spliced data is also cached in the sixth cache, and the specific process is similar to that of fig. 5A, which is not described herein.
For the scenario shown in fig. 5B, as an example, two spliced data are stored in the sixth buffer, where at least one spliced data in the storage unit 1 and at least one spliced data in the storage unit 2 are aligned in a non-byte manner or the first bit of the valid data is not located at the starting position of the corresponding storage unit, and then a read-write operation is performed on the non-byte aligned valid data or the valid data where the first bit of the valid data is not located at the starting position of the corresponding storage unit, which is similar to the above description. If, in the sixth buffer, the spliced data in the storage unit 1 and the spliced data in the storage unit 2 are aligned in bytes, and the first bits of the valid data are located at the starting positions of the corresponding storage units, the read-write operation is not performed. The whole of the memory unit 1 and the memory unit 2 is moved to the ninth buffer, and then the logic circuit transmits the whole data of the corresponding memory unit 1 and the memory unit 2 to the slave (memory controller), see fig. 6B in particular. For the scenario shown in fig. 5C, two pieces of spliced data are also cached in the sixth cache, and the specific process is similar to that of fig. 5B, which is not described herein.
For example, for the scenario shown in fig. 5D, when the valid data of the L2P table entry 120 and the L2P table entry 121 are stored in the sixth buffer, since the first Q bits in the valid data of the L2P table entry 121 cannot be stored in the sixth buffer together with the valid data of the L2P table entry 120, the merging unit may splice the last M-Q bits in the valid data of the L2P table entry 120 and the valid data of the L2P table entry 121 to obtain a spliced data, and obtain a spliced data in the sixth buffer, where the spliced data has a size of 2M-Q bits, and the positions in the storage unit 1 are from 0 th bit to 2M-Q-1 bit. The merging unit further sends a part of spliced data stored in the sixth buffer to a ninth buffer, and the ninth buffer sends the spliced data to the logic circuit, so that the logic circuit sends the spliced data to the slave device.
The merging unit also stores the first Q bits in the valid data of the L2P table entry 120 into the sixth buffer, at this time, the merging unit may check whether there is a new L2P table entry in the second buffer, and whether the new entry can be spliced with the first Q bits in the valid data of the L2P table entry 120 in the sixth buffer. The manner of stitching is similar to the various cases already described above and will not be described in detail here. If the splicing is not possible, the first Q bits in the valid data of the L2P table entry 120 in the sixth buffer are sent to the ninth buffer.
In addition, whether the process of writing the spliced data or the first Q bits in the valid data of the L2P table entry 120 into the memory needs to perform the read/write operation is similar to the scenario shown in fig. 5A, and will not be described herein.
For example, for the scenario shown in fig. 5F, since the merging unit does not splice the valid data of the L2P table entry 120 with the valid data of the L2P table entry 121, the valid data of the L2P table entry 120 is stored in the sixth cache, and the valid data of the L2P table entry 120 in the sixth cache is transferred to the ninth cache. The merging unit stores the valid data of the L2P table entry 121 in the sixth buffer, and transfers the valid data of the L2P table entry 121 in the sixth buffer to the ninth buffer.
In addition, whether the process of writing the valid data of the L2P table entry 120 or the valid data of the L2P table entry 121 into the memory needs to perform the read/write operation is similar to the scenario shown in fig. 5A, and will not be described here.
Further, by way of example, valid data of the L2P table entry is obtained from the second cache and stored in the sixth cache; deleting the data of the L2P table entry from the second cache.
Also for example, in response to combining the valid data of the L2P table entry in the ninth cache or one or more spliced data with a portion of the response data in the eighth cache to obtain first data, generating second data according to the protocol information in the fifth cache and the first data, and sending the second data to the memory, deleting the one or more memory addresses and the first bit of the valid data from the third cache in the memory, or deleting the valid data of the L2P table entry from the sixth cache.
Also by way of example, in response to determining one or more memory addresses in the memory storing valid data of the L2P table entry and a location in memory of a first bit of the valid data, an address index of the write command is deleted from the first cache.
As another example, in response to the read channel generating one or more read commands according to the one or more memory addresses, the identification information of the read commands is cached in the third cache, the fourth cache or the fifth cache, so that when response data fed back based on each read command is received by the logic circuit, valid data of the L2P table entry combined with the response data is determined according to the identification information.
For example, the accelerator receives 4 write commands sent by the host, write command A1, write command A2, write command A3, and write command A4, respectively. The parallel processing mechanism of the accelerator processing the write commands A1, A2, A3, and A4 under the multiple caches will be described below with reference to fig. 7.
FIG. 7 is a schematic diagram of processing multiple write commands under multiple caches according to an embodiment of the present application.
In fig. 7, T0 to T10 represent a plurality of time periods that are continuous in time, and the contents below each time period represent operations performed by the write channel in the accelerator during that time period. The second buffer, the third buffer, the sixth buffer, the seventh buffer, the ninth buffer and the tenth buffer are shown in fig. 7; the second buffer is coupled to the sixth buffer, and is used for buffering valid data indicated by a write command. As an example, the second buffer includes a single storage unit for buffering data indicated by one write command, and the third buffer, the sixth buffer, the seventh buffer, the ninth buffer, and the tenth buffer may include a plurality of storage units, where different storage units are used for storing data corresponding to different write commands.
In the T0 period, the logic circuit receives a write command A1 (LBA 1, data1, id 1), wherein the write command A1 indicates address index LBA1, L2P table entry data1 and identification information id1, and parses the write command A1 to obtain address indexes LBA1, data1 and id1. The data1 of the received write command A1 is stored in the second buffer.
In the T1 period (the T1 period is a period after the T0 period), the logic circuit extracts valid data in data1 from the second cache, stores the valid data in the sixth cache, determines a position of one or more memory addresses 1 and a first bit of valid data of data1 according to the address index LBA1 indicated by the write command A1 and a bit number of the valid data, and stores a mapping relationship between the address index LBA1 and its corresponding memory address 1 in the third cache, for example, < LBA1, memory address 1>. In addition, after determining the memory address corresponding to the write command A1 and the position of the valid data of the data1 stored in the memory, recognizing that the valid data of the data1 is not byte aligned or byte aligned but the first bit of the valid data is not located at the starting position of the storage unit in the memory, the read channel generates a read command B11 and a read command B12 according to the memory address 1 corresponding to the write command A1, wherein the identification information of the read command B11 is id1_1, and the identification information of the read command B12 is id1_2; the logic circuit stores the mapping relationship between the identification information of the write command A1 and the identification information of its corresponding read commands B11 and B12 in the seventh buffer, for example, < id1→id1_1, id1_2>. In addition, during the period T1, the logic circuit also receives a write command A2 (LBA 2, data2, id 2), wherein the write command A2 indicates the address index LBA2, the L2P table entry data2 and the identification information id2, parses the write command A2 to obtain the address index LBA2, data2 and id2, and stores the data2 in the second cache.
It will be appreciated that FIG. 7 is illustrative, and that moving valid data for data1 from the second cache to the sixth cache may occur immediately after data1 is added to the second cache, without waiting for the write command A2 to be received before and without having to occur concurrently with the receipt of the write command A2. The valid data of data1 is moved from the second cache to the sixth cache as early as possible so that the second cache becomes free to receive the write command A2. The sixth buffer is also used for processing valid data merging of a plurality of write commands. In the case that there is no valid data to be merged, the data of the sixth buffer is moved to the ninth buffer as early as possible. And when the effective data to be combined is available, the data is moved to the ninth cache after being combined in the sixth cache.
With continued reference to fig. 7, in the T2 period (the T2 period is a period after the T1 period), since the address indexes indicated by the write command A1 and the write command A2 are adjacent, the data indicated by the write command A1 and the write command A2 may be spliced/merged in the L2P table, and after the logic circuit obtains the data2, the valid data of the data2 and the valid data of the data1 in the sixth buffer memory are spliced to obtain the spliced valid data (the merging is completed by the sixth buffer memory); and updating the mapping relation in the seventh cache, for example, the updated mapping relation is < id1→id1_1, id1_2>, < id2→id2_1, id2_2>.
And the logic circuit determines the position of the first bit of the effective data of one or more memory addresses 2 and data2 according to the address index LBA2 indicated by the write command A2 and the bit of the effective data, and stores the mapping relation between the address index LBA2 and the corresponding memory address 2 into a third cache. At this time, "< LBA1, memory address 1> and < LBA2, memory address 2>" corresponding to the two write commands A1 and A2 are recorded in the third buffer.
After the combination of the effective data of data1 and the effective data of data2 is completed by using the sixth cache, the effective data in the sixth cache is moved to the ninth cache. The ninth cache includes a plurality of memory locations capable of holding a plurality of copies of data from the sixth cache. When there is a read/write operation, the effective data waits for the data read from the memory in the ninth buffer.
Although the merged valid data in the sixth buffer is shown in fig. 7 for the period T2 and the write command A3 is received for the period T2, the merging of valid data is independent of the reception of the write command A3 and does not have a dependency on time. The transfer of the uncombined or combined valid data to the ninth buffer is also independent of the receipt of the write command A3.
In addition, after determining the memory address corresponding to the write command A2 and the position of the valid data of the data2 stored in the memory, recognizing that the valid data of the data2 is not byte aligned or byte aligned but the first bit of the valid data is not located at the starting position of the storage unit in the memory, the read channel generates a read command B21 and a read command B22 according to the memory address 2 corresponding to the write command A2, wherein the identification information of the read command B21 is id2_1, and the identification information of the read command B12 is id2_2; the logic circuit stores the mapping relationship between the identification information of the write command A2 and the identification information of the corresponding read commands B21 and B22 in the seventh buffer. At this time, the seventh buffer records the mapping relationship between the identification information corresponding to each of the two write commands A1 and A2 and the identification information corresponding to the read command. In addition, during the period T2, the logic circuit further receives a write command A3 (LBA 4, data4, id 4), wherein the write command A3 indicates the address index LBA4, the L2P table entry data4, and the identification information id4, parses the write command A3 to obtain the address indexes LBA4, data4, and id4, and stores the data4 in the second cache.
With continued reference to FIG. 7, during the time period T3 (time period T3 is a time period after time period T2), at this time, the processing of the read commands B11 and B12 has not been completed, and it is also checked that there is a conflict between the read commands B21 and B22, i.e., the read commands B21 and B22 have respectively accessed the same addresses as the read commands B11 and B12. Since the data read back by the read commands B11 and B12 will include the data required by the read commands B21 and B22, the processing of the read commands B21 and B22 is terminated, and the mapping relationship of id2 recorded in the seventh buffer is modified to < id2, id1_1, id1_2>.
In addition, the combined valid data of data1 and data2 in the sixth cache is moved to the ninth cache (this operation may be completed before the T3 period), after which the sixth cache becomes available, and the logic circuit reads the valid data of data4 from the second cache and moves to the sixth cache. In addition, the logic circuit further determines the position of the first bit of the valid data of one or more memory addresses 3 and data4 stored in the memory according to the address index LBA4 indicated by the write command A3 and the bit of the valid data, and stores the mapping relationship between the address index LBA4 and the corresponding memory address 3 in the third cache. At this time, the third buffer records "< LBA1, memory address 1>, < LBA2, memory address 2> and < LBA4, memory address 3>" of the three write commands A1, A2 and A3. In addition, after determining the memory address corresponding to the write command A3 and the position of the valid data of the data4 stored in the memory, recognizing that the valid data of the data4 is aligned in bytes and the first bit of the valid data is located at the starting position of the storage unit in the memory, the processing of the write command A3 by the accelerator will not trigger the read-write operation, i.e. will not trigger the read channel to generate one or more read commands according to the memory address 3 corresponding to the write command A3. In this case, the seventh buffer records the mapping relationship between the identification information of each of the two write commands A1 and A2 and the identification information of the corresponding read command, and the relationship between the identification information of the write command A3 and the information of the non-generated read command, < id 1- > id1_1, id1_2>, < id 2- > id1_1, id1_2> and < id 4- > none >.
In the period of time T4 (the period of time T4 is the period of time after the period of time T3), a new write command is not received, and then the next command can be continuously waited, if the data indicated by the next command can be spliced with the valid data of data4 in the sixth buffer, splicing the data in the sixth buffer, and moving the spliced data to the ninth buffer; the logic circuit may move the valid data of data4 in the sixth buffer to the ninth buffer without waiting for the next command, and at this time, the combined valid data of data1 and data2 and the valid data of data4 are stored in different storage units of the ninth buffer.
In the period T5 (the period T5 is a period after the period T4), since the read/write operation is not triggered by the processing of the write command A3, after the valid data of the data4 is moved to the ninth buffer, the logic circuit can directly generate a data according to the valid data of the data4 in the ninth buffer and the protocol information (for example, AXI protocol information), and send the data to the memory, so that the valid data of the data4 in the data is stored in the memory.
In addition, the logic circuit may delete the mapping relationship between the address index indicated by the write command A3 stored in the third buffer and the memory address 3 in response to transmitting the valid data of the data4 to the memory, and at this time "< LBA1, memory address 1> and < LBA2, and memory address 2>" corresponding to the two write commands A1 and A2 are recorded in the third buffer; deleting the relation between the identification information of the write command A3 and the information of the read command which is not generated in the seventh cache, wherein the seventh cache records the mapping relation < id 1- > id1_1, id1_2>, < id 2- > id1_1, id1_2> of the identification information of each of the two write commands A1 and A2 and the identification information of the corresponding read command; and deleting the valid data of the data4 stored in the ninth cache.
In addition, the logic circuit also receives a write command A4 (LBA 10, data10, id 10) within the T5 period, wherein the write command A4 indicates the address index LBA10, the L2P table entry data10, and the identification information id10. The logic circuit analyzes the write command A4 to obtain address index LBA10, data10 and id10, stores the data10 in the second cache, and determines the position of the first bit of the valid data of one or more memory addresses 4 and data10 stored in the memory according to the bit number of the address index LBA10 and the valid data indicated by the write command A4. After determining the memory address 4 corresponding to the write command A4 and the position of the valid data of the data10 stored in the memory, recognizing that the valid data of the data10 is not aligned in bytes or aligned in bytes but the first bit of the valid data is not located at the starting position of the storage unit in the memory, then the read-write operation needs to be executed in the process of processing the write command A4. At this time, the processing of the read commands B11 and B12 is not completed yet, and it is also checked that the write command A4 collides with the read commands B11 and/or B12, that is, one or more read commands corresponding to the read-write operation performed during the processing of the write command A4 have the same addresses as the read commands B11 and/or B12, so that the processing of the write command A4 is suspended, on the one hand, the read channel is not triggered to generate one or more read commands according to the memory address 4, or the one or more read commands are not sent to the memory after being generated; on the other hand, the mapping relationship between the address index LBA10 and its corresponding memory address 4 is not stored in the third cache. And/or the mapping relation between the identification information of the write command A4 and the information of the corresponding read command is not stored in the seventh buffer memory. At this time, only "< LBA1, memory address 1> and < LBA2, memory address 2>" of the two write commands A1 and A2 are recorded in the third buffer; and mapping relations < id 1- > id1_1, id1_2>, < id 2- > id1_1, id1_2>, < id1_2> of the identification information of each of the two write commands A1 and A2 and the identification information of the corresponding read command are recorded in the seventh buffer.
With continued reference to fig. 7, during the period T6 (period T6 is a period after period T5), the logic circuit does not receive the data read back by the read commands B11 and B12, nor receives a new write command, so the logic circuit needs to wait for receiving the data read back by the read commands B11 and B12. Additionally, it should be appreciated that there is no time or logical relationship between waiting to receive data read back by read commands B11 and B12 and receiving a new write command during the T6 period, and waiting to receive data read back by read commands B11 and B12 is independent of receiving a new write command.
Further, in the period T7 (the period T7 is a period after the period T6), the logic circuit receives the data read back by the read commands B11 and B12, and stores the data read back by the read commands B11 and B12 to the tenth buffer. In response to storing the response data of the read commands B11 (id1_1) and B12 (id1_2) to the tenth cache, identifying the response data of the read commands B11 (id1_1) and B12 (id1_2) simultaneously applicable to the valid data of the write commands A1 and A2 from the mapping relations < id1→id1_1, id1_2> and < id2→id1_1, id1_2> recorded in the seventh cache, thereby acquiring all the response data from the tenth cache and the valid data of the merged data1 and data2 from the ninth cache, combining the valid data of the merged data1 and data2 with part of all the response data, and generating a data containing the protocol information and the valid data of the merged data1 and data2 from the combined data and the protocol information to the memory so that the valid data of the merged data1 and data2 are stored in the memory, wherein the valid data of data1 in the memory store the valid data1 and the address 2 corresponding to the memory and the address 2 thereof are stored in the memory.
With continued reference to fig. 7, during a T8 period (the T8 period is a period after the T7 period), the logic circuit may delete the mapping relationship between the address index indicated by the write commands A1 and A2 stored in the third cache and the memory address in response to storing the valid data of the combined data1 and data2 into the memory; deleting the relation between the identification information of each of the write commands A1 and A2 in the seventh cache and the information of the corresponding read command; and deleting the valid data of the data1 and the data2 combined in the ninth cache.
At this time, the accelerator processes the write commands A1 and A2, resumes processing the write command A4, acquires the valid data in the data10 from the second cache, and stores the valid data of the data10 in the sixth cache; the read channel generates a read command B41 according to the memory address 4 corresponding to the write command A4, and sends the read command B41 to the memory, wherein the identification information of the read command B41 is id4_1, and the logic circuit stores the mapping relationship between the identification information of the write command A4 and the identification information of the read command B41 corresponding to the write command A4 in the seventh cache. At this time, the mapping relation < id 4- > id4_1> of the identification information of the corresponding write command A4 and the identification information of the corresponding read command is recorded in the seventh buffer; and storing the mapping relation < LAB 10- > memory address 4> between the address index LBA10 and the corresponding memory address 4 in the third buffer memory.
In the T8 period, it is shown that the processing of the write commands A1 and A2 is completed and the processing of the write command A4 is resumed, and there is a temporal relationship between the completion of the processing of the write commands A1 and A2 and the processing of the write command A4, and the processing of the write command A4 is resumed after the processing of the write commands A1 and A2 is completed.
In the period of T9 (the period of T9 is a period of time after the period of T8), the logic circuit moves the valid data in the data10 in the sixth buffer to the ninth buffer, receives the read command response identified as id10_1 fed back by the memory, and stores the read command response identified as id10_1 in the tenth buffer, at this time, the logic circuit receives all the response data corresponding to the write command A4, combines the valid data in the data10 with a part of the response data to obtain combined data, and sends the combined data to the memory for storage, thereby completing the processing of the write command A4.
It should be noted that, for the sake of simplicity, the present application represents some methods and embodiments thereof as a series of acts and combinations thereof, but it will be understood by those skilled in the art that the aspects of the present application are not limited by the order of acts described. Thus, those skilled in the art will appreciate, in light of the present disclosure or teachings, that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described herein may be considered as alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or all aspects of the present application. In addition, the description of some embodiments of the present application is also focused on according to the different schemes. In view of this, those skilled in the art will appreciate that portions of one embodiment of the application that are not described in detail may be referred to in connection with other embodiments.
In particular implementations, based on the disclosure and teachings of the present application, those skilled in the art will appreciate that several embodiments of the disclosed application may be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are split based on consideration of the logic function, and another splitting manner may be implemented in practice. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustical, magnetic, or other forms of signal transmission.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. An accelerator for processing write commands for coupling a host device to a memory and accelerating the storage of valid data of L2P table entries indicated by write commands sent by the host device into an L2P table of the memory, comprising: a write channel, wherein the write channel comprises a logic circuit and a plurality of caches;
the logic circuit responds to a first write command sent by the main equipment, acquires a first address index and a first L2P table entry from the first write command, and stores data of the first address index and the first L2P table entry into a cache; determining one or more first memory addresses and a first position of a first bit in the effective data of the first L2P table entry in a memory according to the first address index and the effective data bit number of the first L2P table entry, and storing the mapping relation between the identification information of the first write command and the first memory addresses and the first position in a cache; storing valid data of the first L2P table entry into a cache;
the logic circuit is used for responding to the received second write command, whether the operation of writing the valid data of the first L2P table entry into the memory is completed or not, acquiring a second address index and a second L2P table entry from the second write command, and determining one or more second storage addresses and a second position of a first bit in the valid data of the second L2P table entry in the memory according to the second index address and the bit number of the valid data of the second L2P table entry; storing the mapping relation between the identification information of the second write command, the second memory address and the second position into a cache; storing the valid data of the second L2P table entry into a cache;
And writing the valid data of the first L2P table entry and/or the valid data of the second L2P table entry into a memory from a cache, wherein the address of the valid data of the first L2P table entry in the memory corresponds to the first memory address and the first position, and the address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the second position.
2. The accelerator according to claim 1, further comprising: a read channel; the read channel generates one or more first read commands according to the first memory address in response to byte non-byte alignment of valid data of the first L2P table entry or byte alignment but the first position is not located at a starting position of a corresponding storage unit in the memory, stores a first mapping relationship between identification information of a first write command and identification information of the one or more first read commands corresponding to the first write command in a cache, and sends the one or more first read commands to the memory; and/or in response to byte-non-byte alignment of valid data of the second L2P table entry, or byte-aligned but the second location is not located at a starting location of its corresponding storage unit in memory, generating one or more second read commands according to the second memory address, storing a second mapping relationship between identification information of a second write command and identification information of its corresponding one or more second read commands in a cache, and sending the one or more second read commands to the memory;
The logic circuit responds to the first response data of all the first read commands fed back from the memory, and combines the valid data of the first L2P table entry and part of data in the first response data according to the first position to obtain first data; or second response data of all second read commands fed back from the memory are received, and valid data of a second L2P table entry and partial data in the second response data are combined according to the second position to obtain second data; generating third data according to the protocol information stored in the cache and the first data or the second data, and sending the third data to the memory;
the memory comprises a plurality of aligned storage units, wherein each storage unit is used for storing valid data of a plurality of entries of the L2P table; valid data of the plurality of entries of the L2P table need not be stored in the memory in byte boundary alignment.
3. The accelerator of claim 2, wherein the logic circuitry is to splice valid data of the first L2P table entry with valid data of the second L2P table entry to obtain one or more spliced data in response to the first L2P table entry and a second L2P table entry being adjacent in an L2P table of memory and to update the first mapping and the second mapping prior to issuing the one or more first read commands, wherein,
In response to obtaining one or more pieces of spliced data, causing the read channel to generate one or more third read commands to replace the one or more first read commands according to a memory address of the spliced data; the combination of the updated first mapping relation and the second mapping relation includes all the identifiers of the third read command, and the identifiers of the first mapping relation and the second mapping relation to the third read command may be the same or different.
4. The accelerator of claim 2 or 3, wherein the logic circuitry is to suspend processing of a second write command in response to recognizing that the one or more second read commands collide with a memory address indicated by one or more first read commands before generating the one or more second read commands from the second memory address after issuing the one or more first read commands.
5. The accelerator of any of claims 1-4, wherein the logic circuit comprises: the device comprises an analysis module, a calculation module and a packaging module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the parsing module is used for responding to the received first write command, parsing the first write command to obtain a first address index and a first L2P table entry, caching the first address index into a first cache of the plurality of caches and caching the first L2P table entry into a second cache of the plurality of caches;
The computing module is coupled with the first cache and is used for computing the one or more first memory addresses and the first position according to the first address index and the number of valid data bits; storing the first memory address and the first location in a third cache of the plurality of caches and storing a mapping relationship between identification information of the first write command and the first memory address and the first location in a fourth cache; and caching valid data of the first L2P table entry into a sixth cache;
the parsing module receives a second write command again, and obtains a second address index and a second L2P table entry from the second write command, and stores the second address index in the first cache and the second L2P table entry in the second cache, regardless of whether the writing of the valid data of the first L2P table entry into the memory is completed or not;
the computing module is further coupled with the fourth cache, and determines one or more second storage addresses and a second position of a first bit in the effective data of the second L2P table entry in the memory according to the second index address and the bit number of the effective data of the second L2P table entry; storing the second memory address and the second location in a third cache of the plurality of caches and storing a mapping relationship between the identification information of the second write command and the second memory address and the second location in a fourth cache; and caching valid data of the second L2P table entry into a sixth cache;
The packing module is coupled with the second buffer, the third buffer, the sixth buffer and a fifth buffer for buffering valid data byte alignment information, and writes valid data of the first L2P table entry and/or valid data of the second L2P table entry into a memory from the sixth buffer, wherein an address of the valid data of the first L2P table entry in the memory corresponds to the first memory address and the first location, and an address of the valid data of the second L2P table entry in the memory corresponds to the second memory address and the second location.
6. The accelerator of claim 5, wherein the logic circuit further comprises a merge unit; in response to obtaining one or more spliced data, wherein the first write command has one or more corresponding first read commands and/or the second write command has one or more corresponding second read commands, the merging unit further stores identification information of the first write command and identification information of the one or more first read commands in a seventh cache; and/or storing the identification information of the second write command and the identification information of one or more second read commands in a seventh cache.
7. The accelerator of any of claims 1-6, wherein the plurality of caches comprises: a first buffer, a second buffer, a third buffer, a fourth buffer, a fifth buffer, a sixth buffer, a seventh buffer, an eighth buffer, a ninth buffer, and a tenth buffer; the first cache is used for caching index addresses; the second cache is used for caching L2P table entries indicated by the write command; the third buffer is used for buffering a memory address corresponding to the L2P table entry and the position of the effective data first bit of the L2P table entry in the memory; the fourth cache is used for caching the mapping relation between the identification information of the write command and the memory address and the position of the effective data first bit of the L2P table entry in the memory; the fifth cache user caches valid data byte alignment information; the sixth cache is coupled with the second cache, and caches valid data of the L2P table entry indicated by the write command; the seventh cache is used for caching the identification information of the write command and the identification information of one or more read commands corresponding to the identification information; the eighth cache is used for caching protocol information; the ninth cache is coupled with the sixth cache and is used for caching the valid data of the L2P table entry; the tenth buffer is coupled to the logic circuit and configured to buffer response data of the read command sent by the read channel.
8. The accelerator of claim 7, wherein a first mapping relationship between identification information of a first write command and identification information of one or more first read commands is stored in the seventh cache in response to the first write command having the corresponding one or more first read commands; and/or in response to the second write command having the corresponding one or more second read commands, storing a mapping relationship between the identification information of the second write command and the identification information of the one or more second read commands in the seventh cache.
9. The accelerator of claim 8, wherein after the read channel issues the one or more first read commands, the logic circuitry is to not store identification information of the second write command and identification information of its corresponding one or more second read commands in a seventh cache and valid data of a second L2P table entry in a sixth cache to suspend processing of a second write command and also suspend processing of subsequent write commands in response to the one or more second read commands colliding with memory addresses indicated by the one or more first read commands.
10. The accelerator of claim 9, wherein the logic circuitry, in response to receiving information that the writing of valid data of the first L2P table entry to memory is complete, resumes processing of the second write command, stores identification information of the second write command and identification information of one or more second read commands corresponding thereto in a seventh cache, and stores valid data of a second L2P table entry in a sixth cache; and moving the valid data of the second L2P table entry to a ninth cache to write the valid data of the second L2P table entry into the memory.
CN202210412614.5A 2022-04-19 2022-04-19 Accelerator for processing write command Pending CN116955228A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210412614.5A CN116955228A (en) 2022-04-19 2022-04-19 Accelerator for processing write command

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210412614.5A CN116955228A (en) 2022-04-19 2022-04-19 Accelerator for processing write command

Publications (1)

Publication Number Publication Date
CN116955228A true CN116955228A (en) 2023-10-27

Family

ID=88441459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210412614.5A Pending CN116955228A (en) 2022-04-19 2022-04-19 Accelerator for processing write command

Country Status (1)

Country Link
CN (1) CN116955228A (en)

Similar Documents

Publication Publication Date Title
EP2849077B1 (en) Method for writing data into storage device and storage device
US9734085B2 (en) DMA transmission method and system thereof
US8583839B2 (en) Context processing for multiple active write commands in a media controller architecture
CN109726163B (en) SPI-based communication system, method, equipment and storage medium
US11010056B2 (en) Data operating method, device, and system
CN113032293A (en) Cache manager and control component
KR20200025184A (en) Nonvolatile memory device, data storage apparatus including the same and operating method thereof
CN112214157A (en) Executing device and method for host computer output and input command and computer readable storage medium
CN105260332A (en) Method and system for orderly storing CPLD data packets
US10853255B2 (en) Apparatus and method of optimizing memory transactions to persistent memory using an architectural data mover
CN214376421U (en) FTL accelerator and control component
CN116486868A (en) Computing high speed nonvolatile memory (NVMe) over high speed link (CXL)
CN116955228A (en) Accelerator for processing write command
CN111290974A (en) Cache elimination method for storage device and storage device
CN111290975A (en) Method for processing read command and pre-read command by using unified cache and storage device thereof
CN117009259A (en) L2P accelerator
CN113031849A (en) Direct memory access unit and control unit
KR20200143922A (en) Memory card and method for processing data using the card
CN116860664A (en) Accelerator for processing read command
CN114840447A (en) Accelerator
CN110928682A (en) Method for external equipment to access computer memory
CN116643999A (en) L2P accelerator
CN117806570B (en) Online memory expansion method, device, equipment and storage medium
CN114691550A (en) Compressed FTL meter and accelerator thereof
CN108536475A (en) Complete program command processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination