CN116594805A

CN116594805A - System, method and apparatus for copy destination atomicity in a device

Info

Publication number: CN116594805A
Application number: CN202310147534.6A
Authority: CN
Inventors: D·L·赫尔米克; R·库尔; R·莫斯; S·詹尼亚武拉文卡塔; Y·D·金
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2022-02-11
Filing date: 2023-02-09
Publication date: 2023-08-15

Abstract

A method may include: receiving, at the device, a copy command, wherein the copy command includes a first indication of a first amount of source data and a second indication of a second amount of source data; determining an amount of destination space based at least in part on the first indication; and blocking at least a portion of the amount of destination space. The method may further include reading the first indication and reading the second indication, wherein the amount of destination space may include a first number of at least a first portion and a second number of at least a second portion. Blocking may include blocking at least a first portion of the first number and at least a second portion of the second number. The method may further include storing the first indication to generate a stored first indication.

Description

System, method and apparatus for copy destination atomicity in a device

Citation of related application

The present application claims priority and benefit from U.S. provisional patent application No. 63/309,508 filed on 11, 2, 2022, which is incorporated by reference.

Technical Field

The present disclosure relates generally to storage devices, and more particularly, to systems, methods, and apparatus for copy-destination atomicity (atomicity) in a device.

Background

The storage device may perform a copy operation in which data from one or more source address ranges may be copied to a destination address range. The copy operation may be used, for example, for host-directed garbage collection operations, where data from multiple source address ranges may be moved to a destination address range.

The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art

Disclosure of Invention

A method may include: receiving, at the device, a copy command, wherein the copy command includes a first indication of a first amount of source data and a second indication of a second amount of source data; determining an amount of destination space based at least in part on the first indication; and blocking at least a portion of the amount of destination space. The method may further include reading the first indication and reading the second indication, wherein the amount of destination space may include a first number of at least a first portion and a second number of at least a second portion. Blocking may include blocking at least a first portion of the first number and at least a second portion of the second number. The method may further include storing the first indication to generate a stored first indication. The method may further comprise: reading the stored first indication; and based on reading the stored first indication, writing at least a portion of the first amount of source data to the at least a portion of the amount of destination space. The stored first indication may be stored in a first memory location, and the method may further include modifying the first memory location based on writing the at least a portion of the first amount of data. Reading the first indication may include performing a first read operation of the first indication, and the method may further include: performing a second read operation of the first indication; and writing at least a portion of the first amount of source data to the at least a portion of the amount of destination space based on the second read operation. The method may further comprise: storing the second indication to generate a stored second indication; reading the stored second indication; and based on reading the stored second indication, writing at least a portion of the second amount of source data to the at least a portion of the amount of destination space. The amount of destination space may be a first amount of destination space, and the method may further include blocking a second amount of destination space based on receiving the copy command. The second amount of destination space may be based on a copy length of the copy command. Determining the first amount of destination space may include modifying the second amount of destination space based at least in part on the first indication. The copy command may include an indicated count, and determining the first amount of destination space may be based on the indicated count. The amount of destination space may be a first amount of destination space, and the method may further include: determining a second amount of destination space based at least in part on the second indication; and blocking at least a portion of the second amount of destination space.

A system may include: a host configured to send a copy command, wherein the copy command may include a first indication of a first amount of source data and a second indication of a second amount of source data; and a device configured to receive a copy command, determine an amount of destination space based at least in part on the first indication, and block at least a portion of the amount of destination space. The device may be further configured to read the first indication from the host and the second indication from the host, wherein the amount of destination space may include a first number of at least a first portion and a second number of at least a second portion. The device may be further configured to store the first indication at the device to generate a stored first indication, perform a read operation of the stored first indication, and write at least a portion of the first amount of source data to the at least a portion of the amount of destination space based on the read operation.

A device may include a communication interface, a storage medium, and a device controller configured to: receiving a copy command using the communication interface, wherein the copy command may include a first indication of a first amount of source data in the storage medium and a second indication of a second amount of source data in the storage medium; determining an amount of destination space in the storage medium based at least in part on the first indication; and blocking at least a portion of the amount of destination space. The device controller may be configured to perform a first read operation of the first indication using the communication interface and a second read operation of the second indication using the communication interface, wherein the amount of destination space may include a first number of at least a first portion and a second number of at least a second portion. The device controller may be configured to store the first indication to generate a stored first indication, perform a read operation of the stored first indication, and write at least a portion of the first amount of source data to the at least a portion of the amount of destination space based on the read operation. The device controller may be configured to perform a third read operation of the first indication using the communication interface and write at least a portion of the first amount of source data to the at least a portion of the amount of destination space based on the third read operation.

Drawings

The figures are not necessarily to scale and elements of similar structure or function may generally be represented by like reference numerals or parts thereof throughout the figures for illustrative purposes. The drawings are only intended to facilitate the description of the various embodiments described herein. The figures do not describe every aspect of the teachings disclosed herein and do not limit the scope of the claims. To prevent the drawing from becoming obscure, not all components, connections, etc. may be shown and not all components may have reference numerals. However, the mode of the component configuration can be easily seen from the drawings. The accompanying drawings illustrate example embodiments of the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 illustrates a first embodiment of a copy operation according to an example embodiment of the present disclosure.

FIG. 2 illustrates a second embodiment of a copy operation according to an example embodiment of the present disclosure.

FIG. 3 illustrates a table of an example embodiment of replication command results according to an example embodiment of the present disclosure.

Fig. 4 illustrates an example embodiment of a command format of a copy command according to an example embodiment of the present disclosure.

Fig. 5 illustrates an example embodiment of a data format of data that may be used for a copy command according to an example embodiment of the present disclosure.

FIG. 6 illustrates an embodiment of a method of storing a copy operation of a source range entry according to an example embodiment of the present disclosure.

FIG. 7 illustrates another embodiment of a method of storing a copy operation of a source range entry according to an example embodiment of the present disclosure.

FIG. 8 illustrates an embodiment of a method of performing a copy operation of a first read and a second read of a source range entry, according to an example embodiment of the present disclosure.

FIG. 9 illustrates another embodiment of a method of performing a copy operation of a first read and a second read of a source range entry in accordance with an example embodiment of the present disclosure.

Fig. 10 illustrates an embodiment of a method of blocking a copy operation of a predetermined amount of destination space according to an example embodiment of the present disclosure.

FIG. 11 illustrates another embodiment of a copy operation blocking a predetermined amount of destination space in accordance with an example embodiment of the present disclosure.

FIG. 12 illustrates an example embodiment of a system for implementing an atomically protected copy command in accordance with example embodiments of the present disclosure.

Fig. 13 illustrates an example embodiment of an apparatus according to an example embodiment of the present disclosure.

Fig. 14 illustrates an embodiment of a copy command according to an example embodiment of the present disclosure.

Detailed Description

The device may implement a copy command, which may involve performing one or more read operations to read data from one or more source address ranges and one or more write operations to write some or all of the data to one or more destination address ranges. For example, a copy command may enable a user to indicate multiple source address ranges from which data may be read (e.g., in a storage medium in a storage device) and a continuous destination address range to which some or all of the data may be written.

Some copy commands may cause some or all of one or more destination address ranges to be blocked, e.g., to enable the copy command to perform one or more atomic write operations on the one or more destination address ranges. However, in some cases (e.g., if the copy command indicates more than one source address range), the device may not know how much of the one or more destination address ranges to block.

Some inventive principles of this disclosure relate to various techniques for determining one or more amounts of destination space to block for write operations of a copy command. For example, in some embodiments, the copy command may include a first indication of a first amount of source data and a second indication of a second amount of source data. The device may combine the first amount of source data and the second amount of source data to determine a total amount of destination space to be blocked for the copy command. As another example, the device may at least initially block a certain amount of destination space based on the maximum copy length of the copy command.

Some additional inventive principles of this disclosure relate to various techniques for executing copy commands that may involve blocking at least portions of one or more destination address ranges. For example, in some embodiments, a device may read one or more indications of one or more amounts of source data to determine a total amount of destination space to block for a copy command. The device may store at least one indication at the device, e.g., for performing one or more read operations and/or write operations as part of the copy command execution.

As another example, in some embodiments, the device may perform one or more first read operations of one or more indications of one or more amounts of source data to determine a total amount of destination space to block for the copy command. The device may perform one or more second read operations of the one or more indications of one or more amounts of source data for performing one or more read operations and/or write operations as part of the copy command execution.

As another example, a device may initially block a certain amount of destination space based on the maximum copy length of a copy command. Based on performing one or more read and/or write operations for the copy command, the device may modify the amount of blocked destination space, e.g., based on the amount of remaining read and/or write operations for the copy command.

The present disclosure encompasses numerous inventive principles related to determining and/or blocking destination space for replication commands. The principles disclosed herein may have independent utility and may be embodied separately and not every embodiment may utilize every principle. Furthermore, the principles may be embodied in various combinations, some of which may amplify some of the benefits of the various principles in a synergistic manner. For example, some embodiments may store one or more indications of one or more amounts of source data for a first amount of copy commands and may perform first and second read operations of the one or more indications of one or more amounts of source data for a subsequent copy command.

For purposes of illustration, some example embodiments may be described in the context of some example implementation details, such as a storage device that may implement a copy command (e.g., a simple copy command) that is received and operated on a storage medium accessible through a Logical Block Address (LBA) using a storage protocol such as non-volatile memory standard (Nonvolatile Memory Express, NVMe). However, the inventive principles are not limited to these example implementation details, and may be applied to any type of device (e.g., accelerator, graphics Processing Unit (GPU), neural Processing Unit (NPU), etc.), any type of addressable data (e.g., registers, memory, storage medium, etc.), etc. that uses any type of protocol and/or interface (e.g., serial ATA (SATA), small Computer System Interface (SCSI), serial Attached SCSI (SAS)), to implement any type of copy command. In some embodiments, the inventive principles may be implemented using a storage protocol that may enable overlapping command execution, enabling parallelization (e.g., maximum parallelization) of overlapping commands, where NVMe is an example. In some embodiments, the use of one or more atomic operations may enable overlapping commands to be ordered and/or determined in some cases.

FIG. 1 illustrates a first embodiment of a copy operation according to an example embodiment of the present disclosure. In the embodiment shown in fig. 1, the copy command may provide four source indications 102a, … …, 102d of four amounts of source data, but any number of indications and/or any number of source data may be used. In this example, the source indications 102a, … …, 102d may be provided in the form of source ranges of LBAs in a storage device (e.g., a Solid State Drive (SSD)), which may indicate the amount of source data based on the LBA size. (e.g., a system may use 512 bytes or any other size LBA). Alternatively or additionally, the indication of the amount of source data may be implemented with any one or more flags, such as the Number of Logical Blocks (NLBs), the number and/or size of buffers, the number of bytes, the number of pages, etc.

The first indication 102a may specify a first source LBA Range (Range 0), which in this example may be LBA ranges 101-150 (e.g., 50 LBAs). The source LBA range may be specified in any suitable manner, for example, by specifying a Starting Source LBA (SSLBA) and an Ending Source LBA (ESLBA) as shown in fig. 1, by specifying the number of starting source LBAs and LBAs (e.g., NLBs), and/or in any other manner.

The second indication 102b may specify a second source LBA Range (Range 1), which in this example may be LBA ranges 2300-2399 (e.g., 100 LBAs). The third indication 102c may specify a third source LBA Range (Range 2), which in this example may be LBA ranges 331-340 (e.g., 10 LBAs). The fourth indication 102d may specify a fourth source LBA Range (Range 3), which in this example may be LBA ranges 215-224 (e.g., 10 LBAs).

The copy command may also include an indication 104, which indication 104 may specify a continuous destination LBA range, which in this example may be LBA ranges 10001-10170 (e.g., 170 LBAs). The destination LBA range may be specified in any suitable manner, for example, by specifying a Starting Destination LBA (SDLBA) and an Ending Destination LBA (EDLBA) as shown in fig. 1, by specifying the total number of starting destination LBAs and destination LBAs, and/or in any other manner. In some embodiments, the total number of destination LBAs may be determined, for example, by summing the Number of LBAs (NLBs) of some or all (e.g., each) of the four source LBA ranges (e.g., range 0 through Range 3).

In the example embodiment shown in fig. 1, some or all of the destination LBA ranges 10001-10170 may be blocked based on a first aspect of the copy command (e.g., feature, parameter, event, action, condition, indication, determination, etc.), and some or all of the blocks may be released based on a second aspect of the copy command. For example, in some embodiments, the entire destination LBA range 10001-10170 may be blocked during some or all of the execution of the copy command, e.g., to maintain atomicity across the entire destination LBA range. In some embodiments, blocking may refer to ordering of forced execution commands (e.g., other reads, other writes, other copy commands, etc.), subcommands (e.g., executing one or more portions of copy commands and/or write operations (such as managing data buffered in a storage device controller), source range operations, destination range operations, etc. during some or all of the copy operations (e.g., execution of copy commands).

Thus, for example, blocking of the destination LBA ranges 10001-10170 and/or the source LBA ranges (e.g., range 0-Range 3) may be enforced during a first read operation to read first source data from Range 0, a first write operation to write first source data to the destination LBA ranges 10001-10050, a second read operation to read second source data from Range 1, a second write operation to write second source data to the destination LBA ranges 10051-10150, a third read operation to read third source data from Range 2, a third write operation to write third source data to the destination LBA ranges 10151-10160, a fourth read operation to read fourth source data from Range3, and a fourth write operation to write fourth source data to the destination LBA ranges 10161-10170. Additionally or alternatively, one or more Source Range Entries (SREs) as described below may be partially read. For example, in the event of a power failure, some embodiments may attempt to perform one or more copy commands and/or write operations on data in units of multiples of an atomic write power failure parameter (e.g., atomic Write Unit Power Fail (AWUPF) parameter). In such embodiments, the copy command and/or write operation may stop at the boundary of the atomic write power failure parameter. Thus, in such embodiments, the SRE may be performed in one or more portions. Additionally or alternatively, in some embodiments, the ending segment of the first SRE may be performed simultaneously, concurrently, overlapping, etc. with the starting segment of the second SRE. In some embodiments and depending on implementation details, the first SRE and the second SRE may be continuous and/or discontinuous SREs.

FIG. 2 illustrates a second embodiment of a copy operation according to an example embodiment of the present disclosure. In some aspects, the embodiment shown in fig. 2 may be similar to the embodiment shown in fig. 1, and elements in fig. 2 similar to elements in fig. 1 may be indicated by reference numerals ending with the same numerals. However, in the embodiment shown in fig. 2, blocking may be enforced based on individual read and/or write operations to individual source and/or destination LBA ranges.

Thus, for example, blocking of the first destination LBA Range 10001-10050 (Blocking 0) and/or the first source LBA Range (which may be referred to as Range 0) may be enforced during a first read operation that reads first source data from Range 0 and/or a first write operation that writes first source data to the destination LBA Range 10001-10050. Blocking of the second destination LBA Range 10051-10150 (Blocking 1) and/or the second source LBA Range (which may be referred to as Range 1) may be enforced during a second read operation that reads the second source data from Range 1 and/or a second write operation that writes the second source data to the destination LBA Range 10051-10150. Blocking of the third destination LBA Range 10151-10160 (Blocking 2) and/or the third source LBA Range (which may be referred to as Range 2) may be enforced during a third read operation to read the third source data from Range 2 and/or a third write operation to write the third source data to the destination LBA Range 10151-10160. Blocking of the fourth destination LBA Range 10161-10170 (Blocking 3) and/or the fourth source LBA Range (which may be referred to as Range 3) may be enforced during a fourth read operation to read fourth source data from Range 3 and/or a fourth write operation to write fourth source data to the destination LBA Range 10161-10170. Alternatively or additionally, in some embodiments, multiple destination LBA ranges (e.g., two or more of Range 0, range 1, range 2, and/or Range 3) may be initially blocked, and blocks on one or more of the destination LBAs may be freed, e.g., one at a time when a write operation of the corresponding LBA Range is completed.

FIG. 3 illustrates a table of an example embodiment of replication command results according to an example embodiment of the present disclosure. The embodiment shown in fig. 3 may demonstrate the results that may be obtained using, for example, the different occlusion techniques shown in fig. 1 and/or fig. 2. In the embodiment shown in fig. 3, a first Command (which may be referred to as Copy Command a) and a second Copy Command (which may be referred to as Copy Command B) may be present in the storage device (e.g., in a controller in the storage device). In some embodiments, copy Command A and Copy Command B may have been read from a commit queue (SQ) as described below, for example, with reference to FIG. 12. In such an embodiment, for the example shown in FIG. 3, copy Command A and Copy Command B may not have returned corresponding Completion Queue (CQ) entries as described below, for example, with reference to FIGS. 6-11.

In some embodiments, the blocking associated with atomicity may affect one or more other input and/or output operations and/or commands (I/O operations, I/O commands, or I/O), such as write and/or read commands. For example, in some embodiments and depending on implementation details, for one or more overlapping LBA ranges, a blockage associated with one or more copy commands, write commands, read commands, write operations, read operations, etc., may result in a delay of one or more portions of another copy command, write command, read command, write operation, read operation, etc.

In table 306 shown in fig. 3, a first copy Command (Command a) may write data from one or more source LBA ranges to four consecutive destination LBAs (LBA 0 to LBA 3) and a second copy Command (Command B) may write data from one or more source LBA ranges to four consecutive destination LBAs (LBA 1 to LBA 4). However, depending on the order of execution of Command A and Command B and/or the type of blocking implemented, different results may be obtained.

Referring to FIG. 3, rows 308 and 310 of example 1 and example 2, respectively, illustrate results that may be obtained when both Command A and Command B implement blocking within an entire combined destination LBA range (e.g., as implemented in the embodiment shown in FIG. 1) or a single destination LBA range (e.g., as implemented in the embodiment shown in FIG. 2).

In example 1, shown in line 308, command B may be executed first, followed by Command A. Because blocking may be enforced across the entire destination LBA range of Command B until Command B completes, command A may not continue to override the results of Command B until Command B completes. Thus, a valid result for Command A (indicated by data "A" shown by LBA0 through LBA 3) may be obtained.

In example 2, shown in line 310, command A may be executed first, followed by Command B. Because blocking may be enforced across the entire destination LBA range of Command A until Command A completes, command B may not continue to override the results of Command A until Command A completes. Thus, a valid result for Command B (indicated by data "B" shown by LBAs 1 through 4) may be obtained.

In contrast, lines 312 and 314, which show example 3 and example 4, respectively, show the results that can be obtained when Command A and Command B do not implement blocking in either the combined or separate destination LBA ranges. Regardless of the order of execution of commands a and B, because blocking may be enforced only within a separate destination LBA range (e.g., associated with a separate source LBA range), invalid results may be obtained (e.g., indicated by an instance of data "B" for Command B for LBA 2 and LBA3 of example 3 in row 312 and an instance of data "a" for Command a for LBA 2 and LBA3 of example 4 in row 314).

To illustrate the inventive principles, the embodiments shown in fig. 1, 2, and 3 are described in the context of a system that can implement Destination Range (DR) atomicity. However, these principles may be applied to systems that may implement any other type of atomicity. For example, some embodiments may implement source scope entry (SRE) atomicity or a combination of DR atomicity and SRE atomicity.

In some embodiments, writing data to one or more destination LBAs may refer to programming data into a storage medium, such as a NAND (Not-AND) flash memory. Alternatively or additionally, writing data to one or more destination LBAs may refer to updating a write buffer with the write data, which may also involve updating some or all of the trace information associated with the write data (e.g., in a controller of the storage device and/or in a storage medium in the storage device). In some embodiments, the write data may refer to a write operation, wherein the data in the buffer may not be fully programmed into the storage medium, but one or more read commands that may access one or more LBAs associated with the write operation may return data as shown in the table of FIG. 3, for example, by returning a portion of the data from the write buffer and/or a portion of the data from the programmed storage medium. Depending on implementation details, this may enable the block or ordering constraint of one or more LBAs associated with the data in the write buffer to be released before the data in the write buffer is programmed into the storage medium.

Fig. 4 illustrates an example embodiment of a command format of a copy command according to an example embodiment of the present disclosure. The copy command 416 shown in fig. 4 may include a first Word (Word 0) with an opcode 418 to indicate that the command is a copy command, and a second Word (Word 1) with a data pointer 420, which data pointer 420 may point to a data (e.g., memory) location that may contain data for use with the copy command 416 (e.g., data such as the Source Range Entry (SRE) shown in fig. 5). The copy command 416 may also include a third Word (Word 2) with an SDLBA that may indicate that the first LBA is to be used as a destination for one or more write operations using data read from one or more source LBAs. The copy command 416 may also include a fourth Word (Word 3) having a Number of Ranges (NR) that may indicate a number of SREs that may provide information regarding the location, number, etc. of data for one or more read operations of the copy command.

In some embodiments, the replication command 416 shown in fig. 4 may be referred to as a commit queue entry (SQE) and may include one or more additional words, fields, etc. related to characteristics such as retry, protection information, instruction type, reference tag, one or more data formats for the replication command 416 (e.g., one or more formats for SRE, such as the format shown in fig. 5), etc.

Fig. 5 illustrates an example embodiment of a data format of data that may be used for a copy command according to an example embodiment of the present disclosure. The data structure shown in fig. 5 may be used in embodiments such as the copy command shown in fig. 4.

Referring to FIG. 5, the data structure may include one or more SREs 526-0, 526-1, … …, 526-N. One or more of the SREs 526-0, 526-1, … …, 526-N may include a corresponding Starting Source LBA (SSLBA) 528-0, 528-1, … …, 528-N and/or one or more corresponding Number of Logical Blocks (NLB) 530-0, 530-1, … …, 530-N. Thus, in the example embodiment shown in fig. 5, an indication of the amount of source data may be provided in the form of an NLB.

In some embodiments, the information provided in the copy command 416 shown in FIG. 4 and/or one or more SREs 526-0, 526-1, … …, 526-N shown in FIG. 5 may provide sufficient information for a device receiving the copy command 416 to implement the copy command 416.

However, in embodiments where blocking of one or more destination spaces (e.g., blocking of one or more portions of one or more destination LBAs implementing atomic operations) may be implemented, it may be difficult to determine the one or more destination spaces on which to implement the blocking. For example, blocking the entire destination LBA range (e.g., as implemented in the embodiment shown in FIG. 1) may involve reading multiple ones (e.g., all) of SREs 526-0, 526-1, … …, 526-N shown in FIG. 5, and adding some (e.g., all) of NLBs 530-0, 530-1, … …, 530-N to determine the total NLB of the replication command. In some embodiments, the blocking operation may use the SDLBA shown in FIG. 4, and/or it may use the total NLB of replication commands calculated from the individual SREs.

FIG. 6 illustrates an embodiment of a method of storing a copy operation of a source range entry according to an example embodiment of the present disclosure. For purposes of illustration, the embodiment shown in fig. 6 may be described in the context of a device such as a storage device that may receive copy commands using a storage protocol such as NVMe, but the inventive principles are not limited to these example implementation details. For example, some embodiments may use serial ATA (SATA), small Computer System Interface (SCSI), serial Attached SCSI (SAS), and/or any other type of interface, protocol, etc. for any type of device.

Referring to FIG. 6, the method may begin with operation 632-1, where a device may acquire a commit queue (SQ) entry with a copy command. In some embodiments, the commit queue (SQ) may reside on the host, on the device, or in any other location. At operation 632-2, the method may parse the copy command (e.g., by checking the Number of Ranges (NR) in Word 3 as shown in FIG. 4) to locate one or more Source Range Entries (SREs), such as shown in FIG. 5, to locate SDLBA fields, etc. Also at operation 632-2, the method may initialize the total number of logical blocks (total NLB) to zero. In some embodiments, the SRE may be located, for example, in a memory at the host, at the device, or at any other location.

At operation 632-3, the method may read the SRE from a data area (e.g., a data area pointed to by a Word 01r data pointer such as that shown in FIG. 4), which may be located at the host in this example. At operation 632-4, the method may add the NLB in the SRE to the total NLB. At operation 632-5, the method may store the SRE in, for example, a local memory of the device. At operation 632-6, the method may determine whether there are any additional SREs to read (e.g., based on NR). If there are additional SREs to read, the method may continue to loop through operations 632-3, 632-4, and/or 632-5 until all SREs (e.g., indicated by the number of ranges in Word 3) have been read and the corresponding NLB has been added to the total NLB. Thus, in some embodiments and depending on implementation details, the total NLB may indicate the total number of LBAs indicated by the sum of the individual SREs indicated by the copy command.

At operation 632-7, the method may block at least a portion (e.g., all) of the destination LBAs (e.g., starting from SDLBA 422 and continuing for the number of destination LBAs determined by the total NLB), e.g., to force command and/or SRE ordering for one or more atomic specifications of the replication command.

Once some or all of the destination LBAs are blocked, the method may begin performing one or more read operations and/or one or more write operations to copy data indicated by the one or more SREs to the one or more destination LBAs. For example, at operation 632-8, the method may retrieve the SRE (which may have been stored in local memory, e.g., at operation 632-5). At operation 632-9, the method may perform a read operation on the source data indicated by the corresponding starting source LBA and NLB of the retrieved SRE. At operation 632-10, the method may perform a write operation by writing the data read at operation 632-9 to one or more corresponding destination LBAs.

At operation 632-11, the method may determine whether the aggregate amount of source data indicated by the total NLB has been read from the corresponding source LBA and written to the destination LBA indicated by the SDLBA and the total NLB. If source data associated with one or more SREs is still to be read and/or written, the method may loop through operations 632-8, 632-9, 632-10, and/or 632-11 until source data associated with an SRE has been read and written to a corresponding destination LBA. At operation 632-12, the method may free up the blockage from some or all of the destination LBAs that may have been blocked in operation 632-7. At operation 632-13, the method completes the copy command, for example, by sending a Completion Queue (CQ) entry to the host to indicate that the copy command has completed. In some embodiments, the release of the occlusion may be complete or partial. For example, the blocking may be released on a portion of the copy operation that has completed some of the processes of reading or writing.

Although operations 632-8, 632-9, and/or 632-10 may show replication commands through the SRE occurring in sequence, in some embodiments, one or more read and/or write operations for implementing the replication commands may be performed in parallel, e.g., while still maintaining atomicity of the entire replication command.

In some embodiments and depending on implementation details, the embodiment of the copy operation shown in fig. 6 may access SREs (e.g., in local memory of a storage device) for any purpose and/or in any order, with little or no penalty in terms of access time, power consumption, latency, etc., for example.

FIG. 7 illustrates another embodiment of a method of storing a copy operation of a source range entry according to an example embodiment of the present disclosure. In some aspects, the embodiment shown in fig. 7 may be similar to the embodiment shown in fig. 6, and elements in fig. 7 similar to elements in fig. 6 may be indicated by reference numerals ending with the same numerals. However, the embodiment shown in FIG. 7 may include operations 732-14, where the method may discard one or more SREs after the source data indicated by the SRE is read from and/or written to one or more corresponding source LBAs. For example, in some embodiments, an SRE may be discarded by freeing up space in local memory that the SRE may occupy for use by one or more other processes. Depending on the implementation details, this may improve and/or optimize the method shown in fig. 7.

FIG. 8 illustrates an embodiment of a method of performing a copy operation of a first read and a second read of a source range entry, according to an example embodiment of the present disclosure. For purposes of illustration, the embodiment shown in fig. 8 may be described in the context of a device such as a storage device that may receive copy commands using a storage protocol such as NVMe, but the inventive principles are not limited to these example implementation details. For example, some embodiments may use SATA, SAS, and/or any other type of interface, protocol, etc. for any type of device.

Referring to FIG. 8, the method may begin at operation 834-1, where a device may acquire a commit queue entry with a copy command. In some embodiments, the commit queue may reside at the host, at the device, or at any other location. At operation 834-2, the method may parse the copy command (e.g., by checking the number of ranges in Word 3 as shown in fig. 4) to locate one or more SREs, e.g., as shown in fig. 5, to locate SDLBA fields, etc. Also at operation 834-2, the method may initialize the total NLB of the replication command to zero. In some embodiments, the SRE may be located, for example, in a memory at the host, at the device, or at any other location.

In operation 834-3, the method may read the SRE from a data area (e.g., the data area pointed to by a data pointer such as Word1 shown in fig. 4), which in this example may be located at the host. At operation 834-4, the method may add NLB in the SRE to the total NLB.

At operation 834-5, the method may determine whether there are any additional SREs to read. If there are additional SREs to read, the method may continue to loop through operations 834-2, 834-3, and 834-4 until all SREs (e.g., indicated by the number of ranges in Word 3) have been read and the corresponding NLB has been added to the total NLB. Thus, in some embodiments and depending on implementation details, the total NLB may indicate the total number of LBAs indicated by the SRE indicated by the copy command.

In the embodiment shown in fig. 8, the method may not store one or more SREs locally at the device. Thus, depending on implementation details, the embodiment shown in FIG. 8 may reduce the amount of local memory used by the device. For example, in some embodiments, an SRE may use 32 bytes of memory, and a copy command may include up to 256 SREs. Thus, depending on implementation details, the embodiment shown in FIG. 8 may reduce the local memory usage per copy command by about 8KiB.

At operation 834-6, the method may block at least a portion (e.g., all) of the destination LBAs (e.g., starting from the SDLBA and continuing for the number of destination LBAs determined by the total NLB) to force command and/or SRE ordering, for example, on one or more atomic specifications of the replication command.

Once some or all of the destination LBAs are blocked, the method may begin reading one or more SREs a second time to perform one or more read operations and/or one or more write operations to implement the copy command. For example, at operation 834-7, the method may read the SRE a second time (which may be located at the host in this example). At operation 834-8, the method may perform a read operation on source data indicated by the corresponding starting source LBAs and NLBs of the SRE that have been read from the host a second time. At operation 834-9, the method may perform a write operation by writing the data read at operation 834-8 to one or more corresponding destination LBAs.

At operation 834-10, the method may determine whether the total number of source data indicated by the total NLB has been read from the corresponding source LBA and written to the SDLBA and the destination LBA indicated by the total NLB. If source data associated with one or more SREs is still to be read and/or written, the method may loop through operations 834-7, 834-8, 834-9, and/or 834-10 until source data associated with an SRE has been read and written to a corresponding destination LBA. At operation 834-11, the method may free up the blockage from some or all of the destination LBAs that may have been blocked at operation 834-6. At operation 834-12, the method completes the replication command, for example, by sending a completion queue entry to the host to indicate that the replication command has completed. In some embodiments, the release of the occlusion may be complete or partial. For example, the blocking may be released on a portion of the copy operation that has completed some of the processes of reading or writing.

Although operations 834-7, 834-8, and/or 834-9 may show replication commands travelling sequentially through the SREs, in some embodiments one or more read and/or write operations for implementing replication commands may be performed in parallel, e.g., while maintaining atomicity of the entire replication command or atomicity on each SRE may be required depending on implementation details.

FIG. 9 illustrates another embodiment of a method of performing a copy operation of a first read and a second read of a source range entry in accordance with an example embodiment of the present disclosure. In some aspects, the embodiment shown in fig. 9 may be similar to the embodiment shown in fig. 8, and elements in fig. 9 similar to elements in fig. 8 may be indicated by reference numerals ending with the same numerals. However, the embodiment shown in FIG. 9 may include operations 934-13, where the method may store one or more SREs (e.g., the first few SREs read from the host) in the local memory of the device. The embodiment shown in fig. 9 may also include operations 934-14, where the method may retrieve the SRE (which may have been stored in local memory, e.g., at operations 934-13). At operations 934-15, the method may perform a read operation on source data indicated by corresponding starting source LBAs and NLBs of the SREs that have been stored locally at the device at operations 934-13. At operations 934-16, the method may perform a write operation into one or more corresponding destination LBAs using the data read at operations 934-15. Thus, the embodiment shown in FIG. 9 may begin processing the locally stored SRE (e.g., without waiting for the SRE to be read from the host a second time). In some embodiments, the atomic copy command may protect (e.g., implement blocking) one or more LBAs associated with the data in the write buffer, such that the blocking may be released before the data in the write buffer is transferred to (e.g., programmed to) the storage medium. Thus, in some embodiments, the storage device controller may be in the process of programming write data from the buffer into the storage medium (e.g., after the storage device controller has sent a completion to the completion queue to indicate to the host that the copy command has completed), but later references to that data may be broken down to the buffer and/or storage medium to preserve atomicity.

At operations 934-17, the method may determine whether any more SREs are stored locally at the device. If one or more locally stored SREs are still available, the method may loop through operations 934-14, 934-15, 934-16, and/or 934-17 until no more locally stored SREs are available. The method may then proceed to operation 934-7, where it may begin reading the SRE from the host a second time.

Fig. 10 illustrates an embodiment of a method of blocking a copy operation of a predetermined amount of destination space according to an example embodiment of the present disclosure. For purposes of illustration, the embodiment shown in fig. 10 may be described in the context of a device such as a storage device that may receive copy commands using a storage protocol such as NVMe, but the inventive principles are not limited to these example implementation details. For example, some embodiments may use SATA, SAS, and/or any other type of interface, protocol, etc. for any type of device.

Referring to FIG. 10, the method may begin with operation 1036-1, where a device may obtain a commit queue entry with a copy command. In some embodiments, the commit queue may reside at the host, at the device, or at any other location. At operation 1036-2, the method may parse the copy command (e.g., by checking the Number of Ranges (NR) in Word 3 as shown in FIG. 4) to locate one or more SREs, such as shown in FIG. 5, to locate an SDLBA field, etc.

At operation 1036-3, the method can block at least a portion (e.g., all) of the destination LBA, e.g., starting from the SDLBA, and for an amount of space that can be sufficient to block the expected total size of the copy command. For example, in some embodiments, the method may block a conservative number of LBAs determined by the maximum replication length (MCL) of the replication command or device support (maximum SRE NLB SRE maximum number). In some embodiments and depending on implementation details, these two values may add up to not the same number, even though they may be logically similar. In some embodiments, MCL may be determined by the expression Min { MCL, MSRC MSRRL }.

Once some or all of the destination LBAs have been blocked, the method may begin reading one or more SREs to perform one or more read operations and/or one or more write operations to implement the copy command. For example, at operation 1036-4, the method may read the SRE (which may be located at the host in this example). At operation 1036-5, the method may perform a read operation on source data indicated by the corresponding starting source LBAs and NLBs of the SREs that have been read from the host. At operation 1036-6, the method may perform a write operation into one or more corresponding destination LBAs using the data read at operation 1036-5.

At operation 1036-7, the method may determine whether the total number of SREs indicated by the Number of Ranges (NR) has been read and whether their corresponding NLBs have been copied. If source data associated with one or more SREs is still to be read and/or written, the method may loop through operations 1036-4, 1036-5, 1036-6, and/or 1036-7 until the source data associated with the SRE has been read and written to the corresponding destination LBA. In some embodiments, by blocking a certain amount of destination space based on MCL or (maximum number of NLB x SRE SREs), the method may be able to copy the entire data size of the copy command without performing any checks on the amount of data to be copied and/or the amount of data remaining to be copied. In some embodiments, this may implement blocking over a relatively large LBA range, and while it may provide correct results, it may delay one or more reads of some or all of the entire LBA range that is protected. Thus, one or more reads may slow, even though some of them may not have copy activity.

At operation 1036-8, the method may unblock some or all of the destination LBAs that may have been blocked in operation 1036-3. At operation 1036-9, the method completes the copy command, for example, by sending a completion queue entry to the host to indicate that the copy command has completed. In some embodiments, freeing some or all of the blocked address ranges may begin from the end as more addresses are read from the SRE, and/or freeing blocked LBA ranges may begin from the beginning or middle of the LBA ranges as part of the copy operation proceeds.

Although operations 1036-4, 1036-5, 1036-6, and/or 1036-7 may display copy commands that proceed sequentially through the SRE, in some embodiments, one or more read and/or write operations for implementing the copy commands may be performed in parallel, e.g., while still maintaining atomicity of the entire copy command.

FIG. 11 illustrates another embodiment of a copy operation blocking a predetermined amount of destination space according to an example embodiment of the present disclosure. In some aspects, the embodiment shown in fig. 11 may be similar to the embodiment shown in fig. 10, and elements in fig. 11 similar to elements in fig. 10 may be indicated by reference numerals ending with the same numerals. However, the embodiment shown in FIG. 11 may determine a different amount of destination space to block based on the amount of destination space that has been written. The embodiment shown in fig. 11 may include operations 1136-10, where the method may set a counter (e.g., remaining_sre_count) to NR. Each time a loop is passed, the method may decrement a counter at operation 1136-11, and at operation 1136-12 the method may determine a different amount of destination space to block based on the amount of destination space that has been written. For example, in some embodiments, at operations 1136-12, the method may determine the amount of destination space to block as MCL-MSRRL remaining_sre_count, where MSSRL may indicate the maximum single source range length supported by the device. In some embodiments, the amount of destination space to block may be reduced, for example, because not every SRE may use a full MSSRL. In some embodiments, operations 1136-11 and/or 1136-12 may be performed before and/or in parallel with operations 1136-4 through 1136-6, or at any other time or in any other order.

As another example, in some embodiments where the NR may be unknown, the method may determine the amount of destination space to be blocked as Max { MCL- (MSRRL [ MSRC-remaining_sre_count ]), remaining_sre_count [ MSRRL }, where MSRC may indicate the maximum SRE Count.

Some embodiments may combine one or more of the techniques described herein to implement one or more hybrid techniques. In some embodiments, one or more parameters of the hybrid technology may vary based on device behavior, memory limitations, current device activity, and the like.

For example, in some embodiments, a number of replication commands (e.g., the first 10 replication commands submitted to the SQ) may use the method shown in fig. 6, where some or all SREs may be stored (e.g., in device memory), while one or more additional replication commands may use the method shown in fig. 8, where no or fewer SREs may be stored (e.g., in device memory). In some embodiments and depending on implementation details, such a hybrid technique may enable a device to handle a relatively large number of overlapping replication commands while limiting the amount of local memory used to store the SRE.

As another example, in some embodiments, a system, device, etc. may generate an activity indicator (e.g., number of overlapping replication commands (e.g., in one or more SQs)) to estimate an activity level of a host, device, etc. In some embodiments, the method shown in FIG. 8 may be used to process a copy command if the activity indicator is below a threshold (e.g., relatively idle). However, if the activity indicator is at or exceeds a threshold (e.g., relatively busy), the method shown in fig. 10 may be used to process the copy command, for example, because it may involve fewer reads and/or computations.

FIG. 12 illustrates an example embodiment of a system for implementing an atomically protected copy command in accordance with example embodiments of the present disclosure. The system shown in fig. 12 may be used to implement and/or may be implemented with any embodiment of the systems, devices, methods, techniques, etc. disclosed herein. The system shown in fig. 12 may include a host 1240 and a device 1242, which may communicate via a communication connection 1244.

In some embodiments, host 1240 may include command commit queues (SQs) 1246, completion Queues (CQs) 1248, host memory 1250, and/or host communication interface 1252. The SQ 1246 may be implemented, for example, as an NVMe commit queue, which may hold commands (e.g., copy commands) to be consumed by the device 1242. Completion queue 1248 can be implemented, for example, as an NVMe completion queue, which can hold, for example, commands (e.g., copy commands) that can be sent by device 1242 to instruct device 1242 to complete. Host memory 1250 may be used, for example, to store command data, such as an SRE of copy commands submitted by host 1240 using a commit queue 1246.

In some embodiments, the device 1242 can include a device communication interface 1254, a device controller 1256, and/or a storage medium 1258. In some embodiments, the device controller 1256 may include a command processor 1260, and the command processor 1260 may obtain and/or parse commands (e.g., copy commands) from the commit queue 1246. In some embodiments, the device controller 1256 may include replication logic 1261, which replication logic 1261 may implement one or more replication commands, including one or more read and/or write operations associated therewith as described herein.

In some embodiments, the device controller 1256 may include a hazard manager 1262, the hazard manager 1262 may be used to, for example, enforce the atomicity of one or more copy commands. For example, in some embodiments, one or more (e.g., all) destination spaces (e.g., destination LBA ranges) of one or more (e.g., all) write operations may be submitted by replication logic 1261 to hazard manager 1262 to ensure that any atomicity of the destination spaces is enforced. In some embodiments, after one or more (e.g., all) atomically protected write operations are completed, the associated destination space (e.g., destination LBA range) may be purged from the hazard manager to free up space for use by other processes.

In some embodiments, the device controller 1256 may include a local memory 1264, which local memory 1264 may be used to store, for example, one or more SREs of one or more replication commands that are resolvable by the command processor 1260 and executable by the replication logic 1261 and/or the hazard manager 1262. In some embodiments, the device controller 1256 may include a write buffer 1266, which write buffer 1266 may be used, for example, to store source data read from one or more source LBAs in the storage medium 1258 to be written to one or more destination LBAs in the storage medium 1258. After completing the copy command, command processor 1260 may send the completion to completion queue 1248.

In some embodiments, host 1240 may be implemented with one or more of any type of device (such as a server, e.g., a computing server, a storage server, a web server, a cloud server, etc., a computer, e.g., a workstation, a personal computer, a tablet, a smart phone, etc., or multiple and/or combinations thereof). In some embodiments, the device 1242 may be implemented with one or more any type of device, such as an accelerator device, a storage device, a network device (e.g., a Network Interface Card (NIC)), a memory expansion and/or buffering device, a Graphics Processing Unit (GPU), a Neural Processing Unit (NPU), a Tensor Processing Unit (TPU), etc., or multiple and/or combinations thereof. In some embodiments, host 1240 may also be implemented as a device. In some embodiments, either or both of host 1240 and/or device 1242 may be configured as a host, a client, etc., or any combination thereof.

In some embodiments, the communication connection 1244 may be implemented by any type oF wired and/or wireless communication medium, interface, protocol, etc., including peripheral component interconnect express (PCIe), non-volatile memory standard (NVMe), architecturally NVMe (NVMe-orf), fast computing link (CXL) and/or coherence protocol (such as cxl.mem, cxl.cache, cxl.io, etc.), gen-Z, open consistent accelerator processor interface (opencaps), accelerator Cache Consistent Interconnect (CCIX), etc., advanced extensible interface (AXI), direct Memory Access (DMA), remote DMA (RDMA), aggregated ethernet RDMA (ROCE), advanced Message Queue Protocol (AMQP), ethernet, transmission control protocol/internet protocol (TCP/IP), mesh channel, wireless bandwidth, serial ATA (SATA), small computer system interface (SAS), iWARP, any wireless network including 2G, 3G, 4G, 5G, etc., any generation-Wi-Fi, near Field Communication (NFC), etc., or any combination thereof. In some embodiments, communication connection 1244 may include a communication structure including one or more links, buses, switches, hubs, nodes, routers, switches, repeaters, and the like.

In embodiments where the device 1242 may be at least partially implemented as a storage apparatus, the storage device may include any type of non-volatile storage medium or any combination thereof, for example, based on solid state media (e.g., solid State Drive (SSD)), magnetic media (e.g., hard Disk Drive (HDD)), optical media, and the like. For example, in some embodiments, the storage device may be implemented as a non-AND (NAND) flash memory based SSD, a persistent memory such as a cross-grid non-volatile memory, a memory with a change in bulk resistance, a Phase Change Memory (PCM), or the like, or any combination thereof. Any such storage device may be implemented in any form factor (such as 3.5 inches, 2.5 inches, 1.8 inches, m.2, enterprise, and data center SSD form factor (EDSFF), NF1, etc.) using any connector configuration (such as SATA, SCSI, SAS, U.2, m.2, etc.). Any such storage devices may be implemented and/or used in whole or in part with a server chassis, a server rack, a data room, a data center, an edge data center, a mobile edge data center, and/or any combination thereof.

Any of the functions described herein, including any host functions, device functions, etc. (e.g., command processor 1260, replication logic 1261, hazard manager 1262, and any of the functions described with respect to the embodiments shown in fig. 1-13) may be implemented in hardware, software, firmware, or any combination thereof, including, for example, hardware and/or software combinational logic, sequential logic, timers, counters, registers, state machines, volatile memory such as Dynamic Random Access Memory (DRAM) and/or Static Random Access Memory (SRAM), non-volatile memory including flash memory, permanent memory such as cross-grid non-volatile memory, memory with changes in bulk resistance, phase Change Memory (PCM), etc., and/or any combination thereof, complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs) CPUs including Complex Instruction Set Computer (CISC) processors such as x86 processors and/or reduced instruction set computer (ASIC) processors such as ARM processors, graphics Processing Units (GPUs), neural processing units (GPUs), etc. In some embodiments, one or more components may be implemented as a system on a chip (SOC).

Fig. 13 illustrates an example embodiment of an apparatus according to an example embodiment of the present disclosure. For example, the embodiment 1300 shown in fig. 13 may be used to implement any of the devices disclosed herein. The device 1300 may include a device controller 1302, replication logic 1308, device functional circuitry 1306, and/or a communication interface 1310. The components shown in fig. 13 may communicate via one or more device buses 1312. The replication logic 1308 may be used, for example, to implement any of the replication command functions disclosed herein.

The device function circuitry 1306 may include any hardware that implements the primary functions of the device 1300. For example, if device 1300 is implemented as a storage device, device functional circuitry 1306 may include a storage medium, such as one or more flash memory devices, FTLs, etc. As another example, if device 1300 is implemented as a Network Interface Card (NIC), device functional circuitry 1306 may include one or more modems, network interfaces, physical layers (PHYs), medium access control layers (MACs), and the like. As another example, if the device 1300 is implemented as an accelerator, the device function circuitry 1306 can include one or more accelerator circuits, memory circuits, and the like.

Fig. 14 illustrates an embodiment of a method of communication according to an example embodiment of the present disclosure. The method may begin at operation 1402. At operation 1404, the method may receive a copy command at the device, wherein the copy command may include a first indication of a first amount of source data and a second indication of a second amount of source data. At operation 1406, the method may determine an amount of destination space for the write operation based at least in part on the first indication. At operation 1408, the method may block at least a portion of the amount of destination space for at least a portion of the write operation. The method may end at operation 1410.

The embodiment shown in fig. 14, as well as all other embodiments described herein, are example operations and/or components. In some embodiments, some operations and/or components may be omitted, and/or other operations and/or components may be included. Furthermore, in some embodiments, the temporal and/or spatial order of operations and/or components may vary. Although some components and/or operations may be shown as separate components, in some embodiments, some components and/or operations shown separately may be integrated into a single component and/or operation and/or some components and/or operations shown as a single component and/or operation may be implemented with multiple components and/or operations.

Some embodiments disclosed above have been described in the context of various implementation details, but the principles of the disclosure are not limited to these or any other specific details. For example, some functions have been described as being implemented by certain components, but in other embodiments, the functions may be distributed among different systems and components in different locations and with various user interfaces. Certain embodiments have been described as having particular processes, operations, etc., but these terms also encompass embodiments in which a particular process, operation, etc. may be implemented with multiple processes, operations, etc. or embodiments in which multiple processes, operations, etc. may be integrated into a single process, step, etc. Reference to a component or element may refer to only a portion of the component or element. For example, a reference to a block may refer to an entire block or one or more sub-blocks. Terms such as "first" and "second" are used in the present disclosure and claims may be solely for the purpose of distinguishing between their modified elements and may not indicate any spatial or temporal order unless it is apparent from the context. In some embodiments, a reference to an element may refer to at least a portion of the element, e.g., "based on" may refer to "based at least in part on" or the like. The reference to a first element does not imply the presence of a second element. The principles disclosed herein have independent utility and may be embodied separately and not every embodiment may utilize every principle. However, these principles may also be embodied in various combinations, some of which may amplify the benefits of the various principles in a synergistic manner.

The various details and embodiments described above may be combined according to the inventive principles of this patent disclosure to produce additional embodiments. Since the inventive principles of this patent disclosure may be modified in arrangement and detail without departing from the inventive concept, such changes and modifications are considered to be within the scope of the appended claims.

Claims

1. A method for copy-destination atomicity in a device, comprising:

receiving, at the device, a copy command, wherein the copy command includes a first indication of a first amount of source data and a second indication of a second amount of source data;

determining an amount of destination space based at least in part on the first indication; and

at least a portion of the amount of destination space is blocked.

2. The method of claim 1, further comprising:

reading a first indication; and

reading a second indication;

wherein the amount of destination space comprises at least a first portion of the first amount and at least a second portion of the second amount.

3. The method of claim 2, wherein the blocking comprises blocking the first number of the at least first portions and/or the second number of the at least second portions.

4. The method of claim 1, further comprising storing the first indication to generate a stored first indication.

5. The method of claim 4, further comprising:

reading the stored first indication; and

based on reading the stored first indication, at least a portion of the first amount of source data is written to the at least a portion of the amount of destination space.

6. The method of claim 5, wherein the stored first indication is stored in a first memory location, the method further comprising modifying the first memory location based on writing the at least a portion of the first amount of source data.

7. The method of claim 2, wherein reading the first indication comprises performing a first read operation of the first indication; the method further comprises the steps of:

performing a second read operation of the first indication; and

based on a second read operation, at least a portion of the first amount of source data is written to the at least a portion of the amount of destination space.

8. The method of claim 7, further comprising:

storing the second indication to generate a stored second indication;

reading the stored second indication; and

based on reading the stored second indication, at least a portion of the second amount of source data is written to the at least a portion of the amount of destination space.

9. The method of claim 1, further comprising: the method also includes blocking the second amount of destination space based on receiving the copy command.

10. The method of claim 9, wherein the second amount of destination space is based on a copy length of the copy command.

11. The method of claim 9, wherein determining a first amount of destination space comprises modifying a second amount of destination space based at least in part on the first indication.

12. The method according to claim 9, wherein:

the copy command includes a count of the indication; and

wherein determining the first amount of destination space is based on the indicated count.

13. The method of claim 1, wherein the amount of destination space is a first amount of destination space, the method further comprising:

determining a second amount of destination space based at least in part on the second indication; and

blocking at least a portion of the second amount of destination space.

14. A system for copy-destination atomicity in a device, comprising:

a host configured to send a copy command, wherein the copy command includes a first indication of a first amount of source data and a second indication of a second amount of source data; and

An apparatus configured to:

receiving a copy command;

at least a portion of the amount of destination space is blocked.

15. The system of claim 14, wherein the device is further configured to:

reading a first indication from a host; and

reading a second indication from the host;

16. The system of claim 15, wherein the device is further configured to:

storing, at the device, the first indication to generate a stored first indication;

performing a read operation of the stored first indication; and

based on the read operation, at least a portion of the first amount of source data is written to the at least a portion of the amount of destination space.

17. An apparatus, comprising:

a communication interface;

a storage medium; and

a device controller configured to:

receiving a copy command using the communication interface, wherein the copy command includes a first indication of a first amount of source data in the storage medium and a second indication of a second amount of source data in the storage medium;

determining an amount of destination space in the storage medium based at least in part on the first indication; and

At least a portion of the amount of destination space is blocked.

18. The device of claim 17, wherein the device controller is configured to:

performing a first read operation of the first indication using the communication interface; and

performing a second read operation of the second indication using the communication interface;

19. The device of claim 18, wherein the device controller is configured to:

storing the first indication to generate a stored first indication;

performing a read operation of the stored first indication; and

20. The device of claim 18, wherein the device controller is configured to:

performing a third read operation of the first indication using the communication interface; and

based on a third read operation, at least a portion of the first amount of source data is written to the at least a portion of the amount of destination space.