CN116700603A

CN116700603A - Memory device and method for parallel processing

Info

Publication number: CN116700603A
Application number: CN202211708820.7A
Authority: CN
Inventors: 奇亮奭
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2022-03-03
Filing date: 2022-12-29
Publication date: 2023-09-05

Abstract

A memory device and a method for parallel processing are provided. The storage device includes a first storage source that stores first data. The storage device also includes a second storage source that stores second data. The second data includes a first portion and a second portion separated by a separator. The storage device also includes a first buffer configured to receive first data. The storage device also includes a second buffer configured to receive second data. The storage device also includes a first processor associated with the first buffer. The storage device also includes a second processor associated with the second buffer. The second processor is configured to perform a first operation on a second portion of the second data, and wherein the first processor is configured to perform the second operation on the first portion of the second data and the first data based on the separator.

Description

Memory device and method for parallel processing

The present application claims priority and benefit from U.S. provisional application No. 63/316,307 entitled "parallel processing of stream data in a computing storage device" filed 3 at 2022 and U.S. patent application No. 17/856,823 filed 7 at 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates generally to systems and methods for parallel processing in computing storage.

Background

A Computing Storage Device (CSD) provides computing functions and data storage. Thus, the host may store data in the computing storage. The host data units (e.g., files) may have a different size than the CSD data units (e.g., blocks). Thus, the host data unit may be split across multiple components of the CSD.

The offload (offflow) to CSD computation may target the host data unit. While performing offloaded computations in parallel may reduce the overall time spent on computation, CSDs may have difficulty providing pipelining of parallel execution due to the unknown alignment of host data units.

The above information disclosed in this background section is only for enhancement of understanding of the background art disclosed and therefore it may include information that does not form the prior art.

Disclosure of Invention

In various embodiments, systems, methods, and apparatus described herein include systems, methods, and devices related to isolation of resources in a computing storage.

A storage device includes a first storage source that stores first data. The storage device also includes a second storage source that stores second data. The second data includes a first portion and a second portion separated by a separator. The storage device also includes a first buffer configured to receive first data. The storage device also includes a second buffer configured to receive second data. The storage device also includes a first processor associated with the first buffer. The storage device also includes a second processor associated with the second buffer. The second processor is configured to perform a first operation on a second portion of the second data, and wherein the first processor is configured to perform the second operation on the first portion of the second data and the first data based on the separator.

A method comprising: at a first buffer, first data is received from a first storage source. The method further comprises the steps of: at a second buffer, second data is received from a second storage source, the second data including a first portion and a second portion separated by a separator. The method further comprises the steps of: at a second processor associated with a second buffer, performing a first operation on a second portion of the second data. The method further comprises the steps of: at a first processor associated with the first buffer, a second operation is performed on the first portion of the second data and the first data.

A storage device includes: the first storage channel includes a first media device storing first data. The storage device further includes: a second storage channel comprising a second media device storing second data. The second data includes a first portion and a second portion separated by a separator. The memory device also includes a first computing module associated with the first memory channel and including a first processor and a first input buffer. The memory device also includes a second computing module associated with the second memory channel and including a second processor and a second input buffer. The second processor is configured to perform the first operation on a second portion of the second data. The first processor is configured to perform a second operation on the first portion of the second data and the first data based on the separator.

Drawings

The above-mentioned and other aspects of the present technology will be better understood when the present application is read in view of the following drawings in which like reference numerals refer to similar or identical elements:

FIG. 1 is a block diagram of a system for computing parallel processing in a storage device.

Fig. 2 is a block diagram illustrating an example of a system in which a processor is configured to transfer data to another processor for processing.

FIG. 3 is a block diagram illustrating an example of a system in which a processor associated with a storage source sends data from the storage source to a processor associated with a group of storage sources for processing.

Fig. 4 is a block diagram illustrating an example of a system in which a processor reads through an associated input buffer in successive memory spaces based on the location of a separator (relimit).

FIG. 5 is a block diagram of a storage device supporting parallel processing of delimiter-based data.

Fig. 6 is a diagram of a computing module.

FIG. 7 is a diagram of an apparatus that performs delimiter-based parallel computation in which computing modules operating in parallel pass data between each other.

Fig. 8 shows an example of an input buffer structure.

Fig. 9 is a diagram illustrating a process 900 for performing delimiter-based parallel processing.

FIG. 10 is a diagram illustrating an example of an apparatus that performs delimiter-based parallel computation, where a parallel-operated computation module passes data to one or more downstream processors based on delimiters.

Fig. 11 is a diagram showing an example of an apparatus that performs delimiter-based parallel computation, in which a computation module of parallel operation directly accesses input buffers of other computation modules based on the position of delimiters.

Fig. 12 is a flow chart of a method for separator-based parallel processing.

While the technology is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described. The drawings may not be to scale. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the technology to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present technology as defined by the appended claims.

Detailed Description

The details of one or more embodiments of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

Various embodiments of the present disclosure now will be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments are shown. Indeed, the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. The term "or" is used herein in both the alternative and the connected sense, unless indicated otherwise. The terms "illustrative" and "example" are used for illustration and do not indicate a quality level. Like numbers refer to like elements throughout. The arrows in each figure depict bi-directional data flow and/or bi-directional data flow capabilities. The terms "path," "pathway," and "route" are used interchangeably herein.

Embodiments of the present disclosure may be implemented in various ways, including as a computer program product containing an article of manufacture. The computer program product may include a non-transitory computer-readable storage medium storing an application, a program component, a script, a source code, a program code, an object code, a byte code, a compiled code, an interpreted code, a machine code, an executable instruction, etc. (also referred to herein as executable instructions, instructions for execution, a computer program product, a program code, and/or similar terms are used interchangeably herein). Such non-transitory computer-readable storage media include all computer-readable media (including volatile and nonvolatile media).

In one embodiment, the non-volatile computer-readable storage medium may include a floppy disk, a flexible disk, a hard disk, a solid state storage device (SSS) (e.g., a Solid State Drive (SSD), a Solid State Card (SSC), a Solid State Module (SSM)), an enterprise flash drive, a tape, or any other non-transitory magnetic medium, etc. The non-volatile computer-readable storage medium may also include punch cards, paper tape, optical marking sheets (or any other physical medium having a hole pattern or other optically identifiable marking), compact disc read-only memory (CD-ROM), compact disc rewriteable (CD-RW), digital Versatile Discs (DVD), blu-ray discs (BD), any other non-transitory optical medium, etc. Such non-volatile computer-readable storage media may also include read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory (e.g., serial, NAND, NOR, etc.), multimedia Memory Cards (MMC), secure Digital (SD) memory cards, smart media cards, compact Flash (CF) cards, memory sticks, and the like. In addition, the non-volatile computer-readable storage medium may also include Conductive Bridge Random Access Memory (CBRAM), phase change random access memory (PRAM), ferroelectric random access memory (FeRAM), non-volatile random access memory (NVRAM), magnetoresistive Random Access Memory (MRAM), resistive Random Access Memory (RRAM), silicon-oxide-nitride-oxide-silicon memory (SONOS), floating junction gate random access memory (FJG RAM), millipede (Millipede) memory, racetrack memory, and the like.

In one embodiment, the volatile computer-readable storage medium may include Random Access Memory (RAM), dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), fast page mode dynamic random access memory (FPM DRAM), extended data output dynamic random access memory (EDO DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), second generation double data rate synchronous dynamic random access memory (DDR 2 SDRAM), third generation double data rate synchronous dynamic random access memory (DDR 3 SDRAM), lanbas (Rambus) dynamic random access memory (RDRAM), double transistor RAM (TTRAM), thyristor RAM (T-RAM), zero capacitor (Z-RAM), lanbas direct memory component (RIMM), dual inline memory component (DIMM), single inline memory component (SIMM), video Random Access Memory (VRAM), cache memory (including various levels), flash memory, register memory, and the like. It will be appreciated that where embodiments are described as using a computer-readable storage medium, other types of computer-readable storage media may be substituted or used in addition to the computer-readable storage media described above.

It should be appreciated that the various embodiments of the present disclosure may also be implemented as a method, apparatus, system, computing device, computing entity, or the like. As such, embodiments of the present disclosure may take the form of an apparatus, system, computing device, computing entity, or the like that executes instructions stored on a computer-readable storage medium to perform particular steps or operations. Accordingly, embodiments of the present disclosure may also take the form of an entirely hardware embodiment, an entirely computer program product embodiment and/or an embodiment containing a combination of computer program products and hardware performing certain steps or operations.

Embodiments of the present disclosure are described below with reference to block diagrams and flowchart illustrations. Accordingly, it should be understood that each block of the block diagrams and flowchart illustrations can be implemented in the form of computer program products, entirely hardware embodiments, combinations of hardware and computer program products, and/or devices, systems, computing devices, computing entities, etc. that execute instructions, operations, steps and similar words (e.g., executable instructions, instructions for execution, program code, etc.) on a computer readable storage medium for execution. For example, the fetching, loading, and executing of code may be performed sequentially such that one instruction is fetched, loaded, and executed once. In some example embodiments, fetching, loading, and/or executing may be performed in parallel such that multiple instructions are fetched, loaded, and/or executed together. Thus, such embodiments may result in a specially configured machine performing the steps or operations specified in the block diagrams and flowchart illustrations. Accordingly, the block diagrams and flowchart illustrations support various combinations of embodiments for performing the specified instructions, operations or steps.

As used herein, computing Storage (CSD) refers to storage that supports computing tasks. For example, a CSD may include a storage element (e.g., a non-volatile memory (such as flash memory, hard drive, etc.) and a computing element (e.g., a Central Processing Unit (CPU), a Graphics Processor (GPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC) (such as a tensor processing unit), a processor core, etc.), and be configured to support storage of data at the computing element and execution of computing tasks at the computing element. Thus, the CSD may provide storage capabilities to a host device (e.g., a computing device), and may support offloading of computing tasks from the host device to the CSD device.

In some examples according to the disclosure, a Computing Storage Device (CSD) includes more than one computing engine and more than one storage source. Examples of storage sources include storage media (e.g., flash memory chips (such as NAND flash memory chips), flash memory media packages, resistive random access memory devices, hard disk devices, etc.), storage channels (e.g., NAND flash memory channels, etc.), other groupings of storage media, and the like. The compute engine receives data from the storage source and performs computations on the data. Because host data units (e.g., files) operated on by a computation may be split across more than one storage source, the computation engine operates on the data based on the location of a separator (relimit) that indicates boundaries between host data units in the data. In particular, the compute engine may begin performing computations on data following the first instance of the delimiter in the buffer of the compute engine. The data preceding the first delimiter may be combined with the data from the input buffers of the previous compute engine and processed elsewhere. Similarly, the compute engine may detect a final instance of the separator in an input buffer associated with the compute engine and stop the computation at the final instance until additional data is available. In some examples, data following the final example of the separator may be forwarded to a further input buffer for processing by a further calculation engine.

The disclosed separator aware systems and methods may provide parallel computation in a CSD despite misalignment (misalignments) between host data units and CSD data units. These systems and methods may be particularly useful in RAID configurations where data is striped across multiple storage sources. Furthermore, the disclosure may be extended to systems that include host data stored across more than one CSD.

Referring to FIG. 1, a block diagram of a system 100 for computing parallel processing in a storage device is shown. The system 100 includes a first processor 102, a second processor 104, a first buffer 106, a second buffer 108, a first storage source 110, and a second storage source 112. The first processor 102 may include a central processor, a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a Graphics Processor (GPU), another type of processor, or any combination thereof. The second processor 104 may similarly include any type of processor described with respect to the first processor 102. In some implementations, the first processor 102, the second processor 104, or both may comprise a portion of a processor device. For example, the first processor 102 and the second processor 104 may each correspond to one or more processor cores.

The first buffer 106 may include a memory device, such as a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), another type of memory, or a combination thereof. The second buffer 108 may similarly include any type of memory device described with respect to the first buffer 106. In some implementations, the first buffer 106 and the second buffer 108 correspond to different regions of the same memory device (or virtual memory device).

The first storage source 110 includes one or more storage devices, such as a flash chip (e.g., NAND flash), a flash package, a flash channel (e.g., NAND flash channel), a Hard Disk Device (HDD), a Resistive Random Access Memory (RRAM) device, etc. The second storage source 112 may similarly comprise any type of storage device described with respect to the first storage source 110.

The first processor 102 is associated with a first buffer 106. In some implementations, the first buffer 106 is included within the first processor 102, or the first processor 102 and the first buffer 106 are included in a common computing module. In other implementations, the first processor 102 and the first buffer 106 are different components of the system 100, and the first processor 102 is configured to utilize the first buffer 106 as an input buffer.

The second processor 104 is associated with a second buffer 108. In some implementations, the second buffer 108 is included within the second processor 104, or the second processor 104 and the second buffer 108 are included in a common computing module. In other implementations, the second processor 104 and the second buffer 108 are different components of the system 100, and the second processor 104 is configured to utilize the second buffer 108 as an input buffer.

The first storage source 110 and the second storage source 112 may store host data. Because host data units (e.g., files) may not be aligned with storage source data units (e.g., blocks (chunks)), host data units may be split across the first storage source 110 and the second storage source 112. Individual host data units may be separated by a separator. Thus, data that occurs between 2 consecutive delimiters may correspond to a single host data unit. The computations to be performed by the first processor 102 and the second processor 104 may be targeted to the entire host data unit.

In the illustrated example, the first storage source 110 stores first data (e.g., first host data) including a first delimiter 113 and a first portion 114. The second storage source 112 stores second data (e.g., second host data) including a second portion 116, a second separator 118, a third portion 120, and a third separator 121. The first portion 114 and the second portion 116 may comprise a single host data unit. Thus, successful computation based on a single data unit may be based on a processor having access to both the first portion 114 and the second portion 116.

In operation, the first separator 113 and the first portion 114 are transferred (e.g., copied) from the first storage source 110 into the first buffer 106, and the second portion 116, the second separator 118, the third portion 120, and the third separator 121 are transferred (e.g., copied) from the second storage source 112 to the second buffer 108. For example, the transfer may correspond to a Direct Memory Access (DMA) transfer or some other type of transfer. The transmission of the first delimiter 113 and the first portion 114 may be initiated by the first processor 102. The transmission of the second portion 116, the second separator 118, the third portion 120, and the third separator 121 of the second data may be initiated by the second processor 104. The first delimiter 113, the first portion 114, the second portion 116, the second delimiter 118, the third portion 120, and the third delimiter 121 may be transmitted in response to a request (e.g., from a host device) to perform one or more computations based on host data stored in the first storage source 110 and the second storage source 112. As described above, the computation may target the entire host data unit.

The first processor 102 is configured to identify the location of the separator in the data transferred into the first buffer 106. These separators may separate the host data units from each other. Similarly, the second processor 104 is configured to identify the location of the separator in the data transferred into the second buffer 108. The first processor 102 determines which data in the first buffer 106 is to be processed based on the location of the delimiter. Similarly, the second processor 104 determines which data in the second buffer 108 is to be processed based on the location of the delimiter in the second buffer 108. Further, the first processor 102 and/or the second processor 104 may perform computations on data from different input buffers (e.g., buffers associated with different processors) based on the locations of separators in the data of the different input buffers.

For example, the second processor 104 (or the first processor 102) may be configured to ignore data that occurs before the first delimiter in the second buffer 108 (or the first buffer 106). Such data may be processed by different processors. To illustrate, data that occurs before the first delimiter may correspond to an incomplete host data unit. Thus, a processor (e.g., first processor 102) having the remainder of the host data unit may process this data. The first processor 102 may similarly process data that occurs before the first delimiter in the first buffer.

Further, the second processor 104 may ignore data that occurs after the last delimiter in the second buffer 108 (or the first buffer 106). This data after the last separator may be processed by the second processor 104 in combination with additional data from further input buffers or may be processed by a different processor in combination with additional data. For example, the second processor 104 may wait until additional data is transferred into the second buffer 108 to complete the host data unit after the last delimiter. The first processor 102 may process the data after the last separator in the first buffer 106.

The data that appears between the first separator and the last separator in the second buffer 108 may be processed by the second processor 104. Similarly, data occurring between the first delimiter and the last delimiter in the first buffer 106 may be processed by the first processor 102. By selectively performing computations based on the data in the second buffer 108 according to the locations of the first and last separators, the second processor 104 may perform computations on the entire host data unit.

In the example shown, the second processor 104 identifies the location of the second separator 118 and the location of the third separator 121 in the second buffer 108. Based on the location of the second separator 118 (e.g., the first separator in the second buffer 108), the second processor 104 restricts (refrain) processing of the second portion 116. "restricting" the processing of the second portion 116 may include starting to perform calculations on data stored in the buffer at a location after the second separator 118, transferring (e.g., copying) the second portion 116 to an input buffer of another processor, or a combination thereof.

As indicated by arrow 126, the second processor 104 performs a calculation on the third portion 120 based on the location of the second separator 118 (e.g., the second separator 118) and the location of the third separator 121. For example, because the third portion 120 falls between the first separator and the last separator in the second buffer 108, the second processor 104 may perform a calculation on the third portion 120. On the other hand, as indicated by arrows 122 and 124, first processor 102 performs a calculation based on the combination of first portion 114 and second portion 116 based on the locations of first separator 113 and second separator 118. In some implementations, the second processor 104 transfers the second portion 116 into the first buffer 106. In other implementations, the first buffer 106 and the second buffer 108 are arranged in contiguous memory space, and the first processor 102 continues to perform computations based on the boundaries of the first buffer 106 until the second separator 118 is reached.

Thus, in the illustrated example, although the first portion 114 and the second portion 116 originate from a remote storage source and are initially copied into input buffers of different processors, host data units comprising the first portion 114 and the second portion 116 may be processed by the same processor (e.g., the first processor 102). Thus, system 100 may provide parallel processing of host data despite host data units being misaligned with data units of storage system 100 and falling on several storage sources across system 100. Thus, the system 100 may be suitable for various embodiments in which host data may be broken down across several storage sources (such as a RAID system).

The system 100 may have an alternative configuration to that shown in fig. 1. For example, the system 100 may include a different number of processors (and corresponding buffers). Furthermore, while the first buffer 106 and the second buffer 108 are shown as distinct components, they may be part of a contiguous memory space. Further, the system 100 may include a different number of storage sources. Further, components other than those shown may be included in the system 100. It should also be noted that the components of the system 100 may be virtual components provided by the system 100 (e.g., by a processor executing a hypervisor or other emulation software). Further, while FIG. 1 illustrates a 1-to-1 correspondence between processors and storage sources, it should be noted that in some embodiments, a processor may be configured to receive and process data from more than one storage source. In some of these examples, a processor may be associated with each memory source (e.g., process data from each memory source), and then additional processors may be associated with groups of memory sources. For example, each flash memory chip may have an associated processor, and each flash channel (including several flash memory chips) may have an associated processor.

Fig. 2 is a block diagram illustrating an example of a system 100 in which a processor is configured to transfer data to another processor for processing. In the illustrated example, the second processor 104 communicates (e.g., replicates) the second portion 116 to the first buffer 106 in response to the second portion 116 being located in front of the first separator (e.g., the second separator 118) in the second buffer 108, as indicated by arrow 222. The second processor 104 may utilize a DMA operation to transfer the second portion 116. In some implementations, the second processor 104 also communicates the second separator 118 to the first buffer 106. "transfer second portion 116 to first buffer 106" combines host data units comprising first portion 114 and second portion 116. Thus, the first processor 102 may perform computations on host data units that include the first portion 114 and the second portion 116.

Fig. 3 is a block diagram illustrating an example of a system 100 in which a processor associated with a storage source sends data from the storage source to a processor associated with a group of storage sources for processing. In the example shown, the third processor 302 is directly associated with the first storage source 110 and the first processor 102 is associated with a group comprising the first storage source 110 and the second storage source 112. For example, the third processor 302 may be configured to process data from a first flash chip corresponding to the first storage source 110, the second processor 104 may be configured to process data from a second flash chip corresponding to the second storage source 112, and the first processor 102 may be configured to process data from a flash channel including the first storage source 110 and the second storage source 112.

In the illustrated example, the fourth delimiter 308, the fourth portion 306 of data, the first delimiter 113, and the first portion 114 are transferred from the first storage source 110 to the third buffer 304 of the third processor 302 (e.g., via DMA initiated by the third processor 302). The third processor 302 processes the fourth portion 306 based on the fourth portion 306 falling between the first separator (e.g., the fourth separator 308) in the third buffer 304 and the last separator (e.g., the first separator 113) in the third buffer 304. As indicated by arrow 322, third processor 302 initiates the transfer of first portion 114 to first buffer 106 based on the last separator first portion 114 is located in third buffer 304. In some examples, the third processor 302 communicates the first separator 113 in addition to the first portion 114. Further, as indicated by arrow 324, the second processor 104 initiates the transfer of the second portion 116 to the first buffer 106 based on the second portion 116 being located before the first delimiter (e.g., the second delimiter 118) in the second buffer. In some examples, the second processor 104 communicates a second separator 118 in addition to the second portion 116. The first processor 102 may perform calculations on the entire host data unit including the first portion 114 and the second portion 116.

Fig. 4 is a block diagram illustrating an example of a system 100 in which a processor reads across an associated input buffer in consecutive memory spaces based on the location of a separator. In the example of fig. 4, the first buffer 106 and the second buffer 108 are part of a common buffer space 402. Buffer space 402 may correspond to a single physical memory device or to virtual memory space supported by several memory devices. The first buffer 106 corresponds to an input buffer for the first processor 102 and the second buffer 108 corresponds to an input buffer for the second processor 104. However, the first processor 102 (and the second processor 104) may ignore data in the associated input buffer and/or process data in the input buffer of the other processor based on the location of the separator. Ignoring the data may include starting processing at a location subsequent to the data.

In the illustrated example, the first processor 102 begins performing computations on the data in the first buffer after a first delimiter (e.g., first delimiter 113) in the first buffer 106, as indicated by arrow 422. In response to detecting data following the last separator (e.g., first separator 113) in first buffer 106, first processor 102 also continues processing until a first separator (e.g., second separator 118) in the next input buffer (e.g., second buffer 108) is reached, as indicated by arrow 423. Thus, while the first portion 114 and the second portion originate from different storage sources and are input to different input buffers associated with different processors, the first processor 102 may perform one or more computations based on the entire host data unit including the first portion 114 and the second portion 116. Further, as indicated by arrow 426, the second processor 104 may begin performing computations on data in the second buffer 108 that occurs after the first delimiter (e.g., the second delimiter 118) in the second buffer 108. As indicated by arrow 427, the second processor 104 continues until the final separator (e.g., third separator 121) in the second buffer 108 is reached. In an example where additional data follows the final separator in the second buffer 108, the second processor 104 may continue to perform calculations on the data in the next buffer (not shown).

As shown, the data between the host data unit delimiters is processed by a single processor. Thus, FIG. 4 shows another example where the system may perform parallel processing of data despite misalignment between host data units and storage system data units.

Referring to FIG. 5, an example of a storage device 500 supporting parallel processing of data is shown. Storage 500 is a computing storage and may include a solid State Storage (SSD), a hard disk device, another type of storage, or a combination thereof. In the illustrated example, the storage 500 includes a storage controller 504 and a storage medium. The storage medium includes a first channel 534 (e.g., a flash memory channel or other type of storage channel), the first channel 534 including a media device (e.g., a storage media device) 534a, a media device 534b, a media device 534c, and a media device 534d. The storage medium further includes a second channel 536 (e.g., a flash memory channel or other type of storage channel), the second channel 536 including a media device 536a, a media device 536b, a media device 536c, and a media device 536d. The storage medium also includes a third channel 538 (e.g., a flash channel or other type of storage channel), the third channel 538 including media devices 538a, 538b, 538c, and 538d. The storage medium further includes a fourth channel 540 (e.g., a flash memory channel or other type of storage channel), the fourth channel 540 including a media device 540a, a media device 540b, a media device 540c, and a media device 540d. Media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d may include NAND flash memory chips, hard disk media, other types of storage media, or combinations thereof.

The first storage source 110 and the second storage source 112 of fig. 1-4 may correspond to different elements of a group including media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d, one of the channels 534, 536, 538, 540, the group of channels 534, 536, 538, 540, or some other aspect of the storage medium of the storage device 500. In some examples, the storage medium has a different configuration than that shown. For example, the storage device 500 may include more or fewer channels, more or fewer media devices per channel, more or fewer channels per channel group, or a combination thereof. Further, it should be noted that in some embodiments, the elements of storage device 500 are virtual.

The storage 500 also includes storage memory 512. The storage device memory 512 may include a memory device (such as a DRAM device, an SRAM device, another type of memory device, or a combination thereof). In some examples, storage device memory 512 includes more than one device.

The storage device controller 504 includes a host interface controller 506, a controller memory 505, a storage device computing module 508, a memory controller 510, a first storage medium controller 514, and a second storage medium controller 522. The host interface controller 506 may include hardware components, firmware, software, or a combination thereof configured to provide an interface to a host device. In some implementations, the interface includes a non-volatile memory express (NVMe) interface, a express computing link (CXL) interface, or another type of storage interface.

The controller memory 505 may include a memory device (such as a DRAM device, an SRAM device, another type of memory device, or a combination thereof). In some examples, the controller memory 505 includes more than one device. In some examples, the controller memory 505 and the storage device memory 512 are different types of memory having different characteristics (e.g., latency, capacity, etc.). For illustration, the controller memory 505 may comprise SRAM and the storage device memory 512 comprises DRAM.

The storage device controller 504 also includes a storage device calculation module 508. The storage device controller 504 is configured to perform calculations based on data stored in a storage medium of the storage device 500. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the storage device calculation module 508 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be separate components included in the storage computing module 508 or may include a range of memory included in the controller memory 505 and/or the storage memory 512.

The first storage medium controller 514 includes a flash memory controller or other type of storage controller. In various examples, the first storage media controller 514 includes hardware, firmware, software, or a combination thereof configured to control access to the first channel 534 (and associated media devices 534a-534 d) and the second channel 536 (and associated media devices 536a-536 d). Controlling access may include performing translation of memory addresses to and from address spaces used by media devices 534a-534d, 536a-536d, as well as initiating data transfers to and from media devices 534a-534d, 536a-536 d.

Similarly, the second storage medium controller 522 includes a flash memory controller or other type of storage controller. In various examples, second storage medium controller 522 includes hardware, firmware, software, or a combination thereof configured to control access to third channel 538 (and associated media devices 538a-538 d) and fourth channel 540 (and associated media devices 540a-540 d). Controlling access may include performing a translation of memory addresses into and from address spaces used by media devices 538a-538d, 540a-540d, as well as initiating a transfer of data to and from media devices 538a-538d, 540a-540 d.

The first storage medium controller 514 also includes a first medium core calculation module 516. The first media core calculation module 516 is configured to perform calculations based on data stored in the media devices 534a-534d, 536a-536d connected to the first storage media controller 514. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the first media core calculation module 516 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be different components included in the first media core computing module 516 or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

The first storage medium controller 514 also includes a first channel calculation module 518. The first channel calculation module 518 is configured to perform calculations based on data stored in the media devices 534a-534d of the first channel 534. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the first channel computation module 518 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be different components included in the first channel calculation module 518 or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

The first storage media controller 514 also includes a second channel calculation module 520, the second channel calculation module 520 being configured to perform calculations based on data stored in the media devices 536a-536d of the second channel 536. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the second channel computation module 520 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be different components included in the second channel calculation module 520 or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

The second storage medium controller 522 also includes a second medium core computing module 524. The second media core computing module 524 is configured to perform computations based on data stored in the media devices 538a-538d, 540a-540d connected to the second storage media controller 522. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. The second media core computing module 524 may include an input buffer, an output buffer, and a processor, as further described herein. The input buffer and the output buffer may be different components included in the second media core computing module 524 or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

The second storage medium controller 522 also includes a third channel computation module 526. The third channel calculation module 526 is configured to perform calculations based on data stored in the media devices 538a-538d of the third channel 538. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the third channel computation module 526 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be different components included in the third channel computation module 526, or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

The second storage media controller 522 also includes a fourth channel calculation module 528, the fourth channel calculation module 528 being configured to perform calculations based on data stored in the media devices 540a-540d of the fourth channel 540. Such calculations may include filtering operations, mathematical operations, searching operations, and the like. As further described herein, the fourth computing module 528 may include an input buffer, an output buffer, and a processor. The input buffer and the output buffer may be different components included in the fourth channel computing module 528 or may include a range of memory included in the controller memory 505 and/or the storage device memory 512.

It should be noted that in some implementations, the computing modules 516, 518, 520, 524, 526, 528 may be external to the storage media controllers 514, 522. Further, the storage 500 may include a different number of channel calculation modules and/or medium core calculation modules than shown. Although not shown, one or more of media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d may have associated computing modules. These computing modules may have a similar structure to that of the channel computing modules 518, 520, 526, 528 and the media core computing modules 516, 524, and each may be configured to perform computations on data from a respective media device. In some implementations, the first processor 102 and the first buffer 106 correspond to one of the computing modules, channel computing modules 518, 520, 526, 528, and media core computing modules 516, 524 of the media devices 534a-534d, 536a-536d, 538a-538d, 540a-540 d. The second processor 104 and the second buffer 108 may correspond to different ones of the computing modules, channel computing modules 518, 520, 526, 528, and media core computing modules 516, 524 of the media devices 534a-534d, 536a-536d, 538a-538d, 540a-540 d. The third processor 302 and the third buffer 304 may correspond to one of the channel computation modules 518, 520, 526, 528, the media core computation modules 516, 524, and the storage device computation module 508.

Each of the first buffer 106, the second buffer 108, and the third buffer 304 may be implemented in the controller memory 505, in the storage device memory 512, in a memory of a computing module corresponding to a media device, in a memory of the first channel computing module 518, in a memory of the second channel computing module 520, in a memory of the third channel computing module 526, in a memory of the fourth channel computing module 528, in a memory of the first media core computing module 516, in a memory of the second media core computing module 524, in a memory of the storage device computing module 508, in a different component of the storage device 500, or a combination thereof.

In response to a host command (or other trigger), the storage device controller 504 may initiate transfer of data from one or more of the media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d to one or more input buffers of the computing module. The computing modules may include computing modules associated with media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d, channel computing modules 518, 520, 526, 528, media core computing modules 516, 524, storage computing module 508, or a combination thereof. The computing module that receives the data performs one or more computations on the data to generate a result. However, because individual host data units may be split across the media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d, the channel computation modules 518, 520, 526, 528, individual host data units may be split across the input buffers. To process the entire host data unit, the calculation module performs the calculation based on the location of the delimiter in the input buffer, as described herein. For example, based on the location of the separator in the data, the computing module may transfer the data to other input buffers and/or may access input buffers associated with other computing modules. Further, in some implementations, the computing module of the storage device 500 is configured to operate on data as it flows (e.g., is read) into the corresponding input buffers. Based on the location of the delimiter in the input buffer, the calculation module may delay performing the calculation until the entire host data unit is available to the calculation module. Thus, a single host data unit may be processed by a single processor.

Referring to fig. 6, an example of a computing module 600 is shown. The computing modules of the media devices 534a-534d, 536a-536d, 538a-538d, 540a-540d, the first media core computing module 516, the first channel computing module 518, the second channel computing module 520, the second media core computing module 524, the third channel computing module 526, the fourth channel computing module 528, and the storage device computing module 508 may be similar to the computing module 600. The computing module 600 includes an output buffer 604, a processor 606, and an input buffer 608. The output buffer 604 may correspond to a memory device, such as a DRAM device or an SRAM device. Similarly, the input buffer 608 may correspond to a memory device (such as a DRAM or SRAM device). In some implementations, the output buffer 604 and the input buffer 608 correspond to regions of memory (e.g., physical or virtual memory space). Although shown as components of computing module 600, it should be noted that output buffer 604 and input buffer 608 may be implemented as areas of memory external to computing module 600 (such as storage device memory 512, controller memory 505, or a combination thereof). The input buffer 608 may correspond to the first buffer 106, the second buffer 108, or the third buffer 304.

The processor 606 may correspond to the first processor 102, the second processor 104, or the third processor 302. The processor 606 may include FPGA, CPU, GPU, ASIC, another type of processor, or a combination thereof. The processor 606 is configured to perform calculations based on data in the input buffer 608 and output the results of the calculations to the output buffer 604. As described above and below, the processor 606 may perform calculations based on the location of the delimiters in the input buffer 608. Based on these separators, the processor 606 may transfer (e.g., copy) the data to another input buffer for processing by the other processor, process the data external to the input buffer 608, or a combination thereof, to hold the host data units together. Further, the processor 606 may perform computations based on the delimiter delays in the input buffer 608.

The storage devices (e.g., storage device 500) and storage systems (e.g., system 100) disclosed herein perform delimiter-based processing on data stored in a storage medium. In some implementations, computing modules operating in parallel pass data between each other based on separators. In some implementations, the computation module passes the data to additional computation modules downstream in the parallel computation hierarchy based on the delimiter. In some implementations, the computing module reads across the boundary of the associated input buffer into the input buffer of another computing module based on the separator. Aspects of these embodiments may be combined.

Fig. 7 is a diagram illustrating an example of an apparatus 700 that performs delimiter-based parallel computation, wherein computing modules operating in parallel pass data between each other. The device 700 may correspond to the memory device 500 of fig. 5. Fig. 7 depicts a first processor 718, a second processor 720, a third processor 722, and a fourth processor 724. The first processor 718 has a corresponding first input buffer 710, the second processor 720 has a corresponding second input buffer 712, the third processor 722 has a corresponding third input buffer 714, and the fourth processor 724 has a corresponding fourth input buffer 716. Further, the first processor 718 has a corresponding first output buffer 758, the second processor 720 has a corresponding second output buffer 760, the third processor has a corresponding third output buffer 762, and the fourth processor has a corresponding fourth output buffer 764. Each processor and corresponding input and output buffers may be a computing module similar to computing module 600.

The first input buffer 710 is configured to receive data from the first storage source 702, the second input buffer 712 is configured to receive data from the second storage source 704, the third input buffer 714 is configured to receive data from the third storage source 706, and the fourth input buffer 716 is configured to receive data from the fourth storage source 708. In the example shown, the storage source corresponds to a storage channel. Thus, the first storage source 702 may correspond to the first channel 534, the second storage source 704 may correspond to the second channel 536, the third storage source 706 may correspond to the third channel 538, and the fourth storage source 708 may correspond to the fourth channel 540. Further, the first input buffer 710, the first processor 718, and the first output buffer 758 may correspond to the first channel calculation module 518; the second input buffer 712, the second processor 720, and the second output buffer 760 may correspond to the second channel calculation module 520; the third input buffer 714, the third processor 722, and the third output buffer 762 may correspond to the third channel computing module 526; and the fourth input buffer 716, the fourth processor 724, and the fourth output buffer 764 may correspond to the fourth computing module 528.

The apparatus 700 also includes an aggregate output buffer 770. The aggregate output buffer 770 may correspond to an output buffer of a computing module downstream of the processors 718, 720, 722, 724. As used herein, "downstream" indicates a direction of output toward the processing pipeline. For example, the aggregate output buffer 770 may correspond to an output buffer of the storage device computing module 508 downstream of the channel computing modules 518, 520, 526, 528.

In operation, data is copied from the first storage source 702 to the first input buffer 710, data is copied from the second storage source 704 to the second input buffer 712, data is copied from the third storage source 706 to the third input buffer 714, and data is copied from the fourth storage source 708 to the fourth input buffer 716. The data may be copied into the input buffers 710, 712, 714, 716 using a DMA transfer or other type of memory transfer. In some implementations, the data copy operation is initiated in response to a command to perform an operation on the data (e.g., a search operation, a filter operation, a mathematical operation, etc.). Such commands may be received from a host device (e.g., through host interface controller 506). The data copy operation may be initiated by the processor 718, 720, 722, 724, by an upstream processor, or by another component of the storage controller (e.g., the storage controller 504).

The data is copied into the input buffers 710, 712, 714, 716 in sequence such that the entire host data unit is defined by two adjacent delimiters. Furthermore, the data copied into the last position of the input buffer is continued in order of "data copied into the first position of the next input buffer". Thus, data received by the first input buffer 710 may be followed in sequence by data received by the second input buffer 712. Similarly, data received by the second input buffer 712 may be followed in sequence by data received by the third input buffer 714, and data received by the third input buffer 714 may be followed in sequence by data received by the fourth input buffer 716. As the computations are performed by the processors 718, 720, 722, 724, data may flow into the input buffers 710, 712, 714, 716.

The first input buffer 710 (e.g., a first input buffer in a parallel processing pipeline stage) may include additional space at the beginning and end of the input buffer. The data copied from the first storage source 702 may be placed between the additional space at the beginning and the additional space at the end. The additional space may correspond to a host data unit size (e.g., file size), to a device data unit size (e.g., block), to another unit, or to a multiple of any of these data unit sizes.

The fourth input buffer 716 (e.g., the last input buffer in a parallel processing pipeline stage) may not include additional buffer space. The second input buffer 712 and the third input buffer 714 (e.g., the input buffer between the first input buffer and the last input buffer in the parallel processing stage) may each include additional space at the end of the input buffers 712, 714.

As data is streamed into the first input buffer 710, the first processor 718 is configured to track the location of the first separator in the first input buffer 710 and the location of the last separator in the first input buffer 710. Similarly, the second processor 720 tracks the locations of the first and last separators in the second input buffer 712, the third processor 722 tracks the locations of the first and last separators in the third input buffer 714, and the fourth processor 724 tracks the locations of the first and last separators in the fourth input buffer 716. The processors 718, 720, 722, 724 may perform computations on data falling between the first and last separators in the respective input buffers.

As part of the forwarding process, the second processor 720, the third processor 722, and the fourth processor 724 (e.g., processors subsequent to the first processor 718 in the parallel processing stage) are configured to transfer data in the respective input buffers that occur prior to the first delimiter to the previous input buffers. In fig. 7, the second processor 720 initiates the transfer of data that occurs before the first delimiter in the second input buffer 712 into additional space at the end of the first input buffer 710. Similarly, the third processor 722 initiates the transfer of data that occurs before the first delimiter in the third input buffer into the additional space at the end of the second input buffer 712, and the fourth processor 724 initiates the transfer of data that occurs before the first delimiter in the fourth input buffer 716 into the additional space at the end of the third input buffer 714. The first processor 718 may perform computations on the data transferred into the first input buffer 710 along with the data following the last separator in the first input buffer 710. Further, the second processor 720 may perform calculations on the data transferred into the second input buffer 712 by the third processor 722 along with the data after the last separator in the second input buffer 712. Further, the third processor 722 may perform calculations on the data transferred into the third input buffer 714 by the fourth processor 724 along with the data after the last separator in the third input buffer 714. Thus, prior to processing, a portion of the host data unit in the input buffer that falls before the first delimiter may be recombined with the remainder of the host data unit in the previous input buffer. In some implementations, the processors 720, 722, 724, in addition to transmitting data that occurs before the first delimiter, also transmit the first delimiter to a previous input buffer.

As part of the forward (carryover) process, the fourth processor 724 (e.g., the final processor in the parallel processing stage) also transfers data that occurs after the final delimiter in the fourth input buffer 716 to additional space at the beginning of the first input buffer 710 (e.g., the first input buffer in the parallel processing stage). The first processor 718 may perform computations on data transferred into the first input buffer 710 by the fourth processor 724 and data that occurs prior to the first delimiter of the first input buffer 710. Thus, host data units split between the end of the fourth input buffer 716 and the beginning of the first input buffer 710 may be recombined prior to processing. The final processor may also transmit the final delimiter with the data.

Each of the processors 718, 720, 722, 724 in the apparatus 700 includes four cores. Cores other than the number shown may be included. Each core may include hardware and/or executable software configured to perform computations (e.g., search functions, filtering functions, mathematical operations, etc.) on data to generate output. The first processor 718 includes a first core 726, a second core 728, a third core 730, and a fourth core 732. The second processor 720 includes a fifth core 734, a sixth core 736, a seventh core 738, and an eighth core 740. The third processor 722 includes a ninth core 742, a tenth core 744, an eleventh core 746, and a twelfth core 748. The fourth processor 724 includes a thirteenth core 750, a fourteenth core 752, a fifteenth core 754, and a sixteenth core 756. Each of the cores 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756 may perform computations in parallel to generate an output. The kernels 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756 may be configured to perform computations based on the locations of the separators. For example, the kernels 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756 may begin computation at the data immediately after the delimiter. Less than all of the cores 726, 728, 730, 732, 734, 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756 may be used in a particular execution cycle.

The output of the computations performed by the first processor 718 is placed into a first output buffer 758. The output of the computations performed by the second processor 720 are placed into a second output buffer 760. The output of the computations performed by the third processor 722 is placed into a third output buffer 762. The output of the computation performed by the fourth processor 724 is placed into a fourth output buffer 764.

The outputs in the output buffers 758, 760, 762, 764 are aggregated into an aggregated output buffer 770 (e.g., by a downstream processor (such as a processor in the storage device computing module 508)) for output and/or additional processing.

Thus, FIG. 7 illustrates an example in which the system performs a forward and forward operation based on the locations of the first and last separators in order to hold the host data units together for computation. Thus, parallel processing may be performed although the host data unit is split across multiple storage sources and processor input buffers.

Fig. 8 is a diagram 800 illustrating an example of an input buffer structure that may be used by the apparatus 700 depicted in fig. 7. Diagram 800 depicts first buffer 802 after replication from a storage source. The first buffer 802 corresponds to an input buffer of a first processing device in the parallel processing stage. The first buffer 802 may correspond to the first input buffer 710. As shown, the first buffer 802 includes a first additional buffer space 804 and a second additional buffer space 806. The first additional buffer space 804 and the second additional buffer space 806 may have a size based on storage medium data units, based on host data units, or based on some other data units. In some implementations, the first additional buffer space 804 corresponds to one third of the size of the first buffer 802, and the second additional buffer space 806 corresponds to one third of the size of the first buffer 802. Data from the storage source is copied between the additional buffer spaces 804, 806. The data transferred into the first buffer 802 may be in storage medium data units. An example storage medium data unit 810 is shown. The size of the storage medium data unit 810 may be different from the size of the host data unit utilized by the host application (e.g., application data unit). An example complete application data unit 812 (e.g., host data unit) is shown between the first application data separator 808 and the second application data separator 809. Because the storage medium data units and host data units may not be aligned, the host data units may be split across the storage sources. In the example shown, the incomplete host data unit is located before the first application data separator 808 and the incomplete host data unit is located after the second application data separator 809. Incomplete host data units that occur prior to the first application data separator 808 may be completed by a forward operation into the first additional buffer space 804 (e.g., as shown and described with reference to fig. 7). Similarly, host data units that occur after the second application data separator 809 may be completed by forwarding operations into the second additional buffer space 806 (e.g., as shown and described with reference to fig. 7).

Diagram 800 also depicts a second buffer 822 after data replication from the storage source. The second buffer 822 corresponds to an input buffer of a processing device between the first processing device and the last processing device in the parallel processing stage. The second buffer 822 may correspond to the second input buffer 712 or the third input buffer 714. As shown, the second buffer 822 includes additional buffer space 826. The additional buffer space 826 may have a size based on storage medium data units, based on host data units, or based on some other data units. In some implementations, the additional buffer space 826 is half of the total space of the second buffer 822. In some implementations, the total space of the second buffer 822 is two-thirds the size of the total space of the first buffer 802. Data from the storage source is copied into the second buffer 822 before the additional buffer space 826. The data transferred into the second buffer 822 may be in a storage medium data unit. An example storage medium data unit 830 is shown. The size of the storage medium data unit 830 may be different from the size of the host data unit utilized by the host application (e.g., application data unit). An example complete application data unit 832 (e.g., host data unit) is shown between the first application data separator 828 and the second application data separator 829. Because the storage medium data units and host data units may not be aligned, the host data units may be split across the storage sources. In the example shown, the incomplete host data unit is located before the first application data separator 828 and the incomplete host data unit is located after the second application data separator 829. Incomplete host data units that occur prior to the first application data separator 828 may be transferred into the additional buffer space of the previous input buffer by a forwarding operation (e.g., as shown and described with reference to fig. 7). The data units that occur after the second application data separator 829 may be completed by a forwarding operation from the additional input buffer into the additional buffer space 826 (e.g., as shown and described with reference to fig. 7).

The device 700 may have alternative configurations. For example, each of the input buffers 710, 712, 714, 716 may receive data from a different storage source (e.g., a single media device, a set of storage channels, etc.) than that shown. Further, the system may have more or fewer components than shown (e.g., storage sources, computing modules, etc.). Further, the processor may execute a different number of cores than shown.

Diagram 800 also depicts a third buffer 842 after replication of data from a storage source. The third buffer 842 corresponds to the input buffer of the last processing arrangement in the parallel processing stage. The third buffer 842 may correspond to the fourth input buffer 716. The size of the third buffer 842 may be one third of the total size of the first buffer 802 (e.g., because the third buffer lacks additional buffer space). Data from the storage source is copied into the third buffer 842. The data transferred into the third buffer 842 may be in storage medium data units. An example storage medium data unit 850 is shown. The size of the storage medium data unit 850 may be different from the size of the host data unit utilized by the host application (e.g., application data unit). An example complete application data unit 852 (e.g., host data unit) is shown between the first application data separator 848 and the second application data separator 849. Because the storage medium data units and host data units may not be aligned, the host data units may be split across the storage sources. In the example shown, the incomplete host data unit is located before the first application data separator 848 and the incomplete host data unit is located after the second application data separator 849. Incomplete host data units that occur prior to the first application data separator 848 may be transferred into additional buffer space of a previous input buffer by a forwarding operation (e.g., as shown and described with reference to fig. 7). The data units that occur after the second application data separator 849 are transferred into the additional buffer space of the first input buffer in the parallel processing stage (e.g., into the first additional buffer space 804) by a forward operation.

As shown, the first input buffer may maintain additional buffer space at the front end for a forward operation from the last input buffer. Furthermore, the input buffer preceding the last input buffer may maintain additional buffer space for forwarding operations. Thus, as described in FIG. 7, a processor performing delimiter-based parallel processing may use the forward and forward operations to reassemble the complete host data unit.

Referring to fig. 9, a diagram illustrating a process 900 for performing delimiter-based parallel processing is shown. Process 900 may be performed by a storage device or system including system 100 or storage device 500. The diagram depicts a first input buffer (also referred to as a first buffer) 902, a second input buffer (also referred to as a second buffer) 904, a third input buffer (also referred to as a third buffer) 906, and a fourth input buffer (also referred to as a fourth buffer) 908. The first input buffer 902 may correspond to the first input buffer 710, the second input buffer 904 may correspond to the second input buffer 712, the third input buffer 906 may correspond to the third input buffer 714, and the fourth input buffer 908 may correspond to the fourth input buffer 716.

Process 900 includes a first DMA operation 920, a first forwarding operation 922, a processing operation 924, a forward-to-forward operation 926, a second DMA operation 928, and a second forwarding operation 930.

In a first DMA operation 920, first valid data 932 (e.g., by first processor 718) is copied from a storage source (e.g., from first storage source 702) into first buffer 902. In addition, second valid data 934 is copied (e.g., by second processor 720) from the storage source (e.g., second storage source 704) into second buffer 904. In addition, third valid data 936 (e.g., by the third processor 722) is copied from the storage source (e.g., the third storage source 706) into the third buffer 906. Further, fourth valid data 938 (e.g., by fourth processor 724) is copied from the storage source (e.g., fourth storage source 708) into fourth buffer 908. As shown, the first buffer 902 has an additional buffer space (e.g., first additional buffer space 804) before and an additional buffer space (e.g., second additional buffer space 806) after the first valid data 932. In addition, the second buffer 904 and the third buffer 906 have additional buffer space after the second valid data 934 and the third valid data 936, respectively. During the first DMA operation 920, a processor associated with the buffers 902, 904, 906, 908 identifies the location of the delimiter within the valid data 932, 934, 936, 938 (e.g., in the buffers 902, 904, 906, 908). In particular, the processor may identify the locations of the first and last separators within each of the buffers 902, 904, 906, 908.

During a first forwarding operation 922, a processor associated with the second buffer 904 identifies first data 942 preceding a first separator in second valid data 934 and forwards the first data 942 to the first buffer 902 to form first modified valid data 940 when combined with valid data 932.

In addition, a processor associated with the third buffer 906 identifies the second data 946 preceding the first break in the third valid data 936 and forwards the second data 946 to the second buffer 904. Second data 946 is added and first data 942 is removed from second valid data 934, forming second modified valid data 944.

Further, a processor associated with the fourth buffer 908 identifies third data 950 preceding the first delimiter of the fourth valid data 938 and forwards the third data 950 to the third buffer 906. Third data 950 is added and second data 946 is removed from third valid data 936, forming third modified valid data 948. The third data 950 is removed from the fourth valid data 938, forming fourth modified valid data 952.

In a first processing operation 924, a processor associated with the first buffer performs computations based on the first modified valid data 940. The processor associated with the second buffer 904 performs computations based on the second modified valid data 944. The processor associated with the third buffer 906 performs calculations based on the third modified valid data 948. The processor associated with the fourth buffer 908 performs computations based on the fourth modified valid data 952 until a final separator in the fourth modified valid data 952.

In a forward operation 926, the processor associated with the fourth buffer 908 identifies the fourth data 954 following the last separator in the fourth modified valid data 952 and transfers (e.g., forward) the fourth data 954 to the first buffer 902 (e.g., into additional space in front of the first buffer 902).

In a second DMA operation 928, a processor associated with first buffer 902 copies data from a storage source into the first buffer to form fifth valid data 956 when combined with fourth data 954. A processor associated with the second buffer 904 copies data from the storage source into the second buffer 904 to form sixth valid data 958. The processor associated with the third buffer 906 copies the data from the storage source into the third buffer 906 to form seventh valid data 960. The processor associated with the fourth buffer 908 copies data from the storage source into the fourth buffer to form eighth valid data 962. During a second DMA operation 928, the processor identifies the locations of the first and last separators in the buffers 902, 904, 906, 908.

In a second forwarding operation 930, a processor associated with the second buffer 904 identifies fifth data 966 that occurs before the first delimiter in the sixth valid data 958 and forwards the fifth data 966 to form fifth modified valid data 964. The processor associated with the third buffer 906 identifies sixth data 970 that occurs before the first delimiter in the seventh valid data 960 and forwards the sixth data 970 to the second buffer 904. The sixth data 970 is added to the sixth valid data 958 and the fifth data 966 is subtracted to form the sixth modified valid data 968. The processor associated with the fourth buffer 908 may identify seventh data 974 preceding the first separator in the eighth valid data 962 and forward the seventh data 974 to the third buffer 906. Adding the seventh data 974 to the seventh valid data 960 and subtracting the sixth data 970 may form the seventh modified valid data 972. Subtracting the seventh data 974 from the eighth valid data 962 may form eighth modified valid data 976 in the fourth buffer 908.

The process 900 may continue with additional processing, forwarding, DMA, forwarding operations, etc., until the target amount of data is processed. The execution of the delimiter-based operations together with the forwarding operation and the forwarding operation may keep the host data units together in one buffer, regardless of how the host data units are split across the storage sources. Thus, process 900 may provide techniques for parallel processing in a system where host data units are not aligned with parallel processing pipelines.

FIG. 10 is a diagram illustrating an example of an apparatus 1000 that performs delimiter-based parallel computation, where a parallel-operated computation module passes data to one or more downstream processors based on delimiters. Device 1000 may correspond to storage device 500 or system 100.

The apparatus 1000 includes a first input buffer 1062, a second input buffer 1064, a third input buffer 1066, and a fourth input buffer 1068. The first processor 1002 is associated with a first input buffer 1062, the second processor 1004 is associated with a second input buffer 1064, the third processor 1006 is associated with a third input buffer 1066, and the fourth processor 1008 is associated with a fourth input buffer 1068. The first processor 1002 is associated with a first output buffer 1070, the second processor 1004 is associated with a second output buffer 1073, the third processor 1006 is associated with a third output buffer 1075, and the fourth processor 1008 is associated with a fourth output buffer 1077.

The apparatus 1000 also includes a first downstream input buffer 1011 associated with the first downstream processor 1010. The first downstream processor 1010 is also associated with a first downstream output buffer 1072. The apparatus 1000 also includes a second downstream input buffer 1013 associated with a second downstream processor 1012. The second downstream processor 1012 is also associated with a second downstream output buffer 1074. The apparatus 1000 also includes a third downstream input buffer 1015 associated with the third downstream processor 1014. The third downstream processor 1014 is also associated with a third downstream output buffer 1076. The apparatus 1000 also includes a fourth downstream input buffer 1017 associated with the fourth downstream processor 1016. The fourth downstream processor 1016 is also associated with a fourth downstream output buffer 1078.

The first processor 1002, the first input buffer 1062, and the first output buffer 1070 may correspond to a computing module, such as the computing module 600. Similarly, other combinations of associated processors, input buffers, and output buffers included in apparatus 1000 may correspond to a computing module (such as computing module 600).

Downstream processors 1010, 1012, 1014, 1016 are arranged downstream of the processors 1002, 1004, 1006, 1008 (e.g., closer to the output stages of the processing pipeline). In some examples, the first input buffer 1062, the first processor 1002, and the first output buffer 1070 correspond to the first channel computation module 518; the second input buffer 1064, the second processor 1004, and the second output buffer 1073 correspond to the second channel calculation module 520; the third input buffer 1066, the third processor 1006, and the third output buffer 1075 correspond to the third channel calculation module 526; and the fourth input buffer 1068, the fourth processor 1008, and the fourth output buffer 1077 correspond to the fourth computing module 528. The first downstream input buffer 1011, the first downstream processor 1010, and the first downstream output buffer 1072 may correspond to downstream computing modules (such as the storage device computing module 508). Similarly, the second downstream input buffer 1013, the second downstream processor 1012, and the second downstream output buffer 1074 may correspond to the storage device computing module 508 (e.g., a device may have more than one storage device computing module). Further, the third downstream input buffer 1015, the third downstream processor 1014, and the third downstream output buffer 1076 may correspond to the storage device calculation module 508. Further, the fourth downstream input buffer 1017, the fourth downstream processor 1016, and the fourth downstream output buffer 1078 may correspond to the storage device calculation module 508.

The apparatus 1000 also includes a first storage source 1020, a second storage source 1022, a third storage source 1024, and a fourth storage source 1026. Storage sources 1020, 1022, 1024, 1026 may include storage media, storage media channels, groups of storage media channels, and the like. In some implementations, the first storage source 1020 corresponds to the first channel 534, the second storage source 1022 corresponds to the second channel 536, the third storage source 1024 corresponds to the third channel 538, and the fourth storage source 1026 corresponds to the fourth channel 540.

The apparatus 1000 also includes an aggregate output buffer 1080, which aggregate output buffer 1080 may correspond to the output buffer of the computing module downstream of the illustrated processor.

In operation, first input buffer 1062 receives data from first storage source 1020. Data may be placed in the first input buffer 1062 by DMA operations performed by the first processor 1002. The first processor 1002 may identify a first delimiter and a last delimiter within the data during the DMA operation. Similarly, the second input buffer 1064 receives data from the second storage source 1022. The data may be placed in the second input buffer 1064 by DMA operations performed by the second processor 1004. The second processor 1004 may identify a first delimiter and a last delimiter within the data during the DMA operation. In addition, third input buffer 1066 receives data from third storage source 1024. The data may be placed in the third input buffer 1066 by a DMA operation performed by the third processor 1006. The third processor 1006 may identify a first delimiter and a last delimiter within the data during the DMA operation. In addition, fourth input buffer 1068 receives data from fourth storage source 1026. The data may be placed in the fourth input buffer 1068 by DMA operations performed by the fourth processor 1008. The fourth processor 1008 may identify a first delimiter and a last delimiter within the data during the DMA operation.

The first processor 1002 sends the data before the first delimiter in the first input buffer 1062 to the fourth downstream input buffer 1017. In addition, the processor 1002 sends data following the last separator in the first input buffer 1062 to the first downstream input buffer 1011. The second processor 1004 sends the data preceding the first delimiter in the second input buffer 1064 to the first downstream input buffer 1011. Thus, host data units split across the first input buffer 1062 and the second input buffer 1064 are put back together in the first downstream input buffer 1011. The second processor 1004 also sends data that occurs after the last separator in the second input buffer 1064 to the second downstream input buffer 1013. The third processor 1006 sends data that occurs before the first delimiter in the third input buffer 1066 to the second downstream input buffer 1013. Thus, host data units split across the second input buffer 1064 and the third input buffer 1066 are put back together in the second downstream input buffer 1013. The third processor 1006 sends data that occurs after the last separator in the third input buffer 1066 to the third downstream input buffer 1015. The fourth processor 1008 transmits data that occurs before the first delimiter in the fourth input buffer 1068 to the third downstream input buffer 1015. Thus, host data units split across the third input buffer 1066 and the fourth input buffer 1068 are placed back together in the third downstream input buffer 1015. The fourth processor 1008 sends data that occurs after the last separator in the fourth input buffer 1068 to the fourth downstream input buffer 1017. Thus, host data units split across the fourth input buffer 1068 and the first input buffer 1062 are put back together in the fourth downstream input buffer 1017.

Each of the processors 1002, 1004, 1006, 1008 in the apparatus 1000 includes four cores. In addition, each of the downstream processors 1010, 1012, 1014, 1016 includes a core. Cores other than the number shown may be included. Each core may include hardware and/or executable software configured to perform computations (e.g., search functions, filtering functions, mathematical operations, etc.) on data to generate output. The first processor 1002 includes a first core 1030, a second core 1032, a third core 1034, and a fourth core 1036. The second processor 1004 includes a fifth core 1038, a sixth core 1040, a seventh core 1042, and an eighth core 1044. Third processor 1006 includes a ninth core 1046, a tenth core 1048, an eleventh core 1050, and a twelfth core 1052. Fourth processor 1008 includes a thirteenth core 1054, a fourteenth core 1056, a fifteenth core 1058, and a sixteenth core 1060. Each of the kernels 1030, 1032, 1034, 1036, 1038, 1040, 1042, 1044, 1046, 1048, 1050, 1052, 1054, 1056, 1058, 1060 may perform computations in parallel to generate an output.

The first downstream processor 1010 includes a first downstream core 1082, the second downstream processor 1012 includes a second downstream core 1084, the third downstream processor 1014 includes a third downstream core 1086, and the fourth downstream processor 1016 includes a fourth downstream core 1088. Each of the downstream cores 1082, 1084, 1086, 1088 may operate in parallel to generate an output. In some examples, cores 1030, 1032, 1034, 1036, 1038, 1040, 1042, 1044, 1046, 1048, 1050, 1052, 1054, 1056, 1058, 1060 may operate in parallel with downstream cores 1082, 1084, 1086, 1088.

The cores 1030, 1032, 1034, 1036, 1038, 1040, 1042, 1044, 1046, 1048, 1050, 1052, 1054, 1056, 1058, 1060 and the downstream cores 1082, 1084, 1086, 1088 may be configured to perform computations based on the locations of the separators. For example, kernels 1030, 1032, 1034, 1036, 1038, 1040, 1042, 1044, 1046, 1048, 1050, 1052, 1054, 1056, 1058, 1060 may begin computation at data immediately following a separator. The downstream cores 1082, 1084, 1086, 1088 may operate in a similar manner. In some examples, fewer than all of the cores 1030, 1032, 1034, 1036, 1038, 1040, 1042, 1044, 1046, 1048, 1050, 1052, 1054, 1056, 1058, 1060 may be used.

The output of the computation performed by the first processor 1002 is placed into the first output buffer 1070. The output of the computations performed by the second processor 1004 are placed into a second output buffer 1073. The output of the computations performed by the third processor 1006 is placed into a third output buffer 1075. The output of the computation performed by the fourth processor 1008 is placed into a fourth output buffer 1077. The output from the first downstream processor 1010 is placed into a first downstream output buffer 1072. The output from the second downstream processor 1012 is placed into a second downstream output buffer 1074. The output from the third downstream processor 1014 is placed into a third downstream output buffer 1076. The output from the fourth downstream processor 1016 is placed into a fourth downstream output buffer 1078.

The outputs in the output buffers 1070, 1073, 1075, 1077 and downstream output buffers 1072, 1074, 1076, 1078 are aggregated (e.g., by a downstream processor, such as a processor in the storage computing module 508) into an aggregated output buffer 1080 for output and/or additional processing.

Because incomplete host data units located at the end of the input buffer are placed back into the downstream input buffer together based on the location of the delimiters, the apparatus 1000 provides for efficient parallel processing of host data stored in a manner that is not aligned with the parallel processing pipeline.

Fig. 11 is a diagram illustrating an example of an apparatus 1100 that performs delimiter-based parallel computation, in which a computing module that operates in parallel directly accesses input buffers of other computing modules based on the position of the delimiter. The device 1100 is a computing storage device and may correspond to the storage device 500 of fig. 5. The apparatus 1100 includes a first processor 1110 associated with a first input buffer 1120, a second processor 1112 associated with a second input buffer 1122, a third processor 1114 associated with a third input buffer 1124, and a fourth processor 1116 associated with a fourth input buffer 1126. In some examples, the first processor 1110 and the first input buffer 1120 correspond to the first channel calculation module 518, the second processor 1112 and the second input buffer 1122 correspond to the second channel calculation module 520, the third processor 1114 and the third input buffer 1124 correspond to the third channel calculation module 526, and the fourth processor 1116 and the fourth input buffer 1126 correspond to the fourth channel calculation module 528. In other examples, processors 1110, 1112, 1114, 1116 and input buffers 1120, 1122, 1124, 1126 correspond to different computing modules in storage 500.

In apparatus 1100, input buffers 1120, 1122, 1124, 1126 are ranges within a common buffer space. Data from the first storage source 1102 is copied (e.g., by DMA operations initiated by the first processor 1110) to the first input buffer 1120. Data from the second storage source 1104 is copied from the second storage source to the second input buffer 1122, data from the third storage source 1106 is copied to the third input buffer 1124, and data from the fourth storage source 1108 is copied to the fourth input buffer 1126. The first processor 1110 identifies the locations of the first and last separators within the first input buffer 1120, the second processor 1112 identifies the locations of the first and last separators within the second input buffer 1122, the third processor 1114 identifies the locations of the first and last separators within the third input buffer 1124, and the fourth processor 1116 identifies the locations of the first and last separators within the fourth input buffer 1126.

In operation, after a first separator in the first input buffer 1120, the first processor 1110 begins processing data and continues into the second input buffer 1122 until the first separator in the second input buffer 1122 is reached. Similarly, the second processor 1112 begins processing data that occurs after the first delimiter in the second input buffer 1122 and continues processing into the third input buffer 1124 until the first delimiter in the third input buffer is reached. Similarly, the third processor 1114 begins processing data that occurs after the first delimiter in the third input buffer 1124 and continues processing until the first delimiter in the fourth input buffer 1126 is reached. Similarly, the fourth processor 1116 begins processing data that occurs after the first delimiter in the fourth input buffer and, upon reaching the end of the fourth input buffer 1126, begins processing at the beginning of the first input buffer 1120 and continues until the first delimiter in the first input buffer 1120 is reached.

The processors 1110, 1112, 1114, 1116 may operate in parallel. Further, because a processor may read across the boundary of its associated input buffer, host data units may be processed by a single processor rather than split across processors. Thus, apparatus 1100 may provide parallel processing of data in a system where host data units are not aligned with parallel processing pipelines. It should be noted that the input buffer of device 1100 may not include additional buffer space as in some other implementations described herein.

Fig. 12 is a flow chart of a method 1200 for separator-based parallel processing. Method 1200 may be performed by system 100, by storage 500, by device 700, by device 1000, or by device 1100.

The method 1200 includes: at 1202, first data is received from a first storage source at a first buffer. For example, the first buffer 106 may receive first data including the first delimiter 113 and the first portion 114 from the first storage source 110. As another example, the first buffer 106 may receive the first data from the first storage source 110 via the third processor 302 and the third buffer 304. As another example, the first input buffer 710 may receive data from the first storage source 702. As another example, the first downstream input buffer 1011 may receive data from the first storage source 1020 via the first input buffer 1062 and the first processor 1002. As another example, the first input buffer 1120 may receive data from the first storage source 1102.

The method 1200 further includes: at 1204, second data is received from a second storage source at a second buffer, the second data including a first portion and a second portion separated by a separator. For example, the second buffer 108 may receive second data including a second portion 116, a second separator 118, a third portion 120, and a third separator 121 from the second storage source 112. As another example, the second input buffer 712 may receive data from the second storage source 704. As another example, the second input buffer 1064 may receive data from the second storage source 1022. As another example, the second input buffer 1122 may receive data from the second storage source 1104.

The method 1200 further includes: at 1206, a first operation is performed on a second portion of the second data at a second processor associated with the second buffer. For example, the second processor 104 may perform a calculation on the third portion 120 in response to the third portion 120 falling between the first and last separators (e.g., the second and third separators 118, 121) in the second buffer 108. As another example, the second processor 720 may perform a calculation on data in the second input buffer 712 between the first delimiter and the last delimiter that falls within the second input buffer. As another example, the second processor 1004 may perform a calculation based on data in the second input buffer 1064 between a first separator and a last separator within the second input buffer 1064. As another example, the second processor 1112 may perform a calculation based on data in the second input buffer 1122 between the first separator and the last separator that fall within the second input buffer 1122.

The method 1200 further includes: at 1208, at a first processor associated with the first buffer, a second operation is performed on the first portion of the second data and the first data. For example, the second processor 104 may copy the second portion 116 into the first buffer 106 in response to the second portion 116 being located before the first delimiter (e.g., the second delimiter 118) in the second buffer 108. The first processor 102 may perform calculations based on the first portion 114 and the second portion 116. As another example, the first processor 102 may perform a calculation that starts after a first separator (e.g., first separator 113) in the first buffer 106 and continues into the second buffer 108 until reaching the first separator (e.g., second separator 118) within the second buffer 108. As another example, the second processor 720 may copy data located before the first delimiter in the second input buffer 712 into the first input buffer 710, and the first processor 718 may perform calculations based on the data in the first input buffer 710. In another example, the second processor 1004 copies data before the first delimiter in the second input buffer 1064 to the first downstream input buffer 1011, and the first downstream processor 1010 performs computations based on the data in the first downstream input buffer 1011. In another example, the first processor 1110 performs a calculation on data that begins after a first delimiter in the first input buffer 1120 and continues into the second input buffer 1122 until the first delimiter within the second input buffer 1122 is reached.

Method 1200 may be used to perform parallel processing in systems where host data units are not aligned with storage data units and/or parallel processing pipelines.

In some examples, X corresponds to Y based on X and Y matching. For example, the first ID may be determined to correspond to a second ID that matches (e.g., has the same value as) the first ID. In other examples, X corresponds to Y based on X being associated with Y (e.g., X being linked to Y). For example, X may be associated with Y by a mapping data structure.

Particular embodiments may be implemented in one or a combination of hardware, firmware, and software. Other embodiments may also be implemented as instructions stored on a computer-readable storage device, which may be read and executed by at least one processor to perform the operations described herein. Computer-readable storage devices may include any non-transitory memory mechanism for storing information in a form readable by a machine (e.g., a computer). For example, computer-readable storage devices may include read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, and other storage devices and media.

As used in this document, the term "communication" is intended to include transmission, or reception, or both transmission and reception. This may be particularly useful in claims when describing the organization of data sent by one device and received by another device, but only requiring the function of one of these devices to infringe the claim. Similarly, when only the function of one of those devices is claimed, the bidirectional data exchange between two devices (both devices transmitting and receiving during the exchange) may be described as "communication". The term "communication" as used herein with respect to wireless communication signals includes transmitting wireless communication signals and/or receiving wireless communication signals. For example, a wireless communication unit capable of transmitting wireless communication signals may include a wireless transmitter for transmitting wireless communication signals to at least one other wireless communication unit, and/or a wireless communication receiver for receiving wireless communication signals from at least one other wireless communication unit.

Some embodiments may be used in conjunction with various devices and systems, such as the following: personal Computers (PCs), desktop computers, mobile computers, laptop computers, notebook computers, tablet computers, server computers, handheld devices, personal Digital Assistant (PDA) devices, handheld PDA devices, on-board devices, off-board devices, hybrid devices, in-vehicle devices, off-board devices, mobile or portable devices, consumer devices, non-mobile or non-portable devices, wireless communication stations, wireless communication devices, wireless Access Points (APs), wired or wireless routers, wired or wireless modems, video devices, audio-video (a/V) devices, wired or wireless networks, wireless area networks, wireless Video Area Networks (WVAN), local Area Networks (LANs), wireless LANs (WLANs), personal Area Networks (PANs), wireless PANs (WPANs), and the like.

Some embodiments may be used in conjunction with unidirectional and/or bidirectional radio communication systems, cellular radiotelephone communication systems, mobile telephones, cellular telephones, wireless telephones, personal Communication Systems (PCS) devices, PDA devices that include wireless communication devices, mobile or portable Global Positioning System (GPS) devices, devices that include GPS receivers or transceivers or chips, devices that include Radio Frequency Identification (RFID) elements or chips, multiple Input Multiple Output (MIMO) transceivers or devices, single Input Multiple Output (SIMO) transceivers or devices, multiple Input Single Output (MISO) transceivers or devices, devices with one or more internal and/or external antennas, digital Video Broadcasting (DVB) devices or systems, multi-standard radio devices or systems, wired or wireless handheld devices (e.g., smart phones), wireless Application Protocol (WAP) devices, and so forth.

Some embodiments may be compatible with a wireless communication protocol (e.g., radio Frequency (RF), infrared (IR), frequency Division Multiplexing (FDM), orthogonal FDM (OFDM), time Division Multiplexing (TDM), time Division Multiple Access (TDMA), extended TDMA (E-TDMA), general Packet Radio Service (GPRS), extended GPRS, code Division Multiple Access (CDMA), wideband CDMA (WCDMA), CDMA 2000, single carrier CDMA, multi-carrier modulation (MDM), discrete multi-tone (DMT), bluetooth) compliant with one or more wireless communication protocols ^TM Global Positioning System (GPS), wi-Fi, wi-Max, zigBee (ZigBee) ^TM One or more types of wireless communication signals and/or systems of Ultra Wideband (UWB), global system for mobile communications (GSM), 2G, 2.5G, 3G, 3.5G, 4G, fifth generation (5G) mobile networks, 3GPP, long Term Evolution (LTE), LTE-advanced, enhanced data rates for GSM evolution (EDGE), etc.). Other embodiments may be used in various other devices, systems, and/or networks.

Although an example processing system has been described above, embodiments of the subject matter and functional operations described herein may be implemented in other types of digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.

Embodiments of the subject matter and operations described herein may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described herein may be implemented as one or more computer programs (i.e., one or more components of computer program instructions), encoded on a computer storage medium, for execution by, or to control the operation of, information/data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal (e.g., a machine-generated electrical, optical, or electromagnetic signal) that is generated to encode information/data for transmission to suitable receiver apparatus for execution by information/data processing apparatus. The computer storage medium may be a computer readable storage device, a computer readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them, or the computer storage medium may be included in a computer readable storage device, a computer readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Furthermore, while the computer storage medium is not a propagated signal, the computer storage medium may be a source or destination of computer program instructions encoded in an artificially generated propagated signal. Computer storage media may also be or be included in one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices).

The operations described herein may be implemented as operations performed by an information/data processing apparatus on information/data stored on one or more computer-readable storage devices or received from other sources.

The term "data processing apparatus" includes all kinds of apparatus, devices, and machines for processing data, including by way of example the aforementioned programmable processor, computer, system-on-a-chip, or multiple programmable processors, computers, systems-on-a-chip, or combinations. The device may include dedicated logic circuitry (e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit)). In addition to hardware, the device may include code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The devices and execution environments may implement a variety of different computing model infrastructures (such as web services, distributed computing, and grid computing infrastructures).

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a component, assembly, subroutine, object, or other unit suitable for use in a computing environment. The computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or information/data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store portions of one or more components, sub-programs, or code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input information/data and generating output. Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and information/data from a read-only memory or a random access memory or both. Elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive information/data from or transfer information/data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, the computer need not have such devices. Devices suitable for storing computer program instructions and information/data include all forms of non-volatile memory, media and memory devices including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices); magnetic disks (e.g., internal hard disks or removable disks); magneto-optical disk; CD-ROM and DVD-ROM discs. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

To provide for interaction with a user, embodiments of the subject matter described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information/data to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other types of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may receive input from a user in any form including acoustic, speech, or tactile input. Further, the computer may interact with the user by sending and receiving documents to and from the device used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.

Embodiments of the subject matter described herein can be implemented in a computing system that includes a back-end component (e.g., as an information/data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an embodiment of the subject matter described herein), or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital information/data communication (e.g., a communication network). Examples of communication networks include local area networks ("LANs") and wide area networks ("WANs"), internetworks (e.g., the internet), and peer-to-peer networks (e.g., ad hoc (hoc) peer-to-peer networks).

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, the server sends information/data (e.g., HTML pages) to the client device (e.g., for the purpose of displaying information/data to and receiving user input from a user interacting with the client device). Information/data generated at the client device (e.g., results of user interactions) may be received at the server from the client device.

While this specification contains many specific embodiment details, these should not be construed as limitations on the scope of any embodiments or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Particular features described herein in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a combination can in some cases be excised from the claimed combination and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described herein. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In particular embodiments, multitasking and parallel processing may be advantageous.

Many modifications and other embodiments of the disclosure set forth herein will come to mind to one skilled in the art to which these embodiments pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the embodiments are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A storage device, comprising:

a first storage source storing first data;

a second storage source storing second data, the second data including a first portion and a second portion separated by a separator;

a first buffer configured to receive first data;

a second buffer configured to receive second data;

a first processor associated with the first buffer; and

a second processor associated with the second buffer, wherein the second processor is configured to: performing a first operation on a second portion of the second data, and wherein the first processor is configured to: based on the separator, a second operation is performed on the first portion of the second data and the first data.

2. The storage device of claim 1, wherein the second processor is configured to: the first portion of the second data is copied from the second buffer to the first buffer.

3. The storage device of claim 1, wherein the delimiter corresponds to a first delimiter in a second buffer, and wherein the second processor is configured to: a first portion of the second data is copied to the first buffer in response to the first portion being located before the first separator within the second buffer.

4. The storage device of claim 3, wherein the first buffer comprises: a first additional buffer space in the first buffer before the first data, and a second additional buffer space in the first buffer after the first data, and wherein the second processor is configured to: the first portion is copied to a second additional buffer space.

5. The storage device of claim 4, further comprising:

a third buffer storing third data; and

a third processor associated with the third buffer, wherein the third processor is configured to: the third data is copied to the first additional buffer space in response to the third data being located after the final separator in the third buffer.

6. The storage device of claim 3, wherein the first data follows a final separator in the first buffer, and wherein the first processor is configured to: the second operation is performed in response to the first buffer receiving the first portion of the second data.

7. The storage device of claim 1, wherein the storage device comprises: a buffer space comprising a first buffer and a second buffer, and wherein the first processor is configured to: stopping processing in the second buffer based on the location of the separator, and wherein the second processor is configured to: based on the position of the separator, processing in the second buffer is started.

8. The storage device of any one of claims 1 to 7, wherein the first storage source comprises: a storage channel, a storage medium device, or a group of storage channels.

9. The storage device of any one of claims 1 to 7, wherein the first storage source comprises: NAND flash channels, NAND flash chips, or groups of NAND flash channels.

10. The storage device of any one of claims 1 to 3, further comprising:

a third buffer configured to receive first data from the first storage source; and

a third processor, associated with the third buffer, configured to: the first data is copied from the third buffer to the first buffer in response to the first data being located after the final separator in the third buffer.

11. A method for parallel processing, comprising:

receiving, at a first buffer, first data from a first storage source;

receiving, at a second buffer, second data from a second storage source, the second data comprising a first portion and a second portion separated by a separator;

performing, at a second processor associated with a second buffer, a first operation on a second portion of the second data; and

at a first processor associated with the first buffer, a second operation is performed on the first portion of the second data and the first data.

12. The method of claim 11, further comprising: the first portion of the second data is copied from the second buffer to the first buffer.

13. The method of claim 11, wherein the separator corresponds to a first separator in a second buffer, the method further comprising: a first portion of the second data is copied to the first buffer in response to the first portion being located before the first separator within the second buffer.

14. The method of claim 13, wherein the first buffer comprises: a first additional buffer space in the first buffer before the first data and a second additional buffer space in the first buffer after the first data, and wherein the step of copying the first portion of the second data to the first buffer comprises: the first portion is copied to a second additional buffer space.

15. The method of claim 14, further comprising:

storing the third data in a third buffer associated with the third processor; and

the third data is copied to the first additional buffer space in response to the third data being located after the final separator in the third buffer.

16. The method of claim 11, wherein the first buffer and the second buffer are included in a common buffer space, and wherein performing the second operation comprises:

the first processor performs an operation that starts in the first buffer and stops in the second buffer based on the position of the separator.

17. The method of any of claims 11 to 16, wherein the first storage source comprises: a storage channel, a storage medium device, or a group of storage channels.

18. The method of any of claims 11 to 16, wherein the first storage source comprises: NAND flash channels, NAND flash chips, or groups of NAND flash channels.

19. The method of any of claims 11 to 13, further comprising:

receiving, at a third buffer, first data from a first storage source; and

the first data is copied from the third buffer to the first buffer in response to the first data being located after the final separator in the third buffer.

20. A storage device, comprising:

a first storage channel including a first media device storing first data;

a second storage channel comprising a second media device storing second data, the second data comprising a first portion and a second portion separated by a separator;

A first computing module associated with the first memory channel and including a first processor and a first input buffer; and

a second computing module associated with the second storage channel and comprising a second processor and a second input buffer, wherein the second processor is configured to: performing a first operation on a second portion of the second data, and wherein the first processor is configured to: a second operation is performed on the first portion of the second data and the first data based on the separator.