US11010431B2 - Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD - Google Patents

Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD Download PDF

Info

Publication number
US11010431B2
US11010431B2 US15/472,061 US201715472061A US11010431B2 US 11010431 B2 US11010431 B2 US 11010431B2 US 201715472061 A US201715472061 A US 201715472061A US 11010431 B2 US11010431 B2 US 11010431B2
Authority
US
United States
Prior art keywords
data
storage device
data storage
memory array
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US15/472,061
Other versions
US20180189635A1 (en
Inventor
Sompong P. Olarig
Fred WORLEY
Nazanin Farahpour
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US15/472,061 priority Critical patent/US11010431B2/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FARAHPOUR, NAZANIN, OLARIG, SOMPONG P., WORLEY, FRED
Priority to KR1020170139006A priority patent/KR102449191B1/en
Publication of US20180189635A1 publication Critical patent/US20180189635A1/en
Priority to US17/322,601 priority patent/US20210279285A1/en
Application granted granted Critical
Publication of US11010431B2 publication Critical patent/US11010431B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0454
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • G06N3/0481
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour

Definitions

  • the present disclosure relates generally to data storage devices, and more particularly to data storage devices that are capable of performing data matching and machine learning on the data stored thereon.
  • Non-volatile memory (NVMe) over Fabrics is a new industry standard.
  • NVMeoF defines a common architecture that supports an NVMe block storage protocol over a wide range of storage networking fabrics such as Ethernet, Fibre Channel, InfiniBand, and other network fabrics.
  • an X86-based central processing unit (CPU) on a motherboard is no longer required to move data between an initiator (e.g., host software) and a target device (i.e., an NVMeoF device) because the target device is capable of moving data by itself.
  • the term, fabric represents a network topology in which network nodes can pass data to each other through a variety of interconnecting protocols, ports, and switches.
  • Ethernet-attached SSDs may attach directly to a fabric, and in this case the fabric is the Ethernet.
  • the standard form factor of NVMeoF devices is the same or very similar to the standard solid-state drive (SSD) and hard disk drive (HDD) to enable quick and easy deployment in existing rack systems in an enterprise or a datacenter.
  • the NVMeoF devices provide high capacity, low latency data storage and operation environment for enterprise or datacenter applications.
  • NVMeoF devices are not optimized for data-centric applications such as machine learning and data mining applications.
  • NVMeoF devices including fabric-attached SSDs (eSSDs) merely respond to a request by an application running on a host computer and provide data requested by the application or perform only basic operations on the data stored thereon.
  • Most of the data matching or machine learning capabilities are performed by CPUs and/or graphics processing units (GPUs) on a host computer that are external to the NVMeoF devices.
  • GPUs graphics processing units
  • a data storage device includes a memory array for storing data; a host interface for providing an interface with a host computer running an application; a central control unit configured to receive a command in a submission queue from the application and initiate a search process in response to a search query command; a preprocessor configured to reformat data contained in the search query command based on a type of the data and generate a reformatted data; and one or more data processing units configured to extract one or more features from the reformatted data and perform a data operation on the data stored in the memory array in response to the search query command and return matching data from the data stored in the memory array to the application via the host interface.
  • a method for operating a data storage device includes: receiving a command in a submission queue from an application running on a host computer; initiating a search process in response to a search query command; generating a reformatted data by changing a format of data contained in the search query command based on a type of the data; extracting one or more features from the reformatted data; performing a data operation on data stored in a memory array of the data storage device in response to the search query command; and returning matching data from the data stored in the memory array to the application via a host interface established between the host computer and the data storage device.
  • FIG. 1 shows a block diagram of an example data storage device, according to one embodiment
  • FIG. 2 is a block diagram illustrating a behavioral view of an example neural code accelerator, according to one embodiment
  • FIG. 3 shows an example GPU configured to implement one or more convolution engines (CEs), according to one embodiment
  • FIG. 4 shows an example data storage device including an XOR engine, according to one embodiment
  • FIG. 5 shows an example data storage device including a GPU for machine learning, according to one embodiment
  • FIGS. 6A and 6B show a flowchart for an example image search query and retrieval process, according to one embodiment.
  • Machine learning can include algorithms that can learn from data including artificial intelligence, getting computers to act without being explicitly programmed, automated reasoning, automated adaptation, automated decision making, automated learning, the ability for a computer to learn without being explicitly programmed, artificial intelligence (AI), or combination thereof.
  • Machine learning can be considered a type of artificial intelligence (AI).
  • Machine learning can include classification, regression, feature learning, online learning, unsupervised learning, supervised learning, clustering, dimensionality reduction, structured prediction, anomaly detection, neural nets, or combination thereof.
  • a learning system can include machine learning systems that can process or analyze “big data.”
  • Parallel or distributed storage devices with in-storage-computing (ISC) can accelerate big data machine learning and analytics.
  • the parallel or distributed learning system can offload functions to ISC for additional bandwidth and reduce input and output (I/O) for the storage and host processor.
  • This parallel or distributed learning system can provide machine learning with ISC.
  • a parallel or distributed learning system can be implemented with in-storage-computing (ISC), a scheduler, or combination thereof.
  • ISC in-storage-computing
  • scheduler can intelligently assign data, tasks, functions, operations, or combination thereof.
  • a host computer and one or more data storage devices can collectively perform data matching or machine learning operations.
  • the data matching or machine learning operations can be partitioned into host operations and device operations.
  • the partitioning into the host operations and the device operations can depend on the optimization of a computational time and power efficiency for operating on a specific usage model. If a specific part of the data matching or machine learning operations performed by a subsystem (either the host computer or the data storage device) can result in a faster and more efficient execution, that specific part of the operations can be partitioned into the corresponding subsystem.
  • the dataset of trained faces may be stored in a data storage device.
  • the dataset of trained faces may include binary codes or feature vectors extracted from the trained face images.
  • the entire or a part of the newly trained facial dataset or data model can be copied from the data storage device to a memory of the host computer.
  • the host computer can perform the new facial training operation using the dataset copied to the host computer's memory. That is, the data storage device may receive the data of the new face and send corresponding neural binary codes or feature vectors to facilitate the new facial training operation performed by the host computer.
  • the host computer can keep the newly trained facial recognition model in the host computer's memory for an additional training or copy the newly trained facial recognition model back to the data storage device to update the dataset of trained faces. This process can repeat for a newly received facial dataset and training for a new model based on the facial dataset.
  • the host computer can perform the data matching or machine learning operations in a framework that supports coordination with the data storage device that stores the dataset.
  • the performance of the framework can be highly dependent on the usage model and deployment parameters, such as a size of the images, a number of training iterations, a size of the dataset, a training algorithm, a floating-point performance, etc.
  • the size of the dataset of trained faces may get larger over time.
  • the data storage device can partially or fully perform the facial recognition operation instead of copying the dataset stored in the data storage device to the memory of the host computer to perform the data matching or machine learning operations in the host computer.
  • the present disclosure provides a data storage device that can internally perform data matching or machine learning operations.
  • the data storage device can be any of a solid-state drive (SSD), a hard disk drive (HDD), an NVMe device that is compatible with the NVMe standard, an NVMeoF device that is compatible with the NVMeoF standard, or any other fabric-attached SSDs (eSSDs). It is noted that any other type of devices that can store data and perform data matching or machine learning can be used without deviating from the scope of the present disclosure.
  • FIG. 1 shows a block diagram of an example data storage device, according to one embodiment.
  • the data storage device 100 includes a central control unit (CCU) 111 , a preprocessor 112 , an embedded DRAM 113 , a signature thresholding engine 114 , a direct memory access (DMA) engine 115 , a controller 116 , an input buffer 117 , a weight buffer 118 , an output buffer 119 , one or more processing units 120 , and a memory array 130 .
  • Various images, text, video, audio, or other data can be stored in the memory array 130 .
  • the memory array 130 is shown to be local to the data storage device 100 , it is noted that the memory array 130 can be remotely connected to the data storage device 100 via the fabrics such as the Ethernet.
  • the memory array 130 can be a flash array that may reside in another NVMeoF device.
  • the placement and physical attachment of the memory array 130 may not be a physical limitation in that the data stored in the memory array 130 can be accessed by any host computer accessible by the NVMeoF protocols.
  • the controller 116 and the one or more data processing units 120 of one data storage device can operate on data stored in the memory array 130 of itself or another data storage device over the fabrics.
  • the data storage device 100 can be integrated circuits, integrated circuit cores, integrated circuit components, microelectromechanical system (MEMS), passive devices, or a combination thereof having a form factor compatible with the NVMe and/or NVMeoF standards.
  • MEMS microelectromechanical system
  • passive devices or a combination thereof having a form factor compatible with the NVMe and/or NVMeoF standards.
  • various form factors of the data storage device 100 can be used without deviating from the scope of the present disclosure.
  • the data storage device 100 is an NVMeoF device, and the connection between a host computer (not shown) and the fabric attached to the NVMeoF device is an Ethernet connection.
  • the host computer can send NVMeoF commands directly to the NVMeoF device over the Ethernet connection.
  • various other fabrics such as Fibre Channel, InfiniBand, and other network fabrics can be used to establish the communication between the data storage device 100 and the host computer.
  • the data storage device 100 can receive a command 150 from an application running on the host computer.
  • the command 150 can be a vendor-specific fabric command (e.g., an NVMeoF command).
  • the command 150 can be a normal read/write operation command, an image search inquiry command, or a machine learning command to operate on the data stored in the memory array 130 .
  • the command 150 can be received in a submission queue (SQ).
  • One submission queue can include several commands 150 .
  • a single submission queue can include the same or similar type of commands 150 , for example, read/write operation commands. Similar or same commands 150 can be sorted by the application and packaged in different submission queues for efficient delivery and processing of the commands 150 .
  • the controller 116 is configured to perform various data operations including data matching and/or machine learning operations on the data stored in the memory array 130 .
  • the controller 116 can run a state machine or perform data matching operations in conjunction with the CCU 111 .
  • the data storage device 100 can internally perform the data operations with no or minimal interaction with the application or with the host computer. In this case, the latency to complete the requested operation can be improved with less power consumed due to less data movement between the host and the data storage device.
  • the data storage device 100 provides the matching data 151 to the application running on the host computer.
  • the CCU 111 can decode the command 150 received from the host computer and generate one or more neural binary codes for internal and external consumption. For example, in response to an image search query command including an image data, the CCU 111 initializes the preprocessor 112 to operate on the received image data.
  • the data storage device 100 can receive only the command 150 from the application to perform a data operation on the dataset that are stored in the memory array 130 instead of receiving both the command and dataset from the host computer. Examples of such data include but are not limited to, image data, text data, video data, and audio data.
  • the preprocessor 112 can convert the format of the image data and create a fixed-size RGB format data. The converted image data in the RGB format may further be scaled up or down for facilitating the extraction of various features from the image data.
  • the analysis-ready fixed-size image data are saved in the DRAM 113 for the data operation as instructed by the command 150 .
  • the data processing unit 120 is a neural code accelerator.
  • Several data processing units 120 may be required for processing data on each of the received image data. For example, if ten images are received from the application in the command 150 , a total of ten data processing units 120 can be invoked by the CCU 111 . The number of invoked data processing units 120 may not necessarily match the number of received image data. Depending on the current workload and the availability of the data processing units 120 , the CCU 111 can invoke a certain number of data processing units 120 . In some embodiments, the data processing can be divided, grouped, or work in parallel or in series depending on the workload and the availability of the data processing units 120 .
  • each of the data processing units 120 can incorporate one or more convolution engines (CEs).
  • CEs convolution engines
  • the image data (e.g., a facial image) received from the application are input to the data processing units 120 in batches, and each of the convolution engines can extract feature sets for each dataset that is grouped in batches.
  • the feature sets that are extracted in parallel can be connected based on their weights.
  • the convolution weight parameters for each feature set can be loaded from the DRAM 113 via the weight buffer 118 .
  • the data processing unit 120 can also have adder trees and optional pooling, and a rectified linear unit 121 to compute and connect the feature sets to compute the fully-connected neural layers.
  • the data processing unit 120 can generate a feature vector 152 and send the feature vector 152 to the DRAM 113 via the DMA 115 .
  • the feature vector 152 can be converted to binary vectors and saved to the memory array 130 .
  • the feature vector 152 can be fetched by another processing unit (e.g., the signature thresholding 114 ) for compressing the feature vector 152 and comparing the extracted features (e.g., binary codes) with the saved features for the database images stored in the memory array 130 .
  • the extracted features e.g., binary codes
  • the rectified linear unit 121 implemented in each data processing engine 120 can provide a rectifier by employing an activation function for given inputs received from the associated convolution engines.
  • the rectified linear units 12 are applicable to computer vision using deep neural nets.
  • FIG. 2 is a block diagram illustrating a behavioral view of an example neural code accelerator, according to one embodiment.
  • the neural code accelerator 200 includes a convolution and pooling filters 203 , a fully connected layer 204 , and a signature thresholding unit 205 .
  • the neural code accelerator 200 can optionally have a principal component analysis (PCA) layer 206 .
  • the neural code accelerator may be the data processing unit 120 shown in FIG. 1 .
  • the neural code accelerator 200 can receive an input data 201 from a data buffer (e.g., a submission queue) from a host computer or a central control unit (e.g., CCU 111 shown in FIG. 1 ).
  • a preprocessor 202 e.g., preprocessor 112 shown in FIG. 1
  • the input data 201 can be any type of data including, but not limited to, text, image, video, and audio data.
  • the preprocessor 202 can be a part of or integrated into the neural code accelerator 200 .
  • the preprocessor 202 can perform initial processing of input data 201 to convert it to a raw RGB format and scale the image up or down to a fixed dimension.
  • the convolution and pooling filters 203 can perform data processing on the converted and/or scaled data with a set of convolution filters.
  • the output from the convolution and pooling filters 203 can be one or more features 207 .
  • the features 207 are fed to the fully connected layer 204 .
  • the fully connected layer 204 can generate a feature vector 208 based on the features 207 and feed the feature vector 208 to the signature thresholding unit 205 and optionally to the PCA layer 206 .
  • the signature thresholding unit 205 can generate one or more neural binary codes 210 .
  • the fully connected layer 204 can generates the feature vector 208 , and the signature thresholding unit 205 can generate the neural binary codes 210 by finalizing activations of the feature vector 208 based on a predetermined threshold.
  • the threshold may be fine-tuned by a user, training, or machine learning.
  • the PCA layer 206 can condense the output feature vector 208 to generate compressed feature vectors 211 .
  • the convolution engines can internally perform data matching and/or deep learning operations inside a data storage device.
  • the data storage device can have a plurality of GPUs, and each of the GPUS can include one or more convolution engines that are grouped in an input layer, a hidden layer, and an output layer.
  • FIG. 3 shows an example GPU configured to implement one or more convolution engines (CEs), according to one embodiment.
  • the GPU 300 can run one or more CEs that are pre-programmed with fixed algorithms such as K-means or regression.
  • a first group of the CEs are implemented as an input layer 301
  • a second group of the CEs are implemented as a hidden layer 302
  • the third group of the CEs are implemented as an output layer 303 .
  • the data paths from the input layer 301 to the output layer 303 through the hidden layer 302 provide a forward path or an inference path.
  • the data paths from the output layer 303 to the input layer 301 through the hidden layer 302 provide a backward or training path.
  • the application can directly utilize the GPU 300 without having to download a specific algorithm because the CEs implemented in the GPU 300 are preprogrammed with algorithms such as K-means or regression that are applicable to a variety of analysis and deep learning.
  • the data storage device implementing the GPU 300 can be a consumer device or a home device that can feature a machine learning capability.
  • the data storage device incorporates an additional data matching logic (e.g., XOR) and DMA engines.
  • the data storage device can perform data matching in real time or as a background task.
  • the application can provide one or more parameters for a matching data (e.g., raw binary values) to the data storage device, and the data storage device can internally execute and complete the pattern matching for the data, and return the matching data stored in the memory array to the application.
  • the data storage device can have one data matching (XOR) engine per bank/channel of an NAND array. For example, if the data storage device employs N number of independent channels/banks of the memory array (e.g., NAND array), a total of N XOR engines can be used to match the data from each NAND channel, where N is an integer number.
  • FIG. 4 shows an example data storage device including an XOR engine, according to one embodiment.
  • the data storage device 400 can be an NVMeoF device that is capable of processing and moving data itself to and from a host computer.
  • the host computer can run an application and communicate with the storage device 400 via the fabric interface.
  • the data storage device 400 can include a host manager 402 that interfaces with the host computer via a host interface 401 (e.g., U.2 connector), a buffer manager 403 including a DRAM controller and a memory interface (e.g., DDR3 and DDR4), a memory manager 404 (e.g., flash manager) including a DMA engine 406 and an XOR engine 407 , a CPU subsystem 405 , and a memory array 410 (e.g., flash memory).
  • a host manager 402 that interfaces with the host computer via a host interface 401 (e.g., U.2 connector), a buffer manager 403 including a DRAM controller and a memory interface (e.g., DDR3 and DDR4), a memory manager 404 (e.g., flash manager) including a DMA engine 406 and an XOR engine 407 , a CPU subsystem 405 , and a memory array 410 (e.g., flash memory).
  • the XOR engine 407 is configured to perform in-line data matching in response to a data matching request received from the application. After performing the in-line data matching operation, the data storage device 400 can provide the matching data to the requesting application via the connector 401 .
  • the XOR engine 407 may be implemented in an existing hardware logic of the memory manager 404 . This is cost effective because an additional hardware logic to implement the XOR engine 407 is not necessary.
  • the DMA engine 406 can be used to transfer the matching data to the requesting application.
  • FIG. 5 shows an example data storage device including a GPU for machine learning, according to one embodiment.
  • the data storage device 500 can be an NVMeoF device that is capable of processing and moving data itself to and from a host computer.
  • the host computer can run an application via the fabrics that provides a communication path with the data storage device 500 .
  • the data storage device 500 can include a CPU subsystem 505 , a host manager 502 that interfaces with the host computer via a host interface 501 (e.g., U.2 connector), a buffer manager 503 including a DRAM controller and a memory interface (e.g., DDR3 and DDR4), and a memory manager 504 (e.g., flash manager) including a DMA engine 506 , an XOR engine 507 , and one or more GPUs 508 a - 508 n .
  • the memory manager 504 can control access to the memory array 510 (e.g., flash memory) using the DMA engine 506 , the XOR engine 507 , and the GPUs 508 .
  • the GPUs can be configured to function as a convolution engine (CE) in an input layer, a hidden layer, and an output layer as shown in FIG. 3 .
  • CE convolution engine
  • the data storage device implementing the GPUs can be a consumer device or a home device that can feature a machine learning capability.
  • the present data storage device can store images in the memory array and internally run an image retrieval application in response to an image retrieval request from a host computer.
  • the storage device may store other types of data, such as text, audio, or video, among others.
  • image data will be used an example.
  • teachings are applicable to other data types as well.
  • the data storage device extract features from a received image data using one or more convolution neural network (CNN) engines.
  • CNN convolution neural network
  • the CNN refers to the collection of the convolution engines contained in the data storage device 100 .
  • the memory array 130 of the data storage device 100 can contain images with searchable image descriptor index values and other values to compute Hamming distances with respect to a requested image data during an image query search and retrieval process.
  • the CNN engines can create a binary neural code (herein also referred to as a binary key) that can be used for interrogation and/or comparison against the database images stored in the data storage device.
  • the binary neural code refers to a key that is stored in a metadata of each stored image data.
  • the CNN engines can provide a key of better quality.
  • the key can be created by deep learning performed elsewhere to generate a partial result. As more deep learning or image processing occurs, more refined keys can be generated, and the image retrieval process be become faster and more efficient.
  • the creation of the binary keys can use various forms, sizes, and types of the input data.
  • the preprocessor of the data storage device can convert the format of an input image to a fixed sized format (e.g., RGB 256 ⁇ 256 bitmap).
  • a fixed sized format e.g., RGB 256 ⁇ 256 bitmap
  • other reformatting or normalization process may be done by the pre-processor to format the input data, depending on the data type, and as would be recognized by one having skill in the art.
  • the preprocessed input image (or other data) is fed to the CNN engines.
  • the CNN engines process the input image, loop iteratively, extracting one or more binary codes.
  • the extracted binary codes are placed in the metadata of associated with the input image (or other data) for searching and selecting matching data stored in the data storage device.
  • the search and the selection process can accept a search value, create a search signature, and compare the search signature with an existing signature.
  • the search process can calculate Hamming distances, and select matching data that has a Hamming distance less than a threshold value.
  • the binary search and selection algorithm based on a k-nearest neighbor search in a Hamming space is well known in the art, for example, in an article entitled “Fast Exact Search in Hamming Space with Multi-Index Hashing” EEE Transactions on Pattern Analysis and Machine Intelligence archive, volume 36 Issue 6, June 2014, page 1107-1119.
  • FIGS. 6A and 6B show a flowchart for an example image search query and retrieval process, according to one embodiment.
  • An application 600 submits a sample image a data storage device.
  • the application 600 submits a search request to the data storage device by placing an image and associated query information in a submission queue (SQ).
  • SQL submission queue
  • a submission queue can include several search requests submitted from the application 600 , and the application 600 can submit several submission queues to the data storage device.
  • the submission queues can be submitted to the data storage device in various ways. For example, the submission queues can be consumed by the data storage device on a predetermined interval or on a first-come first-serve basis. In another example, the data storage device can be notified that the submission queues are ready for serving via a message from the application 600 , and the data storage device can serve the submission queues one at a time. Depending on the urgency or priority, submission queues can be reordered, and search requests in a submission queue can be reordered.
  • a submission queue from the application 600 can include a command associate with a request contained in the submission queue.
  • the data storage device is an NVMeoF device, and the NVMeoF device determines that the command is an NVMeoF command for an image search query ( 601 ).
  • the NVMeoF device can receive various NVMeoF commands from the application, and the NVMeoF device can determine an image search query from the received NVMeoF commands for further processing.
  • the data storage device can arbitrate among the submission queues received from the application 600 ( 602 ). In one embodiment, the arbitration and the subsequent selection of an associated command can be performed by a central control unit 111 shown in FIG. 1 .
  • the arbitration by the data storage device determines a submission queue to extract from the submission queues. Once a submission queue is extracted by arbitration, the data storage device determines a proper I/O request command to process the request contained in the submission queue ( 603 ). If the I/O request command to process the submission queue is an image search query command ( 604 ), the data storage device starts a preprocessing process 606 to retrieve images that matches the image search request.
  • the preprocessing process 606 is performed by an integrated preprocessor, for example, the preprocessor 112 shown in FIG. 1 . If the I/O request command to process the submission queue is not an image search query command, for example, a read/write request to the data stored in the data storage device, the data storage device treats the request associated with the submission queue as a normal request and processes it normally ( 605 ).
  • the data storage device preprocesses the image search request to reduce the size of the image received in the submission queue to a normalized format in preparation for processing. For example, the data storage device converts the image from a YUV color format to a RGB color format and further scales the converted image to a 256 ⁇ 256 pixel image ( 607 ). The data storage device can further convert this intermediate image to a monochromatic bitmap image and sharpen edges to generate a normalized image ( 608 ). The data storage device places the normalized image and associated search criteria into an input buffer ( 609 ) to start a feature extraction process 610 .
  • the feature extraction process 610 starts with an activation of a CNN engine that is internal to the data storage device ( 611 ).
  • the CNN engine corresponds to the convolution engine shown in FIG. 1 .
  • the CNN engine submits the normalized image to one or more procedural reduction layers, herein referred to as CNN layers, to extract specific features from the normalized image.
  • the CNN engine computes a neural descriptor ( 613 ), compresses the neural descriptor ( 614 ), and stores the result in an output buffer ( 615 ).
  • These processes 613 - 615 iteratively continue for all CNN layers.
  • the data storage device e.g., the processing unit 120 of FIG. 1 or the CNN engine
  • the search process 616 starts with fetching an image from the dataset stored in the data storage device ( 618 ).
  • the search process 616 parses metadata from the fetched image ( 619 ) and extracts stored features ( 620 ).
  • the stored features for the fetched image can be partial features extracted by the extraction process 610 .
  • the search process 616 utilizes the extracted features in the feature extraction process 610 as key values in combination with the query's associated search criteria.
  • the data storage device can successively examine candidate database images from the dataset stored in the memory array of the data storage device, compare the stored features (e.g., keys) of the candidate database images with the extracted features of the image data in the feature extraction process 610 based on the search criteria ( 621 ) and calculate Hamming distances to determine a closeness of match for each of the candidate database images ( 622 ).
  • the calculated Hamming distances are stored in an output buffer ( 623 ) for a selection process 624 .
  • These processes 618 - 623 repeat to generate a list of candidate query responses that the data storage device can algorithmically examine using Hamming distances as closeness of match.
  • the search process 616 can process search queries in various ways depending on various parameters including, but not limited to, the size of the dataset and a number of nearest matches to return.
  • the search process 616 searches binary codes.
  • the search process 616 uses a search query's K-nearest neighbors (K being a number of nearest neighbors) within a Hamming distance for a binary code similarity measure.
  • Binary codes are not necessarily distributed uniformly over a Hamming's space. Therefore, the search process 616 may not be able to set a fixed Hamming radius to ensure finding of the K number of matching data.
  • the maximum Hamming radius used in the search process 616 may depend on the K data, the length of the binary code, and the image query. Generally, the longer the binary code is, the larger maximum radius is.
  • the search process 616 can employ several methods to ensure finding of the K data.
  • Q is a length of a binary code
  • S is a substring of the binary code
  • R is a Hamming radius
  • N is a size of the dataset size
  • K is a number of nearest neighbors to search or being searched.
  • the search process 616 applies a parallel linear scan with or without re-ranking.
  • the search process 616 compares all binary codes until the K number of neighbors with less than the Hamming radius R is searched.
  • an optimal Hamming radius R can be obtained by: [ R ⁇ log 10( K )]/ Q ⁇ 0.1.
  • the search process 616 uses a single hash table. If a short binary code length (e.g., 16 to 32 bits) is used, and if the dataset is large (e.g., larger than 10 ⁇ circumflex over ( ) ⁇ 9 images), the search process 616 can use a single hash table for binary code indexing. The search process 616 searches for the entries that have the same binary code and a Hamming radius of 1. In this case, many entries might be empty.
  • a short binary code length e.g., 16 to 32 bits
  • the dataset e.g., larger than 10 ⁇ circumflex over ( ) ⁇ 9 images
  • the search process 616 uses a multi-index hash table.
  • the search process 616 can create multiple small hash tables and index them based on binary code substrings.
  • the search process 616 divides Q bits of the binary code to M disjoint substrings.
  • the dataset includes approximately 2 million images, and a 64-bit binary code is used, and the search process 616 searches for 100 nearest neighbors.
  • the search process 616 divides the binary code into 21 (log 2 (2M)) bit substrings, and 3 hash tables are required.
  • the query binary code is divided into 3 pieces, and the search process 616 searches for a Hamming radius of 3 in each hash table.
  • the data storage system can include a plurality of data storage devices.
  • a front-end data storage device can act like a cache and keep the hash tables. Based on the returned data from searching the hash tables, the search process 616 can access the main data storage for retrieving images that matches the image search request.
  • a selection process 624 After the data storage device examines all the stored candidate images ( 617 ), a selection process 624 starts.
  • the selection process 624 compares the computed Hamming distances for the candidate images to a threshold value ( 626 ).
  • the threshold value used in the selection process 624 can be an arbitrary constant or provided with the image search query. If the Hamming distance is shorter than the threshold ( 627 ), the image is selected ( 628 ), and the selected image is stored in the output buffer ( 629 ).
  • These processes 625 - 629 repeat to create a list of selected images, and the data storage device returns the list to the requesting application 600 .
  • a data storage device includes a memory array for storing data; a host interface for providing an interface with a host computer running an application; a central control unit configured to receive a command in a submission queue from the application and initiate a search process in response to a search query command; a preprocessor configured to reformat data contained in the search query command based on a type of the data and generate a reformatted data; and one or more data processing units configured to extract one or more features from the reformatted data and perform a data operation on the data stored in the memory array in response to the search query command and return matching data from the data stored in the memory array to the application via the host interface.
  • the data storage device may be a non-volatile memory express over fabrics (NVMeoF) device.
  • NVMeoF non-volatile memory express over fabrics
  • the search query command may be an NVMeoF command received over an Ethernet connection.
  • the search query command may include an image data, and the preprocessor may be configured to convert the image data in an RGB format.
  • the one or more data processing units may be configured to generate a plurality of binary codes corresponding to the one or more extracted features, and store the one or more extracted features in an input buffer.
  • Each of the one or more data processing units may include one or more convolution engines (CEs).
  • Each of the one or more convolution engines may be configured to extract the one or more features in a convolutional neural network (CNN) layer.
  • CNN convolutional neural network
  • Each of the one or more convolution engines may be configured to compute a neural descriptor for each convolutional neural network (CNN) layer, compress the neural descriptor, and store the one or more extracted features in an output buffer.
  • CNN convolutional neural network
  • the one or more data processing units may further be configured to extract stored features for the data stored in the memory array and compare the stored features with the extracted features of the reformatted data.
  • the one or more data processing units may further be configured to calculate a Hamming distance for each of the data stored in the memory array.
  • the one or more data processing units may further be configured to select the matching data from the data stored in the memory array based on the Hamming distance, store the matching data in the output buffer, and transmit the matching data to the application over the host interface.
  • the data storage device may further include a memory manager including one or more graphics processing units (GPUs) configured to perform machine learning on the data stored in the memory array.
  • GPUs graphics processing units
  • a method for operating a data storage device includes: receiving a command in a submission queue from an application running on a host computer; initiating a search process in response to a search query command; generating a reformatted data by changing a format of data contained in the search query command based on a type of the data; extracting one or more features from the reformatted data; performing a data operation on data stored in a memory array of the data storage device in response to the search query command; and returning matching data from the data stored in the memory array to the application via a host interface established between the host computer and the data storage device.
  • the data storage device may be a non-volatile memory express over fabrics (NVMeoF) device.
  • NVMeoF non-volatile memory express over fabrics
  • the search query command may be an NVMeoF command received over an Ethernet connection.
  • the method may further include converting an image data included in the search query command in an RGB format.
  • the method may further include: generating a plurality of binary codes corresponding to the one or more extracted features; and storing the one or more extracted features in an input buffer.
  • the features may be extracted using one or more convolution engines (CEs) of the data storage device.
  • Each of the one or more convolution engines may extract the one or more features in a convolutional neural network (CNN) layer.
  • CNN convolutional neural network
  • Each of the one or more convolution engines may be configured to compute a neural descriptor for each convolutional neural network (CNN) layer, compress the neural descriptor, and store the one or more extracted features in an output buffer.
  • CNN convolutional neural network
  • the method may further include: extracting stored features for the data stored in the memory array; and comparing the stored features with the extracted features of the reformatted data.
  • the method may further include: calculating a Hamming distance for each of the data stored in the memory array.
  • the method may further include: selecting the matching data from the data stored in the memory array based on the Hamming distance; and storing the matching data in the output buffer; and transmitting the matching data to the application over the host interface.
  • the data storage device may further include a memory manager including one or more graphics processing units (GPUs) configured to perform machine learning on the data stored in the memory array.
  • GPUs graphics processing units
  • the resulting method, process, apparatus, device, product, and/or system is straightforward, cost-effective, uncomplicated, highly versatile, accurate, sensitive, and effective, and can be implemented by adapting known components for ready, efficient, and economical manufacturing, application, and utilization.
  • Another important aspect of an embodiment of the present disclosure is that it valuably supports and services the historical trend of reducing costs, simplifying systems, and increasing performance.

Abstract

A data storage device includes a memory array for storing data; a host interface for providing an interface with a host computer running an application; a central control unit configured to receive a command in a submission queue from the application and initiate a search process in response to a search query command; a preprocessor configured to reformat data contained in the search query command and generate a reformatted data; and one or more data processing units configured to extract one or more features from the reformatted data and perform a data operation on the data stored in the memory array in response to the search query command and return matching data from the data stored in the memory array to the application via the host interface.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)
This application is claims the benefits of and priority to U.S. Provisional Patent Application Ser. No. 62/441,073 filed Dec. 30, 2016, the disclosure of which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
The present disclosure relates generally to data storage devices, and more particularly to data storage devices that are capable of performing data matching and machine learning on the data stored thereon.
BACKGROUND
Non-volatile memory (NVMe) over Fabrics (NVMeoF) is a new industry standard. NVMeoF defines a common architecture that supports an NVMe block storage protocol over a wide range of storage networking fabrics such as Ethernet, Fibre Channel, InfiniBand, and other network fabrics. For an NVMeoF-based system, an X86-based central processing unit (CPU) on a motherboard is no longer required to move data between an initiator (e.g., host software) and a target device (i.e., an NVMeoF device) because the target device is capable of moving data by itself. The term, fabric, represents a network topology in which network nodes can pass data to each other through a variety of interconnecting protocols, ports, and switches. For example, Ethernet-attached SSDs may attach directly to a fabric, and in this case the fabric is the Ethernet.
The standard form factor of NVMeoF devices is the same or very similar to the standard solid-state drive (SSD) and hard disk drive (HDD) to enable quick and easy deployment in existing rack systems in an enterprise or a datacenter. The NVMeoF devices provide high capacity, low latency data storage and operation environment for enterprise or datacenter applications.
The NVMeoF devices are not optimized for data-centric applications such as machine learning and data mining applications. Currently, NVMeoF devices including fabric-attached SSDs (eSSDs) merely respond to a request by an application running on a host computer and provide data requested by the application or perform only basic operations on the data stored thereon. Most of the data matching or machine learning capabilities are performed by CPUs and/or graphics processing units (GPUs) on a host computer that are external to the NVMeoF devices.
SUMMARY
According to one embodiment, a data storage device includes a memory array for storing data; a host interface for providing an interface with a host computer running an application; a central control unit configured to receive a command in a submission queue from the application and initiate a search process in response to a search query command; a preprocessor configured to reformat data contained in the search query command based on a type of the data and generate a reformatted data; and one or more data processing units configured to extract one or more features from the reformatted data and perform a data operation on the data stored in the memory array in response to the search query command and return matching data from the data stored in the memory array to the application via the host interface.
A method for operating a data storage device includes: receiving a command in a submission queue from an application running on a host computer; initiating a search process in response to a search query command; generating a reformatted data by changing a format of data contained in the search query command based on a type of the data; extracting one or more features from the reformatted data; performing a data operation on data stored in a memory array of the data storage device in response to the search query command; and returning matching data from the data stored in the memory array to the application via a host interface established between the host computer and the data storage device.
The above and other preferred features, including various novel details of implementation and combination of events, will now be more particularly described with reference to the accompanying figures and pointed out in the claims. It will be understood that the particular systems and methods described herein are shown by way of illustration only and not as limitations. As will be understood by those skilled in the art, the principles and features described herein may be employed in various and numerous embodiments without departing from the scope of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are included as part of the present specification, illustrate the presently preferred embodiment and together with the general description given above and the detailed description of the preferred embodiment given below serve to explain and teach the principles described herein.
FIG. 1 shows a block diagram of an example data storage device, according to one embodiment;
FIG. 2 is a block diagram illustrating a behavioral view of an example neural code accelerator, according to one embodiment;
FIG. 3 shows an example GPU configured to implement one or more convolution engines (CEs), according to one embodiment;
FIG. 4 shows an example data storage device including an XOR engine, according to one embodiment;
FIG. 5 shows an example data storage device including a GPU for machine learning, according to one embodiment; and
FIGS. 6A and 6B show a flowchart for an example image search query and retrieval process, according to one embodiment.
The figures are not necessarily drawn to scale and elements of similar structures or functions are generally represented by like reference numerals for illustrative purposes throughout the figures. The figures are only intended to facilitate the description of the various embodiments described herein. The figures do not describe every aspect of the teachings disclosed herein and do not limit the scope of the claims.
DETAILED DESCRIPTION
In an embodiment of the present disclosure, a data storage device capable of data matching and machine learning is disclosed. Machine learning can include algorithms that can learn from data including artificial intelligence, getting computers to act without being explicitly programmed, automated reasoning, automated adaptation, automated decision making, automated learning, the ability for a computer to learn without being explicitly programmed, artificial intelligence (AI), or combination thereof. Machine learning can be considered a type of artificial intelligence (AI). Machine learning can include classification, regression, feature learning, online learning, unsupervised learning, supervised learning, clustering, dimensionality reduction, structured prediction, anomaly detection, neural nets, or combination thereof.
In an embodiment of the present disclosure, a learning system can include machine learning systems that can process or analyze “big data.” Parallel or distributed storage devices with in-storage-computing (ISC) can accelerate big data machine learning and analytics. The parallel or distributed learning system can offload functions to ISC for additional bandwidth and reduce input and output (I/O) for the storage and host processor. This parallel or distributed learning system can provide machine learning with ISC.
In an embodiment of the present disclosure, a parallel or distributed learning system can be implemented with in-storage-computing (ISC), a scheduler, or combination thereof. ISC can provide significant improvements in the learning system including parallel or distributed learning. ISC can provide another processor for machine learning, an accelerator for assisting a host central processing unit, or combination thereof, such as preprocessing at an ISC to relieve a bandwidth bottleneck once detected. The scheduler can intelligently assign data, tasks, functions, operations, or combination thereof.
The following embodiments are described in sufficient detail to enable those skilled in the art to make and use the present disclosure. It is to be understood that other embodiments would be evident based on the present disclosure, and that system, process, or mechanical changes may be made without departing from the scope of an embodiment of the present disclosure.
In the following description, numerous specific details are given to provide a thorough understanding of the present disclosure. However, it will be apparent that the present disclosure may be practiced without these specific details. In order to avoid obscuring an embodiment of the present disclosure, some well-known circuits, system configurations, and process steps are not disclosed in detail.
The drawings showing embodiments of the system are semi-diagrammatic, and not to scale and, particularly, some of the dimensions are for the clarity of presentation and are shown exaggerated in the drawing figures. Similarly, although the views in the drawings for ease of description generally show similar orientations, this depiction in the figures is arbitrary for the most part. Generally, the present disclosure can be operated in any orientation. The embodiments have been numbered first embodiment, second embodiment, etc. as a matter of descriptive convenience and are not intended to have any other significance or provide limitations for an embodiment of the present disclosure.
According to one embodiment, a host computer and one or more data storage devices can collectively perform data matching or machine learning operations. Depending on the datasets and the algorithms that are employed, the data matching or machine learning operations can be partitioned into host operations and device operations. The partitioning into the host operations and the device operations can depend on the optimization of a computational time and power efficiency for operating on a specific usage model. If a specific part of the data matching or machine learning operations performed by a subsystem (either the host computer or the data storage device) can result in a faster and more efficient execution, that specific part of the operations can be partitioned into the corresponding subsystem.
For example, in a facial recognition operation, the dataset of trained faces may be stored in a data storage device. The dataset of trained faces may include binary codes or feature vectors extracted from the trained face images. For training of a new face, the entire or a part of the newly trained facial dataset or data model can be copied from the data storage device to a memory of the host computer. The host computer can perform the new facial training operation using the dataset copied to the host computer's memory. That is, the data storage device may receive the data of the new face and send corresponding neural binary codes or feature vectors to facilitate the new facial training operation performed by the host computer. Once a new facial recognition model is completed, the host computer can keep the newly trained facial recognition model in the host computer's memory for an additional training or copy the newly trained facial recognition model back to the data storage device to update the dataset of trained faces. This process can repeat for a newly received facial dataset and training for a new model based on the facial dataset.
According to one embodiment, the host computer can perform the data matching or machine learning operations in a framework that supports coordination with the data storage device that stores the dataset. The performance of the framework can be highly dependent on the usage model and deployment parameters, such as a size of the images, a number of training iterations, a size of the dataset, a training algorithm, a floating-point performance, etc. For example, in the case of facial recognition, the size of the dataset of trained faces may get larger over time. Because the memory of the host computer is costly, the data storage device can partially or fully perform the facial recognition operation instead of copying the dataset stored in the data storage device to the memory of the host computer to perform the data matching or machine learning operations in the host computer.
According to one embodiment, the present disclosure provides a data storage device that can internally perform data matching or machine learning operations. The data storage device can be any of a solid-state drive (SSD), a hard disk drive (HDD), an NVMe device that is compatible with the NVMe standard, an NVMeoF device that is compatible with the NVMeoF standard, or any other fabric-attached SSDs (eSSDs). It is noted that any other type of devices that can store data and perform data matching or machine learning can be used without deviating from the scope of the present disclosure.
FIG. 1 shows a block diagram of an example data storage device, according to one embodiment. The data storage device 100 includes a central control unit (CCU) 111, a preprocessor 112, an embedded DRAM 113, a signature thresholding engine 114, a direct memory access (DMA) engine 115, a controller 116, an input buffer 117, a weight buffer 118, an output buffer 119, one or more processing units 120, and a memory array 130. Various images, text, video, audio, or other data can be stored in the memory array 130. Although the memory array 130 is shown to be local to the data storage device 100, it is noted that the memory array 130 can be remotely connected to the data storage device 100 via the fabrics such as the Ethernet. For example, the memory array 130 can be a flash array that may reside in another NVMeoF device. In particular, according to the NVMeoF standard, the placement and physical attachment of the memory array 130 may not be a physical limitation in that the data stored in the memory array 130 can be accessed by any host computer accessible by the NVMeoF protocols. In this way, the controller 116 and the one or more data processing units 120 of one data storage device can operate on data stored in the memory array 130 of itself or another data storage device over the fabrics.
According to one embodiment, the data storage device 100 can be integrated circuits, integrated circuit cores, integrated circuit components, microelectromechanical system (MEMS), passive devices, or a combination thereof having a form factor compatible with the NVMe and/or NVMeoF standards. However, it is noted that various form factors of the data storage device 100 can be used without deviating from the scope of the present disclosure.
According to one embodiment, the data storage device 100 is an NVMeoF device, and the connection between a host computer (not shown) and the fabric attached to the NVMeoF device is an Ethernet connection. In this case, the host computer can send NVMeoF commands directly to the NVMeoF device over the Ethernet connection. It is noted that various other fabrics such as Fibre Channel, InfiniBand, and other network fabrics can be used to establish the communication between the data storage device 100 and the host computer.
The data storage device 100 can receive a command 150 from an application running on the host computer. According to one embodiment, the command 150 can be a vendor-specific fabric command (e.g., an NVMeoF command). The command 150 can be a normal read/write operation command, an image search inquiry command, or a machine learning command to operate on the data stored in the memory array 130. The command 150 can be received in a submission queue (SQ). One submission queue can include several commands 150. In some embodiments, a single submission queue can include the same or similar type of commands 150, for example, read/write operation commands. Similar or same commands 150 can be sorted by the application and packaged in different submission queues for efficient delivery and processing of the commands 150.
The controller 116 is configured to perform various data operations including data matching and/or machine learning operations on the data stored in the memory array 130. For example, the controller 116 can run a state machine or perform data matching operations in conjunction with the CCU 111. The data storage device 100 can internally perform the data operations with no or minimal interaction with the application or with the host computer. In this case, the latency to complete the requested operation can be improved with less power consumed due to less data movement between the host and the data storage device. When the requested data operation is completed, the data storage device 100 provides the matching data 151 to the application running on the host computer.
According to one embodiment, the CCU 111 can decode the command 150 received from the host computer and generate one or more neural binary codes for internal and external consumption. For example, in response to an image search query command including an image data, the CCU 111 initializes the preprocessor 112 to operate on the received image data. In some embodiments, the data storage device 100 can receive only the command 150 from the application to perform a data operation on the dataset that are stored in the memory array 130 instead of receiving both the command and dataset from the host computer. Examples of such data include but are not limited to, image data, text data, video data, and audio data. For example, for image data, the preprocessor 112 can convert the format of the image data and create a fixed-size RGB format data. The converted image data in the RGB format may further be scaled up or down for facilitating the extraction of various features from the image data. The analysis-ready fixed-size image data are saved in the DRAM 113 for the data operation as instructed by the command 150.
According to one embodiment, the data processing unit 120 is a neural code accelerator. Several data processing units 120 may be required for processing data on each of the received image data. For example, if ten images are received from the application in the command 150, a total of ten data processing units 120 can be invoked by the CCU 111. The number of invoked data processing units 120 may not necessarily match the number of received image data. Depending on the current workload and the availability of the data processing units 120, the CCU 111 can invoke a certain number of data processing units 120. In some embodiments, the data processing can be divided, grouped, or work in parallel or in series depending on the workload and the availability of the data processing units 120.
According to one embodiment, each of the data processing units 120 can incorporate one or more convolution engines (CEs). The image data (e.g., a facial image) received from the application are input to the data processing units 120 in batches, and each of the convolution engines can extract feature sets for each dataset that is grouped in batches. The feature sets that are extracted in parallel can be connected based on their weights. During the reconnection of the feature sets, the convolution weight parameters for each feature set can be loaded from the DRAM 113 via the weight buffer 118.
The data processing unit 120 can also have adder trees and optional pooling, and a rectified linear unit 121 to compute and connect the feature sets to compute the fully-connected neural layers. Using the fully-connected neural layers, herein also referred to as convolution neural network (CNN) layers, the data processing unit 120 can generate a feature vector 152 and send the feature vector 152 to the DRAM 113 via the DMA 115. The feature vector 152 can be converted to binary vectors and saved to the memory array 130. The feature vector 152 can be fetched by another processing unit (e.g., the signature thresholding 114) for compressing the feature vector 152 and comparing the extracted features (e.g., binary codes) with the saved features for the database images stored in the memory array 130. For hierarchical data retrieval, both binary and actual feature vectors can be saved to the memory array 130.
According to one embodiment, the rectified linear unit 121 implemented in each data processing engine 120 can provide a rectifier by employing an activation function for given inputs received from the associated convolution engines. For example, the rectified linear units 12 are applicable to computer vision using deep neural nets.
FIG. 2 is a block diagram illustrating a behavioral view of an example neural code accelerator, according to one embodiment. The neural code accelerator 200 includes a convolution and pooling filters 203, a fully connected layer 204, and a signature thresholding unit 205. The neural code accelerator 200 can optionally have a principal component analysis (PCA) layer 206. The neural code accelerator may be the data processing unit 120 shown in FIG. 1.
The neural code accelerator 200 can receive an input data 201 from a data buffer (e.g., a submission queue) from a host computer or a central control unit (e.g., CCU 111 shown in FIG. 1). A preprocessor 202 (e.g., preprocessor 112 shown in FIG. 1) can receive the input data 201. The input data 201 can be any type of data including, but not limited to, text, image, video, and audio data. According to some embodiments, the preprocessor 202 can be a part of or integrated into the neural code accelerator 200. For image data, the preprocessor 202 can perform initial processing of input data 201 to convert it to a raw RGB format and scale the image up or down to a fixed dimension. The convolution and pooling filters 203 can perform data processing on the converted and/or scaled data with a set of convolution filters. The output from the convolution and pooling filters 203 can be one or more features 207. The features 207 are fed to the fully connected layer 204. The fully connected layer 204 can generate a feature vector 208 based on the features 207 and feed the feature vector 208 to the signature thresholding unit 205 and optionally to the PCA layer 206. The signature thresholding unit 205 can generate one or more neural binary codes 210. For example, for an image data that is input to the neural code accelerator 200, the fully connected layer 204 can generates the feature vector 208, and the signature thresholding unit 205 can generate the neural binary codes 210 by finalizing activations of the feature vector 208 based on a predetermined threshold. The threshold may be fine-tuned by a user, training, or machine learning. The PCA layer 206 can condense the output feature vector 208 to generate compressed feature vectors 211.
According to one embodiment, the convolution engines can internally perform data matching and/or deep learning operations inside a data storage device. For example, the data storage device can have a plurality of GPUs, and each of the GPUS can include one or more convolution engines that are grouped in an input layer, a hidden layer, and an output layer.
FIG. 3 shows an example GPU configured to implement one or more convolution engines (CEs), according to one embodiment. The GPU 300 can run one or more CEs that are pre-programmed with fixed algorithms such as K-means or regression. A first group of the CEs are implemented as an input layer 301, a second group of the CEs are implemented as a hidden layer 302, and the third group of the CEs are implemented as an output layer 303. The data paths from the input layer 301 to the output layer 303 through the hidden layer 302 provide a forward path or an inference path. The data paths from the output layer 303 to the input layer 301 through the hidden layer 302 provide a backward or training path. The application can directly utilize the GPU 300 without having to download a specific algorithm because the CEs implemented in the GPU 300 are preprogrammed with algorithms such as K-means or regression that are applicable to a variety of analysis and deep learning. The data storage device implementing the GPU 300 can be a consumer device or a home device that can feature a machine learning capability.
According to one embodiment, the data storage device incorporates an additional data matching logic (e.g., XOR) and DMA engines. The data storage device can perform data matching in real time or as a background task. The application can provide one or more parameters for a matching data (e.g., raw binary values) to the data storage device, and the data storage device can internally execute and complete the pattern matching for the data, and return the matching data stored in the memory array to the application. In some embodiments, the data storage device can have one data matching (XOR) engine per bank/channel of an NAND array. For example, if the data storage device employs N number of independent channels/banks of the memory array (e.g., NAND array), a total of N XOR engines can be used to match the data from each NAND channel, where N is an integer number.
FIG. 4 shows an example data storage device including an XOR engine, according to one embodiment. The data storage device 400 can be an NVMeoF device that is capable of processing and moving data itself to and from a host computer. The host computer can run an application and communicate with the storage device 400 via the fabric interface. The data storage device 400 can include a host manager 402 that interfaces with the host computer via a host interface 401 (e.g., U.2 connector), a buffer manager 403 including a DRAM controller and a memory interface (e.g., DDR3 and DDR4), a memory manager 404 (e.g., flash manager) including a DMA engine 406 and an XOR engine 407, a CPU subsystem 405, and a memory array 410 (e.g., flash memory).
The XOR engine 407 is configured to perform in-line data matching in response to a data matching request received from the application. After performing the in-line data matching operation, the data storage device 400 can provide the matching data to the requesting application via the connector 401. The XOR engine 407 may be implemented in an existing hardware logic of the memory manager 404. This is cost effective because an additional hardware logic to implement the XOR engine 407 is not necessary. The DMA engine 406 can be used to transfer the matching data to the requesting application.
FIG. 5 shows an example data storage device including a GPU for machine learning, according to one embodiment. The data storage device 500 can be an NVMeoF device that is capable of processing and moving data itself to and from a host computer. The host computer can run an application via the fabrics that provides a communication path with the data storage device 500. The data storage device 500 can include a CPU subsystem 505, a host manager 502 that interfaces with the host computer via a host interface 501 (e.g., U.2 connector), a buffer manager 503 including a DRAM controller and a memory interface (e.g., DDR3 and DDR4), and a memory manager 504 (e.g., flash manager) including a DMA engine 506, an XOR engine 507, and one or more GPUs 508 a-508 n. The memory manager 504 can control access to the memory array 510 (e.g., flash memory) using the DMA engine 506, the XOR engine 507, and the GPUs 508. The GPUs can be configured to function as a convolution engine (CE) in an input layer, a hidden layer, and an output layer as shown in FIG. 3. The data storage device implementing the GPUs can be a consumer device or a home device that can feature a machine learning capability.
According to one embodiment, the present data storage device can store images in the memory array and internally run an image retrieval application in response to an image retrieval request from a host computer. In other embodiments, the storage device may store other types of data, such as text, audio, or video, among others. Herein, the case of image data will be used an example. One skilled in the art will recognize that the teachings are applicable to other data types as well.
According to one embodiment, the data storage device extract features from a received image data using one or more convolution neural network (CNN) engines. Referring to FIG. 1, the CNN refers to the collection of the convolution engines contained in the data storage device 100. The memory array 130 of the data storage device 100 can contain images with searchable image descriptor index values and other values to compute Hamming distances with respect to a requested image data during an image query search and retrieval process.
The CNN engines can create a binary neural code (herein also referred to as a binary key) that can be used for interrogation and/or comparison against the database images stored in the data storage device. In one embodiment, the binary neural code refers to a key that is stored in a metadata of each stored image data. The CNN engines can provide a key of better quality. In one embodiment, the key can be created by deep learning performed elsewhere to generate a partial result. As more deep learning or image processing occurs, more refined keys can be generated, and the image retrieval process be become faster and more efficient.
According to one embodiment, the creation of the binary keys can use various forms, sizes, and types of the input data. For example, the preprocessor of the data storage device can convert the format of an input image to a fixed sized format (e.g., RGB 256×256 bitmap). For other types of data (such as text, audio, or video data), other reformatting or normalization process may be done by the pre-processor to format the input data, depending on the data type, and as would be recognized by one having skill in the art. The preprocessed input image (or other data) is fed to the CNN engines. The CNN engines process the input image, loop iteratively, extracting one or more binary codes. The extracted binary codes are placed in the metadata of associated with the input image (or other data) for searching and selecting matching data stored in the data storage device.
According to one embodiment, the search and the selection process can accept a search value, create a search signature, and compare the search signature with an existing signature. In one embodiment, the search process can calculate Hamming distances, and select matching data that has a Hamming distance less than a threshold value. The binary search and selection algorithm based on a k-nearest neighbor search in a Hamming space is well known in the art, for example, in an article entitled “Fast Exact Search in Hamming Space with Multi-Index Hashing” EEE Transactions on Pattern Analysis and Machine Intelligence archive, volume 36 Issue 6, June 2014, page 1107-1119.
FIGS. 6A and 6B show a flowchart for an example image search query and retrieval process, according to one embodiment. An application 600 submits a sample image a data storage device. The application 600 submits a search request to the data storage device by placing an image and associated query information in a submission queue (SQ).
A submission queue can include several search requests submitted from the application 600, and the application 600 can submit several submission queues to the data storage device. The submission queues can be submitted to the data storage device in various ways. For example, the submission queues can be consumed by the data storage device on a predetermined interval or on a first-come first-serve basis. In another example, the data storage device can be notified that the submission queues are ready for serving via a message from the application 600, and the data storage device can serve the submission queues one at a time. Depending on the urgency or priority, submission queues can be reordered, and search requests in a submission queue can be reordered.
A submission queue from the application 600 can include a command associate with a request contained in the submission queue. For example, the data storage device is an NVMeoF device, and the NVMeoF device determines that the command is an NVMeoF command for an image search query (601). The NVMeoF device can receive various NVMeoF commands from the application, and the NVMeoF device can determine an image search query from the received NVMeoF commands for further processing.
According to one embodiment, the data storage device can arbitrate among the submission queues received from the application 600 (602). In one embodiment, the arbitration and the subsequent selection of an associated command can be performed by a central control unit 111 shown in FIG. 1. The arbitration by the data storage device determines a submission queue to extract from the submission queues. Once a submission queue is extracted by arbitration, the data storage device determines a proper I/O request command to process the request contained in the submission queue (603). If the I/O request command to process the submission queue is an image search query command (604), the data storage device starts a preprocessing process 606 to retrieve images that matches the image search request. In one embodiment, the preprocessing process 606 is performed by an integrated preprocessor, for example, the preprocessor 112 shown in FIG. 1. If the I/O request command to process the submission queue is not an image search query command, for example, a read/write request to the data stored in the data storage device, the data storage device treats the request associated with the submission queue as a normal request and processes it normally (605).
According to one embodiment, the data storage device preprocesses the image search request to reduce the size of the image received in the submission queue to a normalized format in preparation for processing. For example, the data storage device converts the image from a YUV color format to a RGB color format and further scales the converted image to a 256×256 pixel image (607). The data storage device can further convert this intermediate image to a monochromatic bitmap image and sharpen edges to generate a normalized image (608). The data storage device places the normalized image and associated search criteria into an input buffer (609) to start a feature extraction process 610.
The feature extraction process 610 starts with an activation of a CNN engine that is internal to the data storage device (611). For example, the CNN engine corresponds to the convolution engine shown in FIG. 1. The CNN engine submits the normalized image to one or more procedural reduction layers, herein referred to as CNN layers, to extract specific features from the normalized image. For each CNN layer, the CNN engine computes a neural descriptor (613), compresses the neural descriptor (614), and stores the result in an output buffer (615). These processes 613-615 iteratively continue for all CNN layers.
When all the CNN layers are complete (612), the data storage device (e.g., the processing unit 120 of FIG. 1 or the CNN engine) initiates a search process 616. The search process 616 starts with fetching an image from the dataset stored in the data storage device (618). After fetching the image, the search process 616 parses metadata from the fetched image (619) and extracts stored features (620). The stored features for the fetched image can be partial features extracted by the extraction process 610. The search process 616 utilizes the extracted features in the feature extraction process 610 as key values in combination with the query's associated search criteria.
The data storage device can successively examine candidate database images from the dataset stored in the memory array of the data storage device, compare the stored features (e.g., keys) of the candidate database images with the extracted features of the image data in the feature extraction process 610 based on the search criteria (621) and calculate Hamming distances to determine a closeness of match for each of the candidate database images (622). The calculated Hamming distances are stored in an output buffer (623) for a selection process 624. These processes 618-623 repeat to generate a list of candidate query responses that the data storage device can algorithmically examine using Hamming distances as closeness of match.
The search process 616 can process search queries in various ways depending on various parameters including, but not limited to, the size of the dataset and a number of nearest matches to return. According to embodiment, the search process 616 searches binary codes. For example, the search process 616 uses a search query's K-nearest neighbors (K being a number of nearest neighbors) within a Hamming distance for a binary code similarity measure. Binary codes are not necessarily distributed uniformly over a Hamming's space. Therefore, the search process 616 may not be able to set a fixed Hamming radius to ensure finding of the K number of matching data. The maximum Hamming radius used in the search process 616 may depend on the K data, the length of the binary code, and the image query. Generally, the longer the binary code is, the larger maximum radius is.
According to one embodiment, the search process 616 can employ several methods to ensure finding of the K data. In describing the search process 616, the following terminology will be used in accordance with the definitions set out below. Q is a length of a binary code, S is a substring of the binary code, R is a Hamming radius, N is a size of the dataset size, and K is a number of nearest neighbors to search or being searched.
According to one embodiment, the search process 616 applies a parallel linear scan with or without re-ranking. During the parallel linear scan, the search process 616 compares all binary codes until the K number of neighbors with less than the Hamming radius R is searched. For example, an optimal Hamming radius R can be obtained by:
[R−log 10(K)]/Q≅0.1.
However, the search process 616 may tweak and adapt the constant 0.1 based on the dataset and the search criterion. For example, if a binary code is 64-bit long and the search process 616 searches for 10 nearest neighbors, the search process 616 can collect data with a Hamming distance up to:
R−log(10)=0.1*64−>R=7.
In another example, if a binary code is 128-bit long, and the search process 616 searches for 1000 nearest neighbors, the search process 616 can collect data with a Hamming radius up to:
R−log(1000)=0.1*128−>R=15.
In this case, the search process 616 may or may not return the 1000 nearest neighbors because the search process 616 is greedy to find the first K neighbors. Although this process may not be efficient, this process be the simplest approach requiring the least storage space for the binary codes and the image's address. In one embodiment, the search process 616 may employ multiple parallel XOR engines to work on different binary code chunks to facilitate the search process to efficiently and quickly searches for the matching data.
According to one embodiment, the search process 616 uses a single hash table. If a short binary code length (e.g., 16 to 32 bits) is used, and if the dataset is large (e.g., larger than 10{circumflex over ( )}9 images), the search process 616 can use a single hash table for binary code indexing. The search process 616 searches for the entries that have the same binary code and a Hamming radius of 1. In this case, many entries might be empty.
According to one embodiment, the search process 616 uses a multi-index hash table. In this case, the search process 616 can create multiple small hash tables and index them based on binary code substrings. In one embodiment, the search process 616 divides Q bits of the binary code to M disjoint substrings. The substring length S is chosen by and (N) where:
S=log 2(N)=M=Q/S.
When a search query arrives, the search process 616 checks the query binary code with all M hash tables. In each hash table, the search process 616 can check entries with the R′=R/M distance.
For example, the dataset includes approximately 2 million images, and a 64-bit binary code is used, and the search process 616 searches for 100 nearest neighbors. In this case, the search process 616 divides the binary code into 21 (log2(2M)) bit substrings, and 3 hash tables are required. The optimal R value can be obtained by: R=0.1*64+log 10(100)=9 bits, and R′ can be 9/3=3. In this case, the query binary code is divided into 3 pieces, and the search process 616 searches for a Hamming radius of 3 in each hash table.
According to one embodiment, the data storage system can include a plurality of data storage devices. A front-end data storage device can act like a cache and keep the hash tables. Based on the returned data from searching the hash tables, the search process 616 can access the main data storage for retrieving images that matches the image search request.
After the data storage device examines all the stored candidate images (617), a selection process 624 starts. The selection process 624 compares the computed Hamming distances for the candidate images to a threshold value (626). The threshold value used in the selection process 624 can be an arbitrary constant or provided with the image search query. If the Hamming distance is shorter than the threshold (627), the image is selected (628), and the selected image is stored in the output buffer (629). These processes 625-629 repeat to create a list of selected images, and the data storage device returns the list to the requesting application 600.
According to one embodiment, a data storage device includes a memory array for storing data; a host interface for providing an interface with a host computer running an application; a central control unit configured to receive a command in a submission queue from the application and initiate a search process in response to a search query command; a preprocessor configured to reformat data contained in the search query command based on a type of the data and generate a reformatted data; and one or more data processing units configured to extract one or more features from the reformatted data and perform a data operation on the data stored in the memory array in response to the search query command and return matching data from the data stored in the memory array to the application via the host interface.
The data storage device may be a non-volatile memory express over fabrics (NVMeoF) device.
The search query command may be an NVMeoF command received over an Ethernet connection.
The search query command may include an image data, and the preprocessor may be configured to convert the image data in an RGB format.
The one or more data processing units may be configured to generate a plurality of binary codes corresponding to the one or more extracted features, and store the one or more extracted features in an input buffer.
Each of the one or more data processing units may include one or more convolution engines (CEs). Each of the one or more convolution engines may be configured to extract the one or more features in a convolutional neural network (CNN) layer.
Each of the one or more convolution engines may be configured to compute a neural descriptor for each convolutional neural network (CNN) layer, compress the neural descriptor, and store the one or more extracted features in an output buffer.
The one or more data processing units may further be configured to extract stored features for the data stored in the memory array and compare the stored features with the extracted features of the reformatted data.
The one or more data processing units may further be configured to calculate a Hamming distance for each of the data stored in the memory array.
The one or more data processing units may further be configured to select the matching data from the data stored in the memory array based on the Hamming distance, store the matching data in the output buffer, and transmit the matching data to the application over the host interface.
The data storage device may further include a memory manager including one or more graphics processing units (GPUs) configured to perform machine learning on the data stored in the memory array.
A method for operating a data storage device includes: receiving a command in a submission queue from an application running on a host computer; initiating a search process in response to a search query command; generating a reformatted data by changing a format of data contained in the search query command based on a type of the data; extracting one or more features from the reformatted data; performing a data operation on data stored in a memory array of the data storage device in response to the search query command; and returning matching data from the data stored in the memory array to the application via a host interface established between the host computer and the data storage device.
The data storage device may be a non-volatile memory express over fabrics (NVMeoF) device.
The search query command may be an NVMeoF command received over an Ethernet connection.
The method may further include converting an image data included in the search query command in an RGB format.
The method may further include: generating a plurality of binary codes corresponding to the one or more extracted features; and storing the one or more extracted features in an input buffer.
The features may be extracted using one or more convolution engines (CEs) of the data storage device. Each of the one or more convolution engines may extract the one or more features in a convolutional neural network (CNN) layer.
Each of the one or more convolution engines may be configured to compute a neural descriptor for each convolutional neural network (CNN) layer, compress the neural descriptor, and store the one or more extracted features in an output buffer.
The method may further include: extracting stored features for the data stored in the memory array; and comparing the stored features with the extracted features of the reformatted data.
The method may further include: calculating a Hamming distance for each of the data stored in the memory array.
The method may further include: selecting the matching data from the data stored in the memory array based on the Hamming distance; and storing the matching data in the output buffer; and transmitting the matching data to the application over the host interface.
The data storage device may further include a memory manager including one or more graphics processing units (GPUs) configured to perform machine learning on the data stored in the memory array.
The resulting method, process, apparatus, device, product, and/or system is straightforward, cost-effective, uncomplicated, highly versatile, accurate, sensitive, and effective, and can be implemented by adapting known components for ready, efficient, and economical manufacturing, application, and utilization. Another important aspect of an embodiment of the present disclosure is that it valuably supports and services the historical trend of reducing costs, simplifying systems, and increasing performance. These and other valuable aspects of an embodiment of the present disclosure consequently further the state of the technology to at least the next level.
While the present disclosure has been described in conjunction with a specific best mode, it is to be understood that many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the foregoing description. Accordingly, it is intended to embrace all such alternatives, modifications, and variations that fall within the scope of the included claims. All matters set forth herein or shown in the accompanying drawings are to be interpreted in an illustrative and non-limiting sense.

Claims (22)

What is claimed is:
1. A data storage device comprising:
a memory array for storing data;
a host interface providing a data communication path between a host computer running an application and the data storage device, the data communication path being compatible with non-volatile memory express over fabrics (NVMeoF);
a central control unit implemented in the data storage device and receiving a command from the application via the communication path and placing the command in a submission queue and initiating a search process in response to a search query command queued in the submission queue;
a preprocessor implemented in the data storage device and reformatting data contained in the search query command based on a type of the data and generating a reformatted data; and
one or more data processing units implemented in the data storage device and extracting a feature from the reformatted data, performing a data operation on the data stored in the memory array in response to the search query command, and returning matching data from the data stored in the memory array to the application via the host interface.
2. The data storage device of claim 1, wherein the data storage device is directly attachable to a fabric compatible with NVMeoF.
3. The data storage device of claim 2, wherein the search query command is an NVMeoF command received over an Ethernet connection.
4. The data storage device of claim 1, wherein the search query command includes an image data, and wherein the preprocessor converts the image data in an RGB format.
5. The data storage device of claim 1, wherein the one or more data processing units generate a binary code corresponding to the feature, and store the feature in an input buffer.
6. The data storage device of claim 1, wherein at least one of the one or more data processing units includes a convolution engine (CE) that extracts the feature in a convolutional neural network (CNN) layer.
7. The data storage device of claim 6, wherein the at least one of the one or more convolution engines computes a neural descriptor for a convolutional neural network (CNN) layer, compresses the neural descriptor, and stores the feature in an output buffer.
8. The data storage device of claim 1, wherein the one or more data processing units further extract a stored feature for the data stored in the memory array and compare the stored feature with the feature of the reformatted data.
9. The data storage device of claim 8, wherein the one or more data processing units further calculate a Hamming distance for the data stored in the memory array.
10. The data storage device of claim 9, wherein the one or more data processing units further select the matching data from the data stored in the memory array based on the Hamming distance, store the matching data in the output buffer, and transmit the matching data to the application over the data communication path.
11. The data storage device of claim 1, further comprising a memory manager including one or more graphics processing units (GPUs) that performs machine learning on the data stored in the memory array.
12. A method for operating a data storage device comprising:
receiving a command from an application running on a host computer via a data communication path between the host computer and the data storage device, the data communication path being compatible with non-volatile memory express over fabrics (NVMeoF);
placing the command in a submission queue;
initiating, in the data storage device, a search process in response to a search query command queued in the submission queue;
generating, in the data storage device, a reformatted data by changing a format of data contained in the search query command based on a type of the data;
extracting, in the data storage device, a feature from the reformatted data;
performing, in the data storage device, a data operation on data stored in a memory array of the data storage device in response to the search query command; and
returning matching data from the data stored in the memory array to the application via the data communication path between the host computer and the data storage device.
13. The method of claim 12, wherein the data storage device is directly attachable to a fabric compatible with NVMeoF.
14. The method of claim 13, wherein the search query command is an NVMeoF command received over an Ethernet connection.
15. The method of claim 12, further comprising converting an image data included in the search query command in an RGB format.
16. The method of claim 12, further comprising:
generating a binary code corresponding to the feature; and
storing the feature in an input buffer.
17. The method of claim 12, wherein the feature is extracted using a convolution engine of the data storage device, and wherein the convolution engine extracts the feature in a convolutional neural network (CNN) layer.
18. The method of claim 17, wherein the convolution engine computes a neural descriptor for a convolutional neural network (CNN) layer, compresses the neural descriptor, and stores the feature in an output buffer.
19. The method of claim 12, further comprising:
extracting a stored feature for the data stored in the memory array; and
comparing the stored feature with the feature of the reformatted data.
20. The method of claim 19, further comprising:
calculating a Hamming distance for the data stored in the memory array.
21. The method of claim 20, further comprising:
selecting the matching data from the data stored in the memory array based on the Hamming distance;
storing the matching data in the output buffer; and
transmitting the matching data to the application over the data communication path.
22. The method of claim 12, wherein the data storage device includes a memory manager including one or more graphics processing units (GPUs) that performs machine learning on the data stored in the memory array.
US15/472,061 2016-12-30 2017-03-28 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD Active 2039-11-18 US11010431B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/472,061 US11010431B2 (en) 2016-12-30 2017-03-28 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD
KR1020170139006A KR102449191B1 (en) 2016-12-30 2017-10-25 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet solid state drive
US17/322,601 US20210279285A1 (en) 2016-12-30 2021-05-17 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662441073P 2016-12-30 2016-12-30
US15/472,061 US11010431B2 (en) 2016-12-30 2017-03-28 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/322,601 Continuation US20210279285A1 (en) 2016-12-30 2021-05-17 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd

Publications (2)

Publication Number Publication Date
US20180189635A1 US20180189635A1 (en) 2018-07-05
US11010431B2 true US11010431B2 (en) 2021-05-18

Family

ID=62711728

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/472,061 Active 2039-11-18 US11010431B2 (en) 2016-12-30 2017-03-28 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD
US17/322,601 Pending US20210279285A1 (en) 2016-12-30 2021-05-17 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/322,601 Pending US20210279285A1 (en) 2016-12-30 2021-05-17 Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd

Country Status (2)

Country Link
US (2) US11010431B2 (en)
KR (1) KR102449191B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220075724A1 (en) * 2020-09-09 2022-03-10 Micron Technology, Inc. Memory controllers including examples of calculating hamming distances for neural network and data center applications
US11586380B2 (en) 2020-09-09 2023-02-21 Micron Technology, Inc. Memory systems including examples of calculating hamming distances for neural network and data center applications
US11636285B2 (en) 2020-09-09 2023-04-25 Micron Technology, Inc. Memory including examples of calculating hamming distances for neural network and data center applications
US20230274193A1 (en) * 2020-08-27 2023-08-31 Google Llc Data Management Forecasting Forecasting from Distributed Tracing
US11966827B2 (en) * 2023-05-10 2024-04-23 Google Llc Data management forecasting from distributed tracing

Families Citing this family (116)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573652B (en) * 2015-01-04 2017-12-22 华为技术有限公司 Determine the method, apparatus and terminal of the identity of face in facial image
US11417235B2 (en) * 2017-05-25 2022-08-16 Baidu Usa Llc Listen, interact, and talk: learning to speak via interaction
US10652206B1 (en) 2017-10-27 2020-05-12 EMC IP Holding Company LLC Storage system with network-wide configurable device names
US11436483B2 (en) * 2018-01-17 2022-09-06 Mediatek Inc. Neural network engine with tile-based execution
US10698766B2 (en) * 2018-04-18 2020-06-30 EMC IP Holding Company LLC Optimization of checkpoint operations for deep learning computing
US10757189B2 (en) 2018-04-30 2020-08-25 EMC IP Holding Company LLC Service level objection based input-output selection utilizing multi-path layer of host device
US10380997B1 (en) * 2018-07-27 2019-08-13 Deepgram, Inc. Deep learning internal state index-based search and classification
US11080337B2 (en) * 2018-07-31 2021-08-03 Marvell Asia Pte, Ltd. Storage edge controller with a metadata computational engine
US11050660B2 (en) 2018-09-28 2021-06-29 EMC IP Holding Company LLC Host device with multi-path layer implementing path selection based at least in part on fabric identifiers
US10754572B2 (en) 2018-10-09 2020-08-25 EMC IP Holding Company LLC Migrating control of a multi-path logical device from a current MPIO driver to a target MPIO driver
US11044313B2 (en) 2018-10-09 2021-06-22 EMC IP Holding Company LLC Categorizing host IO load pattern and communicating categorization to storage system
CN111078291B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
US10635355B1 (en) * 2018-11-13 2020-04-28 Western Digital Technologies, Inc. Bandwidth limiting in solid state drives
KR20200057475A (en) 2018-11-16 2020-05-26 삼성전자주식회사 Memory device including arithmetic circuit and neural network system including the same
US10880217B2 (en) 2018-12-24 2020-12-29 EMC IP Holding Company LLC Host device with multi-path layer configured for detection and resolution of oversubscription conditions
US10897627B2 (en) 2019-01-11 2021-01-19 Western Digital Technologies, Inc. Non-volatile memory system including a partial decoder and event detector for video streams
US11625487B2 (en) * 2019-01-24 2023-04-11 International Business Machines Corporation Framework for certifying a lower bound on a robustness level of convolutional neural networks
US10754559B1 (en) 2019-03-08 2020-08-25 EMC IP Holding Company LLC Active-active storage clustering with clock synchronization
US10929058B2 (en) * 2019-03-25 2021-02-23 Western Digital Technologies, Inc. Enhanced memory device architecture for machine learning
US11783176B2 (en) * 2019-03-25 2023-10-10 Western Digital Technologies, Inc. Enhanced storage device memory architecture for machine learning
US11157692B2 (en) * 2019-03-29 2021-10-26 Western Digital Technologies, Inc. Neural networks using data processing units
US11080152B2 (en) 2019-05-15 2021-08-03 Western Digital Technologies, Inc. Optimized neural network data organization
US11615340B2 (en) 2019-05-23 2023-03-28 EMC IP Holding Company LLC Methods and apparatus for application prediction through machine learning based analysis of IO patterns
US11016699B2 (en) 2019-07-19 2021-05-25 EMC IP Holding Company LLC Host device with controlled cloning of input-output operations
US11016783B2 (en) 2019-07-25 2021-05-25 EMC IP Holding Company LLC Secure storage access utilizing multi-path layer of host device to identify processes executed on the host device with authorization to access data of a storage system
US11169737B2 (en) * 2019-08-13 2021-11-09 Micron Technology, Inc. Speculation in memory
US11055003B2 (en) 2019-08-20 2021-07-06 Micron Technology, Inc. Supplemental AI processing in memory
KR20210032222A (en) * 2019-09-16 2021-03-24 에스케이하이닉스 주식회사 Memory controller and operating method thereof
US11086785B2 (en) 2019-09-24 2021-08-10 EMC IP Holding Company LLC Host device with storage cache aware processing of input-output operations in multi-path layer
US11012510B2 (en) 2019-09-30 2021-05-18 EMC IP Holding Company LLC Host device with multi-path layer configured for detecting target failure status and updating path availability
US10884935B1 (en) 2019-09-30 2021-01-05 EMC IP Holding Company LLC Cache allocation for controller boards based on prior input-output operations
US10936522B1 (en) 2019-09-30 2021-03-02 EMC IP Holding Company LLC Performing input-output multi-pathing from user space
US11379325B2 (en) 2019-10-04 2022-07-05 EMC IP Holding Company LLC Path failure information sharing between host devices connected to a storage system
US11366590B2 (en) 2019-10-11 2022-06-21 EMC IP Holding Company LLC Host device with multi-path layer providing dynamic control of one or more path selection algorithms
US11064194B2 (en) 2019-10-31 2021-07-13 Western Digital Technologies, Inc. Encoding digital videos using controllers of data storage devices
US11023161B1 (en) 2019-11-25 2021-06-01 EMC IP Holding Company LLC Host device with multi-path layer implementing efficient load balancing for active-active configuration
US11106381B2 (en) 2019-11-27 2021-08-31 EMC IP Holding Company LLC Automated seamless migration of logical storage devices
CN111045732B (en) * 2019-12-05 2023-06-09 腾讯科技(深圳)有限公司 Data processing method, chip, device and storage medium
US10841645B1 (en) 2019-12-09 2020-11-17 Western Digital Technologies, Inc. Storage system and method for video frame segregation to optimize storage
US11256421B2 (en) 2019-12-11 2022-02-22 EMC IP Holding Company LLC Path selection modification for non-disruptive upgrade of a host device
US11093155B2 (en) 2019-12-11 2021-08-17 EMC IP Holding Company LLC Automated seamless migration with signature issue resolution
US11372951B2 (en) 2019-12-12 2022-06-28 EMC IP Holding Company LLC Proxy license server for host-based software licensing
US11277335B2 (en) 2019-12-26 2022-03-15 EMC IP Holding Company LLC Host device with path selection modification responsive to mismatch in initiator-target negotiated rates
US11099755B2 (en) 2020-01-06 2021-08-24 EMC IP Holding Company LLC Multipath device pseudo name to logical volume mapping for host devices
US11461284B2 (en) * 2020-01-15 2022-10-04 EMC IP Holding Company LLC Method, device and computer program product for storage management
US11231861B2 (en) 2020-01-15 2022-01-25 EMC IP Holding Company LLC Host device with active-active storage aware path selection
US11461026B2 (en) 2020-01-21 2022-10-04 EMC IP Holding Company LLC Non-disruptive update of host multipath device dependency
US11520671B2 (en) 2020-01-29 2022-12-06 EMC IP Holding Company LLC Fast multipath failover
US11050825B1 (en) 2020-01-30 2021-06-29 EMC IP Holding Company LLC Storage system port usage information sharing between host devices
US11175840B2 (en) 2020-01-30 2021-11-16 EMC IP Holding Company LLC Host-based transfer of input-output operations from kernel space block device to user space block device
US11562018B2 (en) 2020-02-04 2023-01-24 Western Digital Technologies, Inc. Storage system and method for optimized surveillance search
US11093144B1 (en) 2020-02-18 2021-08-17 EMC IP Holding Company LLC Non-disruptive transformation of a logical storage device from a first access protocol to a second access protocol
US11449257B2 (en) 2020-02-21 2022-09-20 EMC IP Holding Company LLC Host device with efficient automated seamless migration of logical storage devices across multiple access protocols
US11204699B2 (en) 2020-03-05 2021-12-21 EMC IP Holding Company LLC Storage system port maintenance information sharing with host device
US11397589B2 (en) 2020-03-06 2022-07-26 EMC IP Holding Company LLC Snapshot transmission from storage array to cloud using multi-path input-output
US11042327B1 (en) 2020-03-10 2021-06-22 EMC IP Holding Company LLC IO operation cloning using change information sharing with a storage system
US11328511B2 (en) 2020-03-13 2022-05-10 Western Digital Technologies, Inc. Storage system and method for improved playback analysis
US11265261B2 (en) 2020-03-18 2022-03-01 EMC IP Holding Company LLC Access path management based on path condition
US11636291B1 (en) * 2020-04-06 2023-04-25 Amazon Technologies, Inc. Content similarity determination
US11368399B2 (en) 2020-03-27 2022-06-21 EMC IP Holding Company LLC Congestion aware multipathing based on network congestion notifications
US11080215B1 (en) 2020-03-31 2021-08-03 EMC IP Holding Company LLC Host device providing automated prediction of change intervals to reduce adverse impacts on applications
US11169941B2 (en) 2020-04-09 2021-11-09 EMC IP Holding Company LLC Host device with automated connectivity provisioning
US11366756B2 (en) 2020-04-13 2022-06-21 EMC IP Holding Company LLC Local cached data coherency in host devices using remote direct memory access
US11561699B2 (en) 2020-04-24 2023-01-24 EMC IP Holding Company LLC Input-output path selection using switch topology information
US11216200B2 (en) 2020-05-06 2022-01-04 EMC IP Holding Company LLC Partition utilization awareness of logical units on storage arrays used for booting
US11099754B1 (en) 2020-05-14 2021-08-24 EMC IP Holding Company LLC Storage array with dynamic cache memory configuration provisioning based on prediction of input-output operations
US11175828B1 (en) 2020-05-14 2021-11-16 EMC IP Holding Company LLC Mitigating IO processing performance impacts in automated seamless migration
US11012512B1 (en) 2020-05-20 2021-05-18 EMC IP Holding Company LLC Host device with automated write throttling responsive to storage system write pressure condition
US11023134B1 (en) 2020-05-22 2021-06-01 EMC IP Holding Company LLC Addition of data services to an operating system running a native multi-path input-output architecture
US11151071B1 (en) 2020-05-27 2021-10-19 EMC IP Holding Company LLC Host device with multi-path layer distribution of input-output operations across storage caches
US11226851B1 (en) 2020-07-10 2022-01-18 EMC IP Holding Company LLC Execution of multipath operation triggered by container application
US11256446B1 (en) 2020-08-03 2022-02-22 EMC IP Holding Company LLC Host bus adaptor (HBA) virtualization aware multi-pathing failover policy
US11916938B2 (en) 2020-08-28 2024-02-27 EMC IP Holding Company LLC Anomaly detection and remediation utilizing analysis of storage area network access patterns
US11157432B1 (en) 2020-08-28 2021-10-26 EMC IP Holding Company LLC Configuration of block devices based on provisioning of logical volumes in a storage system
US11392459B2 (en) 2020-09-14 2022-07-19 EMC IP Holding Company LLC Virtualization server aware multi-pathing failover policy
US11320994B2 (en) 2020-09-18 2022-05-03 EMC IP Holding Company LLC Dynamic configuration change control in a storage system using multi-path layer notifications
US11397540B2 (en) 2020-10-12 2022-07-26 EMC IP Holding Company LLC Write pressure reduction for remote replication
US11032373B1 (en) 2020-10-12 2021-06-08 EMC IP Holding Company LLC Host-based bandwidth control for virtual initiators
US11630581B2 (en) 2020-11-04 2023-04-18 EMC IP Holding Company LLC Host bus adaptor (HBA) virtualization awareness for effective input-output load balancing
US11543971B2 (en) 2020-11-30 2023-01-03 EMC IP Holding Company LLC Array driven fabric performance notifications for multi-pathing devices
US11397539B2 (en) 2020-11-30 2022-07-26 EMC IP Holding Company LLC Distributed backup using local access
US11204777B1 (en) 2020-11-30 2021-12-21 EMC IP Holding Company LLC Boot from SAN operation support on multi-pathing devices
US11385824B2 (en) 2020-11-30 2022-07-12 EMC IP Holding Company LLC Automated seamless migration across access protocols for a logical storage device
US11620240B2 (en) 2020-12-07 2023-04-04 EMC IP Holding Company LLC Performance-driven access protocol switching for a logical storage device
US11409460B2 (en) 2020-12-08 2022-08-09 EMC IP Holding Company LLC Performance-driven movement of applications between containers utilizing multiple data transmission paths with associated different access protocols
US11455116B2 (en) 2020-12-16 2022-09-27 EMC IP Holding Company LLC Reservation handling in conjunction with switching between storage access protocols
US11651066B2 (en) 2021-01-07 2023-05-16 EMC IP Holding Company LLC Secure token-based communications between a host device and a storage system
US11308004B1 (en) 2021-01-18 2022-04-19 EMC IP Holding Company LLC Multi-path layer configured for detection and mitigation of slow drain issues in a storage area network
US11494091B2 (en) 2021-01-19 2022-11-08 EMC IP Holding Company LLC Using checksums for mining storage device access data
US11449440B2 (en) 2021-01-19 2022-09-20 EMC IP Holding Company LLC Data copy offload command support across multiple storage access protocols
US11467765B2 (en) 2021-01-20 2022-10-11 EMC IP Holding Company LLC Detection and mitigation of slow drain issues using response times and storage-side latency view
US11386023B1 (en) 2021-01-21 2022-07-12 EMC IP Holding Company LLC Retrieval of portions of storage device access data indicating access state changes
US11640245B2 (en) * 2021-02-17 2023-05-02 EMC IP Holding Company LLC Logical storage device access in an encrypted storage environment
US11755222B2 (en) 2021-02-26 2023-09-12 EMC IP Holding Company LLC File based encryption for multi-pathing devices
US11797312B2 (en) 2021-02-26 2023-10-24 EMC IP Holding Company LLC Synchronization of multi-pathing settings across clustered nodes
US11928365B2 (en) 2021-03-09 2024-03-12 EMC IP Holding Company LLC Logical storage device access using datastore-level keys in an encrypted storage environment
US11294782B1 (en) 2021-03-22 2022-04-05 EMC IP Holding Company LLC Failover affinity rule modification based on node health information
US11782611B2 (en) 2021-04-13 2023-10-10 EMC IP Holding Company LLC Logical storage device access using device-specific keys in an encrypted storage environment
US11422718B1 (en) 2021-05-03 2022-08-23 EMC IP Holding Company LLC Multi-path layer configured to provide access authorization for software code of multi-path input-output drivers
US11550511B2 (en) 2021-05-21 2023-01-10 EMC IP Holding Company LLC Write pressure throttling based on service level objectives
US11822706B2 (en) 2021-05-26 2023-11-21 EMC IP Holding Company LLC Logical storage device access using device-specific keys in an encrypted storage environment
US11625232B2 (en) 2021-06-07 2023-04-11 EMC IP Holding Company LLC Software upgrade management for host devices in a data center
US11526283B1 (en) 2021-06-08 2022-12-13 EMC IP Holding Company LLC Logical storage device access using per-VM keys in an encrypted storage environment
US11762588B2 (en) 2021-06-11 2023-09-19 EMC IP Holding Company LLC Multi-path layer configured to access storage-side performance metrics for load balancing policy control
US11954344B2 (en) 2021-06-16 2024-04-09 EMC IP Holding Company LLC Host device comprising layered software architecture with automated tiering of logical storage devices
US20210318805A1 (en) * 2021-06-25 2021-10-14 Intel Corporation Method and apparatus to perform batching and striping in a stochastic associative memory to avoid partition collisions
US11750457B2 (en) 2021-07-28 2023-09-05 Dell Products L.P. Automated zoning set selection triggered by switch fabric notifications
US11625308B2 (en) 2021-09-14 2023-04-11 Dell Products L.P. Management of active-active configuration using multi-pathing software
US11586356B1 (en) 2021-09-27 2023-02-21 Dell Products L.P. Multi-path layer configured for detection and mitigation of link performance issues in a storage area network
US11656987B2 (en) 2021-10-18 2023-05-23 Dell Products L.P. Dynamic chunk size adjustment for cache-aware load balancing
US11418594B1 (en) 2021-10-20 2022-08-16 Dell Products L.P. Multi-path layer configured to provide link availability information to storage system for load rebalancing
US11567669B1 (en) 2021-12-09 2023-01-31 Dell Products L.P. Dynamic latency management of active-active configurations using multi-pathing software
US11620054B1 (en) 2022-04-21 2023-04-04 Dell Products L.P. Proactive monitoring and management of storage system input-output operation limits
US11789624B1 (en) 2022-05-31 2023-10-17 Dell Products L.P. Host device with differentiated alerting for single points of failure in distributed storage systems
US11886711B2 (en) 2022-06-16 2024-01-30 Dell Products L.P. Host-assisted IO service levels utilizing false-positive signaling
US11934659B1 (en) 2022-09-28 2024-03-19 Dell Products L.P. Host background copy process with rate adjustment utilizing input-output processing pressure feedback from storage system

Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050125369A1 (en) 2003-12-09 2005-06-09 Microsoft Corporation System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US7401253B2 (en) * 2005-05-09 2008-07-15 International Business Machines Corporation Convolution-encoded data storage on a redundant array of independent devices
US7567252B2 (en) 2003-12-09 2009-07-28 Microsoft Corporation Optimizing performance of a graphics processing unit for efficient execution of general matrix operations
US20120096237A1 (en) 2010-10-13 2012-04-19 Riverbed Technology, Inc. Method of improving performance of a data storage device
US8634247B1 (en) 2012-11-09 2014-01-21 Sandisk Technologies Inc. NAND flash based content addressable memory
US8792279B2 (en) 2012-11-09 2014-07-29 Sandisk Technologies Inc. Architectures for data analytics using computational NAND memory
US20150019506A1 (en) 2013-07-15 2015-01-15 International Business Machines Corporation Optimizing digest based data matching in similarity based deduplication
US20150100860A1 (en) 2013-10-03 2015-04-09 Futurewei Technologies, Inc. Systems and Methods of Vector-DMA cache-XOR for MPCC Erasure Coding
US20150149695A1 (en) 2013-11-27 2015-05-28 Jawad B. Khan System and method for computing message digests
US9053008B1 (en) 2012-03-26 2015-06-09 Western Digital Technologies, Inc. Systems and methods for providing inline parameter service in data storage devices
US9158540B1 (en) 2011-11-14 2015-10-13 Emc Corporation Method and apparatus for offloading compute resources to a flash co-processing appliance
US9171264B2 (en) 2010-12-15 2015-10-27 Microsoft Technology Licensing, Llc Parallel processing machine learning decision tree training
US20150317176A1 (en) 2014-05-02 2015-11-05 Cavium, Inc. Systems and methods for enabling value added services for extensible storage devices over a network via nvme controller
US9182912B2 (en) 2011-08-03 2015-11-10 Avago Technologies General Ip (Singapore) Pte. Ltd. Method to allow storage cache acceleration when the slow tier is on independent controller
US9330143B2 (en) 2013-10-24 2016-05-03 Western Digital Technologies, Inc. Data storage device supporting accelerated database operations
US20160170892A1 (en) 2014-12-11 2016-06-16 HGST Netherlands B.V. Expression pattern matching in a storage subsystem
US20160188207A1 (en) 2014-12-31 2016-06-30 Samsung Electronics Co., Ltd. Electronic system with learning mechanism and method of operation thereof
US9396415B2 (en) * 2014-04-01 2016-07-19 Superfish Ltd. Neural network image representation
US20160224544A1 (en) 2015-02-04 2016-08-04 Oracle International Corporation Sparse and data-parallel inference method and system for the latent dirichlet allocation model
US9450606B1 (en) 2015-10-01 2016-09-20 Seagate Technology Llc Data matching for hardware data compression
US20180167352A1 (en) * 2016-12-12 2018-06-14 Samsung Electronics Co., Ltd. Method and apparatus for reducing ip addresses usage of nvme over fabrics devices
US10210196B2 (en) * 2013-11-28 2019-02-19 Samsung Electronics Co., Ltd. Data storage device having internal hardware filter, data storage method and data storage system
US10545861B2 (en) * 2016-10-04 2020-01-28 Pure Storage, Inc. Distributed integrated high-speed solid-state non-volatile random-access memory

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3861157B2 (en) * 2004-02-27 2006-12-20 国立大学法人広島大学 Reference data optimization device and pattern recognition system
KR100939364B1 (en) * 2007-03-22 2010-01-29 한동일 The system and method for searching image
KR101329102B1 (en) * 2012-02-28 2013-11-14 주식회사 케이쓰리아이 augmented reality - image retrieval system using layout descriptor and image feature.
KR102336443B1 (en) * 2015-02-04 2021-12-08 삼성전자주식회사 Storage device and user device supporting virtualization function
US20160259754A1 (en) * 2015-03-02 2016-09-08 Samsung Electronics Co., Ltd. Hard disk drive form factor solid state drive multi-card adapter

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7567252B2 (en) 2003-12-09 2009-07-28 Microsoft Corporation Optimizing performance of a graphics processing unit for efficient execution of general matrix operations
US20050125369A1 (en) 2003-12-09 2005-06-09 Microsoft Corporation System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit
US7401253B2 (en) * 2005-05-09 2008-07-15 International Business Machines Corporation Convolution-encoded data storage on a redundant array of independent devices
US20120096237A1 (en) 2010-10-13 2012-04-19 Riverbed Technology, Inc. Method of improving performance of a data storage device
US9171264B2 (en) 2010-12-15 2015-10-27 Microsoft Technology Licensing, Llc Parallel processing machine learning decision tree training
US9182912B2 (en) 2011-08-03 2015-11-10 Avago Technologies General Ip (Singapore) Pte. Ltd. Method to allow storage cache acceleration when the slow tier is on independent controller
US9158540B1 (en) 2011-11-14 2015-10-13 Emc Corporation Method and apparatus for offloading compute resources to a flash co-processing appliance
US9053008B1 (en) 2012-03-26 2015-06-09 Western Digital Technologies, Inc. Systems and methods for providing inline parameter service in data storage devices
US8792279B2 (en) 2012-11-09 2014-07-29 Sandisk Technologies Inc. Architectures for data analytics using computational NAND memory
US8634247B1 (en) 2012-11-09 2014-01-21 Sandisk Technologies Inc. NAND flash based content addressable memory
US20150019506A1 (en) 2013-07-15 2015-01-15 International Business Machines Corporation Optimizing digest based data matching in similarity based deduplication
US20150100860A1 (en) 2013-10-03 2015-04-09 Futurewei Technologies, Inc. Systems and Methods of Vector-DMA cache-XOR for MPCC Erasure Coding
US9330143B2 (en) 2013-10-24 2016-05-03 Western Digital Technologies, Inc. Data storage device supporting accelerated database operations
US20150149695A1 (en) 2013-11-27 2015-05-28 Jawad B. Khan System and method for computing message digests
US10210196B2 (en) * 2013-11-28 2019-02-19 Samsung Electronics Co., Ltd. Data storage device having internal hardware filter, data storage method and data storage system
US9396415B2 (en) * 2014-04-01 2016-07-19 Superfish Ltd. Neural network image representation
US20150317176A1 (en) 2014-05-02 2015-11-05 Cavium, Inc. Systems and methods for enabling value added services for extensible storage devices over a network via nvme controller
US20160170892A1 (en) 2014-12-11 2016-06-16 HGST Netherlands B.V. Expression pattern matching in a storage subsystem
US20160188207A1 (en) 2014-12-31 2016-06-30 Samsung Electronics Co., Ltd. Electronic system with learning mechanism and method of operation thereof
US20160224544A1 (en) 2015-02-04 2016-08-04 Oracle International Corporation Sparse and data-parallel inference method and system for the latent dirichlet allocation model
US9450606B1 (en) 2015-10-01 2016-09-20 Seagate Technology Llc Data matching for hardware data compression
US10545861B2 (en) * 2016-10-04 2020-01-28 Pure Storage, Inc. Distributed integrated high-speed solid-state non-volatile random-access memory
US20180167352A1 (en) * 2016-12-12 2018-06-14 Samsung Electronics Co., Ltd. Method and apparatus for reducing ip addresses usage of nvme over fabrics devices

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Alzu'bu, Ahmad et al.; Semantic content-based image retrieval: A comprehensive study; Elsevier; J. Vis. Commun. Image R. 32 (2015) 20-54. (Year: 2015). *
Babenko, Artem et al.; Aggregating Deep Convolutional Features for Image Retrieval; 2015 IEEE International Conference on Computer Vision; pp. 1269-1277. (Year: 2015). *
Babenko, Artem et al.; Neural Codes for Image Retrieval; Springer International Publishing Switzerland 2014; ECCV 2014, Part I, LNCS 8689, pp. 584-599. (Year: 2014). *
Castelli, Vittorio et al.; Image Databases—Search and Retrieval of Digital Imagery; 2002; John Wiley & Sons, Inc.; 595 pages. (Year: 2002). *
Novak, David et al.; Large-scale Image Retrieval using Neural Net Descriptors; 2 pages. SIGIR'15, Aug. 9-13, 2015, Santiago, Chile. ACM 978-1-4503-3621-5/15/08. DOI: http://dx.doi.org/10.1145/2766462.2767868. (Year: 2015). *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230274193A1 (en) * 2020-08-27 2023-08-31 Google Llc Data Management Forecasting Forecasting from Distributed Tracing
US20220075724A1 (en) * 2020-09-09 2022-03-10 Micron Technology, Inc. Memory controllers including examples of calculating hamming distances for neural network and data center applications
US11586380B2 (en) 2020-09-09 2023-02-21 Micron Technology, Inc. Memory systems including examples of calculating hamming distances for neural network and data center applications
US11609853B2 (en) * 2020-09-09 2023-03-21 Micron Technology, Inc. Memory controllers including examples of calculating hamming distances for neural network and data center applications
US11636285B2 (en) 2020-09-09 2023-04-25 Micron Technology, Inc. Memory including examples of calculating hamming distances for neural network and data center applications
US11966827B2 (en) * 2023-05-10 2024-04-23 Google Llc Data management forecasting from distributed tracing

Also Published As

Publication number Publication date
US20210279285A1 (en) 2021-09-09
KR102449191B1 (en) 2022-09-29
US20180189635A1 (en) 2018-07-05
KR20180079168A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
US20210279285A1 (en) Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet ssd
Latif et al. Content-based image retrieval and feature extraction: a comprehensive review
JP6721681B2 (en) Method and apparatus for performing parallel search operations
Kong et al. Isotropic hashing
Liu et al. Hashing with graphs
Zhang et al. Image retrieval with geometry-preserving visual phrases
Zhang et al. QsRank: Query-sensitive hash code ranking for efficient∊-neighbor search
Safadi et al. Descriptor optimization for multimedia indexing and retrieval
EP3020203A1 (en) Compact and robust signature for large scale visual search, retrieval and classification
WO2022007596A1 (en) Image retrieval system, method and apparatus
Zhang et al. A prior-free weighting scheme for binary code ranking
Sun et al. Search by detection: Object-level feature for image retrieval
Hare et al. Practical scalable image analysis and indexing using Hadoop
KR101435010B1 (en) Method for learning of sequential binary code using features and apparatus for the same
CN110442749B (en) Video frame processing method and device
Elleuch et al. Multi-index structure based on SIFT and color features for large scale image retrieval
Zhao et al. Fast covariant vlad for image search
Peng et al. Image retrieval based on convolutional neural network and kernel-based supervised hashing
WO2015176840A1 (en) Offline, hybrid and hybrid with offline image recognition
Yan et al. Fast approximate matching of binary codes with distinctive bits
Bouhlel et al. Semantic-aware framework for mobile image search
Hinami et al. Large-scale r-cnn with classifier adaptive quantization
Ma et al. Fast search with data-oriented multi-index hashing for multimedia data.
CN116547647A (en) Search device and search method
Yu et al. Error-correcting output hashing in fast similarity search

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OLARIG, SOMPONG P.;WORLEY, FRED;FARAHPOUR, NAZANIN;REEL/FRAME:041810/0150

Effective date: 20170327

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE