CN114385554A - DL training data reading method based on index shuffle - Google Patents
DL training data reading method based on index shuffle Download PDFInfo
- Publication number
- CN114385554A CN114385554A CN202210062232.4A CN202210062232A CN114385554A CN 114385554 A CN114385554 A CN 114385554A CN 202210062232 A CN202210062232 A CN 202210062232A CN 114385554 A CN114385554 A CN 114385554A
- Authority
- CN
- China
- Prior art keywords
- shuffle
- index
- data
- array
- array index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000002085 persistent effect Effects 0.000 claims abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims description 18
- 238000003491 array Methods 0.000 claims description 9
- 239000000523 sample Substances 0.000 description 13
- 230000006870 function Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- WYTGDNHDOZPMIW-RCBQFDQVSA-N alstonine Natural products C1=CC2=C3C=CC=CC3=NC2=C2N1C[C@H]1[C@H](C)OC=C(C(=O)OC)[C@H]1C2 WYTGDNHDOZPMIW-RCBQFDQVSA-N 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000006993 memory improvement Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
- G06F16/1767—Concurrency control, e.g. optimistic or pessimistic approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1021—Hit rate improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
- G06F2212/1044—Space efficiency improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1048—Scalability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1056—Simplification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a DL training data reading method based on an index shuffle, which comprises the following steps: persisting the data into a non-exception memory and constructing an array index; dividing the array index and carrying out multithreading lock-free parallel shuffle to obtain a good array index of the shuffle; and traversing the array index and pre-reading the data from the non-volatile memory to the DRAM according to the good array index of the shuffle. By using the method and the device, the file system can be simplified, the reading performance of the data set is improved, and the training speed of DNN is finally improved. The method for reading the DL training data based on the index shuffle can be widely applied to the field of computer systems.
Description
Technical Field
The invention relates to the field of computer systems, in particular to a DL training data reading method based on an index shuffle.
Background
Currently, the shuffle strategy for the existing DNN training data set mainly has the following disadvantages: 1) the complexity of an index structure of a default file system is high, and the expandability of a metadata-intensive large-scale DNN data set is poor; 2) the main defects of the shuffle based on the original data are that: the memory and CPU load is too heavy; 3) the main defects of the shuffle based on the metadata are as follows: disk I/O is the main bottleneck; 4) the cache hit rate is extremely low, which causes unexpected disk I/O and delays the data reading process; 5) single-threaded shaffles are inefficient, and multi-threaded shaffles involve the overhead of locks.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a DL training data reading method based on an index shuffle, which can simplify a file system, improve the reading performance of a data set, and finally improve the training speed of DNN.
The first technical scheme adopted by the invention is as follows: a DL training data reading method based on an index shuffle comprises the following steps:
s1, persisting the data into a non-exception memory and constructing an array index;
s2, dividing the array index and carrying out multithreading lock-free parallel shuffle to obtain a good shuffle array index;
s3, traversing the array index and pre-reading the data from the non-exclusive memory to the DRAM according to the good array index of the shuffle.
Further, the step of persisting the data into the non-volatile memory and constructing the array index specifically includes:
s11, acquiring a data set of the deep neural network;
s12, loading the data sets to a nonvolatile memory and recording the address of each sample by using arrays, wherein one data set corresponds to one array;
and S13, obtaining an array index.
Further, the step of dividing the array index and performing multi-thread lock-free parallel shuffle to obtain a good shuffle array index specifically includes:
s21, randomly dividing the array index and generating a plurality of threads in the shuffle stage of each epoch in deep neural network training;
and S22, based on the thread, performing shuffle on the array according to the array index to obtain a good shuffle array index.
Further, still include:
and S4, judging that the target precision of the deep neural network training is not reached, entering the next epoch, and returning to the step S21.
Further, the step of randomly dividing the array index and generating a plurality of threads in a shuffle stage of each epoch in deep neural network training specifically includes:
s211, randomly dividing the array index to generate random numbers and obtain a plurality of sub-arrays;
and S212, generating a corresponding number of threads according to the random number in the shuffle stage of each epoch in the deep neural network training.
Further, the random number corresponds to the number of the sub-arrays, each thread is only responsible for the shuffle of one sub-array, and the threads are isolated from each other.
Further, the traversing array index is a snake-shaped traversing, specifically:
in the first epoch, adopting a forward traversal index to read data, and recording a starting subscript of the cache data;
in the second epoch, a reverse traversal index is adopted, and all samples are read in a descending order according to the mark;
in the third epoch, reading data by adopting a forward traversal index;
and circulating the traversal steps until the target precision of the deep neural network training is reached, wherein the traversal sequence of each epoch is opposite to the traversal sequence of the previous time.
The method has the beneficial effects that: aiming at the problems of a DNN training calculation framework shuffle and low data reading efficiency, the invention provides a set of special file system, designs a more applicable array index structure aiming at the data set characteristics and the data set access characteristics of DNN training, further realizes a multithreading parallel high-efficiency lock-free shuffle strategy based on the array structure, preferentially reads a cache and starts pre-reading in a snake-shaped traversal mode, improves reading throughput, and finally aims to relieve the bottleneck of a data reading stage in the DNN training and improve the execution efficiency of DNN training operation.
Drawings
FIG. 1 is a flowchart illustrating steps of a DL training data reading method based on an index shuffle according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Based on a traditional Deep Neural Network (DNN) computing framework, the method transfers the shuffle operation of a data set from an application layer to a file system layer, introduces a non-volatile memory (NVM), and designs an efficient index structure by using the characteristic of NVM byte addressing, thereby simplifying metadata, realizing multi-thread lock-free parallel shuffle, and enabling the function of the file system to be more consistent with a DNN-trained data access mode. In the traditional DNN computation framework, there are specific modules that are responsible for the data set shuffle.
As shown in fig. 1, the present invention provides a DL training data reading method based on an index shuffle, which includes the following steps:
s1, persisting the data into a non-exception memory and constructing an array index;
s11, acquiring a data set of the deep neural network;
s12, loading the data sets to a nonvolatile memory and recording the address of each sample by using arrays, wherein one data set corresponds to one array;
and S13, obtaining an array index.
Specifically, the index structure in the present invention is specifically designed as follows: when the data set is loaded to the nonvolatile memory, the addresses of each sample are recorded by using the array, one data set corresponds to one array, each sample only needs 4 bytes to store metadata, and the total space required by the whole index structure is the number of the samples multiplied by 4 bytes. The address space of the array in memory is contiguous and can be located directly to each element by subscript, so there is no additional pointer overhead. Because the semantics of parent-child nodes and the like do not exist between the elements, even if any two elements are exchanged, the function of the index structure cannot be damaged, the method has the greatest advantage that the method can directly shuffle the index, redundant metadata does not exist except the index, and the maximization of the space utilization rate is realized.
S2, dividing the array index and carrying out multithreading lock-free parallel shuffle to obtain a good shuffle array index;
s21, randomly dividing the array index and generating a plurality of threads in the shuffle stage of each epoch in deep neural network training;
s211, randomly dividing the array index to generate random numbers and obtain a plurality of sub-arrays;
and S212, generating a corresponding number of threads according to the random number in the shuffle stage of each epoch in the deep neural network training.
And S22, based on the thread, performing shuffle on the array according to the array index to obtain a good shuffle array index.
Specifically, based on the index structure, the invention designs a multi-thread lock-free parallel shuffle, and the specific strategy is as follows: the array is divided into a plurality of sub-arrays, a shuffle thread is started for each sub-array, each thread is only responsible for the shuffle of one sub-array, the threads are isolated, and message communication and data sharing are not carried out, so that each element can be accessed by one thread only, locking is not needed during access, and the parallel mode is an extremely efficient parallel mode. The specific shuffle algorithm is as follows: and traversing the sub-array, and exchanging each element with a subsequent element at a random position, wherein the algorithm complexity is O (n). For a shuffle, the strategy is a pseudo shuffle, because each element only appears in the sub-array where it is located. Therefore, in order to better ensure the randomness of sample reading and further ensure the final precision of model training, the method adopts random division when the subarrays are divided. The specific method comprises the following steps: when the sub-array is divided, a random number is generated, the random number is the number of the divided sub-array, namely a thread number, and then the array is divided equally according to the random number, and adjacent elements are a group. Thus, each element can reach different positions in different epochs, and each element has a probability of reaching any position in the whole array as long as the number of epochs is large enough.
In DNN training, the access pattern of the data set in each epoch has the following two features: first, the order of access for each sample is random; second, each sample is accessed only once. That is, each element in the index is read once in each epoch. Since the tree structure does not support in-place shuffle, all elements cannot be read in a traversal manner, each access to a sample is a random read of the index, and the time complexity is o (logn). However, after the array is adopted, all elements can be scanned on the whole disk in the simplest mode of traversing the array, and the time complexity is reduced to O (1).
S3, traversing the array index and pre-reading the data from the non-exclusive memory to the DRAM according to the good array index of the shuffle.
And S4, judging that the target precision of the deep neural network training is not reached, entering the next epoch, and returning to the step S21.
Further, as a preferred embodiment of the method, the traversing of the array index is a snake-shaped traversal, specifically:
in the first epoch, adopting a forward traversal index to read data, and recording a starting subscript of the cache data;
in the second epoch, a reverse traversal index is adopted, and all samples are read in a descending order according to the mark;
in the third epoch, reading data by adopting a forward traversal index;
and circulating the traversal steps until the target precision of the deep neural network training is reached, wherein the traversal sequence of each epoch is opposite to the traversal sequence of the previous time.
Particularly, the read performance of the DRAM is superior to that of the NVM, and in order to fully exert the advantages of the DRAM, the invention additionally provides a traversal mode of the index when reading the data set. The operating system default LRU cache replacement policy is still employed in the system, such that in each epoch, the last set of samples accessed will be cached in memory, and in the next epoch, the set of cached samples will be read preferentially. The specific method comprises the following steps: in the first epoch, traversing the index in the forward direction to read data, recording the initial subscript of the cache data, dividing the data into a cache part and a non-cache part when in a shuffle, and respectively and independently executing the shuffle strategy; in the second epoch, because the samples with the largest subscript are all in the cache, all the samples are read in a reverse traversal mode according to the subscript descending order; in the third epoch, the data in the cache is replaced by the last accessed sample in the second epoch, i.e. the sample with the smallest subscript, so that a forward traversal mode is adopted; and the traversal mode of the subsequent epochs is analogized. It can be seen that each traversal is in the opposite order to the previous traversal, forming a serpentine traversal. Therefore, in each epoch, the data in the cache can be read preferentially, the cache hit rate is improved, and the final purpose of improving the reading performance is achieved. In order to enable more data to be acquired from the DRAM, the method further realizes a pre-reading strategy, acquires a data reading sequence according to the index, preferentially reads the data to be accessed into the DRAM, ensures the data acquisition efficiency of the GPU, reduces the GPU waiting time and improves the resource utilization rate.
The invention implements a file system with the above-described functionality that provides a shuffle (dataset) interface to the upper-level PyTorch-based deep learning application, where the dataset specifies the dataset used by this DNN training task. The developer calls the function when each epoch starts, and the shuffle work of the data set can be completed.
In summary, aiming at the problems of a DNN training calculation framework shuffle and low data reading efficiency, the invention provides a set of special file system, designs a more applicable array index structure aiming at the data set characteristics and the data set access characteristics of DNN training, further realizes a multithread parallel high-efficiency lock-free shuffle strategy based on the array structure, preferentially reads a cache and starts pre-reading by adopting a snake-shaped traversal mode, improves reading throughput, and finally aims to relieve the bottleneck of a data reading stage in DNN training and improve the execution efficiency of DNN training operation.
The beneficial effects of the invention specifically comprise the following:
1. the deep learning training speed is improved: on the premise of not influencing the randomness of sample reading and ensuring the final precision of DNN training, lock-free parallel shuffle is realized, the shuffle efficiency is improved in multiples, and the multiples of performance improvement are related to the thread number. The data is read by traversing the array, and the time complexity of the index is O (1), so that the method is convenient and quick. After the strategies of priority reading cache and pre-reading are introduced, all data can be acquired from the DRAM, and the speed and throughput of the GPU for acquiring the data are improved to a certain extent. The Shuffle and the data reading stage are no longer bottlenecks, and the overall performance of the DNN training will be improved.
2. The space resources are saved: in an application layer, the path information of each sample does not need to be maintained, and only a data set id needs to be known; on the file system level, the address space of the array structure is continuous, and no extra space is needed for maintaining the pointer, so that a large amount of memory space can be saved.
3. The resource utilization rate is improved: after the data reading speed is increased, the idle time of the GPU for waiting data is reduced, and GPU resources are more fully utilized; the multithreading shuffle can effectively utilize the multi-core CPU, and the resource utilization rate of the CPU is improved; in addition, the saving of memory space also means the improvement of memory utilization rate.
4. The system expandability is improved: the shuffle data volume based on the metadata is small, the time and space overhead of an array structure is small, a large amount of data can be stored, the expansion and maintenance are easy, and the data set expansion can be well adapted. The saved memory space and CPU cycles can be used for data preprocessing or storing intermediate data, and thus can also better accommodate the amplification of the DNN model.
A DL training data reading device based on index shuffle:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one program causes the at least one processor to implement the index shuffle-based DL training data reading method as described above.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
A storage medium having stored therein instructions executable by a processor, the storage medium comprising: the processor-executable instructions, when executed by the processor, are for implementing an index shuffle-based DL training data reading method as described above.
The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A DL training data reading method based on an index shuffle is characterized by comprising the following steps:
s1, persisting the data into a non-exception memory and constructing an array index;
s2, dividing the array index and carrying out multithreading lock-free parallel shuffle to obtain a good shuffle array index;
s3, traversing the array index and pre-reading the data from the non-exclusive memory to the DRAM according to the good array index of the shuffle.
2. The method as claimed in claim 1, wherein the step of persisting data into non-volatile memory and constructing an array index includes:
s11, acquiring a data set of the deep neural network;
s12, loading the data sets to a nonvolatile memory and recording the address of each sample by using arrays, wherein one data set corresponds to one array;
and S13, obtaining an array index.
3. The method for reading DL training data based on the index shuffle as claimed in claim 2, wherein the step of dividing the array index and performing multi-thread lock-free parallel shuffle to obtain good array index of the shuffle comprises:
s21, randomly dividing the array index and generating a plurality of threads in the shuffle stage of each epoch in deep neural network training;
and S22, based on the thread, performing shuffle on the array according to the array index to obtain a good shuffle array index.
4. The method for reading DL training data based on the index shuffle of claim 3, further comprising:
and S4, judging that the target precision of the deep neural network training is not reached, entering the next epoch, and returning to the step S21.
5. The method as claimed in claim 4, wherein the step of randomly dividing the array index and generating a plurality of threads in the shuffle stage of each epoch in the deep neural network training specifically includes:
s211, randomly dividing the array index to generate random numbers and obtain a plurality of sub-arrays;
and S212, generating a corresponding number of threads according to the random number in the shuffle stage of each epoch in the deep neural network training.
6. The method as claimed in claim 5, wherein the random number corresponds to the number of sub-arrays, each thread is responsible for only one sub-array and the threads are isolated from each other.
7. The method as claimed in claim 6, wherein the traversing the array index is a snake-shaped traversal, specifically:
in the first epoch, adopting a forward traversal index to read data, and recording a starting subscript of the cache data;
in the second epoch, a reverse traversal index is adopted, and all samples are read in a descending order according to the mark;
in the third epoch, reading data by adopting a forward traversal index;
and circulating the traversal steps until the target precision of the deep neural network training is reached, wherein the traversal sequence of each epoch is opposite to the traversal sequence of the previous time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210062232.4A CN114385554A (en) | 2022-01-19 | 2022-01-19 | DL training data reading method based on index shuffle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210062232.4A CN114385554A (en) | 2022-01-19 | 2022-01-19 | DL training data reading method based on index shuffle |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114385554A true CN114385554A (en) | 2022-04-22 |
Family
ID=81202918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210062232.4A Pending CN114385554A (en) | 2022-01-19 | 2022-01-19 | DL training data reading method based on index shuffle |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114385554A (en) |
-
2022
- 2022-01-19 CN CN202210062232.4A patent/CN114385554A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11349639B2 (en) | Circuit and method for overcoming memory bottleneck of ASIC-resistant cryptographic algorithms | |
US11086792B2 (en) | Cache replacing method and apparatus, heterogeneous multi-core system and cache managing method | |
US4991088A (en) | Method for optimizing utilization of a cache memory | |
US11526960B2 (en) | GPU-based data join | |
Helman et al. | Designing practical efficient algorithms for symmetric multiprocessors | |
CN1831824A (en) | Buffer data base data organization method | |
US7702875B1 (en) | System and method for memory compression | |
Kim et al. | Efficient multi-GPU memory management for deep learning acceleration | |
EP0974907A2 (en) | A method for determining an optimized data organization | |
Li et al. | A multi-hashing index for hybrid dram-nvm memory systems | |
CN1818887A (en) | Built-in file system realization based on SRAM | |
US11429299B2 (en) | System and method for managing conversion of low-locality data into high-locality data | |
Wang et al. | Circ-Tree: A B+-Tree variant with circular design for persistent memory | |
US20230385258A1 (en) | Dynamic random access memory-based content-addressable memory (dram-cam) architecture for exact pattern matching | |
Dasari et al. | High performance implementation of planted motif problem using suffix trees | |
Nakano et al. | The random address shift to reduce the memory access congestion on the discrete memory machine | |
CN114385554A (en) | DL training data reading method based on index shuffle | |
Chacón et al. | FM-index on GPU: A cooperative scheme to reduce memory footprint | |
US11816025B2 (en) | Hardware acceleration | |
US10579519B2 (en) | Interleaved access of memory | |
Nakano et al. | The super warp architecture with random address shift | |
Cheng et al. | Alleviating bottlenecks for dnn execution on gpus via opportunistic computing | |
CN112433672A (en) | Solid state disk reading method and device | |
US20230393746A1 (en) | Hardware revocation engine for temporal memory safety | |
CN110334251B (en) | Element sequence generation method for effectively solving rehash conflict |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |