CN114816322A - External sorting method and device of SSD and SSD memory - Google Patents

External sorting method and device of SSD and SSD memory Download PDF

Info

Publication number
CN114816322A
CN114816322A CN202210466201.5A CN202210466201A CN114816322A CN 114816322 A CN114816322 A CN 114816322A CN 202210466201 A CN202210466201 A CN 202210466201A CN 114816322 A CN114816322 A CN 114816322A
Authority
CN
China
Prior art keywords
data
ordered
result
flash memory
ssd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210466201.5A
Other languages
Chinese (zh)
Other versions
CN114816322B (en
Inventor
肖侬
欧洋
陈文汉
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210466201.5A priority Critical patent/CN114816322B/en
Publication of CN114816322A publication Critical patent/CN114816322A/en
Application granted granted Critical
Publication of CN114816322B publication Critical patent/CN114816322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/22Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
    • G06F7/24Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an external sorting method and device of an SSD, a computer device and a storage medium. The method comprises the following steps: acquiring small files read into a memory, sequencing the small files to obtain an ordered intermediate result, and writing the ordered intermediate result back into a flash memory, wherein the small files are obtained by segmenting large file data; constructing an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information; and when data merging is carried out, sorting the index table according to the minimum value of the data page, carrying out data merging according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory. By adopting the method, the read-write multichannel concurrency of the SSD can be improved, and the channel resource utilization rate of the SSD is improved.

Description

External sorting method and device of SSD and SSD memory
Technical Field
The present application relates to the field of nonvolatile storage technologies, and in particular, to an external sorting method and apparatus for an SSD, and an SSD memory.
Background
With the development of new nonvolatile memory technologies, flash memory has been widely used in large servers, personal mobile devices, embedded sensing devices, and high performance computing systems. The flash memory is a persistent external memory of block addressing, the minimum granularity of reading and writing is a flash memory block, and the read and write granularity of the novel nonvolatile memory is smaller. From the viewpoint of the Flash memory type, the Flash memory can be classified into NAND Flash and NOR Flash. The Solid State Disk (SSD) based on the flash memory is a novel semiconductor storage device based on a flash memory chip, Active SSDs are intelligent SSDs capable of executing part of calculation functions, Active sort is an external sorting algorithm based on the Active SSDs, the Active sort is divided into two stages, an ordered intermediate result stage and a merging stage are generated, and the main idea of the Active sort is to construct a merging module inside the Active SSDs, so that data transmission can be effectively reduced, reading and writing operations on the SSD are reduced, when a host end initiates an inquiry request, the Active sort can execute merge-on-the-fly operation inside the Active SSD, and after the ordered intermediate results are merged, the final ordered result is returned to the host end.
However, ActiveSort mainly has two problems, first, ActiveSort does not store the final ordered result at the SSD, and whenever the host initiates an inquiry request, the SSD needs to perform a merge operation, which increases the computation overhead; secondly, the ActiveSort cannot predict the sequence of reading the data blocks in the merging stage, and only after the data blocks input into the cache are processed, the next data block can be read from the intermediate result file where the data block is located, so that the internal read-write multi-channel concurrence of the SSD cannot be fully exerted, and particularly in a scene where data parts are ordered, the data blocks with larger values are retained in the DRAM of the SSD, so that cache resources are occupied, and the execution efficiency of the merging algorithm cannot be improved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an external sorting method and apparatus for SSD and an SSD memory.
A method of external ordering of an SSD, the method comprising:
acquiring small files read into a memory, sequencing the small files to obtain ordered intermediate results, and writing the ordered intermediate results back into a flash memory, wherein the small files are obtained by segmenting large file data;
according to the minimum value of each data page in the ordered intermediate result and the position information written back to the flash memory by the ordered intermediate result, constructing an index table in the memory of the SSD; the index table comprises index information corresponding to the position information;
and when data merging is carried out, sorting the index table according to the minimum value of the data page, carrying out data merging according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
In one embodiment, the method further comprises the following steps: sequentially reading data pages to an input cache of the SSD according to the sorting result of the index table and the index information; merging the data pages to obtain an ordered result, recording the minimum value of a data page to be merged, and writing the ordered result back to the flash memory in batches when the size of data in the input cache does not exceed the minimum value.
In one embodiment, the method further comprises the following steps: and when the size of the data in the input cache is larger than the minimum value, writing the data page containing the minimum data in the ordered result back to the flash memory, reading the next data page to be merged into the input cache, and iteratively merging the data pages in the input cache.
In one embodiment, the method further comprises the following steps: and setting a flash reading channel and a flash writing channel between the flash memory and the input buffer.
In one embodiment, the method further comprises the following steps: and the read flash memory channel processes the read request of data merging, and reads data from the read flash memory channel to the input cache.
In one embodiment, the method further comprises the following steps: and the write flash channel processes the write-back request of the data merging, writes the ordered result back to the write flash channel, and writes the ordered result back to the flash memory through the write flash channel.
In one embodiment, the method further comprises the following steps: and distributing the ordered intermediate results to each flash memory channel between the memory and the flash memory in a staggered manner by taking the data page as a basic unit, and writing the ordered intermediate results back to the flash memory through the flash memory channels.
In one embodiment, the method further comprises the following steps: the number of data pages of the ordered intermediate result is equal to the number of data pages that the SSD input cache can accommodate.
An external ordering apparatus of an SSD, the apparatus comprising:
the ordered intermediate result generation module is used for acquiring small files read into the memory, sequencing the small files to obtain ordered intermediate results, and writing the ordered intermediate results back to the flash memory, wherein the small files are obtained by segmenting large file data;
the index table building module is used for building an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and the data merging module is used for sorting the index table according to the minimum value of the data page during data merging, merging the data according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
the ordered intermediate result generation module is used for acquiring small files read into the memory, sequencing the small files to obtain ordered intermediate results, and writing the ordered intermediate results back to the flash memory, wherein the small files are obtained by segmenting large file data;
the index table building module is used for building an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and the data merging module is used for sorting the index table according to the minimum value of the data page during data merging, merging the data according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
the ordered intermediate result generation module is used for acquiring small files read into the memory, sequencing the small files to obtain ordered intermediate results, and writing the ordered intermediate results back to the flash memory, wherein the small files are obtained by segmenting large file data;
the index table building module is used for building an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and the data merging module is used for sorting the index table according to the minimum value of the data page during data merging, merging the data according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
The external sorting method, the device, the computer equipment and the storage medium of the SSD divide a big data file into small files which can be accommodated by a host-end internal memory, sequentially read each small file into the internal memory for sorting to obtain an ordered intermediate result, write the ordered intermediate result back to the SSD end by taking a data page as a unit, constructing an index table in the memory of the SSD according to the minimum value and the position information of each data page, sequencing the index table according to the minimum value of the data page, the order of reading the data pages in the data merging stage can be determined according to the index table, the data pages with larger values can be prevented from being resident in a memory, the execution efficiency of the algorithm can be improved, merging data according to the sorting result of the index table and the index information to obtain an ordered result, writing the ordered result back to the flash memory, when the host end frequently queries the ordered result, the ordered result can be directly output without re-executing the merging operation. The embodiment of the invention can improve the read-write multichannel concurrency of the SSD and improve the channel resource utilization rate of the SSD.
Drawings
FIG. 1 is a flow diagram illustrating an external ordering method for an SSD in one embodiment;
FIG. 2 is a schematic diagram illustrating reading and writing of a data page during a data merge phase in IndexSort according to an embodiment;
FIG. 3 is a diagram illustrating the performance impact of different ratios of read and write channels on write-back policy in one embodiment;
FIG. 4 is a schematic illustration of the merge stage of IndexSort in another embodiment;
FIG. 5 is a graphical illustration of the performance impact of different data set sizes on Isort and Baseline in one embodiment;
FIG. 6 is a graphical representation of the performance impact of different DRAM sizes on Isort and Baseline in one embodiment;
FIG. 7 is a block diagram of an external sorting device of the SSD in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. The invention provides an external sorting algorithm called IndexSort based on an external sorting algorithm of an Active SSD.
In one embodiment, as shown in fig. 1, there is provided an external sorting method of an SSD, comprising the steps of:
and 102, acquiring the small files read into the memory, sequencing the small files to obtain an ordered intermediate result, and writing the ordered intermediate result back to the flash memory.
The small files are obtained by segmenting the data of the large files, the data of the large files are segmented into small files which can be contained in a memory due to the fact that the memory space is limited, the small files are read into the memory, the small files are subjected to memory sorting to obtain ordered intermediate results, the ordered intermediate results are written back to a flash memory of the SSD by taking data pages as units, and the ordered intermediate results are sorted by data merging to obtain the ordered results.
And 104, constructing an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory.
The index table comprises index information corresponding to the position information, the minimum value of each data page in the ordered intermediate result is detected, the minimum value index table of the data pages is constructed according to the minimum value of the data pages and the position information on the flash memory, the index table is ordered from small to large according to the minimum value of the data pages, the index table has the main functions of determining the sequence of reading the data blocks from the flash memory to the input cache of the SSD in the merging stage, and the sequence of reading the data pages in the merging stage is determined by constructing the index table, so that the execution of external ordering is accelerated.
And 106, sorting the index table according to the minimum value of the data page during data merging, merging the data according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
The data merging of external sequencing is carried out inside the SSD, so that the data transmission quantity can be reduced, the read-write operation on the SSD is reduced, and meanwhile, the algorithm execution performance can be well improved by utilizing the internal concurrency and high I/O bandwidth of the SSD. The merged ordered result is written back to the flash memory, the ordered result is stored, when the ordered result is frequently inquired by the host end, the merging operation does not need to be executed again, and the time overhead can be saved.
In the external sorting method of the SSD, a big data file is divided into small files which can be contained in a host-side memory, each small file is read into the memory in sequence and sorted to obtain an ordered intermediate result, the ordered intermediate result is written back to the SSD side by taking a data page as a unit, an index table is constructed in the memory of the SSD according to the minimum value and the position information of each data page, the index table is sorted according to the minimum value of the data page, the sequence of reading data blocks can be predicted according to the index table, and the execution performance of an algorithm is improved by performing pre-reading operation on the data. And merging data according to the sorting result and the index information of the index table to obtain an ordered result, writing the ordered result back to the flash memory, and directly outputting the ordered result without re-executing merging operation when the host frequently queries the ordered result. The embodiment of the invention can improve the read-write multichannel concurrency of the SSD and improve the channel resource utilization rate of the SSD.
In one embodiment, merging data according to the sorting result and the index information of the index table to obtain an ordered result, and writing the ordered result back to the flash memory includes: sequentially reading the data pages to an input cache of the SSD according to the sorting result and the index information of the index table; merging the data pages to obtain an ordered result, recording the minimum value of a data page to be merged, and writing the ordered result back to the flash memory in batches when the size of data in the input cache does not exceed the minimum value. In this embodiment, merging the data pages according to the index table realizes that the data page with the smallest data in the large file data is preferentially read into the input buffer to participate in merging, and by writing back the small-value data in batch, the time that the data page with a larger value stays in the memory can be reduced, more space is made for the memory, and execution of the merging stage is accelerated.
In one embodiment, the method further comprises the following steps: and when the size of the data in the input cache is larger than the minimum value, writing the data page containing the minimum data in the ordered result back to the flash memory, reading the next data page to be merged into the input cache, and iteratively merging the data pages in the input cache.
In this embodiment, taking a schematic data page read-write diagram of IndexSort shown in fig. 2 at a data merging stage as an example, according to an index table sequence, sequentially reading b1, b2, a1, a2, a3, c1, c2, and a4, performing a merging comparison operation, and recording a minimum value of a next data page, that is, taking the minimum value of the data page b3 as a threshold, if all data in a data buffer area is smaller than the threshold, it indicates that the part of data is also the smallest data that has not been written back to a flash memory channel at present, then the part of data can be written back to the flash memory in bulk after being sorted, so that the concurrency of multiple write channels of the flash memory channel is improved, and the utilization rate of the flash memory channel resources is improved. Meanwhile, more space can be made up in the data cache region, and a plurality of data pages can be read according to the idle space state of the current data cache region, so that the read multi-channel concurrency of the flash memory can be improved, and the resource utilization rate of the flash memory channel can be improved.
In one embodiment, the method further comprises the following steps: setting a flash reading channel and a flash writing channel between the flash memory and the input cache; the read flash channel processes a read request of data merging, and reads data from the read flash channel to an input cache; the write flash channel processes the write-back request of the data merging, writes the ordered result back to the write flash channel, and writes the ordered result back to the flash memory through the write flash channel. In this embodiment, the read-write channel separation policy is used to process a read-write request for data merging, where the read-write channel separation policy refers to that a read flash channel is specially used to process a read request for data merging, an input cache reads data from the read flash channel, and a write flash channel is mainly used to process a write-back request for data merging, and write a merged ordered result back to the flash channel. Therefore, the channel resource utilization rate of the SSD can be fully improved, the idle condition of the channel is reduced, the read-write multichannel concurrency is improved, and the execution performance of the algorithm can be accelerated.
In a specific embodiment, as shown in fig. 3, a schematic diagram of performance impact of different read/write channel ratios on a write-back policy is provided, where base refers to ActiveSort, RT refers to read time, WT refers to write-back time, Avg refers to average, the write-back policy refers to a read/write channel mixing policy and a read/write channel separation policy, the read/write channel mixing policy refers to that a flash memory channel is not only required to process a read request for merging data, but also required to process a write request for merging data, in fig. 3, when an abscissa is a read/write channel separation condition, a ratio of the read/write channels, an SSDSim simulator has 16 flash memory channels in total, the SSDSim simulator is a configurable and modular solid-state disk simulator, and a parameter configuration condition of the SSDSim simulator is shown in the following table:
SSDSim simulator parameter settings
Figure BDA0003624248260000071
16:16 on the abscissa indicates the average processing delay of the read-write request under the read-write channel hybrid strategy, at the moment, the read-write channel hybrid strategy has the best performance, because 16 channels are used for processing the read request and the write request in the merging stage, the idle condition of the flash memory channel of the SSD is reduced, and the read-write channel hybrid strategy can improve the resource utilization rate of the flash memory channel and accelerate the execution of the merging stage; abscissa 12: 4 means that 12 channels are used for processing read requests in the merging stage, 4 channels are used for processing write requests in the merging stage, the merged ordered result is written back to the independent 4 channels, resources of the read channels are not occupied, however, when read operation and algorithm merging operation are carried out in the merging stage, the write channels are idle, and the channel resource utilization rate of the SSD is not well improved; under the read-write channel separation strategy, as the number of read channels increases and the number of write channels decreases, the average response time for processing the read request in the line graph gradually increases, and the average processing time for processing the write request decreases. Comprehensively, the read-write channel mixing strategy effect can well play the read-write multichannel concurrency of the SSD, and the channel resource utilization rate is improved.
In one embodiment, writing the ordered intermediate results back to flash memory comprises: and distributing the ordered intermediate results to each flash memory channel between the memory and the flash memory in a staggered manner by taking the data page as a basic unit, and writing the ordered intermediate results back to the flash memory through the flash memory channels. In this embodiment, the ordered intermediate result is written back to the flash memory by using a staggered data placement strategy, which means that the ordered intermediate result is distributed to each channel of the SSD in a staggered manner with the data page as a basic unit, so that the multi-channel write concurrency performance of the SSD can be fully exerted, the channel resource utilization rate is improved, and the efficiency of writing back to the flash memory can be improved.
In one embodiment, the method further comprises the following steps: the number of data pages of the ordered intermediate results is equal to the number of data pages that the SSD input cache can accommodate. In this embodiment, the accuracy of the algorithm can be guaranteed only if the number of data pages input into the cache is equal to the number of data pages of the ordered intermediate result.
In one embodiment, as shown in fig. 4, a merging stage diagram of IndexSort is provided, in fig. 4, in a conventional merging stage, a data page is read from 6 intermediate results into a buffer for merging comparison, in a case where data distribution is not particularly uniform, a data page with a smaller value such as (2, 4, 12) and a data page with a larger value (200,400,612) are read into the buffer together, the data with the smaller value is written back to a flash memory channel first, and finally the data with the larger value is written back, the data page with the larger value (200,400,612) is always stayed in the buffer and is merged and compared with the data with the smaller value, and occupies a DRAM space until a tail sound is reached in the merging stage, and the data with the larger value is written back to the flash memory channel. In fig. 4, abort refers to Indexsort, sort refers to sort, output buffer refers to output buffer in flash memory, in the merging stage of Indexsort, 6 ordered intermediate results are stored on the flash memory of SSD, the ordered intermediate results are referred to as run, each ordered intermediate result contains three data pages, the numerical value of each ordered intermediate result is ordered from small to large, the input buffer in SSD has 6 data page spaces, the first 6 data pages are sequentially read into the input buffer and sorted according to the order of the index table (page _ min _ index), the 6 data pages containing the minimum value may be from the same run, the minimum value of the seventh data page in the minimum value row is 25, and then the data pages smaller than 25 in the buffer can be directly written back to the flash memory, and because the data in the buffer has a numerical value smaller than 25 and is also a part of the current minimum value, the data can be written back in bulk.
In this embodiment, the IndexSort algorithm can solve the problem that a data page with a large value stays in a memory under the condition that a data part is ordered, improve the resource utilization rate of the DRAM, and simultaneously, data below a threshold value can be written back to the flash memory in batches, and when reading data, the data page is not limited to be read, the threshold value refers to the minimum value of the data page which is not read into the DRAM in an index table at present, and in the merging of IndexSort, a plurality of data pages which are lower than the threshold value in the DRAM are concurrently written back to the flash memory, so that more buffer space can be made free for reading the next data page from the flash memory, which is favorable for improving the multi-channel resource utilization rate of the SSD and improving the multi-channel concurrency rate of reading and writing.
In one embodiment, as shown in fig. 5, a schematic diagram of the performance impact of different data set sizes on isoport and baseine is provided, isoport refers to Indexsort, baseine refers to ActiveSort, RT refers to read time, WT refers to write-back time, Avg refers to average, time on ordinate refers to average response time of read and write requests, and the average response time of read and write requests can well measure the response time of the outer-sorting algorithm in the merging process performed inside the SSD. The data volumes are respectively set to 100M, 200M, 400M, 800M and 1GB, and it can be seen that the average response time of the read-write requests of the two algorithms increases with the increase of the data volumes, but the performance of IndexSort is better than that of Baseline (ActiveSort) under various data volumes.
In a specific embodiment, as shown in fig. 6, a schematic diagram of the performance impact of different DRAM sizes on abort and Baseline is provided, abort refers to Indexsort, Baseline refers to ActiveSort, RT refers to read time, WT refers to write-back time, Avg refers to average, the capacity of the DRAM inside the SD is limited, but the DRAM may buffer part of read and write requests, and may accelerate the processing time of the read and write requests inside the SSD, and as the capacity of the DRAM increases, the average response time of the read and write requests of the two algorithms decreases, and in different DRAMs, the average response time of the read and write requests of Indexsort is less than that of ActiveSort, and the performance is improved by nearly 36%.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, there is provided an external sorting apparatus of an SSD, including: an ordered intermediate result generation module 702, an index table construction module 704, and a data merge module 706, wherein:
the ordered intermediate result generation module 702 obtains the small files read into the memory, orders the small files to obtain ordered intermediate results, and writes the ordered intermediate results back to the flash memory, wherein the small files are obtained by segmenting the data of the large files;
an index table constructing module 704, configured to construct an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the location information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and the data merging module 706 is configured to sort the index table according to the minimum value of the data page when merging the data, merge the data according to the sorting result of the index table and the index information to obtain an ordered result, and write the ordered result back to the flash memory.
In one embodiment, the ordered intermediate result generating module 702 is further configured to distribute the ordered intermediate results to each flash memory channel between the memory and the flash memory in a staggered manner by using the data page as a basic unit, and write the ordered intermediate results back to the flash memory through the flash memory channel.
In one embodiment, the data merging module 706 is further configured to sequentially read data pages to an input cache of the SSD according to the sorting result of the index table and the index information; merging the data pages to obtain an ordered result, recording the minimum value of a data page to be merged, and writing the ordered result back to the flash memory in batches when the size of data in the input cache does not exceed the minimum value.
In one embodiment, further for when the maximum value of an ordered interval is less than the minimum value of another ordered interval; merging the two ordered intervals to obtain a merged ordered interval; and updating the index table according to the position information of the merging ordered interval.
In one embodiment, the method is further configured to, when the size of the data in the input buffer is larger than the minimum value, write the data page containing the minimum data in the ordered result back to the flash memory, read the next data page to be merged into the input buffer, and iteratively merge the data pages in the input buffer.
In one embodiment, the method is further used for setting a read flash channel and a write flash channel between the flash memory and the input buffer.
In one embodiment, the read cache memory is further configured to process a read request for data merging for the read flash memory channel, and read data from the read flash memory channel to the input cache.
In one embodiment, the write-back request is further configured to process data merging by the write flash channel, write the ordered result back to the write flash channel, and write the ordered result back to the flash memory through the write flash channel.
In one embodiment, the number of data pages that are also used for ordered intermediate results is equal to the number of data pages that the SSD input cache can accommodate.
For specific limitations of the external sorting apparatus of the SSD, reference may be made to the above limitations of the external sorting method of the SSD, which are not described herein again. The various modules in the external sequencing means of the SSD described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In one embodiment, an SSD memory is provided, the SSD memory being obtained by performing the steps of the external ordering method of the SSD in the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The non-volatile memory, SSD memory, referred to in the embodiments provided herein, may include Read Only Memory (ROM), programmable ROM (prom), electrically programmable ROM (eprom), electrically erasable programmable ROM (eeprom), or flash memory.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of external ordering of an SSD, the method comprising:
acquiring small files read into a memory, sequencing the small files to obtain an ordered intermediate result, and writing the ordered intermediate result back into a flash memory, wherein the small files are obtained by segmenting large file data;
constructing an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and when data merging is carried out, sorting the index table according to the minimum value of the data page, carrying out data merging according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
2. The method of claim 1, wherein merging data according to the sorting result of the index table and the index information to obtain an ordered result, and writing back the ordered result to the flash memory comprises:
sequentially reading data pages to an input cache of the SSD according to the sorting result of the index table and the index information;
merging the data pages to obtain an ordered result, recording the minimum value of a data page to be merged, and writing the ordered result back to the flash memory in batches when the size of data in the input cache does not exceed the minimum value.
3. The method of claim 2, further comprising: and when the size of the data in the input cache is larger than the minimum value, writing the data page containing the minimum data in the ordered result back to the flash memory, reading the next data page to be merged into the input cache, and iteratively merging the data pages in the input cache.
4. The method according to any one of claims 1-3, further comprising:
and setting a flash reading channel and a flash writing channel between the flash memory and the input buffer.
5. The method of claim 4, wherein the read flash channel processes a read request for merging data, and reads data from the read flash channel to the input buffer.
6. The method of claim 4, wherein the write flash channel processes a write-back request for merging data, writes the ordered result back to the write flash channel, and writes the ordered result back to flash via the write flash channel.
7. The method of claim 1, wherein writing the ordered intermediate results back to flash comprises:
and distributing the ordered intermediate results to each flash memory channel between the memory and the flash memory in a staggered manner by taking the data page as a basic unit, and writing the ordered intermediate results back to the flash memory through the flash memory channels.
8. The method of any of claims 1, 2, 3, or 7, wherein the number of data pages of the ordered intermediate result is equal to the number of data pages that can be accommodated by the SSD input cache.
9. An external ordering apparatus of an SSD, the apparatus comprising:
the ordered intermediate result generation module is used for acquiring small files read into the memory, sequencing the small files to obtain ordered intermediate results, and writing the ordered intermediate results back to the flash memory, wherein the small files are obtained by segmenting large file data;
the index table building module is used for building an index table in the memory of the SSD according to the minimum value of each data page in the ordered intermediate result and the position information of the data page on the flash memory; the index table comprises index information corresponding to the position information;
and the data merging module is used for sorting the index table according to the minimum value of the data page during data merging, merging the data according to the sorting result of the index table and the index information to obtain an ordered result, and writing the ordered result back to the flash memory.
10. An SSD memory, characterized in that said SSD memory is obtained by performing the steps of the external ordering method of the SSD according to any of claims 1 to 8.
CN202210466201.5A 2022-04-29 2022-04-29 SSD external ordering method, SSD external ordering device and SSD memory Active CN114816322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210466201.5A CN114816322B (en) 2022-04-29 2022-04-29 SSD external ordering method, SSD external ordering device and SSD memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210466201.5A CN114816322B (en) 2022-04-29 2022-04-29 SSD external ordering method, SSD external ordering device and SSD memory

Publications (2)

Publication Number Publication Date
CN114816322A true CN114816322A (en) 2022-07-29
CN114816322B CN114816322B (en) 2024-08-27

Family

ID=82509764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210466201.5A Active CN114816322B (en) 2022-04-29 2022-04-29 SSD external ordering method, SSD external ordering device and SSD memory

Country Status (1)

Country Link
CN (1) CN114816322B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303140A (en) * 2023-05-19 2023-06-23 珠海妙存科技有限公司 Hardware-based sorting algorithm optimization method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220583A1 (en) * 2014-01-31 2015-08-06 Microsoft Corporation External data access with split index
WO2020041928A1 (en) * 2018-08-27 2020-03-05 深圳市锐明技术股份有限公司 Data storage method and system and terminal device
US20200183604A1 (en) * 2018-12-07 2020-06-11 Samsung Electronics Co., Ltd. Partitioning graph data for large scale graph processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220583A1 (en) * 2014-01-31 2015-08-06 Microsoft Corporation External data access with split index
WO2020041928A1 (en) * 2018-08-27 2020-03-05 深圳市锐明技术股份有限公司 Data storage method and system and terminal device
US20200183604A1 (en) * 2018-12-07 2020-06-11 Samsung Electronics Co., Ltd. Partitioning graph data for large scale graph processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汤显;孟小峰;: "FClock:一种面向SSD的自适应缓冲区管理算法", 计算机学报, no. 08, 15 August 2010 (2010-08-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303140A (en) * 2023-05-19 2023-06-23 珠海妙存科技有限公司 Hardware-based sorting algorithm optimization method and device
CN116303140B (en) * 2023-05-19 2023-08-29 珠海妙存科技有限公司 Hardware-based sorting algorithm optimization method and device

Also Published As

Publication number Publication date
CN114816322B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
US11061721B2 (en) Task queues
EP3696679A1 (en) Memory controller and method controlling suspend mode
US8874826B2 (en) Programming method and device for a buffer cache in a solid-state disk system
CN111079917B (en) Tensor data block access method and device
US20160179402A1 (en) Memory system
KR101811297B1 (en) Memory controller controlling a nonvolatile memory
CN109358809B (en) RAID data storage system and method
US11429314B2 (en) Storage device, storage system and operating method thereof
CN111324303A (en) SSD garbage recycling method and device, computer equipment and storage medium
US20170003911A1 (en) Information processing device
US20150074360A1 (en) Scheduler for memory
CN109213423B (en) Address barrier-based lock-free processing of concurrent IO commands
CN114816322B (en) SSD external ordering method, SSD external ordering device and SSD memory
US20240070120A1 (en) Data processing method and apparatus
US10324915B2 (en) Information processing apparatus, processing apparatus, data search method
US9927996B2 (en) Information processing device
CN116382569A (en) Data processing method, device, hard disk and medium
CN112732182B (en) NAND data writing method and related device
US7313651B2 (en) Method and related apparatus for data migration of disk array
CN114138176A (en) Nor Flash erasing and upgrading method and device, computer equipment and storage medium
CN113253939A (en) Data processing method and device, electronic equipment and storage medium
CN118502683B (en) Task processing method and system for memory chip
CN112352216A (en) Data storage method and data storage device
US20240061585A1 (en) Memory command assignment based on command processor workload
CN116501266B (en) Message context processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant