CN113220241A

CN113220241A - Cross-layer design-based hybrid SSD performance and service life optimization method

Info

Publication number: CN113220241A
Application number: CN202110582732.6A
Authority: CN
Inventors: 顾能华; 吕梅蕾; 陈勇; 叶文通; 朱秋琴; 徐拥华
Original assignee: Quzhou University
Current assignee: Quzhou University
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-08-06

Abstract

The invention discloses a hybrid SSD performance and service life optimization method based on cross-layer design, which comprises the following steps: identifying and centralizing traditional dispersed BML and FTL load characteristics to form a new WAL layer, and designing and optimizing; forming a page/block adaptive BML by adding a Normal area between the Hot area and the Cold area; the design and optimization of the FTL comprise the steps of splitting CMT of DFTL into H-CMT, CMT and S-CMT, optimizing SLC and MLC distribution of FTL layer data and abrasion balance among the SLC and MLC distribution, and adding a programming mode selection module in the FTL layer to realize the utilization of the writing function of flash memory deep level characteristic perception. According to the invention, by adopting the SLC and MLC mixed chip structure in the NAND flash memory array layer, the firmware design problem of the SSD based on the SLC and MLC mixed structure is solved, the compromise among the performance, the service life and the cost of the SSD based on the mixed structure is realized, the read-write performance of the mixed SSD based on the cross-layer design is greatly improved, the service life is greatly prolonged, and the cost is greatly reduced.

Description

Cross-layer design-based hybrid SSD performance and service life optimization method

Technical Field

The invention belongs to the technical field of SSD storage, and particularly relates to a hybrid SSD performance and service life optimization method based on cross-layer design.

Background

In the last half century, with the continuous progress of computer architecture and chip processing technology, the gap between the CPU performance and the input/output (IO) performance of a computer system has been expanding. The bottleneck of computer system IO performance is Hard Disk Drive (HDD). Although the capacity of the HDD has been greatly increased over the years, the access speed has been increased only to a limited extent due to the existence of the mechanical rotation structure. For example, over the past 20 years, the CPU frequency has increased by about 600 times, while the hard disk speed has increased by only 20 times. Compared with a magnetic disk, a Flash Memory (Flash Memory) is a high-speed, low-power consumption, shock-resistant, small, light and portable chip-level storage medium, and is considered as a key component for improving the IO performance of a computer system. In the last 10 years, under the combined efforts of the industry and academia, the flash memory technology has advanced greatly, and great industrial revolution that Solid State Drive (SSD) based on NAND flash memory replaces HDD is taking place in the storage field.

An SSD based on a NAND Flash memory is generally composed of a host interface Layer, a Buffer Management Layer (BML), a Flash Translation Layer (FTL), and a NAND Flash Array Layer (FAL). The system comprises a host interface layer, a BML (bus management platform) and an FTL (file transfer layer), wherein the host interface layer is responsible for communicating with a host, the BML is responsible for managing a data buffer area of the SSD and is a key component for improving the performance and prolonging the service life of the SSD, the FTL is responsible for simulating the SSD into a traditional hard disk only with read-write operation so as to adapt to the current file system, the FTL is generally composed of three modules of address mapping, garbage recovery and wear leveling, the address mapping is a core, and the FAL is responsible for actual physical data storage and is composed of a plurality of NAND flash memory chips. BML and FTL bias software design is the key firmware in SSD design.

Currently, in the prior art, the SSD firmware design also suffers from the following three drawbacks: (1) in the design of the FTL and the BML, the cooperative design of the FTL and the BML is lacked; (2) lack of hybrid SSD firmware design; (3) SSD firmware design is lacking in the utilization of flash deep level features.

Disclosure of Invention

The invention aims to provide a cross-layer design-based hybrid SSD performance and service life optimization method, which solves the problems that the existing method lacks of collaborative design of an FTL and a BML, mixed SSD firmware design and SSD firmware design by utilizing deep-level characteristics of a flash memory by optimizing a WAL layer, a BML layer and an FTL layer.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a hybrid SSD performance and service life optimization method based on cross-layer design, which comprises the following steps:

s1, identifying and centralizing traditional dispersed BML and FTL load characteristics to form a new WAL layer, and designing and optimizing the WAL layer;

s2, designing and optimizing the BML;

s3, designing and optimizing the FTL;

the optimization method of step S1 is as follows:

s11, for any request, firstly judging whether it hits in BML, if so, updating its access mode according to access history;

s12, if not hit in BML, judging whether hit in FTL, if hit in FTL, using the previous access mode to predict the current mode; otherwise, identifying a rough access mode according to the request size or the relation between the logic page number of the current request and the logic page number of the request in the BML;

s13, when the BML rejects the data item, the access mode is sent to the FTL as a parameter;

s14, FTL adds a parameter (2 bits) for recording access mode to each logic mapping item;

the BML in step S2 is a page/block adaptive BML, and is divided into a Hot region, a Normal region, and a Cold region; the data page migration and elimination method among the Hot region, the Normal region and the Cold region is as follows:

s21, the data pages of two adjacent regions can be migrated to each other, that is, only two types of data migration exist, between the Hot region and the Normal region, and between the Normal region and the Cold region;

s21, when a request is not hit in the BML, processing according to the access mode identified in the step S1, continuously accessing and loading to a Cold area, and loading other accesses to a Normal area;

s22, after a request hits in the BML, judging which area of the Hot area, the Normal area and the Cold area the request hits specifically, and then respectively carrying out corresponding data processing;

and S23, when the space of the three areas is insufficient, performing page/block elimination.

S24, when selecting the block elimination in the Cold area, not only considering the least recent access principle of the Cold area, but also considering the garbage collection efficiency of each block in order to reduce the cost of garbage collection of the FTL;

the optimization method of step S3 is as follows:

s31, realizing classification processing of data, and splitting the CMT of the DFTL into H-CMT, CMT and S-CMT, wherein the H-CMT is responsible for caching frequently accessed mapping items, the S-CMT is responsible for caching continuously accessed mapping items, and the CMT is responsible for caching common randomly accessed mapping items; performing fine-grained management on the H-CMT according to a single mapping item, and clustering the CMT and the S-CMT according to translation pages;

s32, optimizing SLC and MLC distribution of FTL layer data and wear balance among the SLC and MLC distribution;

s33, adding a programming mode selection module in the FTL layer in said step S32 to implement the utilization of the write function of flash memory deep level feature perception.

Further, the load characteristics are divided into macroscopic characteristics and microscopic characteristics in the step S1; the load macroscopic characteristic analysis is realized by adopting a sectional statistical method;

analyzing the microscopic characteristics of the load by adopting a thermal data identification algorithm based on machine learning; the hot data identification algorithm is divided into an off-line learning stage and an on-line learning stage.

Further, in the off-line learning stage, feature modeling and classification are manually carried out on each request, so that a training set is obtained, then machine learning is carried out by using the training set, and finally effective features and model parameters are output;

in the on-line learning stage, when each request arrives, feature extraction is firstly carried out, then the trained classification model is directly used for classification, and a small training sample set is collected on line for on-line machine learning to obtain model parameters, so that the classification model can adapt to the change of load characteristics.

Furthermore, the Hot region, the Normal region and the Cold region all adopt different organization modes;

the Hot area is organized according to pages, belongs to fine granularity and is sorted according to priority values; the Normal area is organized according to virtual blocks and belongs to medium granularity; the Cold areas are organized according to logic blocks and belong to coarse granularity.

Further, the priority value of the Hot zone is calculated as follows:

P1＝f1(ti,tl)

where P1 is defined as the priority value, ti is defined as the page/block average update interval, and tl is defined as the last update time.

Further, when a request in step S22 hits in the first half of the Normal region, the request is migrated from the Normal region to the Hot region, otherwise, the request is reordered according to the least recently accessed virtual block rule;

if the request hits the Cold region, it is migrated to the Normal region, which is sorted by logical block least recently accessed.

Further, when the page/block elimination is performed in step S23, the Hot area and the Normal area eliminate the page/block at the end of the queue to the next area, the Cold area selects an elimination block according to the idea of step S24, and sends the block data to the FTL, which determines to write to a suitable flash memory location;

the elimination order of the blocks in step S24 is calculated as follows:

P2＝f2(tl,n,D)

wherein, P2 is defined as the culling priority of Cold blocks, tl is defined as the last update time of the blocks, n is defined as the number of dirty pages contained, and D is defined as the location distribution of the dirty pages in the flash memory.

Further, when the step S31 performs the classification processing, when the request is not hit in the H-CMT, the CMT and the S-CMT, the consecutive requests are loaded to the S-CMT and other requests are loaded to the CMT according to the request access mode identified in the step S1; when the request hits in CMT and S-CMT, it is promoted to H-CMT;

during elimination, the H-CMT adopts a simple minimum access principle to eliminate the queue tail mapping item into the CMT, and the CMT and the S-CMT adopt a batch elimination principle;

when the CMT and S-CMT remove the translation pages in batches, the removal priority P3 of the translation pages is calculated as follows:

p3 ═ f3(tl, n), where tl is defined as the translation page last access time and n is defined as the number of mapping items of the page viscera.

Further, in order to map data corresponding to part of the H-CMT mapping items to the SLC, adding a variable to each mapping item of the H-CMT, recording the updating (writing) times of the variable, and calculating the normalized wear degree of the SLC and the MLC; the wear balance inside the SLC and the MLC is realized by adopting the existing wear balance algorithm:

rws＝total_es×lm/ns×ls rwm＝total_em×1/nm

wherein rws and rwm are relative wear degrees of the SLC and the MLC respectively, total _ es and total _ em are total erasing times of the SLC and the MLC respectively, ns and nm are total block numbers of the SLC and the MLC respectively, and ls and lm are wear upper limits of the SLC and the MLC respectively.

The invention has the following beneficial effects:

1. according to the invention, by adopting the SLC and MLC mixed chip structure in the NAND flash memory array layer, the firmware design problem of the SSD based on the SLC and MLC mixed structure is solved, the compromise among the performance, the service life and the cost of the SSD based on the mixed structure is realized, the read-write performance of the mixed SSD based on the cross-layer design is greatly improved, the service life is greatly prolonged, and the cost is greatly reduced.

2. According to the invention, the design of the BML and the FTL is optimized by adopting a cross-layer design method, the macro/micro characteristics of the load based on the cooperation of the BML and the FTL are identified in real time, and the BML design sensed by the FTL and the FTL design sensed by the flash memory array layer are designed, so that the performance of the hybrid SSD is greatly improved.

3. The invention provides wider space for the read-write performance optimization and the service life extension of the hybrid SSD by utilizing the compromise among the P/E times, the data storage time and the programming speed based on the programming mode selection of the deep level characteristics of the flash memory.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a hybrid SSD based on a cross-layer design;

FIG. 2 is a block diagram of machine learning based thermal data identification;

fig. 3 is a general structural view of the FTL.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-3, the present invention is a hybrid SSD performance and lifetime optimization method based on cross-layer design, the optimization method includes:

s2, designing and optimizing the BML;

s3, designing and optimizing the FTL;

the optimization method of step S1 is as follows:

s14, FTL adds a parameter (2 bits) for recording access mode to each logic mapping item; it can be seen that through the cooperation of the BML and the FTL, not only can the accuracy of identifying the access mode by the BML be utilized, but also more access modes can be stored by the FTL, and the cost is only that a storage space of 2 bits needs to be added for each mapping item;

the optimization method of step S3 is as follows:

s31, realizing classification processing of data, and splitting the CMT of the DFTL into H-CMT, CMT and S-CMT, wherein the H-CMT is responsible for caching frequently accessed mapping items, the S-CMT is responsible for caching continuously accessed mapping items, and the CMT is responsible for caching common randomly accessed mapping items; performing fine-grained management on the H-CMT according to a single mapping item, clustering the CMT and the S-CMT according to the translation page, namely clustering the mapping items belonging to the same translation page together for management;

s32, optimizing the SLC and MLC distribution of FTL layer data and the wear balance among the FTL layer data so as to achieve the purpose that hot data are stored into the SLC and cold data are stored into the MLC;

s33, adding a programming mode selection module in the FTL layer in the step S32 to realize the utilization of the write function of the flash memory deep level characteristic perception, making a compromise among the P/E times, the programming speed and the data storage time, writing hot data into the flash memory by using fast write when the load is heavy, improving the write performance of the SSD, writing cold data into the flash memory by using slow write when the load is light, improving the P/E times of the SSD, and simultaneously designing a set of mechanism to ensure that if the fast-written data is still valid data when the storage time is about to expire, the data is rewritten, and the data storage time is ensured to be unchanged.

In the embodiment provided by the present invention, the load characteristics are divided into macroscopic characteristics and microscopic characteristics in the step S1; the load macroscopic characteristic analysis is realized by adopting a sectional statistical method; specifically, the method comprises the steps of taking service of N access requests as a period, counting the operation types and the access modes of the N access requests, calculating the macroscopic characteristics of the load in the period after a sampling interval is reached, and predicting the macroscopic characteristics of the next period by using the macroscopic characteristics counted currently;

the load microscopic characteristic analysis is carried out by adopting a machine learning-based hot data identification algorithm, and essentially, hot data identification is a two-classification problem, namely, after a request comes, whether the hot data belongs to a hot data class or a cold data class is judged; the hot data identification algorithm is divided into an off-line learning stage and an on-line learning stage.

In the implementation mode provided by the invention, in the off-line learning stage, feature modeling and classification are manually carried out on each request so as to obtain a training set, then machine learning is carried out by utilizing the training set, and finally effective features and model parameters are output;

In the embodiment provided by the invention, the Hot region, the Normal region and the Cold region all adopt different organization modes;

In the embodiment provided by the present invention, the priority value of the Hot zone is calculated as follows:

P1＝f1(ti,tl)

wherein, P1 is defined as priority value, ti is defined as average updating interval of page/block, and tl is defined as last updating time; and the Hot area is sorted according to the priority value, and the access frequency and the access recency of the page are considered.

In the embodiment provided by the present invention, when a request in step S22 hits in the first half of Normal region, the request is migrated from Normal region to Hot region, otherwise, the request is reordered according to the least recent access rule of the virtual block;

In the embodiment provided by the present invention, when performing page/block elimination in step S23, the Hot area and the Normal area eliminate the page/block at the end of the queue to the next area, the Cold area selects an elimination block according to the idea in step S24, sends the block data to the FTL, and the FTL determines to write the block data to a suitable flash memory location;

the elimination order of the blocks in step S24 is calculated as follows:

P2＝f2(tl,n,D)

the method comprises the following steps that P2 is defined as the elimination priority of a Cold block, tl is defined as the last updating time of the block, n is defined as the number of dirty pages contained, and D is defined as the position distribution of the dirty pages in a flash memory; specifically, the dirty page position distribution D needs FTL cooperation to be completed, which is also rarely considered in the conventional BML design, and qualitatively, these blocks should be preferentially removed when the last update time tl is earlier, or the number n of dirty pages is larger, or the dirty page position distribution D is more concentrated. This is because the earlier tl is, the worse locality of the block is described, and the larger the dirty page number n is, or the more concentrated the dirty page position distribution D is, the higher the subsequent garbage collection efficiency is.

In the embodiment provided by the present invention, when the step S31 performs the classification processing, when the request is not hit in the H-CMT, the CMT and the S-CMT, the request access mode identified in the step S1 is used to load the consecutive requests to the S-CMT, and the other requests to the CMT; when the request hits in CMT and S-CMT, it is promoted to H-CMT;

during elimination, the H-CMT adopts a simple least-recent-access principle to eliminate the queue tail mapping items into the CMT, and the CMT and the S-CMT adopt a batch elimination principle, namely dirty mapping items belonging to the same translation page are updated into a translation block of a flash memory in batch at one time so as to optimize the updating method of the traditional DFTL according to a single mapping item;

p3 ═ f3(tl, n), where tl is defined as the translation page last access time and n is defined as the number of mapping items of the page viscera; in order to enhance the utilization of the spatial locality of the access, a prefetching strategy is adopted when a new mapping item is read after the request is not hit in H-CMT, CMT and S-CMT, the prefetching size depends on the access mode, more mapping items are continuously requested to be prefetched to the S-CMT, and fewer mapping items are randomly requested to be prefetched.

In the embodiment provided by the invention, in order to map data corresponding to part of the H-CMT mapping items to the SLC, a variable is added to each mapping item of the H-CMT, the updating (writing) times of the mapping item are recorded, the normalized wear degrees of the SLC and the MLC are calculated, finally, the assignment thresholds of the SLC and the MLC are dynamically adjusted according to the normalized wear degrees of the SLC and the MLC, the mapping item of which the updating times of the mapping item is less than the assignment _ th is assigned to the MLC (cold data), and the mapping item is mapped to the SLC (hot data) on the contrary. In addition, when the normalized SLC wear level exceeds the MLC wear level by a certain amount, assign the data to the SLC by increasing assign _ th, and when the normalized SLC wear level is less than the MLC wear level by a certain amount, assign _ th by decreasing assign _ th, assign the data to the SLC by increasing assign _ th. The wear balance inside the SLC and the MLC is realized by adopting the existing wear balance algorithm:

rws＝total_es×lm/ns×ls rwm＝total_em×1/nm

In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A hybrid SSD performance and lifetime optimization method based on a cross-layer design, the optimization method comprising:

s2, designing and optimizing the BML;

s3, designing and optimizing the FTL;

the optimization method of step S1 is as follows:

the optimization method of step S3 is as follows:

2. The method of claim 1, wherein the load characteristics are divided into macroscopic characteristics and microscopic characteristics in step S1; the load macroscopic characteristic analysis is realized by adopting a sectional statistical method;

3. The method of claim 2, wherein in the off-line learning stage, feature modeling and classification are performed manually on each request to obtain a training set, and then the training set is used for machine learning, and finally valid features and model parameters are output;

4. The method of claim 1, wherein the Hot region, the Normal region and the Cold region are organized differently;

5. The method of claim 4, wherein the priority value of the Hot region is calculated as follows:

P1＝f1(ti,tl)

6. The method of claim 1, wherein when a request in step S22 hits in the first half of Normal region, the request is migrated from Normal region to Hot region, otherwise, the request is reordered according to the least recent access rule of the virtual block;

7. The method of claim 1, wherein in the step S23, when performing page/block culling, the Hot block and the Normal block cull the page/block at the end of the queue to the next block, the Cold block selects the culled block according to the idea of the step S24, sends the block data to the FTL, and the FTL determines to write the block data to a suitable flash memory location;

the elimination order of the blocks in step S24 is calculated as follows:

P2＝f2(tl,n,D)

8. The method of claim 1, wherein the step S31 is configured to load the S-CMT and other requests to the CMT according to the access mode of the request identified in the step S1 when the request is not hit in H-CMT, CMT and S-CMT during the classification process; when the request hits in CMT and S-CMT, it is promoted to H-CMT;

9. The method of claim 8, wherein in order to map data corresponding to part of the H-CMT mapping entries into SLCs, a variable is added to each mapping entry of H-CMT, the number of updates (writes) is recorded, and the normalized wear level of SLCs and MLCs is calculated; the wear balance inside the SLC and the MLC is realized by adopting the existing wear balance algorithm:

rw_s＝total_e_s×l_m/n_s×l_s rw_m＝total_e_m×1/n_m

wherein, rw_s、rw_mRelative wear, Total _ e, of SLC and MLC respectively_s、total_e_mTotal number of erasures, n, for SLC and MLC respectively_s、n_mTotal number of blocks, SLC and MLC respectively_s、l_mThe upper wear limits for SLC and MLC, respectively.