Background
Compared with the traditional hard disk, the flash memory has high read-write speed and lower power consumption, and the flash memory is applied more and more along with the improvement of the manufacturing process and the reduction of the cost.
Since the flash memory cannot write data again after writing data, the written data must be sorted by using additional flash memory blocks, and the process is called flash garbage collection.
The cold data is the data which is rarely updated after being written into the flash memory, and after the data is sorted, the data is not easily updated, and the garbage recycling benefit is the highest. On the contrary, the hot data is written frequently after being written into the flash memory, and is recycled again in a short time after being recycled, thereby affecting the recycling efficiency.
In the prior art, a method for identifying cold and hot data is to count the write times of each logical address, wherein the hot data is written more times, and the cold data is written less times. However, there are differences in the file systems of each system (for example, FAT system, FAT32 system, NTFS system, EXT4 system, etc.), and the methods for determining cold data according to the number of writes in the prior art have relativity, which may result in the determination of hot data as cold data.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method for identifying flash memory hot data, which can perform data optimization based on writing time threshold classification, identify hot data from cold data blocks, and improve recovery efficiency.
In order to solve the above technical problem, a flash memory thermal data identification method is provided, which includes the following steps:
s100: searching a physical data block of which the ratio of the number of effective data pages to the total number of pages of the physical data block is greater than a preset value in a flash memory;
s110: extracting the logical addresses of the invalid data pages in all the physical data blocks found in the step S100;
s120: when the logical address of the extracted invalid data page is not determined as the hot data logical address in S110, the logical address of the extracted invalid data page is set as the hot data logical address.
Wherein the preset value is four fifths, three quarters or one half.
S120 includes a step of adding the logical address of the extracted invalid data page to a hot data logical address list.
The flash memory thermal data identification method provided by the application has the following technical effects: the method has the advantages that cold and hot data are re-screened through the judgment of the ratio of the number of effective data pages to the total number of pages in the physical data block at the later stage, the phenomenon that the cold and hot data are alternated due to incorrect writing times of the cold and hot data can be completely overcome, the classification method is further optimized, particularly, along with the operation of a system, the flash memory hot data identification method can accurately identify the cold and hot data by combining with the writing times threshold, the recovery efficiency is further improved on the basis of combining with the writing times threshold classification, and the pressure of garbage recovery is reduced.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly. A cold data block is a data block in which data is rarely updated, wherein usually most of the data is valid data and only a few of the data is invalid data. Unlike a cold data block, a hot data block is a data block in which data is frequently updated, wherein usually most of the data is invalid data and only a few of the data is valid data.
The inventor of the present application has intensively studied and found that a method for judging cold and hot data by counting the number of times of writing logical addresses involves a problem of a judgment standard, and hot data is written more than the number of times. The file systems of each system are different, and the method can only be used for simple distinction. By nature, the purpose of distinguishing cold and hot data is to ensure that a large amount of data which does not need to be updated are stored together, and data which is updated frequently is stored together. In particular, hot data, which has a low number of logical address writes, is written to a cold data block interspersed with a large amount of cold data. For such thermal data, it cannot be judged simply by the number of writes, and further optimization of the classification method is required.
As shown in fig. 1 and 2, the flash memory has physical data blocks BLK (english Block)0, BLK 1, BLK 2, BLK N, … BLK M. The data in page N +2 of the physical data block BLK N is updated at other physical addresses after writing the cold data block because the writing times are less than the threshold for determining the hot data. The physical data block BLK N is a cold data block, most of the data is valid data, only few invalid data, and only one data is invalid data in fig. 2, i.e., data in page N + 2. As is well known, if the same logical address is written multiple times (the number of times is greater than the threshold of the hot data determination), the logical address is determined as the hot data address, only one page of data is valid, and other pages are invalid because the data update is invalid, and when the physical address is recovered, all valid data pages of the moved physical data block can be recovered. When the invalid data of page N +2 appears in the physical data block BLK N, 255 valid data need to be moved to recover one flash memory physical data block. In other words, one more valid data page is generated by moving 255 valid data pages, and the recovery efficiency is very low.
Aiming at the problem that the recovery efficiency is low due to the fact that hot data with low logical address writing times are written into cold data blocks among a large amount of cold data in an intermingled mode, the following flash memory hot data identification method is provided, as shown in fig. 3, and the method comprises the following steps:
s100: searching a physical data block of which the ratio of the number of Valid Page counts (hereinafter, abbreviated as VPC) to the total number of pages of the physical data block is greater than a preset value in a flash memory;
s110: extracting the logical addresses of the invalid data pages in all the physical data blocks found in the step S100;
s120: when the logical address of the extracted invalid data page is not determined as the hot data logical address in S110, the logical address of the extracted invalid data page is set as the hot data logical address.
The preset value in S100 is, for example, four fifths, three quarters or one half.
Steps S110 and S120 may be performed on each physical data block one by one, in order from large to small according to the VPC of the physical data block to the total number of pages, until all invalid data pages of the found physical data blocks are traversed.
As shown in fig. 1, the flash memory has physical data blocks BLK (english Block)0, BLK 1, BLK 2, BLK N, … BLK M. The physical data block BLK N is a cold data block, most of the data is valid data (also referred to as a valid page), and only one data is invalid data (also referred to as an invalid page), i.e., page N + 2.
By the flash memory hot data identification method and the classification method of the writing-in frequency threshold, the existing classified physical data blocks are further optimized, the physical data blocks are searched and screened, and misjudged hot data addresses are found out and are optimized and compensated.
In one specific application, first, all the physical data blocks VPC in the flash memory are sorted, since the VPC of cold data blocks is usually larger. Then, a cold data block in which the ratio of the VPC to the total number of pages is greater than a predetermined value, that is, a physical data block in which hot data is updated due to erroneous determination is selected. Then, the logical address of the invalid data page in all the physical data blocks found in the previous step is read, and the logical address is generally stored in the redundant space of the physical page. And then, judging whether the logical address is judged to be a hot data logical address or not, and if not, adding the hot data logical address into a hot data logical address table. And repeating the previous step until the logical addresses of the invalid data pages of the cold data blocks with the VPC to total page number ratio larger than the preset value are all set as the hot data logical addresses.
In the step of determining whether the logical address is already determined as a hot data logical address, and if not, adding the hot data logical address into the hot data logical address list, as shown in fig. 4, the hot data logical address list resides in the RAM of the controller, a hot data logical address record of LPA X is stored in the schematic diagram by way of example, once a new hot data logical address (LPX +1) is scanned and found, the hot data logical address list is added later, and then the logical address data is written into a hot data block, so that the classification is more accurate, and the recovery efficiency can be further improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.