CN104951403A - Low-overhead and error-free cold and hot data recognition method - Google Patents

Low-overhead and error-free cold and hot data recognition method Download PDF

Info

Publication number
CN104951403A
CN104951403A CN201510395697.1A CN201510395697A CN104951403A CN 104951403 A CN104951403 A CN 104951403A CN 201510395697 A CN201510395697 A CN 201510395697A CN 104951403 A CN104951403 A CN 104951403A
Authority
CN
China
Prior art keywords
lru
data
request
item
lpn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510395697.1A
Other languages
Chinese (zh)
Other versions
CN104951403B (en
Inventor
许胤龙
沈标标
李永坤
潘玉彪
王能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingkai Star Beijing Technology Co ltd
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201510395697.1A priority Critical patent/CN104951403B/en
Publication of CN104951403A publication Critical patent/CN104951403A/en
Application granted granted Critical
Publication of CN104951403B publication Critical patent/CN104951403B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a low-overhead and error-free cold and hot data recognition method. The method is characterized by comprising storage structure design, an address grouping process, an aging mechanism and a cold and hot recognition process of a data page. According to the method, cold and hot data can be accurately and effectively recognized with smaller space-time overhead, and the method can be easily extended to a finer-granularity multi-level cold and hot data recognition method. Compared with a conventional cold and hot data recognition method, the method can guarantee lower overhead in the operating time and smaller space overhead, can prevent misrecognition of hot data and is applicable to deployment in an existing storage system, and the whole performance of the system is greatly improved.

Description

A kind of low expense and the cold and hot data identification method of zero defect
Technical field
The invention belongs to technical field of computer data storage, be specifically related to realize low expense and the recognition methods of the quick-cooling, heating data of zero defect by grouping.
Background technology
Real-life operating load (workload) presents higher data access locality usually, and namely some data can be accessed frequently, and is called dsc data; Some data is then little or accessed hardly, is called cold data.By considering data cold and hot in the design of modern memory systems, by cold and hot data identification out and be separated placement, the overall performance of storage system effectively can be improved.But the key issue realizing this design need find one effectively and the cold and hot data identification method of lightweight (less space-time expense).American computer association " storage journal " (ACM transactions on Storage (TOS), 2 volume 1 phases in 2006,22-40 page) method based on multiple Hash mapping introduced is the cold and hot data identification method that current existing space-time expense is less.But the method there will be certain wrong identification (be dsc data by the data identification of seldom or hardly accessing), error recognition rate can depend on the change of operating load, make it to some operating load and inapplicable, thus make to introduce cold and hot data identification separation mechanism the effect that performance of storage system improves is weakened greatly.
Summary of the invention
The object of the invention is to propose a kind of low expense and the cold and hot data identification method of zero defect, to overcome now methodical above-mentioned defect, under the prerequisite ensureing less space-time expense, effectively identify cold and hot data.
The low expense of the present invention and the cold and hot data identification method of zero defect, is characterized in that comprising the following steps:
The first step: node store structure design
Adopt the list of one group of record fixed qty metadata item to record the history visit information of data page (data page), wherein, the number of list is designated as K, and in each list, the number of metadata item is designated as N; The visitation frequency (counter) of the metadata record logical page number (LPN) of data page (lpn) information and data page corresponding to this logical page number (LPN), takies 32 bits (bit) and 4 bit storage space respectively; When list is filled with data item and has new metadata item to need to add in list, use least recently used algorithm (LRU) to carry out the replacement of metadata item to list, each list is designated as LRU table;
Second step: address packets
The whole logical address space of storage system maps f (x)=x%K by a hash function, metadata item corresponding for Different Logic address grouping is stored in different LRU tables, to realize the grouping of logical page number (LPN), wherein, K is the number of LRU table, x is the logical page number (LPN) of a certain data page, and % is modulo operation;
3rd step: the cold and hot identification of data page
When a request of access arrives, first calculate cryptographic hash according to the logical page number (LPN) of request of access and determine that the LRU belonging to it shows, then in corresponding LRU table, search metadata item corresponding to this logical page number (LPN) whether to exist, if existed, judge whether the value of visitation frequency is greater than predefined threshold value after then the visitation frequency value of its correspondence being added 1, if so, then the data of request access are considered as dsc data, otherwise are considered as cold data; By the multiple threshold value of setting, the visitation frequency of request of access and multiple threshold value can be compared, realize the cold and hot data identification of more fine-grained multi-layer; Before recognition result returns, upgrade corresponding LRU show: if during the logical page number (LPN) of request of access exists and show with corresponding LRU, then its metadata item is moved on to gauge outfit that LRU shows with the least recently used algorithm characteristic of maintenance table; If there is no, then search to table tail the metadata item that visitation frequency value is 0 from the gauge outfit of LRU table, if find such item, then the metadata item before this metadata item is all rearwardly moved once, then the metadata item of request of access is inserted into the gauge outfit of LRU table, and to set visitation frequency value be 1, if can not find such item, then show the item of afterbody with the probability dropping LRU of 50%, then other metadata items in table are all rearwardly moved once, the gauge outfit that the metadata item inserting request of access is shown to LRU to set visitation frequency value be 1;
4th step: ageing mechanism (aging mechanism)
The visitation frequency of data page corresponding to a certain logical page number (LPN) is have recorded in metadata item, the request of access of data page is often come once, visitation frequency in respective meta-data item will add 1, when visitation frequency value reaches the maximal value 15 that this store data items space can reach, will no longer be increased; After the request of process fixed qty, the visitation frequency value of metadata item in all LRU tables is reduced by half.
Compared with traditional cold and hot data identification method, the low expense of the present invention and the cold and hot data identification algorithm of zero defect can realize cold and hot data identification effectively accurately under less space-time expense, be easy to be extended to the cold and hot differentiating method of more fine-grained multi-layer, both ensure that expense and less space expense lower working time, it also avoid the wrong identification of dsc data, be applicable to be deployed in existing storage system, and greatly improve the overall performance of system.Because the inventive method adopts, logical address packet map is deposited in different LRU shows, compared with the recognition methods of traditional cold dsc data, identifying faster can be realized.Under the prerequisite of same magnitude recognition time, the inventive method remains the logical page number (LPN) information of request of access, avoids the wrong identification of dsc data, and comparatively traditional cold dsc data method can realize better recognition effect.
Accompanying drawing explanation
Fig. 1 is the general structure schematic diagram according to the cold and hot data of the inventive method identification.
Fig. 2 is the state updating schematic diagram that a certain moment LRU shows after difference request arrives;
Fig. 3 is that LRU table removes the state updating schematic diagram of table tail metadata item with certain probability;
Fig. 4 upgrades schematic diagram for carrying out LRU table after ageing mechanism process.
Embodiment
Below in conjunction with accompanying drawing by specific embodiment to the low expense of the present invention and the cold and hot data identification method of zero defect be described in further detail.
Embodiment 1:
The low expense of the present invention and the operating process of a specific embodiment of the cold and hot data identification method of zero defect is as follows:
The first step: node store structure design
Fig. 1 illustrates the general structure schematic diagram that the inventive method instantiation realizes cold and hot data identification.In this Fig. 1, the every a line of the list that right side is endways is made up of a little rectangle and lattice, forms a metadata item; 4 row formations that each braces comprises LRU table, the LRU table that continuous 4 row of filling with different pattern are corresponding different, namely 4 row that list the top dotted line frame encloses correspond to LRU table 1, are followed successively by LRU table 2 ~ 256 below; Storage organization have employed 256 LRU tables altogether, and retain 4 metadata items in each table, each metadata item retains the logical page number (LPN) information of 32 bits and the visitation frequency information of 4 bits.Logical page number (LPN) in all LRU tables is initialized as-1 by the initial period of cold and hot data identification, and visitation frequency value is initialized as 0.
Second step: address packets process
For reducing the recognition time of cold and hot data, the request of Different Logic address is mapped in different LRU tables by the inventive method, to reduce the metadata item number that identification data is each time inquired about and moved.As shown in Figure 1, in figure, the left side first grid correspond to the logical page number (LPN) x of current access request, and the rectangle of this grid arrow indication correspond to a hash function, and hash function is f (x)=x%256.Hash function is used for the whole logical address space of storage system to divide 256 groups, when often carrying out a request of access, namely calculate the group belonging to it according to the logical address (page number) of request, then transfer to the cold and hot identifying of request msg one of them group from all groups.
3rd step: the cold and hot identifying of data page
When whether identification data page is dsc data, first the inventive method determines the LRU table belonging to it according to the address packets process of previous step, then in LRU table, search metadata item corresponding to this data page whether to exist, if exist and visitation frequency value is more than or equal to the threshold value preset, then be identified as dsc data, otherwise be identified as cold data.Fig. 2 illustrates the renewal process of asking cold and hot data identification process when arriving and corresponding LRU table when specific.In Fig. 2, left side dotted line frame has suffered the state illustrating a certain moment LRU table 1, by numbering represent, in table, the metadata of four line items is respectively (256,1), (0,3), (1024,1) and (-1,0) (represent a metadata by (256,1) this mode, in bracket, first digit represents the logical page number (LPN) of request of access, and second digit represents the visitation frequency of data page corresponding to logical page number (LPN)).Wherein, the logical page number (LPN) of fourth line metadata shows that this LRU shows current and was inserted into 256,0 and 1,024 three logical page number (LPN)s for-1.The gauge outfit of the corresponding LRU table of the first row in LRU table 1, the table tail of the corresponding LRU table of fourth line, when LRU table upgrades, always inserts new metadata item from gauge outfit, removes from table tail the metadata item abandoned.Be numbered in figure lRU table represent LRU table 1 and exist during state, logical page number (LPN) is the state upgraded after the request of access arrival of 0, and in table, the metadata of four line items is respectively (0,4), (256,1), (1024,1) and (-1,0), numbering lRU table point to numbering lRU table dotted arrow represent logical page number (LPN) be 0 request of access arrive after LRU table renewal process.Can find out, when in LRU table 1, subsistence logic page number is the metadata item of 0, first the metadata item before it is moved once after table tail, then be 0 be placed into logical page number (LPN) gauge outfit for metadata item and 1 is added to corresponding visitation frequency value, now, because visitation frequency value equals default threshold value 4, be dsc data by the data identification of this request of access.Be numbered lRU table represent LRU table 1 and exist during state, logical page number (LPN) is the state upgraded after the request of access arrival of 256, and in table, the metadata of four line items is respectively (256,2), (0,3), (1024,1) and (-1,0), numbering lRU table point to numbering lRU table dotted arrow represent logical page number (LPN) be 256 request of access arrive after LRU table renewal process; Be numbered lRU table represent LRU table 1 and exist during state, logical page number (LPN) is the state upgraded after the request of access arrival of 1024, and in table, the metadata of four line items is respectively (1024,2), (256,1), (0,3) and (-1,0), numbering lRU table point to numbering the dotted arrow of LRU table represent logical page number (LPN) be 1024 request of access arrive after the renewal process of LRU table, because the visitation frequency values of these two kinds requests after upgrading all are less than default threshold value 4, the data page of its correspondence is all identified as cold data.Be numbered lRU table represent LRU table 1 and exist during state, logical page number (LPN) is the state upgraded after the request arrival of 8192, and in table, the metadata of four line items is respectively (8192,1), (256,1), (0,3) and (1024,1), numbering lRU table point to numbering lRU table dotted arrow represent logical page number (LPN) be 8192 request of access arrive after LRU table renewal process, can find out when in LRU table 1, subsistence logic page number is not the metadata item of 8192, first in table, searching visitation frequency value metadata from table Caudad gauge outfit is the item of 0, because there is such metadata item, therefore the metadata item before it is moved once after table tail, then insert in a logical page number (LPN) in gauge outfit and be 8192 new metadata items and set visitation frequency value to be 1, now, visitation frequency value is less than default threshold value 4, logical page number (LPN) is that the data page of 8192 correspondences is identified as cold data.As accompanying drawing 3, if LRU table 1 is being numbered state time, have a logical page number (LPN) be 2048 request arrive, in LRU table 1, subsistence logic page number is not the metadata item of 2048 and there is not the metadata item that visitation frequency value is 0, then remove the metadata item of table tail with the probability of 50% and insert in gauge outfit the new metadata item that logical page number (LPN) is 2048, being numbered the representative of LRU table remove table tail metadata item after the state that upgrades, in table, the metadata of four line items is respectively (2048,1), (8192,1), (256,1) and (0,3), from being numbered lRU table in draw bifurcated arrow represent logical page number (LPN) be 2048 request of access arrive after carry out probabilistic determination and LRU table renewal process.
4th step: ageing mechanism
For ensureing the ageing of dsc data, after often processing 4096 request of access, reduce by half to the visitation frequency value of metadata item in all LRU tables.Be numbered in accompanying drawing 4 lRU table illustrate LRU table 1 and be numbered state time carry out ageing mechanism process after the state that upgrades, in table, the metadata of four line items is respectively (8192,0), (256,0), (0,1) and (1024,0), is numbered lRU show sensing be numbered lRU table dotted arrow represent ageing mechanism processing procedure.Can find out that ageing mechanism only can upgrade the value of the visitation frequency of metadata item in LRU table, and can not metadata item be moved.
In the present embodiment cold and hot identifying is carried out once to data page, only need at the most to search in a LRU table or mobile 4 metadata items, compare traditional cold and hot recognizer, greatly improve once the speed of cold and hot data identification; In the design of storage organization, introduce the logical page number (LPN) of request of access, avoid the wrong identification of dsc data, effectively can identify dsc data and improve the overall performance of storage system.

Claims (1)

1. low expense and a cold and hot data identification method for zero defect, is characterized in that comprising the following steps:
The first step: node store structure design
Adopt the list of one group of record fixed qty metadata item to record the history visit information of data page, wherein, the number of list is designated as K, and in each list, the number of metadata item is designated as N; The visitation frequency of the logical page number (LPN) information of metadata record data page and data page corresponding to this logical page number (LPN), takies 32 bits and 4 bit storage space respectively; When list is filled with data item and has new metadata item to need to add in list, use least recently used algorithm to carry out the replacement of metadata item to list, each list is designated as LRU table;
Second step: address packets
The whole logical address space of storage system maps f (x)=x%K by a hash function, metadata item corresponding for Different Logic address grouping is stored in different LRU tables, to realize the grouping of logical page number (LPN), wherein, K is the number of LRU table, x is the logical page number (LPN) of a certain data page, and % is modulo operation;
3rd step: the cold and hot identification of data page
When a request of access arrives, first calculate cryptographic hash according to the logical page number (LPN) of request of access and determine that the LRU belonging to it shows, then in corresponding LRU table, search metadata item corresponding to this logical page number (LPN) whether to exist, if existed, judge whether the value of visitation frequency is greater than predefined threshold value after then the visitation frequency value of its correspondence being added 1, if so, then the data of request access are considered as dsc data, otherwise are considered as cold data; By setting multiple threshold value, the visitation frequency of request of access and multiple threshold value being compared, realizing the cold and hot data identification of more fine-grained multi-layer; Before recognition result returns, upgrade corresponding LRU show: if during the logical page number (LPN) of request of access exists and show with corresponding LRU, then its metadata item is moved on to gauge outfit that LRU shows with the least recently used algorithm characteristic of maintenance table; If there is no, then search to table tail the metadata item that visitation frequency value is 0 from the gauge outfit of LRU table, if find such item, then the metadata item before this metadata item is all rearwardly moved once, then the metadata item of request of access is inserted into the gauge outfit of LRU table, and to set visitation frequency value be 1, if can not find such item, then show the item of afterbody with the probability dropping LRU of 50%, then other metadata items in table are all rearwardly moved once, the gauge outfit that the metadata item inserting request of access is shown to LRU to set visitation frequency value be 1;
4th step: ageing mechanism
The visitation frequency of data page corresponding to a certain logical page number (LPN) is have recorded in metadata item, the request of access of data page is often come once, visitation frequency in respective meta-data item will add 1, when visitation frequency value reaches the maximal value 15 that this store data items space can reach, will no longer be increased; After the request of process fixed qty, the visitation frequency value of metadata item in all LRU tables is reduced by half.
CN201510395697.1A 2015-07-06 2015-07-06 A kind of cold and hot data identification method of low overhead and zero defect Active CN104951403B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510395697.1A CN104951403B (en) 2015-07-06 2015-07-06 A kind of cold and hot data identification method of low overhead and zero defect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510395697.1A CN104951403B (en) 2015-07-06 2015-07-06 A kind of cold and hot data identification method of low overhead and zero defect

Publications (2)

Publication Number Publication Date
CN104951403A true CN104951403A (en) 2015-09-30
CN104951403B CN104951403B (en) 2018-01-30

Family

ID=54166069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510395697.1A Active CN104951403B (en) 2015-07-06 2015-07-06 A kind of cold and hot data identification method of low overhead and zero defect

Country Status (1)

Country Link
CN (1) CN104951403B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569962A (en) * 2016-10-19 2017-04-19 暨南大学 Identification method of hot data based on temporal locality enhancement
CN106874213A (en) * 2017-01-12 2017-06-20 杭州电子科技大学 A kind of solid state hard disc dsc data recognition methods for merging various machine learning algorithms
CN109783443A (en) * 2018-12-25 2019-05-21 西安交通大学 The cold and hot judgment method of mass data in a kind of distributed memory system
CN109885574A (en) * 2019-02-22 2019-06-14 广州荔支网络技术有限公司 A kind of data query method and device
CN116303119A (en) * 2023-05-19 2023-06-23 珠海妙存科技有限公司 Method, system and storage medium for identifying cold and hot data
WO2024066575A1 (en) * 2022-09-26 2024-04-04 华为技术有限公司 Method and device for distinguishing cold and hot physical pages, and chip and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153630A1 (en) * 2005-07-13 2010-06-17 Samsung Electronics Co., Ltd. Data storage system with complex memory and method of operating the same
CN101753625A (en) * 2009-12-28 2010-06-23 北京理工大学 Method for deployment of copy service and copy establishment in peer-to-peer network environment
CN102117309A (en) * 2010-01-06 2011-07-06 卓望数码技术(深圳)有限公司 Data caching system and data query method
CN102170468A (en) * 2011-04-07 2011-08-31 江苏省电力公司 Content similarity-based and distributed storage replica replacement algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153630A1 (en) * 2005-07-13 2010-06-17 Samsung Electronics Co., Ltd. Data storage system with complex memory and method of operating the same
CN101753625A (en) * 2009-12-28 2010-06-23 北京理工大学 Method for deployment of copy service and copy establishment in peer-to-peer network environment
CN102117309A (en) * 2010-01-06 2011-07-06 卓望数码技术(深圳)有限公司 Data caching system and data query method
CN102170468A (en) * 2011-04-07 2011-08-31 江苏省电力公司 Content similarity-based and distributed storage replica replacement algorithm

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569962A (en) * 2016-10-19 2017-04-19 暨南大学 Identification method of hot data based on temporal locality enhancement
CN106874213A (en) * 2017-01-12 2017-06-20 杭州电子科技大学 A kind of solid state hard disc dsc data recognition methods for merging various machine learning algorithms
CN106874213B (en) * 2017-01-12 2020-03-20 杭州电子科技大学 Solid state disk hot data identification method fusing multiple machine learning algorithms
CN109783443A (en) * 2018-12-25 2019-05-21 西安交通大学 The cold and hot judgment method of mass data in a kind of distributed memory system
CN109885574A (en) * 2019-02-22 2019-06-14 广州荔支网络技术有限公司 A kind of data query method and device
WO2024066575A1 (en) * 2022-09-26 2024-04-04 华为技术有限公司 Method and device for distinguishing cold and hot physical pages, and chip and storage medium
CN116303119A (en) * 2023-05-19 2023-06-23 珠海妙存科技有限公司 Method, system and storage medium for identifying cold and hot data
CN116303119B (en) * 2023-05-19 2023-08-11 珠海妙存科技有限公司 Method, system and storage medium for identifying cold and hot data

Also Published As

Publication number Publication date
CN104951403B (en) 2018-01-30

Similar Documents

Publication Publication Date Title
CN104951403A (en) Low-overhead and error-free cold and hot data recognition method
US10706101B2 (en) Bucketized hash tables with remap entries
CN102521269B (en) Index-based computer continuous data protection method
JP6356675B2 (en) Aggregation / grouping operation: Hardware implementation of hash table method
CN101655861B (en) Hashing method based on double-counting bloom filter and hashing device
KR101620773B1 (en) Data migration for composite non-volatile storage device
JP6764359B2 (en) Deduplication DRAM memory module and its memory deduplication method
WO2019127104A1 (en) Method for resource adjustment in cache, data access method and device
CN104731794B (en) A kind of cold and hot data fragmentation excavates storage method
CN109977129A (en) Multi-stage data caching method and equipment
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
US10007615B1 (en) Methods and apparatus for performing fast caching
CN105677580A (en) Method and device for accessing cache
JP2017208096A5 (en)
US20170123979A1 (en) Systems, devices, and methods for handling partial cache misses
CN105117351A (en) Method and apparatus for writing data into cache
CN108027713A (en) Data de-duplication for solid state drive controller
WO2016107182A1 (en) Multi-path set-connection cache and processing method therefor
CN104158744A (en) Method for building table and searching for network processor
CN104750432B (en) A kind of date storage method and device
CN112148217B (en) Method, device and medium for caching deduplication metadata of full flash memory system
CN105956032A (en) Cache data synchronization method, system and apparatus
US9940069B1 (en) Paging cache for storage system
CN108733584B (en) Method and apparatus for optimizing data caching
CN104426774A (en) High-speed routing lookup method and device simultaneously supporting IPv4 and IPv6

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220829

Address after: 100192 207, floor 2, building C-1, Zhongguancun Dongsheng science and Technology Park, No. 66, xixiaokou Road, Haidian District, Beijing

Patentee after: Pingkai star (Beijing) Technology Co.,Ltd.

Address before: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee before: University of Science and Technology of China