CN104881369B - Towards the low memory cost hotspot data identification method of mixing storage system - Google Patents

Towards the low memory cost hotspot data identification method of mixing storage system Download PDF

Info

Publication number
CN104881369B
CN104881369B CN201510236366.3A CN201510236366A CN104881369B CN 104881369 B CN104881369 B CN 104881369B CN 201510236366 A CN201510236366 A CN 201510236366A CN 104881369 B CN104881369 B CN 104881369B
Authority
CN
China
Prior art keywords
page
temperature
bit array
value
heat degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510236366.3A
Other languages
Chinese (zh)
Other versions
CN104881369A (en
Inventor
肖侬
陈志广
卢宇彤
周恩强
张伟
董勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201510236366.3A priority Critical patent/CN104881369B/en
Publication of CN104881369A publication Critical patent/CN104881369A/en
Application granted granted Critical
Publication of CN104881369B publication Critical patent/CN104881369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of low memory cost hotspot data identification method towards mixing storage system, concretely comprises the following steps:1) access times of the Bloom filter UBF record pages of super large counter capacity are defined;2) when page x is accessed, query page x temperature;If 3) x is not yet recorded in UBF, by the way that some positions corresponding to x in UBF are recorded into x for 1 by 0 upset;If x is already recorded in UBF, by the way that one 0 corresponding to x in UBF is overturn for 1, so as to increase x temperature with certain probability;4) page x temperature and the size for the heat degree threshold specified are compared, if page x temperature is more than heat degree threshold, data corresponding to page x are identified as hot spot data;Redirect and perform step 2).The present invention has the advantages of hot spot data identification that memory cost is low, can realize big data quantity and high recognition accuracy.

Description

Towards the low memory cost hotspot data identification method of mixing storage system
Technical field
The present invention relates to extensive mixing technical field of memory, more particularly to a kind of low internal memory towards mixing storage system Expense hotspot data identification method.
Background technology
Current large-scale storage systems are proposed higher requirement to performance and cost, and mixing storage architecture is then one Kind can meet the solution of these demands simultaneously.It is high-end by the way that capacity is smaller, performance is higher in mixing storage system to set It is standby to ensure performance, while relatively low, the cheap low side devices of utility reduce cost, thus stored in such isomery and be In system, high-end devices are generally used to preserve the hot spot data frequently accessed, low side devices and are then used for preserving the cold number seldom accessed According to, but to realize that this Optimized Measures largely then need to rely on accurately identifying for hot spot data.
Hot spot data identifies generally by the analysis that takes statistics of the history to data access, current based on this principle Substantial amounts of hotspot data identification method has been carried out in practitioner's design.In principle, most of cache replacement policy belongs to focus number According to recognition methods, wherein most representative with LFU (Least Frequently Used) strategies.It is each in LFU strategies Part one counter of data maintenance, when a data are accessed, its corresponding counter is incremented by 1;Counter values are larger Data are hot spot data.But this method is needed for the counter of one integer type of each part of data maintenance, memory cost It is larger.
Bloom Filter (Bloom filter) are a kind of data structures of low memory cost, and it can be used to record access and goes through History, so being identified available for hot spot data.Bloom Filter safeguard a units group and k hash function, wherein, bit array In all positions be initialized as 0.When a page is accessed, k hash function calculates k according to the page number x of the page Individual numerical value h1(x),h2(x),…,hk(x), this k value corresponds to k position in bit array respectively, as long as this k position is all set to 1, so that it may recorded page x in Bloom Filter.Bloom Filter can be many with very low space expense record Accession page, but it can not be directly used in hot spot data identification in, have include following 2 reasons;
1) Bloom Filter can not record the repeated accesses to data;No matter page x is accessed how many times, Bloom Filter is only by k position h corresponding to x1(x),h2(x),…,hk(x) 1 is set to, without taking any measure record x to be interviewed The number asked.Due to not having access times information, Bloom Filter can not identify hot spot data from accessing in history.
2) Bloom Filter can only constantly record the new page, and can not delete and be already recorded in Bloom Filter In any page, cause the access history of record increasingly longer;And in fact, the access history of early stage identifies to hot spot data There is no value, memory cost can be increased on the contrary by recording long access history.
In order to overcome two big defects existing for above Bloom Filter, there is researcher to propose to utilize multiple Bloom Filter carrys out record access historic villages and towns, and this method is referred to as more Bloom Filter methods (Multiple Bloom Filters methods), abbreviation MBFs methods.When a page x is accessed, selected from multiple Bloom Filter of its maintenance One is selected also without record x Bloom Filter, and x is recorded in the selected Bloom Filter.So, one The page repeatedly accessed will be recorded in multiple Bloom Filter, so as to be identified as hot spot data.In addition, MBFs Method periodically one Bloom Filter of selection, by all clearings in its bit array, this is recorded in so as to delete The access history of all pages in Bloom Filter, whereby method deletion early stage.
Although MBFs methods can record the access times of the page by multiple Bloom Filter, and by periodic The access history that a Bloom Filter deletes early stage is removed, so as to efficiently identify focus number from access history According to;But there is also both sides defect for MBFs methods:On the one hand, due to using multiple Bloom Filter data structures, The memory cost of MBFs methods is still higher;On the other hand, the page access number that MBFs is able to record is limited in scope;It is assumed that MBFs methods safeguard n Bloom Filter, and for specific webpage, this method can be that the maximum access times of its record are n; If the access times of the page, more than n, the access information beyond part can not be recorded in Bloom Filter so that lost Having lost can a part of hot information.In above-mentioned both sides defect due to can not make up simultaneously, i.e. increase accesses secondary MBFs methods Several count ranges (n) inevitably results in the increase of memory cost, therefore MBFs methods also are difficult to be applied to mixing on a large scale and deposited In storage system.
The content of the invention
The technical problem to be solved in the present invention is overcome the deficiencies in the prior art, there is provided a kind of implementation method is simple, internal memory Expense is low, can realize the hot spot data identification of big data quantity and recognition accuracy is high towards the low interior of mixing storage system Deposit expense hotspot data identification method.
In order to solve the above technical problems, technical scheme proposed by the present invention is:
A kind of low memory cost hotspot data identification method towards mixing storage system, specific implementation step are:
1) the Bloom filter UBF (Ultra-counting Bloom Filter) of super large counter capacity is defined, it is described super The Bloom filter UBF of big counter capacity is remembered by bit array and each page of mapping to k hash function of the bit array Record each access of the page;The bit array is initialized to be 0 and set each hash function;
2) when page x is accessed, by the temperature of the bit array query page x, the temperature of the page x is the page The digit that x is 1 in the corresponding k position intermediate value of bit array;
3) judge whether page x temperature is less than default first record threshold value t, first record threshold value t is less than Hash letter Several number k, if it is, judge that page x is not yet recorded in UBF, for page x k of bit array correspondence position, by by its Middle t position is 1 record page x by 0 upset;Otherwise judge that page x is reported in UBF, it is individual in the k of bit array for page x Corresponding position, pass through the repeated accesses counting for being 1 carry out page x by the upset of the first predetermined probabilities by the position that one of value is 0;
4) page x temperature and the size for the heat degree threshold specified are compared, if page x temperature is more than heat degree threshold, page Data corresponding to the x of face are identified as hot spot data;Redirect and perform step 2).
Preferably, the first predetermined probabilities are 1/2 in the step 3)h-t, wherein h is page x temperature, and t is first record Threshold value.
Preferably, in the step 3) by the bit flipping that a value is 0 be 1 before also include carrying out for some 1 it is random The step of clearing, concretely comprise the following steps:The number m for the position that byte intermediate value where counting the position to be flipped for being 1 is 1, according to counting To number m with the second predetermined probabilities by each of byte clearing where the position to be flipped for being 1.
Preferably, second predetermined probabilities are 1/ (8-m), and byte intermediate value where wherein m is the position to be flipped for being 1 is The number of 1 position.
Preferably, the step 2) concretely comprises the following steps:
2.1) when page x receives accessed request, initialization counter i and temperature h are 0, redirect execution step 2.2);
2.2) the value h of i-th of hash function corresponding to page x is calculatedi(x) it is corresponding in the bit array, to detect page x Hi(x) value of position, if detecting corresponding hi(x) value of position is 1, then redirects and perform step 2.3);If detect corresponding the hi(x) value of position is 0, then redirects and perform step 2.4);
2.3) temperature h is added 1, redirects and perform step 2.4);
2.4) counter i is added 1, redirects and perform step 2.2), until counter i value is equal to the number k of hash function, Exported temperature h as page x temperature, redirect and perform step 3).
Preferably, concretely comprising the following steps for the bit array is initialized in the step 1):
1.1) set the number k of hash function and need the access history length n recorded;
1.2) storage according to needed for the number k of the hash function of setting, access history length n calculate the bit array is empty Between size, in internal memory be a piece of corresponding size of the bit array application memory space;
1.3) memory space corresponding to the bit array is initialized as 0.
Preferably, the memory space needed for the bit array is directly proportional to k × n, and wherein k is the number of hash function, and n is Need the access history length recorded.
Preferably, the step 4) also include being adjusted by formula (1) according to the page x temperature heat degree threshold size specified with The step of being identified for hot spot data next time (5);
Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, and Threshold is represented Heat degree threshold before adjustment, DesiredHotPages represent the number of the hot pages set in advance for needing to identify.
Preferably, the step 5) also includes when page x is identified as hot spot data, after raising the adjustment by formula (2) Heat degree threshold, obtain the heat degree threshold after final adjustment be used for next time hot spot data identify the step of;
Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, NewThreshold ' The heat degree threshold after final adjustment is represented, DesiredHotPages represents the hot pages set in advance for needing to identify Number.
Compared with prior art, the advantage of the invention is that:
1) present invention is visited by defining each time of the Bloom filter UBF data structure records pages of super large counter capacity Ask, only a page need to can record by the t positions information less than hash function number, memory cost can be reduced, led to simultaneously Cross 0 corresponded to the page in position to count to carry out the repetition memory access of the page for 1 with the upset of certain probability, it is only necessary to safeguard that one surpasses The Bloom filter UBF of big counter capacity is the multiple access record that the page can be achieved, so as to realize low memory cost While realize big data quantity hot spot data identification, effectively increase hot spot data recognition accuracy and efficiency.
2) present invention is further clear by all bytes of the position to be flipped for being 1 with certain probability before being 1 by 0 bit flipping Zero, the temperature for making infrequently to access data progressively declines, and is finally disappeared in history is accessed, so as to effectively dispose for a long time not Accessed historical data.
3) present invention further adjusts heat degree threshold according to the temperature of page dynamic, when a page is accessed, root Heat degree threshold Threshold is adjusted according to the temperature h of accession page, the value for making heat degree threshold Threshold is according to adjustment of load The temperature average value of the page appeared in load, wherein when temperature h is more than heat degree threshold Threshold, raise temperature threshold Value Threshold value;When temperature h is less than heat degree threshold Threshold, heat degree threshold Threshold value is lowered, from And cause the identification of hot spot data to be adaptive to the change of load, improve the flexibility of hot spot data identification.
4) present invention further adjusts heat degree threshold according to the temperature of page dynamic, whenever a page is identified as focus During data, up-regulation heat degree threshold Threshold value, identified by the heat degree threshold Threshold constantly raised Hot spot data is data most hot in whole load.
Brief description of the drawings
Fig. 1 is that the implementation process of the present embodiment towards the low memory cost hotspot data identification method of mixing storage system is shown It is intended to.
Embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and Limit the scope of the invention.
As shown in figure 1, low memory cost hotspot data identification method of the present embodiment towards mixing storage system, specific real Applying step is:
1) UBF and initialization are defined:Define the Bloom filter UBF, the Bu Long of super large counter capacity of super large counter capacity Filter UBF records the access times of the page by bit array and each page of mapping to k hash function of bit array;Initially Change bit array to be 0 and set each hash function;
2) query page temperature:When page x receives accessed request, pass through bit array query page x temperature, page Face x temperature is the digit that page x is 1 in the corresponding k position intermediate value of bit array;
3) page is added into UBF:Judge whether page x temperature is less than default first record threshold value t, first record threshold Value t be less than hash function number k, if it is, judge page x be not yet recorded in UBF, for page x bit array k Individual corresponding position, by the way that wherein t position is recorded into page x for 1 by 0 upset;Otherwise judge that page x is reported in UBF, for page For face x k of bit array corresponding position, it is 1 carry out page x to be overturn by the position that is 0 by one of value by the first predetermined probabilities Repeated accesses count;
4) hot spot data identifies:Compare page x temperature and the size for the heat degree threshold specified, if page x temperature is big In heat degree threshold, data corresponding to page x are identified as hot spot data, and otherwise data corresponding to page x are identified as cold data;Redirect Perform step 2).
In the present embodiment, design realize it is a kind of can with the data structure of very low memory cost record access history, should Data structure records the access times of corresponding page by k position in bit array, has the counter capacity of super large, thus is referred to as super The Bloom filter UBF (Ultra-counting Bloom Filter) of big counter capacity.Held by safeguarding that a super large counts The Bloom filter UBF of amount achieves that super large-scale page access historical record, so as to dynamically identify frequent access Hot spot data, for mixing storage system in data layout guidance is provided.
In the present embodiment, initialization bit array concretely comprises the following steps in step 1):
1.1) set the number k of hash function and need the access history length n recorded;
1.2) memory space according to needed for the number k of the hash function of setting, access history length n calculate bit array is big It is small, it is the memory space of a piece of corresponding size of bit array application in internal memory;
1.3) memory space corresponding to bit array is initialized as 0.
The number k of hash function, need the access history length n recorded specifically can be according to memory source, upper layer application journey The practical application requests such as sequence demand are configured, and when the accessed number of data in application load is more, then can be set larger K values to record more access times;On the premise of memory source abundance, access history length n then set it is more big more Good, n is the page number included in recorded access history.
It is directly proportional to k × n in the memory space needed for bit array in the present embodiment, i.e. the number for the position that bit array is included Mesh is directly proportional to k × n, then the access history length n recorded according to the number k of hash function, needs can calculate UBF institutes The size of the bit array needed.Apply for that one piece of memory headroom is initial as bit array, then by all positions in bit array in internal memory Turning to 0, it is assumed that the number of bit array middle position is 2k × n, then applies for the memory space of a piece of 2k × n/8 bytes in internal memory, and Region corresponding to bit array is reset;K each page x of mapping of initializing set hash function h simultaneously1(x),h2(x),…,hk (x) the Bloom filter UBF of super large counter capacity initialization, is completed.
In the present embodiment, step 2) concretely comprises the following steps:
2.1) when page x is accessed, initialization counter i and temperature h are 0, redirect and perform step 2.2);
2.2) the value h of i-th of hash function corresponding to page x is calculatedi(x) page x corresponding h in bit array, are detectedi (x) value of position, if detecting corresponding hi(x) value of position is 1, then redirects and perform step 2.3);If detect corresponding hi(x) The value of position is 0, then redirects and perform step 2.4);
2.3) temperature h is added 1, redirects and perform step 2.4);
2.4) counter i is added 1, redirects and perform step 2.2), until counter i value is equal to the number k of hash function, Exported temperature h as page x temperature, redirect and perform step 3).
Page x corresponds to k position in UBF bit array, and the digit that this k position intermediate value is 1 is to be defined as page x heat Degree.Then after request of the upper layer application to page x is received, page x corresponding k position in bit array, statistics are detected successively Go out the temperature h that the number that the k position intermediate value is 1 obtains page x, then page x is added in bit array to record currently to page Face x access.
Page x is added in UBF bit array and two kinds of situations be present:The first situation is that page x is not yet recorded in UBF Bit array in, be now that page x is added in UBF first to carry out first record;Second of situation is that page x is remembered Record is now to need the repeated accesses for increasing page x to count in UBF bit array.In the present embodiment, step 3) especially by The temperature h of current page and default first record threshold value t comparison determine which kind of above-mentioned situation performed, i.e., by remembering first Record threshold value t differentiates whether a page is reported in UBF bit array, and first record threshold value t is specifically set can be significantly small In k.
When the temperature h of the page is less than first record threshold value t, then the first situation is performed, otherwise performs second of situation. For the first situation, i.e. page x is not yet recorded in UBF, then is 1 by 0 upset by t position in k of bit array corresponding position Page x is recorded, so, after page x is added into, at least t are 1 in its corresponding k position;When page x is accessed again, The temperature h of the page is no longer less than t, then performs second of situation.For second of situation, i.e. page x has been recorded in UBF, this When only need to increase repeated accesses record, then one value of selection is preset generally for 0 position by first in k of bit array corresponding position Rate upset to carry out repeated accesses counting, that is, records the access times to page x for 1.
Using the above method, the present embodiment only needs t positions information to represent whether a page x is recorded in UBF, compares Need to safeguard an integer counter for each page with record access number in traditional LFU strategies, and each integer meter Number device need to occupy tens place memory headroom, can effectively reduce memory cost.The present embodiment also only needs to safeguard a UBF with reality Each time of the existing page accesses record, thus its memory cost also significantly lower than needs to safeguard multiple Bloom Filter MBFs side Case.
The present embodiment for 1 with the upset of certain probability by k positions one 0 by carrying out repeated accesses counting, to page x Access for each time and to be recorded in k of UBF correspondence position.Even if the access times to page x are exponentially increased, in UBF In the number of 1 corresponding with page x also only linear increase so that the i.e. recordable index of position information for passing through linear increase increases Long access times, there is super large count range, so as to realize the focus number of big data quantity while low memory cost is realized According to identification.Due to the unprecedented growth of count range, hot spot data recognition accuracy and efficiency can also be significantly improved.It is and traditional MBFs schemes need to safeguard n Bloom Filter, and the page access number of most multipotency record is only n, when page x visit When asking number more than n, follow-up access can not be recorded in MBFs, so as to cause hot information to be lost.
In the present embodiment, the first predetermined probabilities are 1/2h-t, wherein h is the temperature of current page, and t is time record threshold value.I.e. When needing to carry out repeated accesses counting to page x, the position that a values of the page x in k corresponding position of bit array is 0 is pressed 1/2h-tProbability upset be 1, hence for specific webpage x, just with several information can records application program to page x Hundreds of time access.
In the present embodiment, in step 3) by 0 bit flipping be 1 before also include being reset at random for some 1 Step, concretely comprise the following steps:The number m for the position that byte intermediate value where counting the position to be flipped for being 1 is 1, the number obtained according to statistics Mesh m is reset each of byte where the position to be flipped for being 1 with the second predetermined probabilities.Access was once appeared in for those to go through Shi Zhong but the page not being accessed for a long time, if by its in bit array corresponding to " 1 " position overturn at leisure as 0, can make Its temperature progressively declines, and is eventually disappeared in history is accessed, and effectively disposes not accessed without any value for a long time Early stage access history data.
In the present embodiment, the second predetermined probabilities are 1/ (8-m), and byte intermediate value where wherein m is the position to be flipped for being 1 is The number of 1 position.I.e. for specific one position, before being 1 by 0 upset by the position, first count 1 in the byte at this place Number m, then reset all of the byte with 1/ (8-m) probability, then perform and overturn the certain bits for 1 by 0 Operation.
The present embodiment identifies whether the page is hot spot data by specified heat degree threshold Threshold, if page x heat Degree h is more than the heat degree threshold Threshold specified, then instruction page x is hot pages, and data corresponding to page x are dsc data, It should be stored in high-end devices;If page x temperature h is less than the heat degree threshold Threshold that specifies, illustrate x for not frequently Numerous accession page, data corresponding to page x are infrequently to access data, as cold data, it should are stored in low side devices. To after hot spot data recognition result, the recognition result is returned in mixing storage system, mixing storage is instructed according to hot spot data Data layout in system.
In the present embodiment, step 4) also includes the size for the heat degree threshold specified according to page x temperature by formula (1) adjustment The step of being identified for hot spot data next time;
Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, and Threshold is represented Heat degree threshold before adjustment, DesiredHotPages represent the number of the hot pages set in advance for needing to identify.Assuming that In the Bedding storage system of a two-stage, application program is needed 1000 most hot Page-savings in first order accumulation layer Secondary, then DesiredHotPages is set as 1000.
In the present embodiment, dynamic is made to specified heat degree threshold Threshold by page x temperature and adjusted, such as formula (1) It show the first rule of the present embodiment dynamic adjustment.In the first rule, when a page is accessed, according to the page X temperature adjusts heat degree threshold Threshold according to formula (1), when page x temperature h value is more than heat degree threshold During Threshold, heat degree threshold Threshold is raised, i.e., new threshold value will increase;Conversely, the value of the temperature h as page x During less than heat degree threshold Threshold, heat degree threshold Threshold is lowered, i.e., new threshold value will be reduced.Pass through above-mentioned first Heat degree threshold Threshold can be adjusted to the temperature average value of the page appeared in load by kind rule, so that temperature Threshold value Threshold being capable of flexibility of the dynamic self-adapting in load, raising hot spot data identification for different loads.
In the present embodiment, step 5) also includes when page x is identified as hot spot data, by the heat after formula (2) up-regulation adjustment Threshold value is spent, the heat degree threshold after final adjustment is obtained and is used for the step of hot spot data identifies next time;
Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, NewThreshold ' The heat degree threshold after final adjustment is represented, DesiredHotPages represents the hot pages set in advance for needing to identify Number.
As formula (2) show the present embodiment dynamic adjustment second rule, when a page is identified as hot spot data When, i.e., when the temperature h of page value is more than heat degree threshold Threshold, raise heat degree threshold Threshold.If due to big The page of amount is identified as hot spot data, then illustrates that heat degree threshold Threshold is too low, it should suitably increase threshold value, with from this The higher page of temperature is found out in a little hot pages, thus a hot pages can often found out by above-mentioned second rule Shi Zengjia heat degree thresholds Threshold value, steps up heat degree threshold Threshold value, and some temperatures are less high Hot spot data will likely again be determined and be identified as cold data, thus can finally be filtered out from substantial amounts of hot pages most hot Data, so that it is guaranteed that most hot data can be identified from hotter data by the heat degree threshold Threshold of adjustment. And in traditional MBFs methods, due to a static threshold can only be set, thus do not have flexibility, it is quiet that access times exceed this The page of state threshold value is identified as hot spot data, can not update to obtain most hot data.
Above-mentioned simply presently preferred embodiments of the present invention, not makees any formal limitation to the present invention.It is although of the invention It is disclosed above with preferred embodiment, but it is not limited to the present invention.Any those skilled in the art, do not taking off In the case of from technical solution of the present invention scope, all technical solution of the present invention is made perhaps using the technology contents of the disclosure above More possible changes and modifications, or it is revised as the equivalent embodiment of equivalent variations.Therefore, it is every without departing from technical solution of the present invention Content, according to the technology of the present invention essence to any simple modifications, equivalents, and modifications made for any of the above embodiments, all should fall In the range of technical solution of the present invention protection.

Claims (9)

  1. A kind of 1. low memory cost hotspot data identification method towards mixing storage system, it is characterised in that specific implementation step For:
    1) the Bloom filter UBF, the Bloom filter UBF of the super large counter capacity for defining super large counter capacity pass through digit Group and each page of mapping record the access times of the page to k hash function of the bit array;Initialize the bit array 0 and to set each hash function;
    2) when page x is accessed, by the temperature of the bit array query page x, the temperature of the page x exists for page x The corresponding k position intermediate value of bit array is 1 digit;
    3) judge whether page x temperature is less than default first record threshold value t, wherein first record threshold value t is less than Hash letter Several number k, if it is, judge that page x is not yet recorded in UBF, for page x k of bit array correspondence position, pass through by Wherein t position is 1 record page x by 0 upset;Otherwise judge page x be reported in UBF, for page x bit array k Individual corresponding position, pass through the repeated accesses counting for being 1 carry out page x by the upset of the first predetermined probabilities by the position that one of value is 0;
    4) page x temperature and the size for the heat degree threshold specified are compared, if page x temperature is more than heat degree threshold, page x Corresponding data are identified as hot spot data;Redirect and perform step 2).
  2. 2. the low memory cost hotspot data identification method according to claim 1 towards mixing storage system, its feature It is:The first predetermined probabilities are 1/2 in the step 3)h-t, wherein h is page x temperature, and t is first record threshold value.
  3. 3. the low memory cost hotspot data identification method according to claim 2 towards mixing storage system, its feature Be, in the step 3) by a value be 0 bit flipping be 1 before also include reset at random the step of, concretely comprise the following steps: The number m for the position that byte intermediate value where counting the position to be flipped for being 1 is 1, the number m obtained according to statistics is with the second predetermined probabilities By each clearing of byte where the position to be flipped for being 1.
  4. 4. the low memory cost hotspot data identification method according to claim 3 towards mixing storage system, its feature It is:Second predetermined probabilities are 1/ (8-m), and wherein m is the number for the position that the position place byte intermediate value to be flipped for being 1 is 1.
  5. 5. the low memory cost hotspot data identification method according to claim 4 towards mixing storage system, its feature It is, the step 2) concretely comprises the following steps:
    2.1) when page x is accessed, initialization counter i and temperature h are 0, redirect and perform step 2.2);
    2.2) the value h of i-th of hash function corresponding to page x is calculatedi(x), and page x is detected corresponding the in the bit array hi(x) value of position, if detecting corresponding hi(x) value of position is 1, then redirects and perform step 2.3);If detect corresponding hi (x) value of position is 0, then redirects and perform step 2.4);
    2.3) temperature h is added 1, redirects and perform step 2.4);
    2.4) counter i is added 1, redirects and perform step 2.2), until counter i value is equal to the number k of hash function, by heat The temperature for spending h as page x exports, and redirects and performs step 3).
  6. 6. the low memory cost hotspot data identification method according to claim 5 towards mixing storage system, its feature It is, concretely comprising the following steps for the bit array is initialized in the step 1):
    1.1) set the number k of hash function and need the access history length n recorded;
    1.2) memory space according to needed for the number k of the hash function of setting, access history length n calculate the bit array is big It is small, it is a piece of memory space for corresponding to size of the bit array application in internal memory;
    1.3) memory space corresponding to the bit array is initialized as 0.
  7. 7. the low memory cost hotspot data identification method according to claim 6 towards mixing storage system, its feature It is:Memory space needed for the bit array is directly proportional to k × n, and wherein k is the number of hash function, and n needs to record Access history length.
  8. 8. the low memory cost hot spot data towards mixing storage system according to any one in claim 1~7 is known Other method, it is characterised in that the heat degree threshold that the step 4) also includes being specified by formula (1) adjustment according to page x temperature is big Small the step of being identified for hot spot data next time (5);
    Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, and Threshold represents adjustment Preceding heat degree threshold, DesiredHotPages represent the number of the hot pages set in advance for needing to identify.
  9. 9. the low memory cost hotspot data identification method according to claim 8 towards mixing storage system, its feature It is, the step 5) also includes when page x is identified as hot spot data, and the heat degree threshold after the adjustment is raised by formula (2), Obtain the heat degree threshold after final adjustment and be used for the step of hot spot data identifies next time;
    Wherein, h representation pages x temperature, NewThreshold represent the heat degree threshold after adjustment, and NewThreshold ' is represented Heat degree threshold after final adjustment, DesiredHotPages represent the number of the hot pages set in advance for needing to identify.
CN201510236366.3A 2015-05-11 2015-05-11 Towards the low memory cost hotspot data identification method of mixing storage system Active CN104881369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510236366.3A CN104881369B (en) 2015-05-11 2015-05-11 Towards the low memory cost hotspot data identification method of mixing storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510236366.3A CN104881369B (en) 2015-05-11 2015-05-11 Towards the low memory cost hotspot data identification method of mixing storage system

Publications (2)

Publication Number Publication Date
CN104881369A CN104881369A (en) 2015-09-02
CN104881369B true CN104881369B (en) 2017-12-12

Family

ID=53948869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510236366.3A Active CN104881369B (en) 2015-05-11 2015-05-11 Towards the low memory cost hotspot data identification method of mixing storage system

Country Status (1)

Country Link
CN (1) CN104881369B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874213B (en) * 2017-01-12 2020-03-20 杭州电子科技大学 Solid state disk hot data identification method fusing multiple machine learning algorithms
CN108241725B (en) * 2017-05-24 2019-07-05 新华三大数据技术有限公司 A kind of data hot statistics system and method
CN109542339B (en) * 2018-10-23 2021-09-03 拉扎斯网络科技(上海)有限公司 Data layered access method and device, multilayer storage equipment and storage medium
CN112052190B (en) * 2020-09-03 2022-08-30 杭州电子科技大学 Solid state disk hot data identification method based on bloom filter and secondary LRU table
CN113766650B (en) * 2021-08-26 2022-06-28 武汉天地同宽科技有限公司 Internet resource obtaining method and system based on dynamic balance
CN113849752A (en) * 2021-09-24 2021-12-28 苏州浪潮智能科技有限公司 Page caching method and device and storage medium
CN117234432B (en) * 2023-11-14 2024-02-23 苏州元脑智能科技有限公司 Management method, management device, equipment and medium of hybrid memory system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902260A (en) * 2012-12-25 2014-07-02 华中科技大学 Pre-fetch method of object file system
CN104156432A (en) * 2014-08-08 2014-11-19 四川九成信息技术有限公司 File access method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902260A (en) * 2012-12-25 2014-07-02 华中科技大学 Pre-fetch method of object file system
CN104156432A (en) * 2014-08-08 2014-11-19 四川九成信息技术有限公司 File access method

Also Published As

Publication number Publication date
CN104881369A (en) 2015-09-02

Similar Documents

Publication Publication Date Title
CN104881369B (en) Towards the low memory cost hotspot data identification method of mixing storage system
CN105242871B (en) A kind of method for writing data and device
CN104246721B (en) Storage system, storage controller, and storage method
US8880544B2 (en) Method of adapting a uniform access indexing process to a non-uniform access memory, and computer system
US20180004690A1 (en) Efficient context based input/output (i/o) classification
CN106528454B (en) A kind of memory system caching method based on flash memory
CN105653591A (en) Hierarchical storage and migration method of industrial real-time data
CN101645043B (en) Methods for reading and writing data and memory device
WO2013152678A1 (en) Method and device for metadata query
US10997080B1 (en) Method and system for address table cache management based on correlation metric of first logical address and second logical address, wherein the correlation metric is incremented and decremented based on receive order of the first logical address and the second logical address
CN106502587A (en) Data in magnetic disk management method and magnetic disk control unit
CN104699424A (en) Page hot degree based heterogeneous memory management method
US20120117297A1 (en) Storage tiering with minimal use of dram memory for header overhead
CN110795363B (en) Hot page prediction method and page scheduling method of storage medium
CN108762671A (en) Mixing memory system and its management method based on PCM and DRAM
CN103942161B (en) Redundancy elimination system and method for read-only cache and redundancy elimination method for cache
CN108845957B (en) Replacement and write-back self-adaptive buffer area management method
CN104077242A (en) Cache management method and device
CN103150395A (en) Directory path analysis method of solid state drive (SSD)-based file system
CN107102954A (en) A kind of solid-state storage grading management method and system based on failure probability
CN111580754B (en) Write-friendly flash memory solid-state disk cache management method
CN114416646A (en) Data processing method and device of hierarchical storage system
CN110532200B (en) Memory system based on hybrid memory architecture
CN108710581A (en) PCM storage medium abrasion equilibrium methods based on Bloom filter
CN103176753B (en) Storing device and data managing method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant