CN111444036B - Data relevance perception erasure code memory replacement method, equipment and memory system - Google Patents

Data relevance perception erasure code memory replacement method, equipment and memory system Download PDF

Info

Publication number
CN111444036B
CN111444036B CN202010196333.1A CN202010196333A CN111444036B CN 111444036 B CN111444036 B CN 111444036B CN 202010196333 A CN202010196333 A CN 202010196333A CN 111444036 B CN111444036 B CN 111444036B
Authority
CN
China
Prior art keywords
data blocks
cold data
stripe
memory
cold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010196333.1A
Other languages
Chinese (zh)
Other versions
CN111444036A (en
Inventor
黄建忠
曹强
廖宝忠
王程锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202010196333.1A priority Critical patent/CN111444036B/en
Publication of CN111444036A publication Critical patent/CN111444036A/en
Application granted granted Critical
Publication of CN111444036B publication Critical patent/CN111444036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an erasure code memory replacement method, device and memory system with data relevance perception, belonging to the field of computer storage and comprising the following steps: (1) loading the data into the memory in blocks according to the sequence of the read requests, and storing the data in a copy mode; (2) if the number of the executed read requests reaches a threshold value K2, performing garbage collection, and then turning to the step (4); otherwise, turning to the step (3); (3) if the number reaches a threshold value K1, performing erasure code archiving to enable the associated data blocks to be located in the same stripe, and then turning to the step (4); otherwise, directly switching to the step (4); (4) if the user request is executed completely, the operation is finished; otherwise, acquiring the next unexecuted read request as the current read request, and turning to the step (1). The invention can reduce the updating expense caused by replacement after filing and reduce the access delay of users by increasing the probability that the data blocks in the same strip are eliminated simultaneously.

Description

Data relevance perception erasure code memory replacement method, equipment and memory system
Technical Field
The invention belongs to the field of computer storage, and particularly relates to an erasure code memory replacement method, erasure code memory replacement equipment and an erasure code memory system with data relevance perception.
Background
In order to increase the data reading speed and reduce the user access delay, more and more data are stored in the memory. For example, scientific calculations place the resulting intermediate result set in memory for low latency access. The memory has the characteristic of data volatility, and data in the memory is lost when temporary faults such as power failure, system breakdown and the like occur, so that the fault tolerance is required to be carried out by adopting a copy or an erasure code. The copy fault-tolerant mode is simple and efficient, the access parallelism can be improved, but the space utilization rate is low, and the erasure code has high space utilization rate and low access parallelism on the premise of ensuring the same fault tolerance as the copy. In order to ensure high fault tolerance and high storage efficiency of a cluster memory, different redundancy modes are required to be used for storing data with different access heat degrees (namely hot data, warm data and cold data) in the memory, and usually, a copy mechanism is adopted for the hot data to ensure high access performance; the temperature data is stored by adopting an erasure code mechanism, so that high storage efficiency is ensured; the cold data is persisted to the disk, saving memory space.
Because the memory space is limited, in order to improve the memory utilization rate, when the available memory space is insufficient, a removal strategy (such as LRU) is needed to be used for removing part of data from the memory and writing the data back to a disk to recycle the memory space, because copy storage is adopted for hot data in the current memory and erasure code storage is adopted for cold data blocks, the removed blocks can be stored by copies or in an erasure code stripe, if the data blocks are stored by the copies, the data blocks can be directly written back to the disk and all copies of the data blocks in the memory are deleted, and if the data blocks are located in the erasure code stripe, the data blocks which need to be removed in other stripes in the memory need to be replaced by the data blocks which need not be removed in other stripes so as to ensure the fault tolerance of the stripe. The replacement in the erasure code stripe is equivalent to an updating operation, and not only the block needing to be eliminated needs to be replaced by a new block, but also the check block in the stripe needs to be updated, so that the replacement cost is high.
The process of converting data from a copy to an Erasure code is called Erasure-coded archive (Erasure-coded archive). The existing erasure code filing scheme only focuses on reducing transmission flow during filing, and does not consider updating overhead caused by eliminating cold data blocks due to insufficient memory space after filing; the update schemes in the existing erasure code memory system are all aiming at the update caused by the optimized write request and are not suitable for the update caused by the optimized replacement.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides an erasure code memory replacement method, device and memory system with data relevance perception, and aims to reduce the updating expense caused by replacement after filing and reduce the access delay of a user by increasing the probability that data blocks in the same strip are eliminated simultaneously.
To achieve the above object, according to a first aspect of the present invention, there is provided a data association aware erasure code memory replacement method, including:
(1) loading the data into the memory in blocks according to the sequence of the current read request, and storing the data in a copy mode;
(2) judging whether the number of executed read requests reaches a garbage collection threshold value K2, if so, performing garbage collection to eliminate data blocks with low access frequency in the memory, and after the garbage collection is finished, turning to the step (4); otherwise, turning to the step (3);
(3) judging whether the number of executed read requests reaches an archiving threshold K1, if so, performing erasure code archiving according to the relevance among the data blocks so that the relevant data blocks are positioned in the same strip, and after the erasure code archiving is finished, turning to the step (4); otherwise, directly switching to the step (4);
(4) judging whether the user request is executed completely, if so, finishing the operation; otherwise, acquiring the next unexecuted read request as the current read request, and turning to the step (1);
wherein 0< K1< K2.
When erasure codes are filed, erasure code stripes are constructed based on the relevance among the data blocks, so that the relevant data blocks are positioned in the same stripe, and the data blocks with the relevance are simultaneously cold data blocks at a high probability, so that the probability that the data blocks in the same stripe are simultaneously eliminated is increased, the updating expense caused by replacement after filing is reduced, and the access delay of users is reduced.
Further, in step (3), when the number of executed read requests reaches an archive threshold K1, performing erasure code archiving according to the association between the data chunks so that the associated data chunks are located in the same stripe, including:
(31) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, and after the sorting is finished, taking the data block with the highest access frequency of the first n% as a hot data block and taking the rest blocks as cold data blocks;
(32) screening out all cold data blocks which do not participate in forming the strip to form a set coldlist;
(33) screening all associated K item sets from the set colledlist through association analysis, sequencing all associated K item sets according to the sequence of the occurrence frequency from large to small, and forming a set allCklist by the sequenced associated K item sets;
(34) traversing the set allCklist, and if all cold data blocks in the traversed associated K item set do not participate in forming a stripe, forming a stripe by using the cold data blocks in the associated K item set; otherwise, not operating the associated K item set;
(35) after the set allCklist traversal is finished, cold data blocks which participate in the formation of the strip are removed from the set coldlist, and the strip is formed by utilizing the residual cold data blocks in the set coldlist;
wherein 0< n < 100; k represents the number of data blocks contained in one stripe; each associated K item set is composed of K cold data blocks, the support degree support and the confidence coefficient meet the condition that the support is not less than min _ sup, the confidence is not less than min _ conf, and the min _ sup and the min _ conf are respectively preset threshold values.
According to the method, after the associated K item sets in the cold data blocks are screened out through association analysis, the associated K item sets are traversed according to the descending order of the occurrence frequency, the associated K item sets which have higher occurrence frequency and in which the cold data blocks do not belong to other strips are preferentially utilized to form the strips, so that the association among the cold data blocks in the constructed erasure code strips is maximized, and the probability that the data blocks in the same strip are simultaneously eliminated is maximized.
In the erasure code archiving process, the invention only utilizes the currently identified cold data blocks for archiving, so that the hot data blocks are still stored in a copy mode, and the cold data blocks are stored in an erasure code mode, thereby ensuring the fault tolerance and reducing the memory overhead while not reducing the system performance.
Further, in step (35), the partitioning and assembling the cold data remaining in the set coldlist into the stripe includes:
and sequencing the residual cold data blocks in the set colledlist according to the sequence of the timestamps from small to large, and after the sequencing is finished, sequentially taking out K cold data blocks from the set colledlist and forming a strip until all the cold data blocks in the set colledlist are positioned in the strip.
According to the invention, for data blocks without relevance, K data blocks are sequentially taken to construct the stripe after the data blocks are sequenced according to the sequence of the timestamps, and the probability that the data blocks in the same stripe are eliminated simultaneously can be increased by utilizing the time locality principle of data access.
Further, in step (33), all the related K item sets are screened out from the set coldlist by the correlation analysis, which includes:
dividing K1 user requests into w grouped sets according to group size groupsize1~setwAfter hot data blocks in each group are filtered out, all groups form a set _ all;
using min _ sup as a minimum support threshold of the FP-Growth algorithm, using min _ conf as a minimum confidence threshold of the FP-Growth algorithm, and using the FP-Growth algorithm to perform relevance analysis on a set _ all set, thereby screening all relevant K item sets from cold data blocks in a set coldlist;
wherein, groupsize is a positive integer.
The method and the device utilize the FP-Growth algorithm to search the associated K item sets in the cold data blocks, can fully explore the association among the data blocks, and enable the data blocks in each associated K item set to have larger association.
Further, the frequency of occurrence of the associated K item set is that the associated K item set is grouped into w sets1~setwThe total number of occurrences in (c).
The invention provides reliable basis for maximizing the relevance among cold data blocks in the strip by grouping the executed read requests and taking the total times of the associated K item sets appearing in all the groups as the access frequency of the associated K item sets.
Further, in step (2), when the number of executed read requests reaches a garbage collection threshold K2, performing garbage collection, including:
(21) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, after the sorting is finished, taking the data blocks of the next m% as cold data blocks, taking the rest blocks as hot data blocks, and forming a set replayist by all the cold data blocks;
(22) traversing the set replayist, and if the traversed cold data blocks are stored in a copy mode, directly deleting all copies of the cold data blocks and writing the copies back to the storage device; if the traversed cold data blocks are stored in an erasure code mode, the cold data blocks are not operated;
(23) counting the number of cold data blocks in all the stripes in the memory, sequencing the stripes according to the sequence of the number of the cold data blocks from large to small, and forming a set stripe by the sequenced stripes;
(24) traversing the set stripist, if all data blocks in the traversed strip are cold data blocks, writing all the data blocks in the strip back to the storage device, and deleting the strip; if only part of the data blocks in the traversed stripe are cold data blocks, exchanging the hot data blocks with the cold data blocks in the other stripes, updating the check blocks of the stripe, writing all the data blocks in the stripe back to the storage device, and deleting the stripe; if the traversed stripe does not contain the cold data block, no operation is carried out;
wherein 0< m < 100.
The invention divides the data blocks into hot data blocks and cold data blocks according to the data access frequency, and takes the cold data blocks with lower access frequency as the data blocks to be eliminated, then eliminating the data blocks according to the sequence of cold data blocks stored in a copy mode, the strips containing the data blocks which are all the cold data blocks and the strips containing only partial cold data blocks, the memory space is recovered, the replacement of the data blocks is not involved in the garbage recovery process of the first two types of data blocks, because when the invention constructs the stripe, the data blocks with data relevance can be positioned in the same stripe, data partitions in the same band will most probably become cold data partitions and be eliminated at the same time, therefore, only a few data blocks in the stripe need to be replaced in the process of being eliminated, so that the replacement cost is greatly reduced, and the data access delay is effectively reduced.
According to a second aspect of the present invention, there is provided a data association-aware erasure code memory replacement device, comprising: the system comprises a loading module, a first judgment module, a garbage recovery module, a second judgment module, an erasure code filing module and a circulation control module;
the loading module is used for loading the data into the memory in blocks according to the sequence of the current read request, storing the data in a copy mode and triggering the first judgment module after the loading is finished;
the first judgment module is used for judging whether the number of executed read requests reaches a garbage recovery threshold value K2 or not, and if yes, the garbage recovery module is triggered; otherwise, triggering a second judging module;
the garbage collection module is used for performing garbage collection to eliminate data blocks with low access frequency in the memory and triggering the cycle control module after the garbage collection is finished;
the second judging module is used for judging whether the number of the executed read requests reaches an archiving threshold K1, and if so, the erasure code archiving module is triggered; otherwise, triggering a cycle control module;
the erasure code filing module is used for filing erasure codes according to the relevance among the data blocks, so that the relevant data blocks are positioned in the same strip, and after the erasure code filing is finished, the cyclic control module is triggered;
the circulation control module is used for judging whether the user request is executed completely, and if so, ending the operation; otherwise, acquiring the next unexecuted read request as the current read request, and triggering the loading module;
wherein 0< K1< K2.
According to a third aspect of the present invention, there is provided a memory system comprising: the invention provides a memory and an erasure code memory replacement device with data relevance perception.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) when erasure codes are filed, erasure code stripes are constructed based on the relevance among the data blocks, so that the relevant data blocks are positioned in the same stripe, and the data blocks with relevance are changed into cold data blocks at the same time with high probability, thereby increasing the probability that the data blocks in the same stripe are eliminated at the same time, reducing the updating expense caused by replacement after filing, and reducing the access delay of users.
(2) According to the method, after the associated K item sets in the cold data blocks are screened out through association analysis, the associated K item sets are traversed according to the descending order of the occurrence frequency, the associated K item sets which have higher occurrence frequency and in which the cold data blocks do not belong to other strips are preferentially utilized to form the strips, so that the association among the cold data blocks in the constructed erasure code strips is maximized, and the probability that the data blocks in the same strip are simultaneously eliminated is maximized.
(3) In the erasure code archiving process, the invention only utilizes the currently identified cold data blocks for archiving, so that the hot data blocks are still stored in a copy mode, and the cold data blocks are stored in an erasure code mode, thereby ensuring the fault tolerance and reducing the memory overhead while not reducing the system performance.
(4) According to the invention, for data blocks without relevance, K data blocks are sequentially taken to construct the stripe after the data blocks are sequenced according to the sequence of the timestamps, and the probability that the data blocks in the same stripe are eliminated simultaneously can be increased by utilizing the time locality principle of data access.
(5) According to the method and the device, the data blocks are eliminated according to the sequence of the cold data blocks stored in a copy mode, the stripes containing the cold data blocks and the stripes containing only partial cold data blocks, so that the replacement cost is greatly reduced, and the data access delay is effectively reduced.
Drawings
Fig. 1 is a flowchart of an erasure code memory replacement method for sensing data relevance according to an embodiment of the present invention;
fig. 2 is a flowchart of an erasure code archiving process according to an embodiment of the present invention;
FIG. 3 is a flow chart of garbage recycling according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating distribution of data blocks in a memory before archiving according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating distribution of data blocks in an archived memory according to an embodiment of the present invention;
fig. 6 is a schematic diagram of heat distribution of data blocks in each stripe before garbage collection according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating distribution of data blocks in a memory after garbage collection according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Before explaining the technical scheme of the invention in detail, the key technical terms related to the invention are briefly introduced as follows:
and (3) erasure code archiving: one inherent characteristic of data is that the popularity of data varies. For data with high popularity in a certain time period, a plurality of copies can be stored to improve the efficiency. As time progresses, the popularity of such data may become lower and, in order to conserve memory space, the data may be converted to be stored in erasure-coded form. Data stored in a multi-copy manner in a cluster is organized in an erasure code manner, a process called erasure code archiving.
FP-Growth algorithm: the FP-Growth algorithm is a classic correlation analysis algorithm and is used for searching implicit relations among different characteristics or articles from a large-scale data set. The FP-Growth algorithm has two identification parameters, namely support and confidence, and is considered as an associated sequence only when the support and the confidence of the sequence are both greater than a set support threshold (min _ sup) and a set confidence threshold (min _ conf), wherein one sequence comprises a plurality of elements;
the support degree is as follows: the transaction containing A and B accounts for the proportion of all transactions, namely the times of the common occurrence of the elements A and B; for example, if there are 5 shopping records and milk and bread appear in two shopping records, the support of < milk, bread > is 2/5 ═ 0.4;
confidence coefficient: the proportion of the transactions containing B in the transactions containing A, namely the proportion of the transactions containing A and B accounts for all the transactions containing A; for example, if there are 5 shopping records, milk has appeared in 3 of them, and < milk, bread > has appeared in 2 of them, then the confidence of < milk, bread > is 2/3 ═ 0.66;
strip: a stripe is a whole composed of a plurality of data blocks, and the information set of the failure data can be independently recovered.
In order to reduce the update overhead caused by replacement after filing and reduce the user access delay, the erasure code memory replacement method for sensing the data relevance provided by the invention, as shown in fig. 1, includes:
(1) loading the data into the memory in blocks according to the sequence of the current read request, and storing the data in a copy mode;
the specific copy storage mode can be determined according to the actual space efficiency and the access performance, the more the number of the copies is, the higher the parallelism is, the better the access performance is, but the lower the space efficiency is; conversely, the smaller the number of copies, the higher the space efficiency, but the worse the access performance;
without loss of generality, in the embodiment of the invention, a double-copy storage method is specifically adopted;
(2) judging whether the number of executed read requests reaches a garbage collection threshold value K2, if so, performing garbage collection to eliminate data blocks with low access frequency in the memory, and after the garbage collection is finished, turning to the step (4); otherwise, turning to the step (3);
(3) judging whether the number of executed read requests reaches an archiving threshold K1, if so, performing erasure code archiving according to the relevance among the data blocks so that the relevant data blocks are positioned in the same strip, and after the erasure code archiving is finished, turning to the step (4); otherwise, directly switching to the step (4);
the multiple data blocks with similar semantics and access characteristics can be called associated data blocks, and the associated data blocks can be obtained through analysis by an associated algorithm; in the embodiment, associated data (namely associated data blocks) are organized into the same erasure code stripe, and since the access trends of the associated data blocks are the same, the associated data blocks are likely to be cooled together, so that the probability that the data blocks in the stripe are eliminated simultaneously is increased, the updating overhead is reduced, and the user access delay is reduced;
(4) judging whether the user request is executed completely, if so, finishing the operation; otherwise, acquiring the next unexecuted read request as the current read request, and turning to the step (1);
wherein 0< K1< K2 ensures that archiving has been performed prior to garbage collection; generally, K2 is 2 or 3 times of K1.
In an optional implementation manner, in step (3) of the data association-aware erasure code memory replacement method, when the number of executed read requests reaches an archive threshold K1, erasure code archiving is performed according to the association between data chunks, so that the associated data chunks are located in the same stripe, as shown in fig. 2, specifically including:
(31) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, and after the sorting is finished, taking the data block with the highest access frequency of the first n% as a hot data block and taking the rest blocks as cold data blocks;
(32) screening out all cold data blocks which do not participate in forming the strip to form a set coldlist;
(33) screening all associated K item sets from the set colledlist through association analysis, sequencing all associated K item sets according to the sequence of the occurrence frequency from large to small, and forming a set allCklist by the sequenced associated K item sets;
optionally, in step (33), screening out all the associated K term sets from the set colledlist by association analysis, including:
dividing K1 user requests into w grouped sets according to group size groupsize1~setwAfter hot data blocks in each group are filtered out, all groups form a set _ all; the grouping size is a positive integer, the value of the grouping size is related to the number K of items in the associated K item set, and in order to avoid that the associated data blocks are excessively dispersed in different groups, the grouping size is much larger than K (for example, K is 4, and the grouping size is generally larger than or equal to 40); in the embodiment of the invention, the groupsize is set to be 100; w ═ K1/groupsize;
using min _ sup as a minimum support threshold of the FP-Growth algorithm, using min _ conf as a minimum confidence threshold of the FP-Growth algorithm, and using the FP-Growth algorithm to perform relevance analysis on a set _ all set, thereby screening all relevant K item sets from cold data blocks in a set coldlist; in a specific implementation, the relevance of data blocks in a set colledlist is analyzed by calling a FP-Growth (min _ sup, min _ conf, set _ all) function;
it should be noted that the FP-Growth algorithm is only one optional association algorithm of the present invention, and the description related to the association analysis above should not be understood as the only limitation of the present invention, and other association algorithms capable of completing the association analysis may also be applied to the present invention; in the embodiment, the FP-Growth algorithm is used for searching the associated K item sets in the cold data blocks, so that the association among the data blocks can be sufficiently discovered, and the data blocks in each associated K item set have larger association;
accordingly, the frequency of occurrence of the set of associated K items is that the set of associated K items are grouped into w sets1~setwThe total number of occurrences in (a); the occurrence frequency of each associated K item set is calculated in this way, and a reliable basis is provided for maximizing the association among cold data blocks in the strip;
(34) traversing the set allCklist, and if all cold data blocks in the traversed associated K item set do not participate in forming a stripe, forming a stripe by using the cold data blocks in the associated K item set; otherwise, not operating the associated K item set;
(35) after the set allCklist traversal is finished, cold data blocks which participate in the formation of the strip are removed from the set coldlist, and the strip is formed by utilizing the residual cold data blocks in the set coldlist;
optionally, in step (35), forming a stripe by using the remaining cold data blocks in the set colledlist, including:
sequencing the residual cold data blocks in the set colledlist according to the sequence of the timestamps from small to large, and after the sequencing is finished, sequentially taking out K cold data blocks from the set colledlist and forming a strip until all the cold data blocks in the set colledlist are positioned in the strip; therefore, the probability that the data blocks in the same strip are eliminated simultaneously can be increased by utilizing the time locality principle of data access;
wherein, n is more than 0 and less than 100, n is generally 20, namely the first 20% of data blocks with the highest visit frequency are taken as hot data blocks, the last 80% are cold data blocks, namely the proportion of the hot and cold data blocks generally obeys twenty-eight distribution; k represents the number of data blocks contained in one stripe; each associated K item set is composed of K cold data blocks, the support degree support and the confidence coefficient meet the condition that support is not less than min _ sup, the confidence is not less than min _ conf, and min _ sup and min _ conf are respectively preset threshold values.
In the embodiment, the stripe is formed by preferentially utilizing the associated K item sets which have high occurrence frequency and in which the cold data blocks do not belong to other stripes, so that the association among the cold data blocks in the constructed erasure code stripe is maximized, and the probability that the data blocks in the same stripe are eliminated simultaneously is maximized; the hot data blocks are still stored in a copy mode, and the cold data blocks are stored in an erasure code mode, so that the memory overhead is reduced while the fault tolerance is ensured and the system performance is not reduced.
In an optional implementation manner, in step (2) of the erasure code memory replacement method based on data relevance awareness, when the number of executed read requests reaches a garbage collection threshold K2, performing garbage collection, as shown in fig. 3, specifically includes:
(21) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, after the sorting is finished, taking the data blocks of the next m% as cold data blocks, taking the rest blocks as hot data blocks, and forming a set replayist by all the cold data blocks;
(22) set replacelist is traversed, if the traversed cold data blocks are stored in a copy mode, all copies of the cold data blocks are deleted directly and written back to storage equipment, and replacement operation of the data blocks is not involved in the process; if the traversed cold data blocks are stored in an erasure code mode, the cold data blocks are not operated;
in the embodiment of the present invention, the storage device is specifically a magnetic disk;
(23) counting the number of cold data blocks in all the stripes in the memory, sequencing the stripes according to the sequence of the number of the cold data blocks from large to small, and forming a set stripe by the sequenced stripes;
(24) traversing the set stripist, if all data blocks in the traversed strip are cold data blocks, writing all the data blocks in the strip back to the storage device, and deleting the strip, wherein the replacement operation of the data blocks is not involved in the process; if only part of the data blocks in the traversed stripe are cold data blocks, exchanging the hot data blocks with the cold data blocks in the other stripes, updating the check blocks of the stripe, writing all the data blocks in the stripe back to the storage device, and deleting the stripe; if the traversed stripe does not contain the cold data block, no operation is carried out;
wherein 0< m < 100; in the garbage collection stage, the proportion of cold data blocks, i.e. the proportion of obsolete data blocks, i.e. the proportion of recycled memory space, different proportions may result in different replacement overheads, and m is usually set to 10, 20, 30, 40, etc. for comparison tests.
The invention divides the data blocks into hot data blocks and cold data blocks according to the data access frequency, and takes the cold data blocks with lower access frequency as the data blocks to be eliminated, then eliminating the data blocks according to the sequence of cold data blocks stored in a copy mode, the strips containing the data blocks which are all the cold data blocks and the strips containing only partial cold data blocks, the memory space is recovered, the replacement of the data blocks is not involved in the garbage recovery process of the first two types of data blocks, because when the invention constructs the stripe, the data blocks with data relevance can be positioned in the same stripe, data partitions in the same band will most probably become cold data partitions and be eliminated at the same time, therefore, only a few data blocks in the stripe need to be replaced in the process of being eliminated, so that the replacement cost is greatly reduced, and the data access delay is effectively reduced.
Corresponding to the erasure code memory replacement method for sensing data relevance, the invention also provides erasure code memory replacement equipment for sensing data relevance, which comprises the following steps: the system comprises a loading module, a first judgment module, a garbage recovery module, a second judgment module, an erasure code filing module and a circulation control module;
the loading module is used for loading the data into the memory in blocks according to the sequence of the current read request, storing the data in a copy mode and triggering the first judgment module after the loading is finished;
the first judgment module is used for judging whether the number of executed read requests reaches a garbage recovery threshold value K2 or not, and if yes, the garbage recovery module is triggered; otherwise, triggering a second judging module;
the garbage collection module is used for performing garbage collection to eliminate data blocks with low access frequency in the memory and triggering the cycle control module after the garbage collection is finished;
the second judging module is used for judging whether the number of the executed read requests reaches an archiving threshold K1, and if so, the erasure code archiving module is triggered; otherwise, triggering a cycle control module;
the erasure code filing module is used for filing erasure codes according to the relevance among the data blocks, so that the relevant data blocks are positioned in the same strip, and after the erasure code filing is finished, the cyclic control module is triggered;
the circulation control module is used for judging whether the user request is executed completely, and if so, ending the operation; otherwise, acquiring the next unexecuted read request as the current read request, and triggering the loading module;
wherein 0< K1< K2;
in the embodiments of the present invention, the detailed implementation of each module may refer to the description of the method embodiments, and will not be repeated here.
The invention also provides an erasure code memory replacement device based on the data relevance perception, and a memory system comprises: and the memory and the erasure code memory replacement equipment for sensing the data relevance.
Application example:
data blocks in the memory are stored in an RS (4,3) coding or double-copy mode, a stripe obtained by the RS (4,3) coding is composed of 3 data blocks and 1 check block, the ith (i belongs to {1,2,3}) data block is marked as Di, the jth (j belongs to {1}) check block is marked as Pj, and the duplicate data block is marked by the name of the data block.
Managing the memory according to the erasure code memory replacement method of the data relevance perception; loading data blocks into the memory according to the read request sequence, and storing the data blocks in a double-copy mode, wherein the data blocks are stored in the memory in blocks as shown in fig. 4, D1 is randomly stored in the nodes Node1 and Node4, D2 is randomly stored in the nodes Node2 and Node4, and so on;
in the embodiment, since RS (4,3) is used for encoding, only 3 sets of associated items are concerned; after erasure code filing is triggered, performing association analysis by using an FP-Growth algorithm, wherein the obtained association 3 item sets are respectively < D1, D2, D3>, < D4, D5, D6> and < D7, D8 and D9 >; sorting the associated 3 item sets according to the sequence of the appearance frequencies from large to small, traversing the 3 associated 3 item sets if the appearance frequencies of the associated 3 item sets < D1, D2, D3>, < D4, D5, D6> and < D7, D8 and D9> are reduced in sequence after the sorting, judging whether all data blocks in the traversed associated 3 item sets do not participate in forming the stripe, and forming the current associated 3 item sets into the stripe if the data blocks in the traversed associated 3 item sets do not participate in forming the stripe; otherwise, the current set of associated 3 items does not constitute a stripe.
In the present application example, as shown in fig. 5, three strips are constructed in total by using the strips composed of the associated 3 item sets, wherein the strip 1 is constructed by D1, D2 and D3, the strip 2 is constructed by D4, D5 and D6, and the strip 3 is constructed by D7, D8 and D9.
After garbage recovery is triggered, the heat distribution condition of data blocks in each strip is shown in fig. 6, and the number of cold data blocks in a strip 1, a strip 2 and a strip 3 is 1,2 and 3 in sequence; preferentially traversing the strip 3, wherein all the data blocks are cold data blocks, directly writing the data blocks in the strip back to a disk, and deleting the strip; traversing the stripe 2, wherein only partial data blocks are cold data blocks, exchanging the hot data block D6 with the cold data block D3 in the stripe 1, updating the check block of the stripe 1, writing the data block in the stripe 2 back to a disk, and deleting the stripe; finally, stripe 1 is traversed, where no cold data block is included, no operation is performed on it, and stripe 1 remains in memory. After garbage collection is finished, the distribution of data blocks in the memory is shown in fig. 7.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A data relevance perception erasure code memory replacement method is characterized by comprising the following steps:
(1) loading the data into the memory in blocks according to the sequence of the current read request, and storing the data in a copy mode;
(2) judging whether the number of executed read requests reaches a garbage collection threshold value K2, if so, performing garbage collection to eliminate data blocks with low access frequency in the memory, and after the garbage collection is finished, turning to the step (4); otherwise, turning to the step (3);
(3) judging whether the number of executed read requests reaches an archiving threshold K1, if so, performing erasure code archiving according to the relevance among the data blocks so that the relevant data blocks are positioned in the same strip, and after the erasure code archiving is finished, turning to the step (4); otherwise, directly switching to the step (4);
(4) judging whether the user request is executed completely, if so, finishing the operation; otherwise, acquiring the next unexecuted read request as the current read request, and turning to the step (1);
wherein 0< K1< K2;
in the step (3), when the number of executed read requests reaches an archive threshold K1, performing erasure code archiving according to the association between data chunks so that the associated data chunks are located in the same stripe includes:
(31) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, and after the sorting is finished, taking the data block with the highest access frequency of the first n% as a hot data block and taking the rest blocks as cold data blocks;
(32) screening out all cold data blocks which do not participate in forming the strip to form a set coldlist;
(33) screening all associated K item sets from the set colledlist through association analysis, sequencing all associated K item sets according to the sequence of the occurrence frequency from large to small, and forming a set allCklist by the sequenced associated K item sets;
(34) traversing the set allCklist, and if all cold data blocks in the traversed associated K item set do not participate in forming a stripe, forming a stripe by using the cold data blocks in the associated K item set; otherwise, not operating the associated K item set;
(35) after the set allCklist is traversed, cold data blocks which participate in forming the strip are removed from the set coldlist, and the strip is formed by utilizing the residual cold data blocks in the set coldlist;
wherein 0< n < 100; k represents the number of data blocks contained in one stripe; each associated K item set is composed of K cold data blocks, the support degree support and the confidence coefficient meet the condition that the support is not less than min _ sup, the confidence is not less than min _ conf, and the min _ sup and the min _ conf are respectively preset threshold values.
2. The method as claimed in claim 1, wherein said step (35) of forming stripes by using the remaining cold data blocks in the set colledlist comprises:
and sequencing the residual cold data blocks in the set colledlist according to the sequence of the timestamps from small to large, and after the sequencing is finished, sequentially taking out K cold data blocks from the set colledlist and forming a strip until all the cold data blocks in the set colledlist are positioned in the strip.
3. The method as claimed in claim 1, wherein in step (33), all the sets of associated K entries are screened from the set colledlist through association analysis, which includes:
dividing K1 user requests into w grouped sets according to group size groupsize1~setwAfter hot data blocks in each group are filtered out, all groups form a set _ all;
using min _ sup as a minimum support threshold of the FP-Growth algorithm, using min _ conf as a minimum confidence threshold of the FP-Growth algorithm, and performing relevance analysis on the set _ all by using the FP-Growth algorithm, thereby screening out all relevant K item sets from cold data blocks in the set coldlist;
wherein, groupsize is a positive integer.
4. The method as claimed in claim 3, wherein the frequency of occurrence of the associated K sets is such that the associated K sets are grouped into w sets1~setwThe total number of occurrences in (c).
5. The data association-aware erasure code memory replacement method according to any one of claims 1-4, wherein in the step (2), when the number of executed read requests reaches a garbage collection threshold K2, performing garbage collection includes:
(21) sorting the data blocks in the memory according to the sequence of the access frequency from large to small, after the sorting is finished, taking the data blocks of the next m% as cold data blocks, taking the rest data blocks as hot data blocks, and forming a set replayist by all the cold data blocks;
(22) traversing the set replacelist, and if the traversed cold data blocks are stored in a copy mode, directly deleting all copies of the cold data blocks and writing the copies back to the storage device; if the traversed cold data blocks are stored in an erasure code mode, the cold data blocks are not operated;
(23) counting the number of cold data blocks in all the stripes in the memory, sequencing the stripes according to the sequence of the number of the cold data blocks from large to small, and forming a set stripe by the sequenced stripes;
(24) traversing the set stripist, if all data blocks in the traversed strip are cold data blocks, writing all the data blocks in the strip back to the storage device, and deleting the strip; if only part of the data blocks in the traversed stripe are cold data blocks, exchanging the hot data blocks with the cold data blocks in the other stripes, updating the check blocks of the stripe, writing all the data blocks in the stripe back to the storage device, and deleting the stripe; if the traversed stripe does not contain the cold data block, no operation is carried out;
wherein 0< m < 100.
6. A data association aware erasure code memory replacement device, comprising: the system comprises a loading module, a first judgment module, a garbage recovery module, a second judgment module, an erasure code filing module and a circulation control module;
the loading module is used for loading the data into the memory in blocks according to the sequence of the current read request, storing the data in a copy mode, and triggering the first judgment module after the loading is finished;
the first judgment module is configured to judge whether the number of executed read requests reaches a garbage collection threshold K2, and if yes, trigger the garbage collection module; otherwise, triggering the second decision module;
the garbage collection module is used for performing garbage collection to eliminate data blocks with low access frequency in the memory and triggering the cycle control module after the garbage collection is finished;
the second determining module is configured to determine whether the number of executed read requests reaches an archive threshold K1, and if yes, trigger the erasure code archiving module; otherwise, triggering the circulation control module;
the erasure code filing module is used for filing erasure codes according to the relevance among the data blocks, so that the relevant data blocks are positioned in the same strip, and after the erasure code filing is finished, the cyclic control module is triggered;
the circulation control module is used for judging whether the user request is executed completely, and if yes, the operation is ended; otherwise, acquiring the next unexecuted read request as the current read request, and triggering the loading module;
wherein 0< K1< K2;
the erasure code archiving module performs erasure code archiving according to the relevance between the data partitions so that the associated data partitions are located in the same stripe, including:
sorting the data blocks in the memory according to the sequence of the access frequency from large to small, and after the sorting is finished, taking the data block with the highest access frequency of the first n% as a hot data block and taking the rest blocks as cold data blocks;
screening out all cold data blocks which do not participate in forming the strip to form a set coldlist;
screening all associated K item sets from the set colledlist through association analysis, sequencing all associated K item sets according to the sequence of the occurrence frequency from large to small, and forming a set allCklist by the sequenced associated K item sets;
traversing the set allCklist, and if all cold data blocks in the traversed associated K item set do not participate in forming a stripe, forming a stripe by using the cold data blocks in the associated K item set; otherwise, not operating the associated K item set;
after the set allCklist is traversed, cold data blocks which participate in forming the strip are removed from the set coldlist, and the strip is formed by utilizing the residual cold data blocks in the set coldlist;
wherein 0< n < 100; k represents the number of data blocks contained in one stripe; each associated K item set is composed of K cold data blocks, the support degree support and the confidence coefficient meet the condition that the support is not less than min _ sup, the confidence is not less than min _ conf, and the min _ sup and the min _ conf are respectively preset threshold values.
7. A memory system, comprising: memory and the data dependency aware erasure code memory replacement device of claim 6.
CN202010196333.1A 2020-03-19 2020-03-19 Data relevance perception erasure code memory replacement method, equipment and memory system Active CN111444036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010196333.1A CN111444036B (en) 2020-03-19 2020-03-19 Data relevance perception erasure code memory replacement method, equipment and memory system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010196333.1A CN111444036B (en) 2020-03-19 2020-03-19 Data relevance perception erasure code memory replacement method, equipment and memory system

Publications (2)

Publication Number Publication Date
CN111444036A CN111444036A (en) 2020-07-24
CN111444036B true CN111444036B (en) 2021-04-20

Family

ID=71629463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010196333.1A Active CN111444036B (en) 2020-03-19 2020-03-19 Data relevance perception erasure code memory replacement method, equipment and memory system

Country Status (1)

Country Link
CN (1) CN111444036B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799872B (en) * 2021-02-19 2022-08-12 上海交通大学 Erasure code encoding method and device based on key value pair storage system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391759A (en) * 2014-11-11 2015-03-04 华中科技大学 Data archiving method for load sensing in erasure code storage
CN104657405A (en) * 2013-11-15 2015-05-27 国际商业机器公司 Priority based reliability mechanism for archived data
CN107273048A (en) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 A kind of method for writing data and device
CN107667363A (en) * 2015-06-26 2018-02-06 英特尔公司 Object-based storage cluster with plurality of optional data processing policy
CN109783016A (en) * 2018-12-25 2019-05-21 西安交通大学 A kind of elastic various dimensions redundancy approach in distributed memory system
CN110032338A (en) * 2019-03-20 2019-07-19 华中科技大学 A kind of data copy laying method and system towards correcting and eleting codes

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902628B (en) * 2012-09-18 2016-06-01 记忆科技(深圳)有限公司 A kind of cold and hot data automatic separation method, system and flash memory realized based on flash memory
US9501353B2 (en) * 2015-01-28 2016-11-22 Quantum Corporation Erasure code prioritization

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657405A (en) * 2013-11-15 2015-05-27 国际商业机器公司 Priority based reliability mechanism for archived data
CN104391759A (en) * 2014-11-11 2015-03-04 华中科技大学 Data archiving method for load sensing in erasure code storage
CN107667363A (en) * 2015-06-26 2018-02-06 英特尔公司 Object-based storage cluster with plurality of optional data processing policy
CN107273048A (en) * 2017-06-08 2017-10-20 浙江大华技术股份有限公司 A kind of method for writing data and device
CN109783016A (en) * 2018-12-25 2019-05-21 西安交通大学 A kind of elastic various dimensions redundancy approach in distributed memory system
CN110032338A (en) * 2019-03-20 2019-07-19 华中科技大学 A kind of data copy laying method and system towards correcting and eleting codes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An Erasure Coded Archival Storage System;Prateep Misra等;《2012 IEEE 18th International Conference on Parallel and Distributed Systems》;20130117;全文 *
面向纠删码存储集群的节点并发重构;黄建忠等;《计算机研究与发展》;20160915;全文 *

Also Published As

Publication number Publication date
CN111444036A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
JP7271670B2 (en) Data replication method, device, computer equipment and computer program
US7418544B2 (en) Method and system for log structured relational database objects
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
US8694472B2 (en) System and method for rebuilding indices for partitioned databases
CN102541757B (en) Write cache method, cache synchronization method and device
CN108139968B (en) Method and equipment for determining quantity of threads of garbage collector and managing activities
US8560500B2 (en) Method and system for removing rows from directory tables
CN110825748A (en) High-performance and easily-expandable key value storage method utilizing differential index mechanism
JP2003150418A (en) Storage device having means for obtaining static information of database management system
CN111026329B (en) Key value storage system based on host management tile record disk and data processing method
CN102831222A (en) Differential compression method based on data de-duplication
CN111475507B (en) Key value data indexing method for workload adaptive single-layer LSMT
WO2022007937A1 (en) Method and device for processing bitmap data
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
CN111444036B (en) Data relevance perception erasure code memory replacement method, equipment and memory system
US7725450B1 (en) Integrated search engine devices having pipelined search and tree maintenance sub-engines therein that maintain search coherence during multi-cycle update operations
CN113867627B (en) Storage system performance optimization method and system
CN112799590B (en) Differentiated caching method for online main storage deduplication
CN107133334B (en) Data synchronization method based on high-bandwidth storage system
US20230418827A1 (en) Processing multi-column streams during query execution via a database system
CN112799597A (en) Hierarchical storage fault-tolerant method for stream data processing
CN112463795A (en) Dynamic hash method, device, equipment and storage medium
CN116467267A (en) Garbage recycling method, device, storage medium and system
KR101419428B1 (en) Apparatus for logging and recovering transactions in database installed in a mobile environment and method thereof
CN108021678A (en) A kind of compact-sized key-value pair storage organization and quick key-value pair lookup method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant