CN115793984A - Data storage method and device, computer equipment and storage medium - Google Patents
Data storage method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN115793984A CN115793984A CN202310001205.0A CN202310001205A CN115793984A CN 115793984 A CN115793984 A CN 115793984A CN 202310001205 A CN202310001205 A CN 202310001205A CN 115793984 A CN115793984 A CN 115793984A
- Authority
- CN
- China
- Prior art keywords
- data
- lrc
- encoding
- switching
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013500 data storage Methods 0.000 title claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 28
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 7
- 230000000737 periodic effect Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 10
- 230000015556 catabolic process Effects 0.000 description 6
- 238000006731 degradation reaction Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Error Detection And Correction (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a data storage method, a data storage device, computer equipment and a storage medium, wherein the method comprises the following steps: checking a storage system data mark according to a period, wherein the storage system data mark comprises cold data and hot data, and the storage system is configured to store the data marked as the cold data based on a first code and store the data marked as the hot data based on a second code; and responding to the detected data mark triggering mark switching condition, switching the detected data mark, and correspondingly switching the coding mode to store corresponding data based on the switched coding mode. By the scheme of the invention, the overall degraded read delay and storage overhead of the storage system are reduced.
Description
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a data storage method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of communication technology and network technology, the digital information is exponentially and explosively increased, and the data storage technology is also greatly challenged. The reliability of data in memory systems and the power consumption of memory systems are of increasing concern. Now facing such a huge data scale, the reliability of data in a storage system is inversely proportional to the number of components contained in the storage system, i.e. the greater the number of components of the storage system, the lower the reliability of data in the storage system. According to the related research, about 30 disks are damaged in an internet data center consisting of 600 disks each month, and the data reliability reduction caused by the disk failure is a serious problem in a large-scale storage system, and researches on related fault-tolerant technologies are carried out.
Erasure Code (Erasure Code) belongs to a forward error correction technique in the coding theory, and is applied to the communication field for the first time to solve the problems of loss and loss in data transmission. Erasure coding techniques have been introduced into the storage area because of their superior effectiveness in preventing data loss. Erasure codes can effectively reduce storage overhead while ensuring the same reliability, and thus erasure code techniques are widely applied to large storage systems and data centers.
Erasure Coding (EC) is a method of data protection that partitions data into fragments, expands, encodes, and stores redundant data in different locations, such as disks, storage nodes, or other geographic locations. The original data is divided into k data blocks, m coding blocks are generated according to a coding matrix, and n (n = k + m) blocks are distributed on different servers. Only k blocks are needed to recover the original data.
Most storage systems using erasure codes store data using only one erasure code. However, it is difficult to reduce the degraded read latency while keeping the storage overhead low by using only one erasure code.
Disclosure of Invention
In view of the above, the present invention provides a data storage method, an apparatus, a computer device, and a storage medium, which store data in multiple encoding manners, and select an appropriate encoding for the characteristics of the data to reduce the degraded read latency while keeping the storage overhead low.
Based on the above object, an aspect of the embodiments of the present invention provides a data storage method, which specifically includes the following steps:
checking a storage system data mark according to a period, wherein the storage system data mark comprises cold data and hot data, and the storage system is configured to store the data marked as the cold data based on a first code and store the data marked as the hot data based on a second code;
and responding to the detected data mark triggering mark switching condition, switching the detected data mark, and correspondingly switching the coding mode to store corresponding data based on the switched coding mode.
In some embodiments, the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
In some embodiments, the RS codes as (k) RS ,g RS ) RS encoding, the LRC encoding is (k) LRC ,l LRC ,g LRC ) LRC coding, wherein k RS Represents the number of data blocks in the RS coding and g RS Representing the number of global parity blocks in RS coding, by k LRC Representing the number of data blocks, l, in said LRC coding LRC Representing the number of local check blocks in said LRC encoding, g LRC Represents the number of global check blocks in the LRC coding, wherein k RS =k LRC ,g RS =g LRC +1。
In some embodiments, in response to the detected data flag triggering a flag switching condition, switching the detected data flag, and correspondingly switching the encoding mode includes:
responding to the detected data mark as hot data, and judging whether the access times of the data reach a first threshold value;
and responding to the condition that the access times of the data do not reach the first threshold value, switching the data mark from the hot data to the cold data, and switching the coding mode of the data from the second code storage to the first code.
In some embodiments, in response to the detected data flag triggering a flag switching condition, switching the detected data flag, and correspondingly switching the encoding mode includes:
responding to the detected data mark as cold data, and judging whether the access times of the data reach a second threshold value;
and responding to the access times of the data reaching the second threshold value, switching the data mark from cold data to hot data, and switching the encoding mode of the data from the first encoding to the second encoding.
In some embodiments, switching the encoding of the data from the second encoding store to the first encoding comprises:
and switching the coding mode of the data from the second coding storage to the first coding based on a first coding switching algorithm.
In some embodiments, switching the encoding of the data from the first encoding to the second encoding comprises:
and switching the coding mode of the data from the first coding to the second coding based on a second coding switching algorithm.
In some embodiments, switching the encoding of the data from the second encoding store to the first encoding based on the first encoding switching algorithm comprises:
obtaining the RS-encoded data block based on the LRC-encoded data block;
and obtaining the global check block of the RS code based on the global check block and the local check block of the LRC code.
In some embodiments, deriving the RS-encoded data block based on the LRC-encoded data block comprises:
and acquiring the LRC coded data block, and taking the acquired data block as the RS coded data block.
In some embodiments, obtaining the RS-encoded global parity chunks based on the LRC-encoded global parity chunks and the local parity chunks comprises:
g encoding the LRC LRC A global check block as g of the RS code RS -1 global parity chunk;
l encoding the LRC LRC And carrying out XOR calculation on the local check blocks, and taking the XOR calculation result as the last global check block of the RS code.
In some embodiments, switching the encoding of the data from the first encoding to the second encoding based on the second encoding switching algorithm comprises:
obtaining the LRC coded data block based on the RS coded data block;
obtaining a global check block of the LRC code based on the global check block of the RS code;
and obtaining the local check block of the LRC code based on the data block of the RS code and the last global check block.
In some embodiments, deriving the LRC encoded data block based on the RS encoded data block comprises:
and acquiring the RS-coded data block, and taking the acquired data block as the LRC-coded data block.
In some embodiments, obtaining the LRC encoded global parity chunks based on the RS encoded global parity chunks comprises:
g encoding the RS RS -1 global parity chunk as g of said LRC encoding LRC And a global parity block.
In some embodiments, deriving the LRC encoded local parity block based on the RS encoded data block and the last global parity block comprises:
obtaining the last l of the LRC code based on the RS coded data block LRC -1 local check block and the last global check block of the RS code and the last l of the LRC code LRC And carrying out exclusive or on data blocks corresponding to the 1 local check block to obtain a 1 st local check block.
In some embodiments, the last l of the LRC coding is derived based on the RS-coded data block LRC -the 1 local parity chunks comprise:
post-k based on the RS encoding RS /l LRC *(l LRC -1) data blocks get the last l of the LRC coding LRC 1 local parity chunk.
In some embodiments, the last global parity block of the RS code is represented by k RS The data blocks are obtained by XOR calculation.
In another aspect of the embodiments of the present invention, there is also provided a data storage device, including:
a checking module configured to check a storage system data flag on a periodic basis, wherein the storage system data flag includes cold data and hot data, and the storage system is configured to store the data marked as cold data based on a first encoding and store the data marked as hot data based on a second encoding;
and the switching module is configured to respond to the detected data mark triggering mark switching condition, switch the detected data mark and correspondingly switch the coding mode to store corresponding data based on the switched coding mode.
In some embodiments, the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has at least the following beneficial technical effects: by checking a storage system data tag according to a period, wherein the storage system data tag comprises cold data and hot data, the storage system is configured to store the data tagged as the cold data based on a first code and store the data tagged as the hot data based on a second code; and responding to the detected data mark triggering mark switching condition, switching the detected data mark, and correspondingly switching the coding mode to store the corresponding data based on the switched coding mode, thereby reducing the overall degradation reading delay and the storage overhead of the storage system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of an embodiment of a data storage method provided by the present invention;
FIG. 2 is a schematic diagram of an RS code based on Van der Waals matrix according to the present invention;
FIG. 3 is a schematic diagram of Coxix matrix-based RS encoding;
FIG. 4 is a diagram of RS encoded recovery data;
FIG. 5 is a diagram of an embodiment of an RS encoding (6,3) according to the present invention;
FIG. 6 is a block diagram of one embodiment of a (6,3,2) LRC encoding provided by the present invention;
FIG. 7 is a diagram of an embodiment of a (6,3,2) LRC encoding to (6,3) RS encoding switch provided by the present invention;
FIG. 8 is a diagram of an embodiment of switching from (6,3) RS encoding to (6,3,2) LRC encoding provided by the present invention;
FIG. 9 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an embodiment of a computer device provided in the present invention;
fig. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and no description is given in the following embodiments.
In view of the above object, a first aspect of the embodiments of the present invention proposes an embodiment of a data storage method. As shown in fig. 1, it includes the following steps:
s10, checking a storage system data mark according to a period, wherein the storage system data mark comprises cold data and hot data, and the storage system is configured to store the data marked as the cold data based on a first code and store the data marked as the hot data based on a second code;
and S20, responding to the detected data mark trigger mark switching condition, switching the detected data mark, and correspondingly switching the coding mode to store corresponding data based on the switched coding mode.
Specifically, the data stored in the storage system is periodically checked, and the data includes cold data and hot data. When the access times of a hot data within a certain time do not reach a set value, the hot data is marked as cold data; when the number of accesses to a cold data within a certain time reaches the set value, the cold data is marked as hot data. The switching of the cold and hot data flags also involves switching of the encoding, in this embodiment, cold data is stored based on a first encoding, hot data is stored based on a second encoding, and the first encoding and the second encoding are different types of erasure codes. The first encoding and the second encoding are selected based on the characteristics of the data, the access frequency of hot data is high, therefore, the second encoding adopts erasure codes with low recovery delay, the access frequency of cold data is low, therefore, erasure codes with low storage overhead are selected, and therefore, the degradation read delay and the storage overhead of the storage system are reduced.
In one embodiment, the data may be marked as hot by default as it just entered the storage system. Data marked as hot data is stored based on the second encoding with low degradation read latency, and data marked as cold data is stored based on the first encoding with low storage overhead. The data stored in the storage system is periodically checked. When the access times of a hot data within a certain time do not reach a set value, the hot data is marked as cold data; when the number of accesses to a cold data within a certain time reaches the set value, the cold data is marked as hot data. The switching of the cold and hot data marks switches the corresponding codes of the cold and hot data marks at the same time, so that the degradation read delay and the storage overhead of the overall storage system are reduced.
In some embodiments, the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
Specifically, for the hot data with higher accessed frequency, low recovery delay (k, l, r-1) LRC coding is adopted; and (k, r) RS coding is adopted for cold data with lower accessed frequency, wherein k represents the number of data blocks, l represents the number of local check blocks, and r represents the number of global check blocks. Based on this, the memory system can tolerate the damage of r data blocks regardless of the hot and cold data.
RS coding (Reed-solomon codes), a channel coding for forward error correction, is associated with two parameters, k and r. Given two positive integers k and r, the RS code encodes k data blocks into r additional parity chunks, which may be formed based on van der waals or cauchy matrix encoding. As shown in fig. 2 and 3, RS codes based on vandermonde matrix and cauchy matrix, respectively. The upper k matrix corresponds to k original data blocks, and the lower r matrix corresponds to a coding matrix, which is obtained by correlating the coding matrix with original data D 1 To D k Multiplying to obtain newly added P 1 To P r The resulting r check data are encoded. When any r data are in error or lost in transmission and need to be corrected, the inverse matrix of the matrix corresponding to the residual data is multiplied by the data, and the original data block D is obtained 1 To D k . With D 1 To D r Data is lost and decoding is performed as an example, and the process is shown in fig. 4. The (k, r) RS code is MDS (Maximum Distance separable) code, and any r blocks can be recovered according to the remaining data blocks and check blocks when they fail.
LRC coding (Locxl Reconstruction Codes, a local erasure code), (k, l, r-1) LRC divides k data blocks into l local groups. It encodes l local parts, one local parity block per local group, and r global parity blocks. Any single data block failure can be decoded from the k/l segment in its local group. LRC coding can tolerate at most any fragment failure of r +1, and can also tolerate failures beyond r (up to l + r-1), provided that such information is theoretically decodable. Finally, the LRC provides lower storage overhead. The LRC requires the fewest check blocks among all codes that can decode a single data block failure from a k/l fragment and tolerate r failures. At the same time, LRC reduces reconstruction costs compared to RS codes. Therefore, the LRC has a lower recovery delay.
Therefore, the embodiment reduces the overall degraded read latency and storage overhead of the storage system by adopting the two different erasure codes for the strategy of mixed storage.
In one embodiment, since the switching of the encoding also requires a very high data transmission amount, an efficient encoding switching algorithm is configured to reduce the high data transmission amount and the calculation amount caused by the encoding switching. The embodiment of the invention provides (A) according to the properties of LRC coding and RS codingk,l,g-1) LRC coding andk,g) The high-efficiency switching algorithm of the RS codes ensures that the switching of the cold data and the hot data cannot become the bottleneck in the system, and reduces the data transmission quantity and the calculation quantity during code switching.
In some embodiments, the RS is encoded as (k) RS ,g RS ) RS encoding, the LRC encoding is (k) LRC ,l LRC ,g LRC ) LRC coding, wherein k RS Represents the number of data blocks in the RS encoding and g RS To representThe number of global check blocks in RS coding is represented by k LRC Representing the number of data blocks, l, in the LRC coding LRC Representing the number of local check blocks in said LRC encoding, g LRC Represents the number of global check blocks in the LRC coding, wherein k RS =k LRC ,g RS =g LRC +1。
Specifically, in order to ensure efficient switching between RS coding and LRC coding, RS coding and LRC coding need to satisfy the following conditions:
1)k RS =k LRC and isg RS =g LRC +1;
2) Of RSg RS The last of the global parity chunks is composed of all k RS XOR of the data blocks is obtained;
3) Of LRCg LRC Of a global check block with RSg RS 1 global parity chunk is consistent.
In some embodiments, in response to the detected data flag triggering a flag switching condition, switching the detected data flag, and correspondingly switching the encoding mode includes:
responding to the detected data mark as hot data, and judging whether the access times of the data reach a first threshold value;
and responding to the condition that the access times of the data do not reach the first threshold value, switching the data mark from the hot data to the cold data, and switching the coding mode of the data from the second code storage to the first code.
Specifically, in order to reduce the overall degradation read latency and storage overhead of the storage system, when the data tag is switched from hot data to cold data, the encoding mode for storing the data is switched accordingly.
In some embodiments, in response to the detected data flag triggering a flag switching condition, switching the detected data flag, and correspondingly switching the encoding mode includes:
responding to the detected data mark as cold data, and judging whether the access times of the data reach a second threshold value;
and responding to the access times of the data reaching the second threshold value, switching the data mark from cold data to hot data, and switching the encoding mode of the data from the first encoding to the second encoding.
Specifically, in order to reduce the overall degradation read latency and storage overhead of the storage system, when the data tag is switched from cold data to hot data, the encoding mode for storing the data is switched accordingly.
In some embodiments, switching the encoding of the data from the second encoding store to the first encoding comprises:
and switching the coding mode of the data from the second coding storage to the first coding based on a first coding switching algorithm.
In some embodiments, switching the encoding of the data from the first encoding to the second encoding comprises:
and switching the coding mode of the data from the first coding to the second coding based on a second coding switching algorithm.
In some embodiments, switching the encoding of the data from the second encoding store to the first encoding based on the first encoding switching algorithm comprises:
obtaining the RS encoded data block based on the LRC encoded data block;
and obtaining the global check block of the RS code based on the global check block and the local check block of the LRC code.
In some embodiments, deriving the RS-encoded data block based on the LRC-encoded data block comprises:
and acquiring the LRC coded data block, and taking the acquired data block as the RS coded data block.
In some embodiments, obtaining the RS-encoded global parity chunks based on the LRC-encoded global parity chunks and the local parity chunks comprises:
g encoding the LRC LRC A global check block as g of the RS code RS -1 global parity chunk;
l encoding the LRC LRC The local check blocks are subjected to exclusive OR calculation and are compared with each otherAnd taking the result of the exclusive or calculation as the last global check block of the RS code.
In some embodiments, switching the encoding of the data from the first encoding to the second encoding based on the second encoding switching algorithm comprises:
obtaining the LRC coded data block based on the RS coded data block;
obtaining a global check block of the LRC code based on the global check block of the RS code;
and obtaining the local check block of the LRC code based on the data block of the RS code and the last global check block.
In some embodiments, deriving the LRC encoded data block based on the RS encoded data block comprises:
and acquiring the RS coded data block, and taking the acquired data block as the LRC coded data block.
In some embodiments, obtaining the LRC encoded global parity chunks based on the RS encoded global parity chunks comprises:
g encoding the RS RS -1 global parity chunk as g of said LRC encoding LRC And a global check block.
In some embodiments, deriving the LRC encoded local parity block based on the RS encoded data block and the last global parity block comprises:
obtaining the last l of the LRC code based on the RS coded data block LRC -1 local check block and the last global check block of the RS code and the last l of the LRC code LRC And carrying out exclusive or on data blocks corresponding to the 1 local check block to obtain a 1 st local check block.
In some embodiments, the last l of the LRC coding is derived based on the RS-coded data block LRC -the 1 local parity chunks comprise:
post-k based on the RS encoding RS /l LRC *(l LRC -1) data blocks get the last l of the LRC coding LRC 1 local parity chunk.
In some embodiments, the RS encodingThe last global parity chunk of (1) is composed of RS The data blocks are obtained by XOR calculation.
In one embodiment, (k) is used for more frequently accessed thermal data LRC ,l LRC , g LRC ) LRC coding; for cold data that is accessed less frequently, adopt (k) RS ,g RS ) RS encoding, wherein k RS 、k LRC Denotes the number of data blocks, l LRC Indicates the number of local check blocks, g RS 、g LRC And indicates the number of global check blocks.
As can be seen from fig. 5 and 6, whenk RS =k LRC And isg RS =g LRC When +1 time and the third global redundancy block of the RS code is obtained by the xor of all k data blocks, the corresponding RS code and the LRC code can share the same RS code. Under this condition, the handover between RS coding and LRC coding is quite efficient, and the actual handover algorithm is described below. To avoid confusion, in this handover algorithm, note is takenk=k RS =k LRC And g =g RS =g LRC +1, i.e. the two codes used in the storage system are respectively (k,g) RS code andk,l,g-1)LRC。
s1: for thermal data with higher frequency of being accessed, adoptk,l,g-1) LRC coding; for cold data with lower frequency of access, adoptk,g) RS encoding, and satisfyk RS =k LRC Andg RS =g LRC +1 and RS encoding the last global redundant blockf g (x)Resulting from the exclusive or of all k data blocks.
S2: when the data mark is converted from hot data to cold data, the encoding mode is converted from LRC encoding to RS encoding, and the step is switched to S3; when the data mark is converted from cold data to hot data, the encoding mode is switched from RS encoding to LRC encoding, and the step is switched to S4.
S3: LRC coding is switched to RS coding fork,g) RS code and(k,l,g-1)LRC,parameter(s)k RS =k LRC Andg RS =g LRC +1. To (a)k,g) Last global redundancy block of RS codef g (x)Need to be (A)k,l,g-1) LRC encodedlThe local check blocks are obtained by XOR, and then (a)k,l,g-1) LRC encodedlAnd deleting the local check blocks. Examples are as follows:
as shown in fig. 7, for the (6,3,2) LRC code, when it needs to switch to the (6,3) RS code, the data of the second and third local check blocks are transmitted to the first local check block, and the xor is performed to obtain the dataf 3 (x)And then deleting the second and third local check blocks.
S4: RS encoding is switched to LRC encoding fork,g) RS code and (k,l,g-1) LRC code, parameterk =k RS =k LRC Andg RS =g LRC +1, therefore (k,g) Front of RS code global check blockg-1The global check blocks remain unchanged; wherein the following (l-1) The local check blocks are obtained by carrying out XOR on the corresponding data blocks, and the 1 st local check block is obtained by the last global check blockf g (x)And after (l-1) The local check blocks are obtained by performing exclusive or on all the corresponding data blocks. Examples are as follows:
as shown in fig. 8, when the (6,3) RS code is switched to the (6,3,2) LRC code, the first 2 global parity chunks remain unchanged; the last 2 local check blocks are obtained by carrying out XOR on the corresponding data blocks, and the 1 st local check block is obtained by the last global check blockf 3 (x)And the last 2 local check blocks are obtained by performing exclusive or on all the corresponding data blocks.
By the scheme, when the system needs to convert hot data into cold data, (A) and (B)k,l,g-1) LRC is switched tok,g) And (5) RS code. If a re-encoding algorithm is adopted, inAll data blocks need to be acquired before re-encoding and new redundant blocks are sent to each node after encoding, so at least transmission is needed (at leastk+g-1) blocks. If the switching algorithm of the embodiment of the invention is adopted, only (A) and (B) are requiredk,l,g-1) After LRC codingl-1The data transmission of each local check block is sent to the first local check block, and the data transmission quantity of the switching algorithm of the embodiment of the invention is (l-1) And (5) blocking.
When the system needs to convert cold data into hot data, (willk,g) RS code is switched tok,l,g-1) An LRC code. If a re-encoding algorithm is used, at least transmission is required (k+l+g-1) blocks. If the algorithm proposed herein is used, the transmission (, (b) is requiredk/l)*(l-1) ) blocks. When in usek=6,l=3And isgIf =3, the handover algorithm in the embodiment of the present invention requires a data transfer amount of 4, and the re-encoding requires 11 blocks of data transfer amounts.
By combining the two switching processes, the efficient switching algorithm provided by the invention has obvious optimization compared with a recoding algorithm. It is noted that, in general, since data is generally increasing in a system and data access is time-localized, the situation where hot data becomes cold data should be much more than the situation where cold data becomes hot data. And because the switching algorithm provided by the invention has obvious optimization effect when the hot data becomes cold, the overall performance optimization is quite remarkable.
In the aspect of computational efficiency, the algorithm provided by the invention only needs to carry out XOR (exclusive OR) operation, and the involved range is smaller than that of the whole group of data, so that the algorithm is obviously optimized compared with the Galois field operation which needs the whole data for recoding. Although the computing time is not the bottleneck of code switching, the computing efficiency is greatly improved, and the system can be effectively helped to save computing resources, so that more computing resources can be allocated for upper-layer practical application, and the overall performance of the system is further improved.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 9, an embodiment of the present invention also provides a data storage device, including:
a checking module 110, wherein the checking module 110 is configured to check a storage system data flag periodically, wherein the storage system data flag includes cold data and hot data, and the storage system is configured to store the data marked as cold data based on a first encoding and store the data marked as hot data based on a second encoding;
a switching module 120, where the switching module 120 is configured to respond to the detected data flag triggering flag switching condition, switch the detected data flag, and correspondingly switch the encoding mode to store corresponding data based on the switched encoding mode.
In some embodiments, the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 10, the embodiment of the present invention further provides a computer device 30, in which the computer device 30 comprises a processor 310 and a memory 320, the memory 320 stores a computer program 321 that can run on the processor, and the processor 310 executes the program to perform the steps of the above method.
The memory, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the data storage method in the embodiments of the present application. The processor executes various functional applications and data processing of the device by executing nonvolatile software programs, instructions and modules stored in the memory, namely, the data storage method of the above method embodiment.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 11, an embodiment of the present invention further provides a computer-readable storage medium 40, where the computer-readable storage medium 40 stores a computer program 410, which when executed by a processor, performs the above method.
Finally, it should be noted that, as understood by those skilled in the art, all or part of the processes in the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments corresponding thereto.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing are exemplary embodiments of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (20)
1. A method of storing data, comprising:
checking a storage system data mark according to a period, wherein the storage system data mark comprises cold data and hot data, and the storage system is configured to store the data marked as the cold data based on a first code and store the data marked as the hot data based on a second code;
and responding to the detected data mark triggering mark switching condition, switching the detected data mark, and correspondingly switching the coding mode to store corresponding data based on the switched coding mode.
2. The method of claim 1, wherein the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
3. The method of claim 2, wherein the RS code is (k) RS ,g RS ) RS encoding, the LRC encoding is (k) LRC ,l LRC ,g LRC ) LRC coding, wherein by k RS Represents the number of data blocks in the RS encoding and g RS Representing the number of global check blocks, k, in the RS code LRC Representing the number of data blocks, l, in the LRC coding LRC Representing the number of local check blocks in said LRC encoding, g LRC Represents the number of global check blocks in the LRC coding, wherein k RS =k LRC ,g RS =g LRC +1。
4. The method of claim 3, wherein switching the detected data flag in response to the detected data flag triggering a flag switching condition and switching the encoding scheme accordingly comprises:
responding to the detected data mark as hot data, and judging whether the access times of the data reach a first threshold value;
and responding to the condition that the access times of the data do not reach the first threshold value, switching the data mark from the hot data to the cold data, and switching the coding mode of the data from the second code storage to the first code.
5. The method of claim 3, wherein switching the detected data flag in response to the detected data flag triggering a flag switching condition, and correspondingly switching the encoding scheme comprises:
responding to the detected data mark as cold data, and judging whether the access times of the data reach a second threshold value;
and in response to the number of times of access of the data reaching the second threshold value, switching the data mark from cold data to hot data, and switching the encoding mode of the data from the first encoding to the second encoding.
6. The method of claim 4, wherein switching the encoding mode of the data from the second encoding storage to the first encoding comprises:
and switching the coding mode of the data from the second coding storage to the first coding based on a first coding switching algorithm.
7. The method of claim 5, wherein switching the encoding mode of the data from a first encoding to a second encoding comprises:
and switching the coding mode of the data from the first coding to the second coding based on a second coding switching algorithm.
8. The method of claim 6, wherein switching the encoding mode of the data from the second encoding storage to the first encoding based on the first encoding switching algorithm comprises:
obtaining the RS-encoded data block based on the LRC-encoded data block;
and obtaining the global check block of the RS code based on the global check block and the local check block of the LRC code.
9. The method of claim 8, wherein deriving the RS encoded data block based on the LRC encoded data block comprises:
and acquiring the LRC coded data block, and taking the acquired data block as the RS coded data block.
10. The method of claim 8, wherein obtaining the RS encoded global parity chunks based on the LRC encoded global parity chunks and the local parity chunks comprises:
g encoding the LRC LRC A global check block as g of the RS code RS -1 global parity chunk;
l encoding the LRC LRC And carrying out XOR calculation on the local check blocks, and taking the XOR calculation result as the last global check block of the RS code.
11. The method of claim 7, wherein switching the encoding mode of the data from the first encoding to the second encoding based on the second encoding switching algorithm comprises:
obtaining the LRC coded data block based on the RS coded data block;
obtaining a global check block of the LRC code based on the global check block of the RS code;
and obtaining the LRC coded local check block based on the RS coded data block and the last global check block.
12. The method of claim 11, wherein deriving the LRC encoded data block based on the RS encoded data block comprises:
and acquiring the RS coded data block, and taking the acquired data block as the LRC coded data block.
13. The method of claim 11, wherein deriving the LRC encoded global parity chunks based on the RS encoded global parity chunks comprises:
g encoding the RS RS -1 global parity chunk as g of said LRC encoding LRC And a global check block.
14. The method of claim 11, wherein obtaining the LRC encoded local parity block based on the RS encoded data block and a last global parity block comprises:
obtaining the last l of the LRC code based on the RS coded data block LRC -1 local check block and the last global check block of the RS code and the last l of the LRC code LRC And carrying out exclusive or on data blocks corresponding to the 1 local check block to obtain a 1 st local check block.
15. The method of claim 14, wherein the last l of the LRC coding is obtained based on the RS-coded data block LRC -the 1 local parity chunks comprise:
post-k based on the RS encoding RS /l LRC *(l LRC -1) data blocks get the last l of the LRC coding LRC 1 local parity chunk.
16. The method of claim 3, wherein the last global parity block of the RS encoding is represented by k RS The data blocks are obtained by XOR calculation.
17. A data storage device, comprising:
a checking module configured to check a storage system data flag on a periodic basis, wherein the storage system data flag includes cold data and hot data, and the storage system is configured to store the data marked as cold data based on a first encoding and store the data marked as hot data based on a second encoding;
and the switching module is configured to respond to the detected data mark triggering mark switching condition, switch the detected data mark and correspondingly switch the coding mode to store corresponding data based on the switched coding mode.
18. The apparatus of claim 17, wherein the first encoding comprises RS encoding and the second encoding comprises LRC encoding.
19. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1 to 16.
20. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 16.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310001205.0A CN115793984B (en) | 2023-01-03 | 2023-01-03 | Data storage method, device, computer equipment and storage medium |
PCT/CN2023/121857 WO2024146186A1 (en) | 2023-01-03 | 2023-09-27 | Data storage method and apparatus, and computer device and non-volatile readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310001205.0A CN115793984B (en) | 2023-01-03 | 2023-01-03 | Data storage method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115793984A true CN115793984A (en) | 2023-03-14 |
CN115793984B CN115793984B (en) | 2023-04-28 |
Family
ID=85428487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310001205.0A Active CN115793984B (en) | 2023-01-03 | 2023-01-03 | Data storage method, device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115793984B (en) |
WO (1) | WO2024146186A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024146186A1 (en) * | 2023-01-03 | 2024-07-11 | 苏州元脑智能科技有限公司 | Data storage method and apparatus, and computer device and non-volatile readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956128A (en) * | 2016-05-09 | 2016-09-21 | 南京大学 | Self-adaptive encoding storage fault-tolerant method based on simple regenerating code |
CN106100801A (en) * | 2016-08-29 | 2016-11-09 | 湖南大学 | A kind of non-homogeneous erasure code method of cloud storage system |
US20160380650A1 (en) * | 2015-06-26 | 2016-12-29 | Microsoft Technology Licensing, Llc | Flexible erasure coding with enhanced local protection group structures |
CN106649406A (en) * | 2015-11-04 | 2017-05-10 | 华为技术有限公司 | Method and device for storing file in self-adaption mode |
CN106844098A (en) * | 2016-12-29 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of fast data recovery method and system based on right-angled intersection erasure code |
CN107357685A (en) * | 2017-07-11 | 2017-11-17 | 清华大学 | A kind of Tolerate and redundance method and apparatus of data storage |
CN110764950A (en) * | 2019-10-31 | 2020-02-07 | 深圳信息职业技术学院 | Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code |
CN111149093A (en) * | 2018-09-03 | 2020-05-12 | 深圳花儿数据技术有限公司 | Data coding, decoding and repairing method of distributed storage system |
CN114253917A (en) * | 2021-12-06 | 2022-03-29 | 北京信息科技大学 | Distributed self-adaptive storage method and system based on file access characteristics |
CN114281270A (en) * | 2022-03-03 | 2022-04-05 | 山东云海国创云计算装备产业创新中心有限公司 | Data storage method, system, equipment and medium |
CN115454712A (en) * | 2022-11-11 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Check code recovery method, system, electronic equipment and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9595979B2 (en) * | 2015-01-20 | 2017-03-14 | International Business Machines Corporation | Multiple erasure codes for distributed storage |
CN115357425A (en) * | 2022-07-13 | 2022-11-18 | 阿里巴巴(中国)有限公司 | Code configuration conversion method, erasure code coding method, device and system |
CN115793984B (en) * | 2023-01-03 | 2023-04-28 | 苏州浪潮智能科技有限公司 | Data storage method, device, computer equipment and storage medium |
-
2023
- 2023-01-03 CN CN202310001205.0A patent/CN115793984B/en active Active
- 2023-09-27 WO PCT/CN2023/121857 patent/WO2024146186A1/en unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160380650A1 (en) * | 2015-06-26 | 2016-12-29 | Microsoft Technology Licensing, Llc | Flexible erasure coding with enhanced local protection group structures |
CN106649406A (en) * | 2015-11-04 | 2017-05-10 | 华为技术有限公司 | Method and device for storing file in self-adaption mode |
CN105956128A (en) * | 2016-05-09 | 2016-09-21 | 南京大学 | Self-adaptive encoding storage fault-tolerant method based on simple regenerating code |
CN106100801A (en) * | 2016-08-29 | 2016-11-09 | 湖南大学 | A kind of non-homogeneous erasure code method of cloud storage system |
CN106844098A (en) * | 2016-12-29 | 2017-06-13 | 中国科学院计算技术研究所 | A kind of fast data recovery method and system based on right-angled intersection erasure code |
CN107357685A (en) * | 2017-07-11 | 2017-11-17 | 清华大学 | A kind of Tolerate and redundance method and apparatus of data storage |
CN111149093A (en) * | 2018-09-03 | 2020-05-12 | 深圳花儿数据技术有限公司 | Data coding, decoding and repairing method of distributed storage system |
CN110764950A (en) * | 2019-10-31 | 2020-02-07 | 深圳信息职业技术学院 | Hybrid coding method, data restoration method and system based on RS (Reed-Solomon) code and regeneration code |
CN114253917A (en) * | 2021-12-06 | 2022-03-29 | 北京信息科技大学 | Distributed self-adaptive storage method and system based on file access characteristics |
CN114281270A (en) * | 2022-03-03 | 2022-04-05 | 山东云海国创云计算装备产业创新中心有限公司 | Data storage method, system, equipment and medium |
CN115454712A (en) * | 2022-11-11 | 2022-12-09 | 苏州浪潮智能科技有限公司 | Check code recovery method, system, electronic equipment and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024146186A1 (en) * | 2023-01-03 | 2024-07-11 | 苏州元脑智能科技有限公司 | Data storage method and apparatus, and computer device and non-volatile readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2024146186A1 (en) | 2024-07-11 |
CN115793984B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11531593B2 (en) | Data encoding, decoding and recovering method for a distributed storage system | |
CN108170555B (en) | Data recovery method and equipment | |
US9647698B2 (en) | Method for encoding MSR (minimum-storage regenerating) codes and repairing storage nodes | |
CN110178122B (en) | Data synchronous repair method of distributed storage system and storage medium | |
CN110532126B (en) | Method and device for rapidly recovering erasure code storage system data and storage medium | |
CN109491835B (en) | Data fault-tolerant method based on dynamic block code | |
CN105518996B (en) | A kind of data decoding method based on binary field reed-solomon code | |
CN112000512B (en) | Data restoration method and related device | |
CN114153651B (en) | Data encoding method, device, equipment and medium | |
CN111782152A (en) | Data storage method, data recovery device, server and storage medium | |
CN113297000A (en) | RAID (redundant array of independent disks) coding circuit and coding method | |
CN113687975A (en) | Data processing method, device, equipment and storage medium | |
WO2024146186A1 (en) | Data storage method and apparatus, and computer device and non-volatile readable storage medium | |
CN113391946B (en) | Coding and decoding method for erasure codes in distributed storage | |
CN111858169A (en) | Data recovery method, system and related components | |
WO2015180038A1 (en) | Partial replica code construction method and device, and data recovery method therefor | |
CN116501553B (en) | Data recovery method, device, system, electronic equipment and storage medium | |
KR101621752B1 (en) | Distributed Storage Apparatus using Locally Repairable Fractional Repetition Codes and Method thereof | |
CN102843212A (en) | Coding and decoding method and device | |
JP2021086289A (en) | Distributed storage system and parity update method of distributed storage system | |
CN110895497A (en) | Method and device for reducing erasure code repair in distributed storage | |
JP2022500918A (en) | Turbo product code decoding method, equipment, decoder and computer storage medium | |
CN108628697B (en) | Binary-based node repairing method and system | |
WO2017041233A1 (en) | Encoding and storage node repairing method for functional-repair regenerating code | |
CN115113816A (en) | Erasure code data processing system, method, computer device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |