CN115686382A - Data storage and reading method - Google Patents

Data storage and reading method Download PDF

Info

Publication number
CN115686382A
CN115686382A CN202211713250.0A CN202211713250A CN115686382A CN 115686382 A CN115686382 A CN 115686382A CN 202211713250 A CN202211713250 A CN 202211713250A CN 115686382 A CN115686382 A CN 115686382A
Authority
CN
China
Prior art keywords
data
block
reading
frequency
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211713250.0A
Other languages
Chinese (zh)
Other versions
CN115686382B (en
Inventor
王晓强
林振仪
古妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Whale Shark Data Technology Co ltd
Original Assignee
Nanjing Whale Shark Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Whale Shark Data Technology Co ltd filed Critical Nanjing Whale Shark Data Technology Co ltd
Priority to CN202211713250.0A priority Critical patent/CN115686382B/en
Publication of CN115686382A publication Critical patent/CN115686382A/en
Application granted granted Critical
Publication of CN115686382B publication Critical patent/CN115686382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data storage and reading method, which comprises the following steps: receiving the stored data and obtaining the type of the stored data; setting an initial frequency identifier of stored data; if the frequency is high, dividing the storage data into a plurality of data blocks, and storing a plurality of copies in different nodes of the first storage device by each data block; if the intermediate frequency is the frequency, dividing the stored data into a plurality of data blocks, generating a first check block, and storing the data blocks and the first check block on different nodes of the second storage device; if the frequency is low, dividing the storage data into a plurality of data blocks, compressing to obtain a plurality of compressed data blocks, generating a second check block, and storing the compressed data blocks and the second check block on different nodes of a second storage device; the read frequency of the stored data is monitored. The invention selects the most appropriate storage mode aiming at the data with different reading frequency requirements, and greatly reduces the storage cost on the premise of not reducing the access efficiency.

Description

Data storage and reading method
Technical Field
The invention relates to the technical field of data access, in particular to a data storage and reading method.
Background
As information technology has entered the data age, the dramatic increase in data volume has led to ever increasing storage systems
With the quantity and capacity, large distributed storage systems have been produced and developed rapidly. The distributed storage system adopts an expandable system structure, utilizes a plurality of storage servers to share storage load, utilizes the position server to position storage information, has the characteristics of higher storage reliability, usability, expandability and the like, and becomes the most main mode for storing data information at present.
At present, data of a distributed storage system is generally randomly distributed to a flash memory hard disk area or a magnetic hard disk area of a storage server for storage. However, the types of data are various, such as audio, video, pictures, text, and the like, the reading and writing speed and the reading and writing times required for each type of data are different, and the reading frequency of each type of data is also different and dynamically changes with time. Therefore, the random storage mode is not beneficial to improving the read-write speed and the storage space utilization rate of the storage server, and reasonable matching of the storage data and the storage resources cannot be realized.
Disclosure of Invention
In view of the above problems, the present invention provides a data storage and reading method, which solves the problems that the storage server has a slow read-write speed and a low utilization rate of storage space, and cannot reasonably match the storage data with the storage resources.
In order to solve the technical problems, the invention adopts the technical scheme that: a method of data storage comprising the steps of: receiving storage data, and acquiring the type of the storage data, wherein the type comprises audio and video, pictures and texts; setting initial frequency identification of stored data according to the type, wherein the frequency identification comprises high frequency, intermediate frequency and low frequency; if the frequency identification is high frequency, dividing the stored data into a plurality of data blocks, wherein each data block stores a plurality of copies on different nodes of the first storage device, and simultaneously stores a first mapping table in a memory, wherein the first mapping table is a mapping relation between the data block and the first storage device node; if the frequency identification is the intermediate frequency, dividing the stored data into a plurality of data blocks, generating a first check block according to the data blocks, storing the plurality of data blocks and the first check block on different nodes of a second storage device, and simultaneously storing a second mapping table in a memory, wherein the second mapping table is a mapping relation between the data blocks and the nodes of the first check block and the nodes of the second storage device; if the frequency identification is low frequency, dividing the stored data into a plurality of data blocks, compressing the plurality of data blocks to obtain a plurality of compressed data blocks, generating a second check block according to the compressed data blocks, storing the plurality of compressed data blocks and the second check block on different nodes of a second storage device, and generating a third mapping table in the memory, wherein the third mapping table is a mapping relation between the compressed data blocks and the second check block as well as the nodes of the second storage device; and monitoring the reading frequency of the stored data, changing the frequency identification of the stored data according to a set frequency threshold value, and correspondingly switching the data storage mode.
As a preferred scheme, in the process of storing the data block and/or the compressed data block in the storage device, the method further includes: and caching the data stored in the storage device, and sending the data when the data accumulation amount exceeds a set length threshold value.
Preferably, the calculation formula of the reading frequency is as follows:
Figure 100002_DEST_PATH_IMAGE001
in the above-mentioned formula, the compound has the following structure,
Figure 931525DEST_PATH_IMAGE002
for the purpose of the reading frequency of the data,
Figure 100002_DEST_PATH_IMAGE003
is the initial frequency of the data and,
Figure 754600DEST_PATH_IMAGE004
for the time of the last scan of the data,
Figure 100002_DEST_PATH_IMAGE005
for the time of the last access of the data,
Figure 778051DEST_PATH_IMAGE006
and m is the current scanning time of the data, and the number of data accesses is shown.
Preferably, the compressing the plurality of data blocks to obtain a plurality of compressed data blocks includes: counting the occurrence times of each data in each data block, and recording the data with the maximum occurrence probability and the corresponding probability value; judging whether the maximum probability value exceeds a set frequency threshold, if so, replacing the data with a marker and then outputting the data, and if not, normally outputting the data; and sequentially circulating until the compression of all the data blocks is completed.
As a preferred scheme, the generating of the first parity chunks according to the data chunks includes: dividing k data blocks into a groups, wherein each group comprises k/a data blocks, calculating a local check block in each group based on an encoding equation, and calculating r global check blocks from all the data blocks, wherein the encoding equation is an Van der Monte matrix or a Cauchy matrix.
As a preferred scheme, the generating a second parity block according to the compressed data blocks includes: b redundant blocks are calculated from all the compressed data blocks based on an encoding equation, which is either a vandermonde matrix or a cauchy matrix.
Preferably, the first storage device is a flash hard disk region, and the second storage device is a magnetic hard disk region.
The invention also provides a data reading method, which is applied to the data storage method stored in the first storage device or the second storage device, and comprises the following steps: acquiring stored data according to the reading instruction, and judging a frequency identifier of the stored data; if the frequency identification is high frequency, reading a data block or a data block copy of data stored on a first storage device node according to the first mapping table, and outputting a reading result after reading is finished; if the frequency identification is the intermediate frequency, reading a data block of data stored on a second storage device node according to the second mapping table, and outputting a reading result after the reading is finished; and if the frequency identification is low frequency, reading a compressed data block of the data stored on the second storage equipment node according to the third mapping table, decompressing after reading to obtain a data block, and splicing the data block to output a reading result.
Preferably, when the reading operation is performed, the method further includes judging to select the reconstructed file or directly read the source file according to the load degree of the node reading request, and if the load degree exceeds a set load threshold, selecting the reconstructed file, otherwise, directly reading the source file.
Preferably, the calculation formula of the load degree of the node read request is as follows:
Figure 100002_DEST_PATH_IMAGE007
Figure 271480DEST_PATH_IMAGE008
in the above formula, the first and second carbon atoms are,
Figure 100002_DEST_PATH_IMAGE009
as the load level of the jth data node,
Figure 354974DEST_PATH_IMAGE010
the load degree of the ith data block of the jth data node in the t time period, n is the number of the data blocks,
Figure 100002_DEST_PATH_IMAGE011
for the ith data block, O is a given value,
Figure 190644DEST_PATH_IMAGE012
for last calculation of distanceThe interval of time between the loads is,
Figure 100002_DEST_PATH_IMAGE013
is the ith data block of the jth data node
Figure 486627DEST_PATH_IMAGE014
The degree of load during the period of time,
Figure 100002_DEST_PATH_IMAGE015
is the length of the ith data block.
Compared with the prior art, the invention has the beneficial effects that: by distinguishing the types of the stored data and giving frequency identification according to the reading frequency of the data, the storage mode and the storage position of the stored data can be dynamically adjusted, the storage server adapts to the actual data access state, the effective utilization rate of the storage space of the storage server is improved, and the reasonable matching of the stored data and the storage resources is realized. Aiming at the data characteristics of different reading frequencies, different data protection modes are respectively adopted, when the data is read at high frequency, a plurality of copies are adopted for data backup, check calculation is not needed, the data reconstruction performance is good, the reading and writing speed is high, and the high-frequency data reading and writing requirements are met; when the data is read in the intermediate frequency, the local check block and the global check block are generated by adopting the coding equation, so that the quick reconstruction of the lost data can be realized, the storage cost is high, the copy mode is low, and the read-write requirement of the intermediate frequency data is met; when the data is read at low frequency, a coding equation is adopted to generate a redundant block, reconstruction of lost data is realized by using the redundant block, although the reconstruction speed is sacrificed, the storage cost is reduced, the storage cost is further reduced by a data block compression mode, and the read-write requirement of the low-frequency data is well adapted.
Drawings
The disclosure of the present invention is illustrated with reference to the accompanying drawings. It is to be understood that the drawings are designed solely for the purposes of illustration and not as a definition of the limits of the invention. In the drawings, like reference numerals are used to refer to like parts. Wherein:
FIG. 1 is a flow chart of a data storage method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data reading method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a process of generating a first parity chunk according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a process of generating a second parity chunk according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a process of reconstructing data based on a second parity chunk according to an embodiment of the present invention.
Detailed Description
It is easily understood that, according to the technical solution of the present invention, a person skilled in the art can propose various alternative structural modes and implementation modes without changing the spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention.
An embodiment according to the present invention is shown in connection with fig. 1. A method of data storage comprising the steps of:
s11: receiving the storage data and acquiring the type of the storage data. For example: the storage data types include audio and video, pictures and texts.
S12: and setting initial frequency identification of the stored data according to the type, wherein the frequency identification comprises high frequency, intermediate frequency and low frequency. Setting by default: the audio and video is low frequency, the picture is medium frequency, the text is high frequency, and manual adjustment can be carried out according to actual requirements.
S13: if the frequency identification is high frequency, the storage data is divided into a plurality of data blocks, each data block stores a plurality of copies on different nodes of the first storage device, and a first mapping table is stored in the memory and is a mapping relation between the data block and the first storage device node. The first storage device is a flash hard disk region.
For example: for a certain piece of stored data, the data is divided into data blocks with the size of 1MB by default, each data block is copied into 3 copies, and then the copies are stored on different nodes of the first storage device according to a certain distributed storage algorithm (a consistent hash algorithm, a hash remainder algorithm, a hash slot algorithm and the like). And randomly distributing the copy data in different nodes to realize automatic data balance and horizontal expansion. When the disk or the node is damaged due to a fault, the system can automatically reestablish a new data copy according to the mapping table, so that the reliability of the data is ensured.
S14: if the frequency identification is the intermediate frequency, dividing the stored data into a plurality of data blocks, generating a first check block according to the data blocks, storing the plurality of data blocks and the first check block on different nodes of the second storage device, and simultaneously storing a second mapping table in the memory, wherein the second mapping table is a mapping relation between the data blocks and the nodes of the first check block and the second storage device. The second storage device is a magnetic hard disk region.
In this embodiment of the present invention, if the first parity chunk includes a local parity chunk and a global parity chunk, and there are k data chunks, a local parity chunks, and r global parity chunks, then generating the first parity chunk according to the data chunks includes: dividing k data blocks into a groups, wherein each group comprises k/a data blocks, calculating a local check block for each group based on an encoding equation, and calculating r global check blocks from all the data blocks, wherein the encoding equation is a Van der Monte matrix or a Cauchy matrix.
For example: as shown in fig. 3, the storage data is divided into 6 data blocks, which are divided into two groups (x 0, x1, x 2) and (Y0, Y1, Y2), the first group generates one local parity block PX from three data blocks (x 0, x1, x 2), the second group generates one local parity block PY from three data blocks (Y0, Y1, Y2), and then two global parity blocks P0 and P1 are generated from the 6 data blocks as a whole. If the X1 data block in the stored data is lost, the X1 data block can be recovered only based on X0, X2 and PX, and the reconstruction speed is high.
Compared with the traditional EC erasure codes, the method for recovering the data can greatly reduce the data reconstruction cost and improve the reconstruction speed. When any data block is lost, the data block can be recovered only by reading the nearby (a + r-1) block, so that the delay of reconstructing the data block and the consumption of bandwidth, disk io and the like can be reduced when the data block is lost, and the overall reliability of the system is improved.
S15: if the frequency identification is low frequency, dividing the stored data into a plurality of data blocks, compressing the plurality of data blocks to obtain a plurality of compressed data blocks, generating a second check block according to the compressed data blocks, storing the plurality of compressed data blocks and the second check block on different nodes of the second storage device, and generating a third mapping table in the memory, wherein the third mapping table is a mapping relation between the compressed data blocks and the second check block as well as the nodes of the second storage device. The second storage device is a magnetic hard disk region.
The compressing the plurality of data blocks to obtain a plurality of compressed data blocks includes: counting the occurrence frequency of each data in each data block, and recording the data with the maximum occurrence probability and the corresponding probability value; judging whether the maximum probability value exceeds a set frequency threshold, if so, replacing the data with a marker and then outputting the data, and if not, normally outputting the data; and sequentially circulating until the compression of all the data blocks is completed. The compressed data is smaller or far smaller than the storage space occupied by the data before compression through the compression mode, when the probability of data occurrence in the data block is higher than a set frequency threshold value, a marker replacing mode is selected for coding, otherwise, a normal coding mode is selected for coding, the sizes of the data files before and after compression are the same, and therefore the fact that the storage space occupied by the compressed data of each data block is smaller or equal than that occupied by the data before compression is guaranteed, and the overall compression effect is obtained.
In this embodiment of the present invention, the second parity check block includes redundant blocks, where j compressed data blocks are provided, and b redundant blocks, and the generating the second parity check block according to the compressed data blocks includes: b redundant blocks are calculated from all the compressed data blocks based on an encoding equation, which is a vandermonde matrix or cauchy matrix. The mode allows the loss of b block data at most, and improves the reliability of data storage.
For example: referring to fig. 4, the matrix G is cauchy matrix, the matrix D includes (D1, D2, D3, D4, D5) 5 compressed data blocks, and the matrix E includes the matrix D and 2 redundant blocks (C1, C2). Referring to fig. 5, if data blocks D2, D5 are lost, they may be extracted from matrix DSelecting corresponding row vectors and solving an inverse matrix thereof
Figure 834563DEST_PATH_IMAGE016
And combining the remaining matrices
Figure DEST_PATH_IMAGE017
Original data D is reconstructed.
S16: and monitoring the reading frequency of the stored data, changing the frequency identification of the stored data according to a set frequency threshold value, and correspondingly switching the data storage mode.
Specifically, the calculation formula of the reading frequency is as follows:
Figure 823379DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 621046DEST_PATH_IMAGE018
for the purpose of the reading frequency of the data,
Figure 517458DEST_PATH_IMAGE003
for the purpose of the initial frequency of the data,
Figure 251059DEST_PATH_IMAGE004
for the time of the last scan of the data,
Figure 738672DEST_PATH_IMAGE005
the time of the last access of the data,
Figure 354461DEST_PATH_IMAGE006
and m is the current scanning time of the data and the number of data accesses.
Further, in the process of storing the data block and/or the compressed data block in the storage device, the method further includes: and caching the data stored in the storage device, and sending the data when the data accumulation amount exceeds a set length threshold value. When the size of a cache temporary file which needs to be created locally exceeds one block (usually 64 MB), the node of the storage device is contacted, and a storage position of a data storage node is allocated to complete the writing operation of the file. The pressure of the server side can be greatly reduced by caching the stored data.
The invention selects the most appropriate storage mode aiming at the data with different reading frequency requirements, greatly reduces the storage cost on the premise of not reducing the access efficiency, and provides a better solution for long-term storage and instant access of a large amount of data.
Referring to fig. 2, the present invention further provides a data reading method, which is applied to the data storage method according to any one of the above methods, and stored on the storage data of the first storage device or the second storage device, and includes the following steps:
s21: and acquiring the storage data according to the reading instruction, and judging the frequency identification of the storage data. The frequency identification includes a high frequency, a medium frequency, and a low frequency.
S22: and if the frequency identification is high frequency, reading a data block or a data block copy of the data stored on the first storage equipment node according to the first mapping table, and outputting a reading result after reading.
S23: and if the frequency identification is the intermediate frequency, reading the data block of the data stored on the second storage equipment node according to the second mapping table, and outputting a reading result after the reading is finished.
S24: and if the frequency identification is low frequency, reading a compressed data block of the data stored on the second storage equipment node according to the third mapping table, decompressing after reading to obtain a data block, splicing the data block and outputting a reading result.
When the reading operation is carried out, the method also comprises the steps of judging and selecting the reconstructed file or directly reading the source file according to the load degree of the node reading request, if the load degree exceeds a set load threshold value, selecting the reconstructed file, and if not, directly reading the source file.
The calculation formula of the load degree of the node reading request is as follows:
Figure 320143DEST_PATH_IMAGE007
Figure 439409DEST_PATH_IMAGE008
in the above formula, the first and second carbon atoms are,
Figure 566765DEST_PATH_IMAGE009
as the load level of the jth data node,
Figure 407200DEST_PATH_IMAGE010
the load degree of the ith data block of the jth data node in the t time period, n is the number of the data blocks,
Figure 910994DEST_PATH_IMAGE011
is the ith data block, O is a given value,
Figure 619187DEST_PATH_IMAGE012
for the interval time from the last calculation of the load,
Figure 651865DEST_PATH_IMAGE013
is the ith data block of the jth data node
Figure 711088DEST_PATH_IMAGE014
The degree of load during the period of time,
Figure 156588DEST_PATH_IMAGE015
is the length of the ith data block.
In summary, the beneficial effects of the invention include: by distinguishing the types of the stored data and giving frequency identification according to the reading frequency of the data, the storage mode and the storage position of the stored data can be dynamically adjusted, the storage server adapts to the actual data access state, the effective utilization rate of the storage space of the storage server is improved, and the reasonable matching of the stored data and the storage resources is realized. Aiming at the data characteristics of different reading frequencies, different data protection modes are respectively adopted, when the data is read at high frequency, a plurality of copies are adopted for data backup, checking calculation is not needed, the data reconstruction performance is good, the reading and writing speed is high, and the high-frequency data reading and writing requirements are met; when the data is read in the intermediate frequency, the local check block and the global check block are generated by adopting the coding equation, so that the quick reconstruction of the lost data can be realized, the storage cost is high, the copy mode is low, and the read-write requirement of the intermediate frequency data is met; when the data is read at low frequency, a coding equation is adopted to generate a redundant block, reconstruction of lost data is realized by using the redundant block, although the reconstruction speed is sacrificed, the storage cost is reduced, the storage cost is further reduced by a data block compression mode, and the read-write requirement of the low-frequency data is well adapted.
It should be understood that the integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The technical scope of the present invention is not limited to the above description, and those skilled in the art can make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and such changes and modifications should fall within the protective scope of the present invention.

Claims (10)

1. A method of storing data, comprising the steps of:
receiving storage data, and acquiring the type of the storage data, wherein the type comprises audio and video, pictures and texts;
setting initial frequency identification of stored data according to the type, wherein the frequency identification comprises high frequency, intermediate frequency and low frequency;
if the frequency identification is high frequency, dividing the stored data into a plurality of data blocks, wherein each data block stores a plurality of copies on different nodes of the first storage device, and simultaneously stores a first mapping table in a memory, wherein the first mapping table is a mapping relation between the data block and the first storage device node;
if the frequency identification is the intermediate frequency, dividing the stored data into a plurality of data blocks, generating a first check block according to the data blocks, storing the plurality of data blocks and the first check block on different nodes of a second storage device, and simultaneously storing a second mapping table in a memory, wherein the second mapping table is a mapping relation between the data blocks and the nodes of the first check block and the nodes of the second storage device;
if the frequency identification is low frequency, dividing the stored data into a plurality of data blocks, compressing the plurality of data blocks to obtain a plurality of compressed data blocks, generating a second check block according to the compressed data blocks, storing the plurality of compressed data blocks and the second check block on different nodes of a second storage device, and generating a third mapping table in the memory, wherein the third mapping table is a mapping relation between the compressed data blocks and the second check block as well as the nodes of the second storage device;
and monitoring the reading frequency of the stored data, changing the frequency identification of the stored data according to a set frequency threshold value, and correspondingly switching the data storage mode.
2. The data storage method according to claim 1, wherein the storing of the data block and/or the compressed data block to the storage device further comprises: and caching the data stored in the storage device, and sending the data when the data accumulation amount exceeds a set length threshold value.
3. The data storage method of claim 1, wherein the reading frequency is calculated as follows:
Figure DEST_PATH_IMAGE001
in the above-mentioned formula, the compound has the following structure,
Figure 53956DEST_PATH_IMAGE002
for the purpose of the reading frequency of the data,
Figure DEST_PATH_IMAGE003
for the purpose of the initial frequency of the data,
Figure 470025DEST_PATH_IMAGE004
the time of the last scan of the data,
Figure DEST_PATH_IMAGE005
for the time of the last access of the data,
Figure 549714DEST_PATH_IMAGE006
and m is the current scanning time of the data and the number of data accesses.
4. The data storage method of claim 1, wherein compressing the plurality of data blocks to obtain a plurality of compressed data blocks comprises:
counting the occurrence times of each data in each data block, and recording the data with the maximum occurrence probability and the corresponding probability value;
judging whether the maximum probability value exceeds a set frequency threshold, if so, replacing the data with a marker and then outputting the data, and if not, normally outputting the data;
and sequentially circulating until the compression of all the data blocks is completed.
5. The data storage method according to claim 1, wherein the first parity chunks include local parity chunks and global parity chunks, and if there are k data chunks, a local parity chunks, and r global parity chunks, the generating of the first parity chunk according to the data chunks includes: dividing k data blocks into a groups, wherein each group comprises k/a data blocks, calculating a local check block in each group based on an encoding equation, and calculating r global check blocks from all the data blocks, wherein the encoding equation is an Van der Monte matrix or a Cauchy matrix.
6. The data storage method of claim 1, wherein the second parity block comprises redundant blocks, there are j compressed data blocks, and b redundant blocks, and the generating the second parity block from the compressed data blocks comprises: b redundant blocks are calculated from all the compressed data blocks based on an encoding equation, which is either a vandermonde matrix or a cauchy matrix.
7. The data storage method of claim 1, wherein the first storage device is a flash hard disk region and the second storage device is a magnetic hard disk region.
8. A data reading method applied to the data stored in the first storage device or the second storage device according to the data storage method of any one of claims 1 to 7, comprising the steps of:
acquiring storage data according to the reading instruction, and judging the frequency identification of the storage data;
if the frequency identification is high frequency, reading a data block or a data block copy of data stored on a first storage device node according to the first mapping table, and outputting a reading result after reading is finished;
if the frequency identification is the intermediate frequency, reading a data block of data stored on a second storage device node according to the second mapping table, and outputting a reading result after the reading is finished;
and if the frequency identification is low frequency, reading a compressed data block of the data stored on the second storage equipment node according to the third mapping table, decompressing after reading to obtain a data block, and splicing the data block to output a reading result.
9. The data reading method of claim 8, further comprising determining to select the reconstructed file or directly read the source file according to a load degree of the node read request during the reading operation, and selecting the reconstructed file if the load degree exceeds a set load threshold, otherwise directly reading the source file.
10. The data reading method according to claim 9, wherein the calculation formula of the load degree of the node read request is as follows:
Figure DEST_PATH_IMAGE007
Figure 294335DEST_PATH_IMAGE008
in the above formula, the first and second carbon atoms are,
Figure DEST_PATH_IMAGE009
as the load degree of the jth data node,
Figure 898623DEST_PATH_IMAGE010
the load degree of the ith data block of the jth data node in the t time period, n is the number of the data blocks,
Figure DEST_PATH_IMAGE011
for the ith data block, O is a given value,
Figure 321645DEST_PATH_IMAGE012
for the interval time from the last calculation of the load,
Figure DEST_PATH_IMAGE013
is the ith of the jth data nodeData block is in
Figure 833004DEST_PATH_IMAGE014
The degree of load during the period of time,
Figure DEST_PATH_IMAGE015
is the length of the ith data block.
CN202211713250.0A 2022-12-30 2022-12-30 Data storage and reading method Active CN115686382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211713250.0A CN115686382B (en) 2022-12-30 2022-12-30 Data storage and reading method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211713250.0A CN115686382B (en) 2022-12-30 2022-12-30 Data storage and reading method

Publications (2)

Publication Number Publication Date
CN115686382A true CN115686382A (en) 2023-02-03
CN115686382B CN115686382B (en) 2023-03-21

Family

ID=85056359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211713250.0A Active CN115686382B (en) 2022-12-30 2022-12-30 Data storage and reading method

Country Status (1)

Country Link
CN (1) CN115686382B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521063A (en) * 2023-03-31 2023-08-01 北京瑞风协同科技股份有限公司 Efficient test data reading and writing method and device for HDF5

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN106527993A (en) * 2016-11-09 2017-03-22 北京搜狐新媒体信息技术有限公司 Mass file storage method and device for distributed type system
CN109144791A (en) * 2018-09-30 2019-01-04 北京金山云网络技术有限公司 Data conversion storage method, apparatus and data management server
CN111858169A (en) * 2020-07-10 2020-10-30 山东云海国创云计算装备产业创新中心有限公司 Data recovery method, system and related components
CN113391764A (en) * 2021-06-09 2021-09-14 北京沃东天骏信息技术有限公司 Information processing method and device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118133A (en) * 2013-02-28 2013-05-22 浙江大学 Mixed cloud storage method based on file access frequency
CN106527993A (en) * 2016-11-09 2017-03-22 北京搜狐新媒体信息技术有限公司 Mass file storage method and device for distributed type system
CN109144791A (en) * 2018-09-30 2019-01-04 北京金山云网络技术有限公司 Data conversion storage method, apparatus and data management server
CN111858169A (en) * 2020-07-10 2020-10-30 山东云海国创云计算装备产业创新中心有限公司 Data recovery method, system and related components
CN113391764A (en) * 2021-06-09 2021-09-14 北京沃东天骏信息技术有限公司 Information processing method and device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521063A (en) * 2023-03-31 2023-08-01 北京瑞风协同科技股份有限公司 Efficient test data reading and writing method and device for HDF5
CN116521063B (en) * 2023-03-31 2024-03-26 北京瑞风协同科技股份有限公司 Efficient test data reading and writing method and device for HDF5

Also Published As

Publication number Publication date
CN115686382B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN108170555B (en) Data recovery method and equipment
CN105404469B (en) A kind of storage method and system of video data
US9817715B2 (en) Resiliency fragment tiering
US9665427B2 (en) Hierarchical data storage architecture
US20210263795A1 (en) Efficient compaction on log-structured distributed file system using erasure coding for resource consumption reduction
CN110750382B (en) Minimum storage regeneration code coding method and system for improving data repair performance
US11422703B2 (en) Data updating technology
CN114415976B (en) Distributed data storage system and method
CN110442535B (en) Method and system for improving reliability of distributed solid-state disk key value cache system
WO2019001521A1 (en) Data storage method, storage device, client and system
US9734014B2 (en) Virtual memory mapping in a dispersed storage network
US9207870B2 (en) Allocating storage units in a dispersed storage network
US11025965B2 (en) Pre-fetching content among DVRs
CN107729536B (en) Data storage method and device
JP2018508073A (en) Data removal, allocation and reconstruction
CN115686382B (en) Data storage and reading method
CN106027638B (en) A kind of hadoop data distributing method based on hybrid coding
CN113608701A (en) Data management method in storage system and solid state disk
WO2022105442A1 (en) Erasure code-based data reconstruction method and appratus, device, and storage medium
CN117075821B (en) Distributed storage method and device, electronic equipment and storage medium
EP4170499A1 (en) Data storage method, storage system, storage device, and storage medium
US11422737B2 (en) Data storage system for data distribution and data restoration based on compressibility ratio of data and operating method of controller for controlling the data distribution and data restoration
CN113485874B (en) Data processing method and distributed storage system
CN113157715A (en) Erasure code data center rack collaborative updating method
CN108932176B (en) Data degradation storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant