CN104572987B - A kind of method and system that simple regeneration code storage efficiency is improved by compressing - Google Patents

A kind of method and system that simple regeneration code storage efficiency is improved by compressing Download PDF

Info

Publication number
CN104572987B
CN104572987B CN201510002948.5A CN201510002948A CN104572987B CN 104572987 B CN104572987 B CN 104572987B CN 201510002948 A CN201510002948 A CN 201510002948A CN 104572987 B CN104572987 B CN 104572987B
Authority
CN
China
Prior art keywords
file
compressed
size
block
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510002948.5A
Other languages
Chinese (zh)
Other versions
CN104572987A (en
Inventor
尹建伟
黄晓成
邓水光
李莹
吴健
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201510002948.5A priority Critical patent/CN104572987B/en
Publication of CN104572987A publication Critical patent/CN104572987A/en
Application granted granted Critical
Publication of CN104572987B publication Critical patent/CN104572987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention is applied to data processing field, there is provided a kind of method and system that simple regeneration code storage efficiency is improved by compressing, methods described include:File to be compressed is obtained, and the size according to the file to be compressed is handled with file to be compressed described in the relation pair of default file size threshold values and data block size;Compression algorithm is set according to the size of the data block of the file to be compressed;The check block that simple regeneration code coding obtains is carried out to the file to be compressed according to the compression algorithm to be compressed;Each memory node is arrived into check block storage Jing Guo the compression.Implement the embodiment of the present invention, the storage efficiency of simple regeneration code can be improved and keep the fault-tolerance and reliability of system.

Description

A kind of method and system that simple regeneration code storage efficiency is improved by compressing
Technical field
The invention belongs to data processing field, more particularly to a kind of side that simple regeneration code storage efficiency is improved by compressing Method and system.
Background technology
Data are centrally stored on single memory node by traditional document storage mode, but the processing of single memory node Ability is extremely limited after all, easily forms systematic function bottleneck, reliability and the security ratio of system are relatively low, can not meet to advise greatly The needs of mould storage application.
Thus the concept of distributed storage proposes.Data are disperseed to be stored on multiple memory nodes by this storage mode, These memory nodes are in communication with each other by computer network, are managed concentratedly by distributed memory system and provide succinct visit Ask interface.Distributed memory system is usually the multiple copies of each document creation and is disperseed storage, allows multiple memory nodes The access request pressure of user is shared, the response time can be efficiently reduced and there is good fault-tolerance and scalability.
Hadoop is the distribution developed by Apache Software Fundation (Apache's software foundation) System infrastructure, it realizes a distributed file system (Hadoop Distributed File System, HDFS). HDFS is designed to be deployed on cheap hardware, has very high fault-tolerance, and high-transmission is provided to the data of application program Rate and the data in file system can be accessed with the form of stream.
The mode that HDFS high fault tolerance backs up generally by more copies realizes, but with internet and world's model Various data in enclosing it is intimate it is exponential it is other be skyrocketed through, the relatively low storage efficiency of more copy modes can bring higher deposit Store up expense.
In order to remain in that higher fault-tolerance while HDFS storage overhead is reduced, based on Erasure Code Software RAID (redundant arrays of inexpensive disks) scheme of (correcting and eleting codes) generates.It is honest and clean in order to ensure during the mode backed up using more copies Valency hardware does not lose data, at least to ensure to have three copies while exist;And after using Erasure Code, it can increase Only retain a duplicate of the document on the basis of a small amount of verification file, can thus greatly improve HDFS storage efficiency, and pass through Theoretical proof is crossed, this scheme remains to keep high reliability.
SRC is Erasure Code one kind, and it handles every group of 10 data blocks, and the part less than 10 is by full 0 data It is filled, final coding produces 10 corresponding check blocks.HDFS encodes SRC when realizing this scheme one obtained Group check block be written to together in the single file of distributed memory system, each check block according to software RAID schemes principle It is respectively stored on incomplete same node.
This implementation does not have the data repetition characteristic for considering check block.According to SRC characteristic, when file includes 4 During individual data block, having two check blocks in 10 check blocks caused by SRC codings is obtained by the full 0 data block XOR of two supplements Arrive, the information of the two check blocks can carry out the compression of high degree, and other check blocks that SRC is obtained there is also or it is more Or few duplicate message, a part of memory space can also be saved by the method for compression.In addition, if carried using HDFS If compression mechanism is directly compressed single file caused by SRC codings, it can cause to store multiple check blocks in a block Information, this has run counter to software RAID principle, while can also reduce the fault-tolerance of system.
The content of the invention
In consideration of it, the present invention provides a kind of method and system that simple regeneration code storage efficiency is improved by compressing, with solution Certainly prior art simply regenerates the low technical problem of yard storage efficiency.
The embodiment of the present invention is achieved in that a kind of by compressing the method for improving simple regeneration code storage efficiency, institute The method of stating comprises the following steps:
File to be compressed is obtained, and according to the size and default file size threshold values and data block of the file to be compressed File to be compressed is handled described in the relation pair of size;
Compression algorithm is set according to the size of the data block of the file to be compressed;
The check block that simple regeneration code coding obtains is carried out to the file to be compressed according to the compression algorithm to press Contracting;
Inspection is compressed to compression type code and temporary file corresponding to the check block;
Each memory node is arrived into check block storage Jing Guo the compression.
The embodiment of the present invention also provides a kind of by compressing the system for improving simple regeneration code storage efficiency, the system bag Include:
Processing unit, for obtaining file to be compressed, and it is big according to the size and default file of the file to be compressed File to be compressed is handled described in the relation pair of small threshold values and data block size;
Compression algorithm setting unit, the size for the data block of the file to be compressed according to the processing unit processes are set Put compression algorithm;
Compression unit, the compression algorithm for being set according to the compression algorithm setting unit are entered to the file to be compressed The check block that the simple regeneration code coding of row obtains is compressed;
Verification unit is compressed, for being compressed inspection to compression type code and temporary file corresponding to the check block;
Memory cell, for the check block storage by compression unit compression to be arrived into each memory node.
The embodiment of the present invention, file to be compressed is obtained, handled according to the size of file to be compressed, according to text to be compressed The size of part data block sets compression algorithm, carries out what SRC codings obtained to the file to be compressed according to the compression algorithm Check block is compressed, and the check block storage Jing Guo the compression is arrived into each memory node so that can effectively improve simply again The storage efficiency of raw code, and the high fault tolerance and reliability of original system can be maintained.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Accompanying drawing obtains other accompanying drawings.
Fig. 1 is the flow chart provided in an embodiment of the present invention that simple regeneration code storage efficiency method is improved by compressing;
Fig. 2 is the flow chart provided in an embodiment of the present invention that simple regeneration code storage efficiency system is improved by compressing
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one
It is as shown in Figure 1 the flow provided in an embodiment of the present invention that simple regeneration code storage efficiency method is improved by compressing Figure, the described method comprises the following steps:
Step S101, obtain file to be compressed, and the size according to the file to be compressed and default file size valve File to be compressed described in value and the relation pair of data block size is handled.
In embodiments of the present invention, the system handled obtains file to be compressed first, according to the file to be compressed Size and the relation pair of the default file size threshold values and data block size file to be compressed handled.It is described according to institute The size and file to be compressed described in the relation pair of default file size threshold values and data block size for stating file to be compressed are carried out Processing, it is specially:
If the 1st, the size of the file to be compressed is more than the default file size threshold values, not to described to be compressed File is handled;Or
In embodiments of the present invention, default file size threshold values is by taking 2G sizes as an example, if the size of file to be compressed is big In 2G, (speed that file uploads in distributed file system is about tens of million, it is assumed that uploading speed 32M/s, is then uploaded 1 minute The file size that left and right can obtain is 2G.Temporarily take this value herein, in practical application scene can according to the quantity of file and Size distribution carrys out concrete decision, and encoder coding increases total time with the increase of this value), due to adjusting data block size Time overhead it is larger, the data block size for not treating compressed file is adjusted.
If the 2, the size of the file to be compressed is more than one group of data block, less than the default file size threshold values, Judge the data number of blocks of the file to be compressed, and handled again according to the data number of blocks;
In embodiments of the present invention, it is big less than default file if the size of file to be compressed is more than one group of data block Small threshold values, then need to be handled again according to the data number of blocks of the compressed file.It is described to be entered according to the data number of blocks The step of row is handled again, including:
(1) if, the data number of blocks of the file to be compressed be not 10 multiple, increase the number of the compressed file According to the size of block make the data block total number of the compressed file be changed into 10 multiple;Or
(2) if, the multiple that the data number of blocks of the file to be compressed is 10, the file to be compressed is not carried out Processing.
In embodiments of the present invention, if the data block total number of file is not 10 multiple, weight behind local is downloaded files into HDFS newly is uploaded to, increases the multiple that the size of original data block makes the data block total number of file be changed into 10 when uploading again (for example be 100M by adjusting data block size, file data blocks sum is just for the 1000M files that data block size is 80M Well for 10);If the data block total number of file is just 10 multiple, need not be adjusted again.
If the 3rd, the size of the file to be compressed is less than one group of data block, and is less than default file size threshold values, then The file to be compressed is not handled.
In embodiments of the present invention, if the size of file is less than or equal to size (such as the data block size of one group of data block For 64M 512M files, totally 8 data blocks) and when being not above 2G, also the data block size of file is not adjusted.
Step S102, compression algorithm is set according to the size of the data block of the file to be compressed.
In embodiments of the present invention, system needs to treat compressed file setting compression type.It is described according to described to be compressed The size of file data blocks sets the step of compression algorithm, including:
1st, the information of each check block is handled successively, is decided whether pair according to whether the check block is data redundancy block The check block is compressed.
In embodiments of the present invention, can the numbering of check block according to caused by being encoded SRC decide whether to carry out it Compression, the check block for per group # being 6-9 are that the check block that RS algorithms obtain is that multiple data blocks are passed through as caused by RS algorithms What multiple xor operation obtained, data redundancy therein is seldom, and number the check block for being 0 be as caused by RS algorithms latter two Check block is caused by one group again XOR, does not have data redundancy substantially yet, therefore these check blocks are all without compression.It is if literary Part type is the type that video (being judged according to file suffixes name) etc. can not compress, also without squeeze operation.In this case It is " not compressing " to set compression type code.
If the 2, the check block is data redundancy block, the check block is compressed, and according to the check block Size set compression algorithm.
In embodiments of the present invention, for the file that can compress, (such as other SRC check blocks are only different pieces of information blocks Obtained by an XOR, still suffer from data redundancy, and the also situation of full 0 data block XOR), according to the big of data block It is small to determine compression algorithm and corresponding compression type code is set to be " BZIP2 " or " LZO ".If data block is smaller (being less than 128M), Then using the high BZIP2 algorithms of compression efficiency (BZIP2 decompression speed substantially in 100M or so, for below 128M data Block can be completed to decompress in 1s or so), otherwise using the fast LZO algorithms of compression speed, (the decompression speed of LZO algorithms is hundreds of Million, remain to obtain the information of decompression quickly for larger data block).
Step S103, the verification that simple regeneration code coding obtains is carried out to the file to be compressed according to the compression algorithm Block is compressed.
In embodiments of the present invention, encoder file is grouped by data block size (by data block sequencing according to Secondary packet, every group of 10 data blocks), to every group of data block progress SRC coding generation verification block message, the coding in figure prepares to walk Suddenly refer to and create the operation such as iostream.Wherein, after 4 check blocks have been produced using RS algorithms, by 10 data blocks two-by-two It is adjacent for one group totally 5 groups, along with check block caused by RS algorithms latter two for one group totally six groups of carry out SRC codings produce The information of remaining six check blocks.For the information of each check block, they are compressed and write according to the compression type of setting Enter into single temporary file.
Step S104, inspection is compressed to compression type code and temporary file corresponding to the check block.
In embodiments of the present invention, compression type code and temporary file corresponding to each check block are checked, if compression type Code is " not compressing ", then the information of the check block directly can be read from temporary file.Compression type code be " BZIP2 " or When " LZO ", if the size of temporary file is more than or equal to the size of data block and (pertained only to the size of compression type code in the present invention Three kinds of compression types, only with two can represent compression type) difference, then check block compression process is invalid, now needs root The decompression that the information of temporary file is carried out to respective type according to compression type code obtains the information of former check block and by corresponding pressure Contracting type codes are changed to " not compressing ".In the case of other, verification block message directly can be read from temporary file.
Step S105, each memory node is arrived into the check block storage Jing Guo the compression.
If compression type code is " not compressing ", the information handled well is directly write in verification file;If compress class Type code is " BZIP2 " or " LZO ", then first first writes compression type code in verification file in the form of code value, then will compression Good information write-in.
The embodiment of the present invention, file to be compressed is obtained, handled according to the size of file to be compressed, according to text to be compressed The size of part data block sets compression algorithm, carries out what SRC codings obtained to the file to be compressed according to the compression algorithm Check block is compressed, and the check block storage Jing Guo the compression is arrived into each memory node so that can effectively improve simply again The storage efficiency of raw code, and the high fault tolerance and reliability of original system can be maintained.
Embodiment two
It is illustrated in figure 2 the structure provided in an embodiment of the present invention that simple regeneration code storage efficiency system is improved by compressing Figure, for convenience of description, only shows the part related to the embodiment of the present invention, including:
Processing unit 201, for obtaining file to be compressed, and the size according to the file to be compressed and default file File to be compressed is handled described in the relation pair of size threshold values and data block size.
In embodiments of the present invention, the system handled obtains file to be compressed first, according to the file to be compressed Size and the relation pair of the default file size threshold values and data block size file to be compressed handled.The processing is single Member 201 includes:
First processing subelement 2011, if the size for the file to be compressed is more than the default file size Threshold values, then the file to be compressed is not handled;Or
In embodiments of the present invention, default file size threshold values is by taking 2G sizes as an example, if the size of file to be compressed is big In 2G, (speed that file uploads in distributed file system is about tens of million, it is assumed that uploading speed 32M/s, is then uploaded 1 minute The file size that left and right can obtain is 2G.Temporarily take this value herein, in practical application scene can according to the quantity of file and Size distribution carrys out concrete decision, and encoder coding increases total time with the increase of this value), due to adjusting data block size Time overhead it is larger, the data block size for not treating compressed file is adjusted.
Second processing subelement 2012, if the size for the file to be compressed is more than one group of data block, less than institute Default file size threshold values is stated, judges the data number of blocks of the file to be compressed, and carry out according to the data number of blocks Handle again;Or
In embodiments of the present invention, it is big less than default file if the size of file to be compressed is more than one group of data block Small threshold values, then need to be handled again according to the data number of blocks of the compressed file.The second processing subelement 2012, bag Include:
Increase subelement 20121, if the data number of blocks for the file to be compressed is not 10 multiple, increase The size of the data block of the compressed file make the data block total number of the compressed file be changed into 10 multiple;Or
Subelement 20122 is not handled, if the multiple that the data number of blocks for the file to be compressed is 10, not right The file to be compressed is handled.
In embodiments of the present invention, if the data block total number of file is not 10 multiple, weight behind local is downloaded files into HDFS newly is uploaded to, increases the multiple that the size of original data block makes the data block total number of file be changed into 10 when uploading again (for example be 100M by adjusting data block size, file data blocks sum is just for the 1000M files that data block size is 80M Well for 10);If the data block total number of file is just 10 multiple, need not be adjusted again.
3rd processing subelement 2013, if the size for the file to be compressed is less than one group of data block, and is less than Default file size threshold values, then the file to be compressed is not handled.
In embodiments of the present invention, if the size of file is less than or equal to size (such as the data block size of one group of data block For 64M 512M files, totally 8 data blocks) and when being not above 2G, also the data block size of file is not adjusted.
Compression algorithm setting unit 202, for the data block of file to be compressed that is handled according to the processing unit 201 Size sets compression algorithm.
In embodiments of the present invention, system needs to treat compressed file setting compression type.The compression algorithm sets single Member 202, including:
Check block information processing subelement 2021, for handling the information of each check block successively, according to the check block Whether it is data redundancy block to decide whether to be compressed the check block.
In embodiments of the present invention, can the numbering of check block according to caused by being encoded SRC decide whether to carry out it Compression, the check block for per group # being 6-9 are that the check block that RS algorithms obtain is that multiple data blocks are passed through as caused by RS algorithms What multiple xor operation obtained, data redundancy therein is seldom, and number the check block for being 0 be as caused by RS algorithms latter two Check block is caused by one group again XOR, does not have data redundancy substantially yet, therefore these check blocks are all without compression.It is if literary Part type is the type that video (being judged according to file suffixes name) etc. can not compress, also without squeeze operation.In this case It is " not compressing " to set compression type code.
Compression algorithm sets subelement 2022, if the verification for the check block information processing subelement 2021 processing Block is data redundancy block, then the check block is compressed, and sets compression algorithm according to the size of the check block.
In embodiments of the present invention, for the file that can compress, (such as other SRC check blocks are only different pieces of information blocks Obtained by an XOR, still suffer from data redundancy, and the also situation of full 0 data block XOR), according to the big of data block It is small to determine compression algorithm and corresponding compression type code is set to be " BZIP2 " or " LZO ".If data block is smaller (being less than 128M), Then using the high BZIP2 algorithms of compression efficiency (BZIP2 decompression speed substantially in 100M or so, for below 128M data Block can be completed to decompress in 1s or so), otherwise using the fast LZO algorithms of compression speed, (the decompression speed of LZO algorithms is hundreds of Million, remain to obtain the information of decompression quickly for larger data block).
Compression unit 203, for the compression algorithm that is set according to the compression algorithm setting unit 202 to described to be compressed File carries out the check block that simple regeneration code coding obtains and is compressed.
In embodiments of the present invention, encoder file is grouped by data block size (by data block sequencing according to Secondary packet, every group of 10 data blocks), to every group of data block progress SRC coding generation verification block message, the coding in figure prepares to walk Suddenly refer to and create the operation such as iostream.Wherein, after 4 check blocks have been produced using RS algorithms, by 10 data blocks two-by-two It is adjacent for one group totally 5 groups, along with check block caused by RS algorithms latter two for one group totally six groups of carry out SRC codings produce The information of remaining six check blocks.For the information of each check block, they are compressed and write according to the compression type of setting Enter into single temporary file.
Verification unit 204 is compressed, for being compressed inspection to compression type code and temporary file corresponding to the check block Test.
In embodiments of the present invention, compression type code and temporary file corresponding to each check block are checked, if compression type Code is " not compressing ", then the information of the check block directly can be read from temporary file.Compression type code be " BZIP2 " or When " LZO ", if the size of temporary file is more than or equal to the size of data block and (pertained only to the size of compression type code in the present invention Three kinds of compression types, only with two can represent compression type) difference, then check block compression process is invalid, now needs root The decompression that the information of temporary file is carried out to respective type according to compression type code obtains the information of former check block and by corresponding pressure Contracting type codes are changed to " not compressing ".In the case of other, verification block message directly can be read from temporary file.
Memory cell 205, for the check block compressed by the compression unit 203 storage to be arrived into each memory node.
If compression type code is " not compressing ", the information handled well is directly write in verification file;If compress class Type code is " BZIP2 " or " LZO ", then first first writes compression type code in verification file in the form of code value, then will compression Good information write-in.
The embodiment of the present invention, file to be compressed is obtained, handled according to the size of file to be compressed, according to text to be compressed The size of part data block sets compression algorithm, carries out what SRC codings obtained to the file to be compressed according to the compression algorithm Check block is compressed, and the check block storage Jing Guo the compression is arrived into each memory node so that can effectively improve simply again The storage efficiency of raw code, and the high fault tolerance and reliability of original system can be maintained.
One of ordinary skill in the art will appreciate that it is that the unit included by above-described embodiment two is according to function What logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized;In addition, each function The specific name of unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.
Those of ordinary skill in the art are further appreciated that all or part of step realized in above-described embodiment method is can To instruct the hardware of correlation to complete by program, described program can be stored in a computer read/write memory medium In, described storage medium, including ROM/RAM, disk, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims (6)

  1. It is 1. a kind of by compressing the method for improving simple regeneration code storage efficiency, it is characterised in that methods described includes following step Suddenly:
    File to be compressed is obtained, and according to the size and default file size threshold values and data block size of the file to be compressed Relation pair described in file to be compressed handled;
    Compression algorithm is set according to the size of the data block of the file to be compressed;
    The check block that simple regeneration code coding obtains is carried out to the file to be compressed according to the compression algorithm to be compressed;
    Inspection is compressed to compression type code and temporary file corresponding to the check block;
    Each memory node is arrived into check block storage Jing Guo the compression;
    According to the size of the file to be compressed with waiting to press described in the relation pair of default file size threshold values and data block size Contracting file is handled, and is specially:
    If the size of the file to be compressed is more than the default file size threshold values, the file to be compressed is not entered Row processing;Or
    If the size of the file to be compressed is more than one group of data block, less than the default file size threshold values, institute is judged The data number of blocks of file to be compressed is stated, and is handled again according to the data number of blocks;Or
    If the size of the file to be compressed is less than one group of data block, and is less than default file size threshold values, then not to institute File to be compressed is stated to be handled.
  2. 2. the method as described in claim 1, it is characterised in that the step handled again according to the data number of blocks Suddenly, including:
    If the data number of blocks of the file to be compressed be 10 multiple, increase the compressed file data block it is big The small data block total number for making the compressed file is changed into 10 multiple;Or
    If the data number of blocks of the file to be compressed is 10 multiple, the file to be compressed is not handled.
  3. 3. the method as described in claim 1, it is characterised in that described to be set according to the size of the file data blocks to be compressed The step of compression algorithm, including:
    The information of each check block is handled successively, whether is data redundancy block to decide whether to the school according to the check block Block is tested to be compressed;
    If the check block is data redundancy block, the check block is compressed, and according to the size of the check block Compression algorithm is set.
  4. It is 4. a kind of by compressing the system for improving simple regeneration code storage efficiency, it is characterised in that the system includes:
    Processing unit, for obtaining file to be compressed, and the size according to the file to be compressed and default file size valve File to be compressed described in value and the relation pair of data block size is handled;
    Compression algorithm setting unit, the size for the data block of the file to be compressed according to the processing unit processes set pressure Compression algorithm;
    Compression unit, for carrying out letter to the file to be compressed according to the compression algorithm that the compression algorithm setting unit is set The check block that easily regeneration code coding obtains is compressed;
    Verification unit is compressed, for being compressed inspection to compression type code and temporary file corresponding to the check block;
    Memory cell, for the check block storage by compression unit compression to be arrived into each memory node;
    The processing unit includes:
    First processing subelement, if the size for the file to be compressed is more than the default file size threshold values, The file to be compressed is not handled;
    Or second processing subelement, it is pre- less than described if the size for the file to be compressed is more than one group of data block If file size threshold values, judge the data number of blocks of the file to be compressed, and carry out again according to the data number of blocks Processing;Or
    3rd processing subelement, if the size for the file to be compressed is less than one group of data block, and is less than default text Part size threshold values, then the file to be compressed is not handled.
  5. 5. system as claimed in claim 4, it is characterised in that the second processing subelement, including:
    Increase subelement, if the data number of blocks for the file to be compressed is not 10 multiple, increase the compression The size of the data block of file make the data block total number of the compressed file be changed into 10 multiple;Or
    Subelement is not handled, if the multiple that the data number of blocks for the file to be compressed is 10, does not wait to press to described Contracting file is handled.
  6. 6. system as claimed in claim 4, it is characterised in that the compression algorithm setting unit, including:
    Check block information processing subelement, whether it is several according to the check block for handling the information of each check block successively Decide whether to be compressed the check block according to redundant block;
    Compression algorithm sets subelement, if the check block for check block information processing subelement processing is data redundancy Block, then the check block is compressed, and compression algorithm is set according to the size of the check block.
CN201510002948.5A 2015-01-04 2015-01-04 A kind of method and system that simple regeneration code storage efficiency is improved by compressing Active CN104572987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510002948.5A CN104572987B (en) 2015-01-04 2015-01-04 A kind of method and system that simple regeneration code storage efficiency is improved by compressing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510002948.5A CN104572987B (en) 2015-01-04 2015-01-04 A kind of method and system that simple regeneration code storage efficiency is improved by compressing

Publications (2)

Publication Number Publication Date
CN104572987A CN104572987A (en) 2015-04-29
CN104572987B true CN104572987B (en) 2017-12-22

Family

ID=53089049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510002948.5A Active CN104572987B (en) 2015-01-04 2015-01-04 A kind of method and system that simple regeneration code storage efficiency is improved by compressing

Country Status (1)

Country Link
CN (1) CN104572987B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956128B (en) * 2016-05-09 2019-09-17 南京大学 A kind of adaptive coding storage fault-tolerance approach based on simple regeneration code
CN110945792A (en) * 2018-10-31 2020-03-31 华为技术有限公司 Method for compressing data, method for decompressing data and related device
EP3863018A4 (en) 2018-10-31 2021-12-01 Huawei Technologies Co., Ltd. Data compression method and related apparatus, and data decompression method and related apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN103995855A (en) * 2014-05-14 2014-08-20 华为技术有限公司 Method and device for storing data
CN104123300A (en) * 2013-04-26 2014-10-29 上海云人信息科技有限公司 Data distributed storage system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8874518B2 (en) * 2007-06-06 2014-10-28 International Business Machines Corporation System, method and program product for backing up data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177111A (en) * 2013-03-29 2013-06-26 西安理工大学 System and method for deleting repeating data
CN104123300A (en) * 2013-04-26 2014-10-29 上海云人信息科技有限公司 Data distributed storage system and method
CN103995855A (en) * 2014-05-14 2014-08-20 华为技术有限公司 Method and device for storing data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于再生码的云存储系统——Ustor;柳青 等;《通信学报》;20140430;第35卷(第4期);第166-173页 *

Also Published As

Publication number Publication date
CN104572987A (en) 2015-04-29

Similar Documents

Publication Publication Date Title
CN108540520B (en) Partial repeated code based locality repairing coding and node fault repairing method
CN110750382B (en) Minimum storage regeneration code coding method and system for improving data repair performance
Rashmi et al. A" hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers
CN106776112B (en) A kind of locality reparation coding method based on Pyramid code
WO2013191658A1 (en) System and methods for distributed data storage
CN106527993A (en) Mass file storage method and device for distributed type system
CN101820426A (en) Data compression method in on-line backup service software
CN104572987B (en) A kind of method and system that simple regeneration code storage efficiency is improved by compressing
CN103746774A (en) Error resilient coding method for high-efficiency data reading
CN109792256A (en) For the device and correlation technique coded and decoded to data to erasure codes
CN105703782B (en) A kind of network coding method and system based on incremental shift matrix
CN102843212B (en) Coding and decoding processing method and device
CN103428494A (en) Image sequence coding and recovering method based on cloud computing platform
WO2015180038A1 (en) Partial replica code construction method and device, and data recovery method therefor
US10892781B2 (en) Method and devices for a reduced repair and update erasure code
CN103559102A (en) Data redundancy processing method and device and distributed storage system
CN116501553B (en) Data recovery method, device, system, electronic equipment and storage medium
CN107357689A (en) The fault handling method and distributed memory system of a kind of memory node
CN107153661A (en) A kind of storage, read method and its device of the data based on HDFS systems
André et al. Archiving cold data in warehouses with clustered network coding
CN112799605A (en) Square part repeated code construction method, node repair method and capacity calculation method
CN113258936B (en) Dual coding construction method based on cyclic shift
CN108028666A (en) Data integrity detects and correction
CN109828723A (en) A kind of distributed memory system and its precise information restorative procedure and device
CN106911793B (en) I/O optimized distributed storage data repair method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant