CN102843212B

CN102843212B - Coding and decoding processing method and device

Info

Publication number: CN102843212B
Application number: CN201210275397.6A
Authority: CN
Inventors: 孙崎; 迟恩宇
Original assignee: Nanjing ZTE New Software Co Ltd
Current assignee: ZTE Corp
Priority date: 2012-08-03
Filing date: 2012-08-03
Publication date: 2016-10-26
Anticipated expiration: 2032-08-03
Also published as: CN102843212A; WO2014019549A1

Abstract

The invention provides a kind of coding and decoding processing method and device, the method includes: treats codec data and carries out higher dimensional formats process, wherein, multidimensional at least two dimension；In a predetermined sequence, the at least two dimension in the every dimension treating codec data after processing higher dimensional formatsization carries out reed-solomon RS correcting and eleting codes encoding and decoding and processes, pass through the present invention, solve and prior art correlation technique exists when allowing anti-more corrupted data, need to increase amount of calculation, and the problem affecting codec rate and performance, and then reached on the premise of not reducing original encoding rate and memory space utilization rate, system survivability is dramatically increased, improves coding rate and the effect of decoding speed simultaneously.

Description

Encoding and decoding processing method and device

Technical Field

The present invention relates to the field of communications, and in particular, to a method and an apparatus for encoding and decoding.

Background

The cloud storage refers to a system which integrates a large number of storage devices of various types in a network through application software to cooperatively work through functions such as cluster application, a grid technology or a distributed file system and provides data storage and service access functions to the outside. In a cloud computing environment, files are typically sharded across multiple cloud storage servers. During data communication, data to be communicated is also divided into a plurality of fragments, and the fragments are transmitted to the other party one by one.

In data storage and communication, to solve the problem of reliability, a Reed-Solomon (RS) Erasure Code (EC) technology is generally adopted, and after a file is encoded, the file is divided into m fragments and n check fragments with the same size, and the m fragments and the n check fragments are respectively stored or communicated. For a file storage or communication receiver, the original file or data can be restored by decoding as long as any m fragments are obtained, so that the damage or loss of the n fragments can be prevented, and the reliability of the system is greatly improved. For computer file storage, the utilization rate of the storage space of the erasure code system is m/(m + n) which is far higher than that of a copy storage mode, so that the RS erasure codes exchange the storage capacity through the computing capacity, and the storage cost and the operation and maintenance cost are obviously reduced.

In 1960, Reed (i.s.reed) and Solomon (g.solomon) proposed a method of constructing an erasure code, and an erasure code using this method was called a Reed-Solomon code, RS code for short. The erasure code constructed based on the RS coding technique is called RS erasure code. An (n, k) erasure code encodes k source data into n (n > k) data, so that the original k source data can be reconstructed by any k data in the n data. The erasure code system using m fragments and n check fragments is the (m + n, m) erasure code.

Reed-Solomon codes mainly comprise codes generated based on Vandermond matrices, called Vandermond codes, and codes generated based on Cauchy matrices, called Cauchy codes. Their operation is based on the finite field, the Galois field. The values of m and n can be set arbitrarily when the method is implemented, so that high storage utilization rate is obtained.

However, both the van der waals matrix and cauchy matrix RS erasure code systems have a common disadvantage, namely, a large amount of calculation and low encoding and decoding speeds. According to the existing public mathematical knowledge, the calculated amount and the time complexity of the two RS erasure codes are both O (m ^2) during encoding and decoding, and Gaussian is adopted when the inverse matrix of the generated matrix is solved, if the elimination method is the optimal algorithm, the calculated amount and the time complexity are O (m ^3), and if k redundant blocks are used during decoding, the calculated amount of the decoding algorithm is O (mk). For a file of length L, the decoding algorithm operand is O (lk). The decoding speed is proportional to the used redundant block k, and therefore, in actual use, the used redundant block value cannot be too large. The number m of the fragments in the current commercial system is generally not more than 10, and the number n of the check fragments is generally not more than 6. In order to better apply the RS erasure code system in the field of computer communication, special hardware is generally adopted to realize the coding and decoding functions and improve the coding and decoding speed.

On the other hand, in a cloud storage system using a civil low-cost hard disk and a P2P dynamic storage-communication environment, it is desirable to resist more data corruption without affecting the encoding rate and decoding performance, that is, the check fragment n is required to be sufficiently large, and the performance of the used RS erasure coding algorithm is not degraded. In this mode, simply raising the value of n is not feasible, resulting in a rapid increase in the number of calculations, leading to a reduction in performance to an impractical level.

Therefore, there are problems in the related art that the amount of calculation needs to be increased when more data corruption resistance is allowed, and the codec rate and performance are affected.

Disclosure of Invention

The invention provides a coding and decoding processing method and a coding and decoding processing device, which at least solve the problems that in the prior art, when more data damage is allowed to resist, the calculation amount needs to be increased, and the coding and decoding rate and performance are influenced.

According to an aspect of the present invention, there is provided a codec processing method, including: carrying out multidimensional formatting processing on data to be coded and decoded, wherein the multidimensional at least is two-dimensional; and according to a preset sequence, performing Reed-Solomon RS erasure coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing.

Preferably, the multidimensional formatting processing on the data to be coded and decoded comprises: determining the size of data blocks for formatting the data to be coded; under the condition of executing coding processing, performing complementary segmentation processing on the data to be coded according to the determined data block size; and in the case of executing the decoding processing, storing the data to be decoded into the corresponding position of the data block with the determined data block size for decoding processing.

Preferably, according to a predetermined sequence, the performing, by the RS erasure correcting coding and decoding process, at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting process includes: under the condition of executing coding processing, carrying out RS erasure code coding processing on each dimension of the data to be coded and decoded after multidimensional formatting processing according to the multidimensional step-by-step dimension removing mode; and under the condition of executing decoding processing, performing RS erasure code decoding processing on each dimension of the data to be coded and decoded after multidimensional formatting processing according to the mode of adding the dimension step by step.

Preferably, after performing the RS erasure correcting coding and decoding processing on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a predetermined order, the method further includes: storing data obtained after the RS erasure code processing is performed according to physical resources of a storage server; or, transmitting data obtained after the RS erasure correction processing is performed.

Preferably, storing, according to physical resources of a storage server, data obtained after the RS erasure coding process is performed includes: and storing part of check data in the data obtained after the RS erasure code processing on an independent storage node.

Preferably, when the multidimensional data is three-dimensional, the RS erasure coding and decoding process is performed on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting process by at least one of the following methods in a predetermined order: the same file access client FAC positioned in the storage server carries out RS erasure correcting coding and decoding processing on each dimensionality of the data to be coded and decoded after multidimensional formatting processing; after the FAC finishes the first-stage coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-stage coding and decoding corresponding to the second dimension and the third-stage coding and decoding corresponding to the third dimension on the data after the first-stage coding and decoding; after the FAC finishes the first-level coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-level coding and decoding corresponding to the second dimension on the data after the first-level coding and decoding, and the storage node finishes the third-level coding and decoding corresponding to the third dimension on the data after the second-level coding and decoding.

According to another aspect of the present invention, there is provided a codec processing apparatus including: the first processing module is used for carrying out multi-dimensional formatting processing on data to be coded and decoded, wherein the multi-dimension is at least two-dimension; and the second processing module is used for carrying out Reed-Solomon RS erasure coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a preset sequence.

Preferably, the first processing module comprises: a first determining unit, configured to determine a data block size for formatting the data to be encoded; the first processing unit is used for performing complementary segmentation processing on the data to be coded according to the determined data block size under the condition of executing coding processing; and in the case of executing the decoding processing, storing the data to be decoded into the corresponding position of the data block with the determined data block size for decoding processing.

Preferably, the second processing module includes: the second processing unit is used for performing the RS erasure code coding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing according to the multidimensional step-by-step dimensionality removing mode under the condition of executing the coding processing; a third processing unit, configured to, in a case of performing decoding processing, perform RS erasure decoding processing on each dimension of the to-be-coded and decoded data after multidimensional formatting processing in a multidimensional step-by-step dimension adding manner;

preferably, the second processing module comprises at least one of: according to a preset sequence, performing the RS erasure correcting code encoding and decoding processing on at least two dimensions in each dimension of the data to be encoded and decoded after the multidimensional formatting processing by at least one of the following modes: a fourth processing unit, configured to perform, when the multidimensional format is three-dimensional, the RS erasure coding and decoding processing on each dimension of the to-be-coded and decoded data after the multidimensional formatting processing through a same file access client FAC located in the storage server; a fifth processing unit, configured to, when the multi-dimension is three-dimensional, complete a first-level codec corresponding to the first dimension through the FAC, where a computing node in the storage domain completes a second-level codec corresponding to the second dimension and a third-level codec corresponding to the third dimension on data subjected to the first-level codec; and the sixth processing unit is used for completing the first-stage coding and decoding corresponding to the first dimension through the FAC under the condition that the multi-dimension is three-dimensional, completing the second-stage coding and decoding corresponding to the second dimension by the computing node in the storage domain on the data after the first-stage coding and decoding, and completing the third-stage coding and decoding corresponding to the third dimension by the storage node on the data after the second-stage coding and decoding.

According to the invention, the data to be coded and decoded is subjected to multidimensional formatting treatment, wherein the multidimensional at least is two-dimensional; according to the preset sequence, Reed-Solomon RS erasure coding and decoding processing is carried out on at least two dimensions in each dimension of the data to be coded and decoded after multidimensional formatting processing, the problems that in the prior art, when more data damage is allowed to resist, the calculated amount needs to be increased, and the coding and decoding rate and performance are influenced are solved, and the effects of greatly increasing the fault-tolerant capability of the system and simultaneously improving the coding speed and the decoding speed are achieved on the premise of not reducing the original coding rate and the utilization rate of the storage space.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a flowchart of a codec processing method according to an embodiment of the present invention;

fig. 2 is a block diagram of a codec processing apparatus according to an embodiment of the present invention;

fig. 3 is a block diagram of a preferred structure of the first processing module 22 in the decoding processing apparatus according to the embodiment of the present invention;

fig. 4 is a block diagram one of a preferred structure of the second processing module 24 in the decoding processing apparatus according to the embodiment of the present invention;

fig. 5 is a block diagram of a preferred structure of the second processing module 24 in the decoding processing apparatus according to the embodiment of the present invention;

fig. 6 is a schematic structural diagram of a multi-stage coding and decoding system based on RS erasure codes according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In this embodiment, a codec processing method is provided, and fig. 1 is a flowchart of the codec processing method according to the embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

step S102, carrying out multi-dimensional formatting treatment on data to be coded and decoded, wherein the multi-dimension is at least two-dimension;

step S104, according to a preset sequence, at least two dimensions of the data to be coded and decoded after the multidimensional formatting processing are subjected to Reed-Solomon RS erasure coding and decoding processing.

Through the steps, the data to be coded and decoded are processed in a grading way, compared with the data to be coded and decoded directly in the related technology, the computing capacity and the storage capacity of a storage system or a distributed communication system can be easily utilized, and the grading processing enables the grades to be mutually independent, so that the problems that in the related technology in the prior art, when more data damage is allowed to resist, the computing amount needs to be increased, and the coding and decoding speed and the performance are influenced are solved, the fault-tolerant capability of the system is greatly increased on the premise of not reducing the original coding rate and the utilization rate of the storage space, and the coding speed and the decoding speed are improved.

When the data to be coded and decoded is subjected to multidimensional formatting, firstly, the data block size for formatting the data to be coded can be determined according to physical resources, namely, the data block size is taken as a unit for processing during the subsequent coding and decoding; when encoding and decoding are performed, the operations performed are different, for example, in the case of performing encoding, segmentation processing is performed on the data to be encoded according to the determined data block size, for example, if the data to be encoded is formatted and the segmented data block size is a b c, when the data to be encoded is less than the size, the data to be encoded is completed by zero padding, and when the data to be encoded is greater than the size, the encoding is performed by segmenting the data to be encoded into the size; in the case of performing the decoding process, the data to be decoded is stored in the corresponding position of the data block of the determined data block size for the decoding process, which is also explained by the above example: for the stored data to be decoded, reading the data block size a × b × c from the existing position, and when the data to be decoded is larger than the size, the data to be decoded in the data part can be preferentially selected; in addition, for the received data to be decoded, the data to be decoded can be put into the corresponding position of the logical data block with the data block size of a × b × c according to the number for decoding processing.

It should be noted that, according to a predetermined sequence, RS erasure coding is performed on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting processing, and different sequences are executed according to different coding and decoding processing, for example, in the case of executing coding processing, RS erasure coding processing is performed on each dimension of the data to be coded and decoded after the multidimensional formatting processing in a multidimensional step-by-step dimensionality removal manner; and under the condition of executing the decoding processing, performing RS erasure code decoding processing on each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a multidimensional step-by-step dimension adding mode.

Preferably, after performing RS erasure coding and decoding processing on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a predetermined order, the method further includes: storing data obtained after the RS erasure code processing is performed according to physical resources of a storage server; or, data obtained after the RS erasure correction processing is transmitted. In addition, partial verification data in the data obtained after the RS erasure code processing can also be stored on a separate storage node.

As for the main body for performing encoding and decoding, there are various ways, and the following description is given by taking the above-mentioned multidimensional as a three-dimensional example, and when the data to be encoded and decoded is normalized to three-dimensional, the RS erasure coding and decoding process may be performed for each dimension of the data to be encoded and decoded after the multidimensional formatting process by at least one of the following ways in a predetermined order: (1) the same file access client FAC positioned in the storage server carries out RS erasure correcting coding and decoding processing on each dimensionality of the data to be coded and decoded after multidimensional formatting processing; (2) after the FAC finishes the first-level coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-level coding and decoding corresponding to the second dimension and the third-level coding and decoding corresponding to the third dimension on the data after the first-level coding and decoding; (3) after the FAC finishes the first-level coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-level coding and decoding corresponding to the second dimension on the data after the first-level coding and decoding, and the storage node finishes the third-level coding and decoding corresponding to the third dimension on the data after the second-level coding and decoding. The various processing modes can be flexibly selected according to specific requirements.

In this embodiment, a coding and decoding processing apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted here. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram of a codec processing apparatus according to an embodiment of the present invention, and as shown in fig. 2, the apparatus includes a first processing module 22 and a second processing module 24, which will be described below.

The first processing module 22 is configured to perform multidimensional formatting on data to be encoded and decoded, where the multidimensional data is at least two-dimensional; and a second processing module 24, connected to the first processing module 22, configured to perform reed-solomon RS erasure coding and decoding processing on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a predetermined order.

Fig. 3 is a block diagram of a preferred structure of the first processing module 22 in the decoding processing apparatus according to the embodiment of the present invention, and as shown in fig. 3, the first processing module 22 includes a first determining unit 32 and a first processing unit 34, and the first processing module 22 is explained below.

A first determining unit 32, configured to determine a data block size for formatting data to be encoded; a first processing unit 34, connected to the first determining unit 32, for performing a complementary division process on the data to be encoded according to the determined data block size when the encoding process is performed; and in the case of executing the decoding processing, storing the data to be decoded into the corresponding position of the data block with the determined data block size for the decoding processing.

Fig. 4 is a first block diagram of a preferred structure of the second processing module 24 in the decoding processing apparatus according to the embodiment of the present invention, and as shown in fig. 4, the second processing module 24 includes a second processing unit 42 and a third processing unit 44, and the second processing module 24 is explained below.

A second processing unit 42, configured to, in a case of performing encoding processing, perform RS erasure coding processing on at least two dimensions of each dimension of the data to be encoded and decoded after multidimensional formatting processing in a multidimensional step-by-step dimensionality reduction manner; a third processing unit 44, connected to the second processing unit 42, configured to perform, in a multidimensional step-by-step dimension adding manner, RS erasure decoding processing on each dimension of the data to be coded and decoded after the multidimensional formatting processing in the case of performing decoding processing;

fig. 5 is a block diagram of a preferred structure of the second processing module 24 in the decoding processing apparatus according to the embodiment of the present invention, and as shown in fig. 5, the second processing module 24 includes at least one of the following: a fourth processing unit 52, a fifth processing unit 54, and a sixth processing unit 56, which will be described below with reference to the second processing module 24.

A fourth processing unit, configured to, when the multidimensional data is three-dimensional, perform, by using the same file access client FAC located in the storage server, the RS erasure coding and decoding processing on at least two dimensions of each dimension of the to-be-coded and decoded data after the multidimensional formatting processing; a fifth processing unit, configured to, when the multi-dimension is three-dimensional, complete the first-level encoding and decoding corresponding to the first dimension through the FAC, where the computing node in the storage domain completes, by using the FAC, encoding and decoding of the second-level encoding and decoding corresponding to the second dimension and the third-level encoding and decoding corresponding to the third dimension on the data after the first-level encoding and decoding; and the sixth processing unit is used for completing the first-stage coding and decoding corresponding to the first dimension through the FAC under the condition that the multi-dimension is three-dimensional, completing the second-stage coding and decoding corresponding to the second dimension by the computing node in the storage domain on the data after the first-stage coding and decoding, and completing the third-stage coding and decoding corresponding to the third dimension by the storage node on the data after the second-stage coding and decoding.

In this embodiment, a multi-stage coding and decoding method based on RS erasure codes is provided when storing or communicating files, and the method is used to implement multi-stage coding and decoding of data or files. In the RS erasure code-based multi-level coding and decoding method, multi-dimensional formatting, such as two-dimensional or three-dimensional formatting, is performed on coded data, then RS erasure code coding operation is sequentially performed on each dimension according to requirements to form multi-dimensional data groups and corresponding verification blocks, preferably, the multi-dimensional data groups and the corresponding verification blocks can be stored according to the physical resource condition of cloud storage, and therefore part of verification data can be stored on separate storage nodes. In the present embodiment, three-stage encoding and decoding are taken as an example for explanation.

Step S1, encoding: the three-level coding in the present embodiment includes the following steps:

(1) three-dimensional formatting to-be-encoded data

The number mi of the fragments of the three-level coding parameters and the number ni of the verification fragments, i =1, 2, 3, can be determined according to the physical resource condition of the cloud storage. Wherein m1+ n1 is the total number of disks in a single storage server, m2+ n2 is the number of storage servers in a single cabinet, and m3+ n3 is the number of cabinets in the same storage domain.

The data to be encoded is formatted in three dimensions, i.e. in the manner of m1m2m3, into blocks of data that logically appear as cuboids. If the data is less than m1m2m3, the last part is filled by zero padding; if the file is large, the file is cut into a plurality of logical cuboids according to the partitioning of m1m2m3, wherein the length is m1, the width is m2, and the height is m 3.

(2) Plane coding

Data is partitioned into planar dimensions, and RS (m3+ n3, n3) encoding is performed on m1 × m2 data of each layer. After encoding, the height of the data block is changed from m3 to m3+ n3, i.e. an additional n3 layers of check data are generated.

(3) Column coding

RS (m2+ n2, n2) encoding is performed on each column data in m3 rows in the column direction for each layer data of the data of m3+ n3 layers. After encoding, the number of rows per layer of the data block is changed from m2 to m2+ n2, i.e., an additional n2 rows of check data are generated.

(4) Line coding

And partitioning each layer of the m3+ n3 layer data into blocks, wherein each layer has m2+ n2 rows, and RS (m1+ n1, n1) coding is carried out on each row according to the row direction. After encoding, the data becomes m1+ n1 data per line, and the final data block size becomes (m1+ n1) ((m 2+ n2) ((m 3+ n 3).

For the above three-level coding, the same File Access Client (FAC for short) stored in the cloud may be used to perform the coding, or the FAC may perform the coding of the first level, and then send the coded data to the computing nodes in each cabinet in the storage domain, and the computing nodes perform the processing of the second level and the third level. The computing nodes can independently complete the second-level and third-level coding or only complete the second-level coding, then send the data to the storage nodes in rows, and then the storage nodes carry out the third-level coding, and store the data in each disk of the storage nodes.

(5) Data storage or communication

For the cloud storage mode, the coded data blocks m3+ n3 are transmitted to each cabinet in the same storage domain. For the (m1+ n1) × (m2+ n2) data received by each cabinet, it is stored in rows to each storage server in the cabinet. For each storage server, each row of m1+ n1 data is stored to m1+ n1 disks.

And for the data communication mode, numbering all the coded data blocks in sequence and sending the data blocks to the opposite side one by one.

Step S2: and (3) decoding: the three-stage decoding in the present embodiment includes the steps of:

(1) fetching data to be decoded

For decoding of cloud storage, when a file access client FAC needs decoding, data is fetched according to numbers from m3+ n3 cabinets, storage servers of each cabinet m2+ m2, and m1+ n1 disks of each storage server according to metadata information of the data or files. At least m1 data are fetched for each row of data on each storage server, and if more than m1 data can be selected, the top m1 numbered data are preferentially selected (i.e., the original data are selected as much as possible). For each cabinet, at least m2 rows of complete data are taken, the total number of m2m 1 data is selected, and the data of the first m2 columns are preferentially selected; for multiple cabinets, at least the full m1m2 data of m3 cabinets are returned, with data of the first m3 cabinets being preferred.

For the data communication mode, all the received data blocks are decoded by numbering them into corresponding positions in a logical (m1+ n1) × (m2+ n2) × (m3+ n3) rectangular block.

(2) Line decoding

For each received cube of (m1+ n1) × (m2+ n2) × (m3+ n3), the data of each row is restored by first decoding in row rows RS (m1+ n1, m 1). If the row receives less than m1, the row fails to decode.

(3) Column decoding

For each plane, at least the complete data of any m2 rows needs to be recovered by row decoding to perform row decoding. For each plane, if there are m2 rows of complete data, then for each column of data for these rows, all the original data in the plane can be obtained by RS (m2+ n2, m 2) decoding. If less than m2 rows of complete data are obtained, the column decoding fails and the original data in the plane cannot be obtained.

(4) Plane decoding

For the data of m3+ n3 planes, at least the data of m3 planes need to be recovered by column decoding for plane decoding. If there are m3 planes of data, all the initial m3 planes of data, i.e. all the initial data, can be obtained by decoding (m3+ n3, m 3).

For the above three-level coding system example, in the cloud storage, the three-level decoding process may be performed by the same file access client FAC in all cloud storage, or may be performed by the storage cabinet computing node by the second three-level decoding (i.e., row decoding and column decoding), and then performed by the FAC by the first-level decoding (plane decoding), or may be performed by the storage cabinet storage node by the third-level decoding (row decoding), performed by the computing node by the second-level decoding (column decoding), and performed by the FAC by the first-level decoding (plane decoding). In practical applications, several stages of encoding and decoding processes are specifically used, depending on the actual operating environment. For example, if only two-stage coding is needed, only simple division of data into m 3-layer planes is needed during FAC coding, and the second-stage coding is performed by row coding and does not need column coding.

For the RS erasure code based multi-stage codec system adopted in the embodiment of the present invention, the coding rate and storage space utilization P3= m1m2m 3/(m1+ n1) (m2+ n2) (m3+ n 3). Can accommodate n3 cabinet failures, n2 storage node failures per cabinet, and n1 disk failures per node. For any total number of disks that can accommodate a defect, S3min = (n1+1) (n2+1) (n3+1) -1, i.e., at least any S3min block disk defect. At most, the damaged disc S3max, S3max = (m1+ n1) (m2+ n2) (m3+ n3) -m1m2m3 was accommodated.

In case of a two-stage codec system, its coding rate and storage space utilization P2= m1 × m2/(m1+ n1) (m2+ n2), is resistant to at least S2min block disc damage and at most S2max block disc damage, where S2min = (n1+1) (n2+1) -1, S2max = (m1+ n1) (m2+ n2) -m1m 2. For a k-level codec system, Pk = m1.. mk/(m1+ n 1.) (mk + nk), Skmin = (n1+ 1.) (nk +1) -1, Skmax = (m1+ n1). · (mk + nk) -m1.. mk.

For a typical cloud storage system, a cabinet has 16 2U storage servers, 12 SATA hard disks per storage server, or 8 4U storage servers, 24 SATA hard disks per storage server, and 4 additional compute nodes per cabinet. A storage domain may include 8-16 cabinets. If a three-level codec system is used, m3=7, n3=1, m2=7, n2=1, m1=21, n1=3, the code rate P3=7 × 21/8 × 8 × 24=66.99% can be calculated to accommodate at least any 2 × 4-1=15 disk defects, up to 8 × 24-7 × 21=507 disk defects. If a two-level coding system is adopted, 8 racks and 8 storage servers are combined to form a second level, m1=21, n1=3, m2=61, and n2=3, then the coding rate P2=61 × 21/64 × 24=83.40%, which is at least resistant to 4 × 4-1=15 disk damages and at most resistant to 64 × 24-61 × 21=255 disk damages.

For the three-level storage system in the above example, if RS erasure codes are all adopted, according to the existing knowledge, if k1, k2 and k3 blocks of check data are used in the three-level decoding respectively, the file decoding calculation amount is O (k1), O (k2) and O (k 3). For the above example, the total decoding computation is O (1) + O (1) + O (3) in the case of at least 15 disc losses, which is comparable to the decoding computation O (5) in the case of a single-stage RS codec. In the above example using the two-stage RS codec, the total decoding calculation amount is O (3) + O (3) in the case of resisting at least 15 disc losses, which is equivalent to the decoding calculation amount O (6) in the case of the single-stage RS codec. Therefore, the decoding speed can be greatly improved by adopting a multi-stage storage system. Conversely, in a single-stage RS erasure coding and decoding system, if any 15 blocks of disc loss are to be supported and the coding rate of 83.40% is to be satisfied, RS (90,15) coding and decoding are required, and the performance of solving the inverse matrix and the decoding performance are very low during decoding.

For the RS erasure code-based multi-stage encoding and decoding method provided in the embodiment of the present invention, since the multi-stage encoding and decoding system can be combined with the cloud storage hardware, a distributed cluster encoding and decoding can be formed, and the computing capabilities of the computing nodes and the storage nodes in the storage cabinet are fully utilized. When encoding, only third-level RS (m3+ n3, m3) encoding is carried out by using FAC, after m3+ n3 layer data is generated after the encoding is completed, namely encoding success is returned to an application, and then the computing nodes and the storage nodes in the storage cabinet carry out second-level and first-level encoding again. Thus, in application, only RS (m3+ n3, m3) coding is needed to return the coding success, and only the content of n3/(m3+ n3) needs to be additionally coded. During decoding, the computing nodes and the storage nodes in the storage cabinet complete the second-level and first-level decoding operations, generate a complete data plane, and then transmit the complete data plane to the FAC for the third-level decoding. The storage node decodes only one-half m3 × m2 data of the whole file, and the computing node decodes only one-half m3 data of the whole file, so the decoding operation time of the second stage and the first stage is far shorter than that of the first stage decoding performed by the FAC. Thus, in the distributed multi-stage codec system, from the application point of view, the coding time is RS (m3+ n3, m3) coding time, and the decoding time is slightly larger than RS (m3+ n3, m3), i.e. slightly larger than O (n 3).

For the two-stage coding and decoding system, the storage cabinet computing node completes the first stage coding and decoding, and the FAC completes the second stage coding and decoding. In application, the encoding time is RS (m2+ n2, m 2) encoding time, and the decoding time is slightly larger than RS (m2+ n2, m 2), i.e. slightly larger than O (n 2). For encoding and decoding in a data communication mode, the multi-stage encoding and decoding system idea provided by the embodiment of the invention can also be adopted, namely, the third-stage FAC decomposes encoding and decoding tasks, and sends the second-stage and first-stage encoding and decoding tasks to other computers in a distributed system for processing, so that a more efficient distributed multi-stage encoding and decoding system is formed, and more data damage or loss can be prevented.

In the actual use process, the coded data of each plane can be stored in a separate storage cabinet. And m3+ n3 layers of data are stored in m3+ n3 storage cabinets. Wherein the redundant data of the n3 layer planes are stored in a single n3 storage cabinets. For the same storage cabinet, m2+ n2 rows of data are stored in m2+ n2 storage servers respectively, wherein the check data of n2 rows are stored in n2 storage servers. When the system is in a read-only and write-free mode and the error quantity of the system disk is small, the check data storage cabinet and the check storage server on the rack can be closed or run at a reduced speed, so that the operation cost of the system is saved.

For the RS erasure code-based multi-stage coding and decoding system provided in the embodiment of the present invention, the RS coding and decoding algorithm of each stage is independent, and therefore, the following variants can be adopted to further improve the performance: for example: (1) reducing the check number n of the FAC coding and decoding level, and reducing the value of n2 for a secondary coding and decoding system; for a three-level codec system, the value of n3 is reduced. So that the application-perceived codec time can be reduced accordingly. For another example, the FAC codec stage employs a more efficient codec algorithm, and since the application-aware codec time is mainly determined by the FAC stage, the RS algorithm employed in FAC can employ a more efficient algorithm. For example, at the FAC stage, the corresponding check fragment number n3 or n2 is set to 1 according to a two-stage codec or a three-stage codec scheme, so that the original RS erasure coding algorithm can be replaced by an XOR type XOR algorithm. The XOR algorithm is simple and efficient to realize, and the coding and decoding efficiency of application perception can be further improved.

The RS erasure code-based multi-stage coding and decoding systems provided in the foregoing embodiments and preferred embodiments can be implemented by software, and perform multi-stage coding and decoding on data by using mutually independent multi-stage RS erasure code systems, and encode and decode data or files by using groups as units. Data is also redundantly backed up from group to group. The embodiment of the invention does not require the number of the data to be grouped to be equal to the number of the data in the group, can be combined at will, preferably, the number of the groups and the number of the data in the group can correspond to the domain division condition of the cloud storage physical equipment, and is convenient for the management and the realization of a cloud storage system. By adopting the multi-stage coding and decoding system provided by the embodiment of the invention, the computing capacity and the storage capacity of a cloud storage system or a distributed communication system can be easily utilized to form a distributed coding and decoding system, so that the coding and decoding performance is effectively improved. In addition, the multi-stage coding and decoding system provided by the embodiment of the invention has no limitation on the types of erasure codes, and the coding and decoding systems at all stages are independently realized and do not influence each other. Therefore, an XOR type optimized coding and decoding algorithm can be adopted according to the user performance requirement, and the coding and decoding performance is further improved. Compared with the original single-stage RS coding and decoding system, the invention can greatly increase the fault-tolerant capability of the system and simultaneously improve the coding speed and the decoding speed on the premise of not reducing the original coding rate and the utilization rate of the storage space, and is very suitable for the scenes of a cloud storage system and a P2P dynamic storage system which use cheap civil-grade disks. In addition, by using the multi-stage coding and decoding method provided by the embodiment of the invention, the check data can be concentrated on certain cabinets and racks stored in the cloud. When the system is in a read-only and write-free mode and the error quantity of the system disk is small, the verification storage servers on the verification data cabinet and the rack can be closed or run at a reduced speed, so that the system operation cost is saved.

The following describes embodiments of the present invention with reference to the drawings.

Fig. 6 is a schematic structural diagram of a multi-stage coding and decoding system based on RS erasure codes according to an embodiment of the present invention, and as shown in fig. 6, the system processes data by using three-stage coding and decoding.

In the encoding and decoding system using three-level RS erasure codes, the three-level codes are RS (m1+ n1, m 1), RS (m2+ n2, m 2), and RS (m3+ n3, m3) erasure codes, respectively. The data to be encoded is logically partitioned according to m1m2m3, each row is m1, each column is m2 original data, and the total number of the data is m3 layers. If the file end block is less than m1m2m3, 0 can be filled to meet m1m2m3 division, or the file end can be stored separately in a copy mode.

When the file is coded, the first level of plane coding is firstly carried out, and RS (m3+ n3, m3) coding is used. And after the encoding is finished, generating n3 layers of check data planes, wherein each check data plane contains m1m2 check data, and each check data is calculated by each data in m3 layers of original data planes according to RS erasure codes. The first level of plane coding is performed by FAC. If the FAC is only used to complete the first level encoding, an encoding success message may be returned to the application upon completion.

A second level of column encoding is then performed, using RS (m2+ n2, m 2) encoding. For m1 × m2 raw data on each plane, m1 individual columns may be composed. RS (m2+ n2, m 2) encoding is performed for each column, and n2 check data are generated for each column. For the n3 layer check data generated by the first level plane coding, the second level column coding is also performed to generate check data of the second level. After the second level column encoding, each layer of plane data will generate additional n2 rows of data. The second-level coding can be completed by the FAC, or after the FAC completes the first-level coding, the data of each plane is sent to each storage cabinet computing node, and the second-level coding is completed by each storage cabinet computing node.

Finally, the third level row coding is carried out, and RS (m1+ n1, m 1) coding is used. For m1 × m2 raw data on each plane, m2 individual rows may be composed. RS (m1+ n1, m 1) encoding is performed for each row, and n1 check data are generated for each row. Similarly performing third-level row coding on the rows consisting of n2 extra check data generated by the second-level column coding; and for the n 3-layer check data generated by the first-level plane coding, after the second-level column coding is completed, the third-level row coding is also performed. After the third level of encoding is completed, the data block is changed from m1m2m3 to (m1+ n1) (m2+ n2) (m3+ n 3). The third-level row coding can be completed by the FAC or each storage cabinet computing node, or each row of data of each plane can be sent to each storage node of the storage cabinet and computed by each storage node. After the calculation is completed, m1+ n1 data of each row are stored on m1+ n1 independent disks of the storage node. Thus, the data of each data plane layer is stored in different storage cabinets; each row of data of the same data plane is stored in different storage nodes under the same cabinet; and each data in each row of data is stored on different disks under the same storage node.

The decoding process and the encoding process of the file are just opposite. Firstly, third-level row decoding is carried out, and the decoding of the row at the current level can be finished as long as m1 disk data are obtained on each storage node, so that original data of the row are obtained; when the second-stage column decoding is carried out, the decoding of the current-stage column can be finished as long as m2 rows finish the row decoding, and the original data of the current plane is obtained; when the first-level plane decoding is performed, all the original data can be obtained as long as there are m3 planes of data.

Under the multi-stage RS encoding and decoding system provided by the embodiment of the invention, the data damage or power failure of at most n3 cabinets can be prevented. For each enclosure, a maximum of n2 storage nodes are protected against corruption or power loss. For each storage node, a maximum of n1 disk corruptions are resisted. If the cabinet and the storage node are normal, the whole system is resistant to any (n1+1) (n2+1) (n3+1) -1 disk damage. Taking m3=7, n3=1, m2=7, n2=1, m1=21, and n1=3, it can be calculated that the entire system has 8 × 24=1536 disks, and the code rate P =7 × 21/8 × 8 × 24=66.99%, and can accommodate at least any 15 disk defects, at most 8 × 24-7 × 21=507 disk defects, that is, at most 33% of the disk defects. For the three-level coding and decoding system of the embodiment, the three-level coding and decoding calculated amount is O (1), O (1) and Q (3), the total calculated amount is equal to O (5) of the one-level coding and decoding system, but the disk damage resistance amount is increased from 5 to 15.

If a two-level coding system is adopted, 8 racks and 8 storage servers are combined, 64 storage servers are combined into a second level, the whole system has 64 × 24=1536 disks, m1=21, n1=3, m2=61, and n2=3, then the coding rate P2=61 × 21/64 × 24=83.40%, and the system can resist at least 4 × 4-1=15 disk damages and at most 64 × 24-61 × 21=255 disk damages, namely, can accommodate at most 16.6% of the disk damages. Compared with a three-level coding system, on the premise of resisting 15 disks to be damaged, the coding rate is improved, and the maximum disk damage resistance quantity is reduced. For the second-level codec system of this example, the second-level codec calculation amounts are O (3) and O (3), respectively, and the total calculation amount is equal to O (6) of the first-level codec system, but the number of disk damage resistances increases from 6 to 15.

Therefore, through the above embodiment and the preferred embodiment, not only the coding and decoding rate is greatly improved, but also the fault-tolerant capability of the system is greatly improved.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A coding/decoding processing method, comprising:

carrying out multidimensional formatting processing on data to be coded and decoded, wherein the multidimensional at least is two-dimensional;

and according to a preset sequence, performing Reed-Solomon RS erasure coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing.

2. The method of claim 1, wherein performing multidimensional formatting on the data to be encoded and decoded comprises:

determining the size of data blocks for formatting the data to be coded;

under the condition of executing coding processing, performing complementary segmentation processing on the data to be coded according to the determined data block size; and in the case of executing the decoding processing, storing the data to be decoded into the corresponding position of the data block with the determined data block size for decoding processing.

3. The method of claim 1, wherein performing the RS erasure coding and decoding process on at least two dimensions of each dimension of the data to be coded and decoded after the multidimensional formatting process in a predetermined order comprises:

under the condition of executing coding processing, carrying out RS erasure code coding processing on each dimension of the data to be coded and decoded after multidimensional formatting processing according to the multidimensional step-by-step dimension removing mode;

and under the condition of executing decoding processing, performing RS erasure code decoding processing on each dimension of the data to be coded and decoded after multidimensional formatting processing according to the mode of adding the dimension step by step.

4. The method according to claim 1, wherein after performing the RS erasure coding and decoding process for at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting process in a predetermined order, further comprising:

storing data obtained after the RS erasure code processing is performed according to physical resources of a storage server; or,

and sending the data obtained after the RS erasure code processing is carried out.

5. The method of claim 4, wherein storing the data obtained after the RS erasure coding process according to physical resources of a storage server comprises: and storing part of check data in the data obtained after the RS erasure code processing on an independent storage node.

6. The method according to any one of claims 1 to 5, wherein, in the case where the multi-dimension is three-dimensional, the RS erasure coding and decoding process is performed on at least two dimensions in each dimension of the data to be coded and decoded after the multi-dimensional formatting process by at least one of:

the same file access client FAC positioned in the storage server performs the RS erasure correcting code coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing;

after the FAC finishes the first-stage coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-stage coding and decoding corresponding to the second dimension and the third-stage coding and decoding corresponding to the third dimension on the data after the first-stage coding and decoding;

after the FAC finishes the first-level coding and decoding corresponding to the first dimension, the computing node in the storage domain finishes the second-level coding and decoding corresponding to the second dimension on the data after the first-level coding and decoding, and the storage node finishes the third-level coding and decoding corresponding to the third dimension on the data after the second-level coding and decoding.

7. An encoding/decoding processing apparatus, comprising:

the first processing module is used for carrying out multi-dimensional formatting processing on data to be coded and decoded, wherein the multi-dimension is at least two-dimension;

and the second processing module is used for carrying out Reed-Solomon RS erasure coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing according to a preset sequence.

8. The apparatus of claim 7, wherein the first processing module comprises:

a first determining unit, configured to determine a data block size for formatting the data to be encoded;

the first processing unit is used for performing complementary segmentation processing on the data to be coded according to the determined data block size under the condition of executing coding processing; and in the case of executing the decoding processing, storing the data to be decoded into the corresponding position of the data block with the determined data block size for decoding processing.

9. The apparatus of claim 7, wherein the second processing module comprises:

the second processing unit is used for performing the RS erasure code coding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing according to the multidimensional step-by-step dimensionality removing mode under the condition of executing the coding processing;

a third processing unit, configured to, in a case of performing decoding processing, perform, in a multi-dimensional step-by-step dimension adding manner, the RS erasure code decoding processing on at least two dimensions of each dimension of the to-be-coded and decoded data after the multi-dimensional formatting processing;

10. the apparatus of any of claims 7 to 9, wherein the second processing module comprises at least one of: according to a preset sequence, performing RS erasure correction coding and decoding processing on at least two dimensions in each dimension of the data to be coded and decoded after the multidimensional formatting processing in the following mode:

a fourth processing unit, configured to perform, when the multidimensional format is three-dimensional, the RS erasure coding and decoding processing on each dimension of the to-be-coded and decoded data after the multidimensional formatting processing through a same file access client FAC located in the storage server;

a fifth processing unit, configured to, when the multi-dimension is three-dimensional, complete a first-level codec corresponding to the first dimension through the FAC, where a computing node in the storage domain completes a second-level codec corresponding to the second dimension and a third-level codec corresponding to the third dimension on data subjected to the first-level codec;

and the sixth processing unit is used for completing the first-stage coding and decoding corresponding to the first dimension through the FAC under the condition that the multi-dimension is three-dimensional, completing the second-stage coding and decoding corresponding to the second dimension by the computing node in the storage domain on the data after the first-stage coding and decoding, and completing the third-stage coding and decoding corresponding to the third dimension by the storage node on the data after the second-stage coding and decoding.