CN103838649B

CN103838649B - Method for reducing calculation amount in binary coding storage system

Info

Publication number: CN103838649B
Application number: CN201410079582.7A
Authority: CN
Inventors: 蒋海波; 陈建中; 周星梅; 李娜; 王晓京; 肖宜龙; 李�范
Original assignee: Chengdu Institute of Biology of CAS
Current assignee: Zhongke Xinghe Shandong Intelligent Technology Co ltd
Priority date: 2014-03-06
Filing date: 2014-03-06
Publication date: 2017-04-12
Anticipated expiration: 2034-03-06
Also published as: CN103838649A

Abstract

The invention discloses a method for reducing calculation amount in a binary coding storage system. Compared with the prior art, the method has the advantages that coding process is optimized, and reducing of the calculation amount in the process of coding can be realized. When coding storage is performed on data in the storage system, an original calculation order of check data blocks can be changed according to characteristics of each row vector in a coding matrix so as to reduce the number of calculation times in the process of coding. The method is suitable for all binary matrixes, suitable for any related processes based on the binary matrixes for calculation and suitable for the process for performing data reconstruction on lost data blocks by utilizing binary check matrixes when the data blocks are lost, and has value for popularization and application.

Description

A kind of method for reducing amount of calculation in binary coding storage system

Technical field

The present invention relates to a kind of method that utilization binary matrix is calculated related data, more particularly to a kind of reduction The method of amount of calculation in binary coding storage system.

Background technology

In recent years, it is per second per point with computer technology and extensive application of the related sensor technology in all trades and professions All producing the information that perceives the world, meanwhile, the Internet service of hundreds of millions of users at every moment all producing new data, The historical information for recording people's life simultaneously is also presented explosive growth.The rapid growth of data necessarily brings holding for storage device It is continuous to increase.Meanwhile, in order to meet the data storage requirement for increasingly extending, the architecture of data-storage system is also evolving With change, from traditional centralised storage to distributed storage, the new mass data storage such as cloud storage was have also appeared in recent years Pattern.The scale of storage system is also increasing, thus, how to ensure to reduce data redundancy in the case where data are highly reliable, And then hardware consumption is reduced, become the focus of attention of area information storage.

Different from traditional many backup policy, in the last few years, technos has developed one kind with coding redundancy strategy as core New storage system.Coding redundancy storage system can ensure with replication strategy provide identical system reliability while, The data redudancy of storage system can be greatly reduced, and then substantial amounts of hardware input and power consumption are saved for storage system. But, coding redundancy strategy is from unlike backup policy, and its management is complex, most importantly, data is being carried out During storage, coding need to be carried out to it and is calculated and then is produced redundant data.But, cataloged procedure needs the certain meter of consumption system Calculation amount, when system-computed performance it is relatively low, or system need other side use computing resource when, this can substantially reduce coding meter The speed of calculation, and then affect the storage speed and efficiency of system.Thus, how to reduce the amount of calculation encoded when file is stored always It is correcting and eleting codes memory technology focus of attention and difficult point.To solve this difficult problem, researcher proposes binary coded matrix Storage strategy, and in fact in the construction process of binary coded matrix, it is difficult to directly construction one kind both can guarantee that system was held Delete effect, and the binary coded matrix with minimum amount of calculation.Therefore, in actual binary encoder matrix construction process In, all it is to delete performance to meet the appearance of storage system, and do not consider it in an encoding process, if with minimum cataloged procedure Amount of calculation.Thus, how to search out a kind of method that can reduce binary coded matrix amount of calculation just particularly urgent.

The content of the invention

The purpose of the present invention is that and provide a kind of reduction in binary coding storage system in order to solve the above problems The method of amount of calculation.

The present invention is achieved through the following technical solutions above-mentioned purpose：

The present invention is comprised the following steps：

（1）If the binary system encoder matrix for arbitrarily being determined by " 0,1 " is G_r·m, G_r·m" 0,1 " composition of serving as reasons two is entered Matrix processed, the matrix is used to produce redundant data, and it can be embodied as：

（2）According to row vector l of binary coded matrix₁, l₂..., l_r·The number of " 1 " is determined according to the vector in m XOR calculation times required during check bit are calculated, and calculates any two vectors l_a, l_bBetween the digit that differs；

（3）If vector l_aMiddle element is k for the digit of " 1 ", then system carries out producing redundant data needs using the vector Carry out k-1 XOR operation.

Further, for whole encoder matrix G_r·mThe Optimizing Flow for carrying out encoding calculating to original document is as follows：

A:According to G in encoder matrix_r·mEach row vector in " 1 " number, to determine and calculate verification according to the row vector XOR number required for position, the number of " 1 " is marked with k in row vector, then calculated required for check bit using the row vector XOR time number is（k-1）M, wherein m are the size that each participates in the original data block that verification is calculated；

B:The number of the element identical bits position different from element relatively in encoder matrix between any two row vector, is designated as (e/d), wherein e represents element identical position number in two vectors；D represents the different position number of element in two vectors；

C:If a certain row vector l_iXOR number required for (1≤i≤rm) is less than or equal in step B not isotopic number D, then verification data block directly according to corresponding to the vector calculates the row, and the vector is designated as into l_j；

D:Using the vectorial l determined in step C_j, according to identical digit in step B and the ratio of not isotopic number, determine next Individual calculating row vector, when certain row vector l_kWith vectorial l_jIsotopic number is not less than identical digit, and l_kWith vectorial l_jNot isotopic number and its Remaining each vector is when isotopic number reaches minimum, then according to vectorial l_jThe verification data having calculated that is calculating by l_kIt is determined that school Test data；

E:If still not calculating check bit, according to computation rule in step D, with l_kBased on vector, under searching

One vector to be calculated, and return to step D；

F:Complete verification position calculating process whether is had determined that, check bit calculating process successively is if so, then preserved, if it is not, then Calculated according to original corresponding relation.

The beneficial effects of the present invention is：

The present invention compared with prior art, optimizes cataloged procedure, can realize the reduction of cataloged procedure amount of calculation.Depositing When storage system carries out code storage to data, original check number can be changed according in encoder matrix the characteristics of each row vector According to the calculating order of block, and then reduce the calculation times of cataloged procedure；Carried out to encoder matrix using method proposed by the present invention Optimization after calculating order, can store in a computer, in each calculating afterwards, can be according to the optimization after Rule is calculated；Cataloged procedure optimization method proposed by the present invention, can be applied to all binary matrixs, especially, should Method goes for any correlated process calculated based on binary matrix, the coding being applicable not only to during data storage Process, applies also for, when dropout of data block, the process of data reconstruction being carried out to losing data block using binary system check matrix, With the value promoted the use of.

Description of the drawings

Fig. 1 is (6,3,4) binary system vandermonde sytem matrix Stored Procedure schematic diagram；

Fig. 2 is that row vector correspondence calculates schematic diagram；

Fig. 3 is calculating process optimization schematic diagram.

Specific embodiment

Below in conjunction with the accompanying drawings the invention will be further described：

The present invention is comprised the following steps：

（2）According to row vector l of binary coded matrix₁, l₂..., l_r·mIn the number of " 1 " determine according to the vector XOR calculation times required during check bit are calculated, and calculates any two vectors l_a, l_bBetween the digit that differs；

One vector to be calculated, and return to step D；

Embodiment 1：Binary system of the present invention based on construction on " 0,1 " (6,3, to a certain wait to deposit by 4) vandermonde systematic code Storage file carries out the process of block encoding generation redundant data and is illustrated.Due to being (6,3,4) vandermonde in the present invention Systematic code, thus, in the present embodiment original is divided into into 9 microdata blocks.By file（Example image）Nine data point Block d_1,1, d_1,2, d_1,3, d_2,1, d_2,2, d_2,3, d_3,1, d_3,2, d_3,3Arrange in order, and with encoder matrix G in per 9 in a line The position of element is corresponding in turn to, according to (6,3,4) vandermonde systematic code can enough produce in addition from 9 original deblockings 9 verification data blocks：p_1,1, p_1,2, p_1,3, p_2,2, p_2,3, p_3,1, p_3,1, p_3,2, p_3,3.Wherein data chunk { d_1,1, d_1,2, d_1,3}； {d_2,1, d_2,2, d_2,3}；{d_3,1, d_3,2, d_3,3}；{p_1,1, p_1,2, p_1,3}；{p_2,1, p_2,2, p_2,3}；{p_3,1, p_3,2, p_3,3Respectively constitute One macrodata block, each independent storage section that each macrodata block will be respectively stored in system as a memory cell Point in.The generation amount of calculation of wherein 9 verification data blocks depend in generator matrix " 1 " element in each row vector number with And between each row vector " 1 " element overall distribution situation.0-1 distribution situations in every a line of G all determine a coding The generation rule of data block：Those the file data piecemeals of all values in certain a line of G corresponding to the element position of " 1 " are entered Row mould 2 adds up（' XOR ' between data block）, the result for obtaining is exactly the coded data block determined by the row, such as Fig. 2 institutes Show.Computing is carried out to data block using G matrix, its overall calculation schematic diagram is as shown in Figure 1.

To describe the calculating process of the cataloged procedure optimized algorithm on { 0,1 } symbol field in detail, the present invention is given and entered by two The low amount of calculation optimization process that system (6,3,4) vandermonde systematic code determines：

Then according to the encoder matrix G neutrons matrix V ' of (6,3,4) vandermonde systematic code on { 0,1 } symbol field then can determine that to Amount l₁, l₂..., l₉, then encoder matrix each vector relations table can determine that according to optimized algorithm step one.L in a table entry_a B (), wherein a are expressed as the order of the row vector of matrix V '；B is required when being expressed as directly generating check bit according to a row vectors The XOR number wanted.(e/d) e represents the number of identical bits element between any two vector in generator matrix V ' in；D represents generation The not number of isotopic element between any two vector in matrix V '.The first row vector in matrix V ' is encoded as described above is：[0 1 0 10010 1], i.e., it is represented by table：l₁(3).Then the row vector of encoder matrix second is represented by：l₂(4), then go Vectorial l₁In the 6th bit element with row vector l₂In the 6th bit element it is identical；Row vector l₁In the 7th bit element with row vector l₂In 7th bit element is identical, and remaining every element is all different, vectorial l₁With vectorial l₂Between identical bits element number be 2, different bits Plain number is 7, is denoted in the table as (2/7).Then coding vector relation table is as shown in table 1：

The coding vector relation table of table 1

As shown in table 1：If directly calculating verification data position by the vector in encoder matrix, by vectorial l₁It is determined that school Position is tested, average each verification data position needs 3 XORs, then by vectorial l₂It is determined that check bit, average each check number 4 XORs are needed according to position.As shown in Table 1, using vectorial l₃Directly calculate verification data position to be less than by other verifications Interdigit meets the check bit calculated corresponding to the vector, i.e. vector l₃Check bit P of corresponding generation_1,3Directly counted by the row vector Calculate and obtain, i.e.,：

In the same manner, l₁, l₇, l₈Corresponding check bit has its corresponding row vector to be directly over XOR generation：

Below by l₁, l₃, l₇, l₈Check block obtained by calculating is calculating the verification data corresponding to remaining each vector Block.As shown in Table 1, by l₁The identical bits element number vectorial with other understand with not isotopic element number, can be by l₁It is determined that school Test block to calculate by l₉It is determined that verification data block, but due to from vectorial l₃, l₆It is determined that verification data block calculating l₉Need Less XOR operation, thus, by vectorial l₁It is determined that verification data block cannot function as the verification data block that its complement vector determines Basic verification data block.In the same manner, l₃, l₇Can not be used as a certain basic verification data block for obtaining remaining verification data block.Due to By vectorial l₈The verification data block of acquisition averagely only needs three XORs to be obtained by vectorial l₄The verification data for being determined Block, therefore, it is possible to by l₈It is determined that check block calculate by l₄It is determined that verification data block：

Can calculate by l in the same manner₆It is determined that verification data block：

Search successively, can respectively by by l₄It is determined that verification data block obtain by l after three XORs₅Really Fixed verification data block：

l₆It is determined that verification data block may participate in calculating respectively by l₂, l₉It is determined that verification data block：

So far, encoder matrix determines that the verification data block of generation is all calculated and finishes.Its overall calculation flow process such as Fig. 3 institutes Show.

For the encoder matrix built using said method, if directly being calculated original piecemeal with original method, Obtaining complete verification block needs 38 XOR operations, if using set forth herein optimized calculation method, only need 26 XOR fortune Calculate, i.e., total operation times will save 31.57%.It is required after calculation optimization for the encoder matrix for generating 9 verification data blocks Amount of calculation be 26 XOR operations, then averagely generating each verification data block needs 26/9=2.89 XOR.I.e. thus Innovatory algorithm greatly reducing the amount of calculation calculated required for the process of verification data position, so as to greatly save the meter of CPU Calculation amount.

Claims

1. it is a kind of reduce binary coding storage system in amount of calculation method, it is characterised in that comprise the following steps：

(1) if the binary system encoder matrix for arbitrarily being determined by " 0,1 " is G_r·m, G_r·mThe binary system square of " 0,1 " composition of serving as reasons Battle array, the matrix is used to produce redundant data, and it can be embodied as：

G_{r \cdot m} = [\begin{matrix} l_{1} \\ l_{2} \\ . \\ . \\ . \\ l_{r \cdot m} \end{matrix}]

(2) according to row vector l of binary coded matrix₁, l₂..., l_r·mIn the number of " 1 " to determine and calculate school according to the vector Required XOR calculation times when testing, and calculate any two vectors l_a, l_bBetween the digit that differs；

(3) if vector l_aMiddle element is k for the digit of " 1 ", then system carries out generation redundant data using the vector needs to carry out k- 1 XOR operation；

For whole encoder matrix G_r·mThe Optimizing Flow for carrying out encoding calculating to original document is as follows：

A:According to G in encoder matrix_r·mEach row vector in " 1 " number, to determine and calculate check bit institute according to the row vector XOR number of needs, the number of " 1 " is marked with k in row vector, then calculate the XOR required for check bit using the row vector Number of times is (k-1) m, and wherein m is the size that each participates in the original data block that verification is calculated；

B:The number of the element identical bits position different from element relatively in encoder matrix between any two row vector, is designated as (e/ D), wherein e represents element identical position number in two vectors；D represents the different position number of element in two vectors；

C:If a certain row vector l_iXOR number required for (1≤i≤rm) is less than or equal in step B not isotopic number d, then directly The verification data block according to corresponding to the vector calculates the row is connect, and the vector is designated as into l_j；

D:Using the vectorial l determined in step C_j, according to the ratio of identical digit in step B and not isotopic number, it is determined that next calculate Row vector, when certain row vector l_kWith vectorial l_jIsotopic number is not less than identical digit, and l_kWith vectorial l_jNot isotopic number with remaining each Vector is when isotopic number reaches minimum, then according to vectorial l_jThe verification data having calculated that is calculating by l_kIt is determined that verification data；

E:If still not calculating check bit, according to computation rule in step D, with l_kBased on vector, find it is next it is to be calculated to Amount, and return to step D；

F:Whether have determined that complete verification position calculating process, if so, then preserve check bit calculating process successively, if it is not, then according to Original corresponding relation is calculated.