CN107153588A - data encoding storage method - Google Patents
data encoding storage method Download PDFInfo
- Publication number
- CN107153588A CN107153588A CN201710331789.2A CN201710331789A CN107153588A CN 107153588 A CN107153588 A CN 107153588A CN 201710331789 A CN201710331789 A CN 201710331789A CN 107153588 A CN107153588 A CN 107153588A
- Authority
- CN
- China
- Prior art keywords
- file
- data
- block
- node
- piecemeal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1453—Management of the data involved in backup or backup restore using de-duplication of the data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a kind of data encoding storage method, this method includes:Piecemeal is encapsulated after file data is encoded;By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;Mirror image subset extension is carried out according to the demand of memory capacity.The present invention proposes a kind of data encoding storage method, realizes data recovery using video memory node collection internal network bandwidth and computing capability as few as possible, while data high availability is realized, improves scalability.
Description
Technical field
The present invention relates to Video processing, more particularly to a kind of data encoding storage method.
Background technology
With continuing to develop for information technology, data are increasingly becoming valuable source in people's daily life.Explosive growth
Data necessarily bring continuing to increase for storage device.At present, the memory node of the modern data center under data storage environment
Scale tens of thousands of at most hundreds of thousands, but in huge storage environment system at least, memory node it is abnormal or fail into
For a kind of universal phenomenon;At the same time, the data caused by network access device or the other components of memory node can not be visited
Ask or Loss also happens occasionally.For video coding and storage, the few encoding and decoding complexity of amount of calculation is lost with data
How to carry out data recovery using minimum data volume during mistake all has local time response, such as the storage center network bandwidth
Factor, CPU computing capability factors, can be to the storage time of file when video file is stored using coding redundancy strategy
Performance is impacted.If having high speed bandwidth and high performance computing capability in system, the file of video storage cell size is just
The shorter time can be consumed.And data redundancy minimum in the higher reliability and system electric energy less with system consumption has
Global time response, this will equipment cost that directly decision systems are consumed, management cost and energy consumption cost.In order to meet day
The data storage requirement of benefit extension, the reliability that people store to video data, the correlation properties such as availability propose higher
It is required that, how to realize that the low redundancy high-reliability storage of data has become the huge challenge that industry faces.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of data encoding storage method, including:
Data encoding storage method, for carrying out video data storage, it is characterised in that including:
Piecemeal is encapsulated after file data is encoded;
By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;
Mirror image subset extension is carried out according to the demand of memory capacity.
Preferably, the video data storage systems include coordination service device, when memory node is added, by the money controlled oneself
Source list is supplied to coordination service device.
Preferably, the hashed value of file to be uploaded is calculated, and the value is uploaded into coordination service device, coordination service device is coordinated
Each memory node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document.
Preferably, when not detecting same Hash, built-in terminal receives this document, and to the block sort meter of file
Hash is calculated, and distributed storage is into the node of mirror image subset.
Preferably, the hashed value for calculating file to be uploaded, in addition to:File is divided into each piecemeal, and calculated
The SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature;By the characteristic signature of each file with
File path and other relevant informations constitute metadata and are placed on together in internal memory, and the signature of its each piecemeal is placed on disk
In, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing number
After recovery, verification contrast is carried out.
Preferably, the positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of data encoding storage method, and video memory node collection internal network is utilized as few as possible
Bandwidth and computing capability realize data recovery, while data high availability is realized, improve scalability.
Brief description of the drawings
Fig. 1 is the flow chart of data encoding storage method according to embodiments of the present invention.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention
State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right
Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of data encoding storage method.Fig. 1 is data according to embodiments of the present invention
Coding and storing method flow chart.
The video data storage systems of the present invention employ memory node subset expanding policy.Adopted on system software structure
With new data recovery mode, while utilizing the computing capability of built-in terminal so that system is as few as possible to utilize storage section
Point set internal network bandwidth and computing capability lose the restoration and reconstruction of data to realize.The recovery funtion part of data block will be lost
Move to built-in terminal.The encoded rear piecemeal of the data of single file is encapsulated in video data storage systems, uniformly
Ground is stored on the different nodes of mirror image subset, and the data that system is provided in an expansible storage volume, volume use level mesh
Directory structures tissue, supports the concurrent access of multimachine multi-process.System is set to carry out mirror image subset expansion according to the demand of memory capacity
Exhibition, the purpose extended on demand is reached using each mirror image subset nodes storage capacity.
Consistent code storage view is formed between mirror image subset one unified single File Mapping of formation, each subset.
The different piecemeals for same file of each subset memory storage node storage, different piecemeals and the storage of system maintenance identical file
Mapping relations between node.A dendrogram with hierarchical structure is would be combined between each mirror image subset, to set up storage
Mapping relations between file set and cluster tool.Meanwhile, each memory node opens up one section of single memory space, is used for
The data storage of system specific use, it is to avoid the situation for causing the memory node storage file catalogue in mirror image subset inconsistent.
The memory node of each in system independently safeguard subset storage resource and file metadata in itself, and text can be separately provided
Part piecemeal reading service.When disk failures in memory node, node where disk recovers to data block, file access pattern section
Point will dispatch distributed storage can meet the minimum verification piecemeal that reconstruction is required in the subsets, be rebuild to losing data block.
If there are multiple memory nodes generation exceptions, file server presses the amount of calculation required for calculating recovery All Files
According to the equally loaded principle of amount of calculation, task distribution is carried out to recovery nodes.When occurring node exception in check information subset,
System will dispatch source file and carry out secondary coding, and be deployed in again on the node newly added.
Designed between memory node using peering structure, when memory node adds video data storage systems, by what is controlled oneself
The Resources list is supplied to coordination service device, any one group interior nodes both can as file block requestor, can also make
For the supplier of file block, video data storage systems determine whether to start according to data storage amount and storage system utilization rate
Next mirror image subset.In either image subset Sk(A;B memory node) is divided into file block storage section in (k=1,2,3...)
Point and coding checkout piecemeal memory node.File block memory node configuration information mirror image subset Sk(A), its interior joint aK, i∈Sk
(A) (k, i belong to positive integer), the piecemeal for storing original;And coding checkout piecemeal memory node bK, i(k, i belong to just whole
Number) constitute check information mirror image subset S (B).
In order to reduce the data redudancy of whole system internal file, present system framework deletes file-level repeated data
Except the system of introducing.And perform repeated data in data source location and disappear superfluous strategy.During built-in terminal running software, SHA is utilized
AES calculates the hashed value of file to be uploaded, and the value is uploaded into coordination service device, and coordination service device is coordinated each and deposited
Storage node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document, and notifies insertion
Formula terminal data has been stored, when not detecting same Hash, and built-in terminal receives this document, and to the block sort of file
Hash is calculated, and distributed storage is into the node of mirror image subset.
The identical data detection of file-level can detect the same file of different filenames, can also detect difference
Same file under catalogue.File is divided into each piecemeal by system, and calculates the SHA values of each piecemeal, by whole file
Hashed value as this document characteristic signature.System is by the characteristic signature of each file and file path and other relevant informations
Metadata is constituted to be placed in internal memory together, and the signature of its each piecemeal is placed in disk, only when system has node abnormal,
Just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing after data recovery, verification contrast is carried out.To add
The positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table by the lookup of fast piecemeal, system
In.
When system, node occur abnormal, and when loss of data and built-in terminal need to read data, built-in terminal leads to
The former deblocking of download part and part verification data block are crossed, is rebuild by way of reconstruction in built-in terminal and loses data
Block, after the completion of recovery, built-in terminal is while using file, by the deblocking reconstructed, by data verification,
Resend to storage system.
The characteristics of all there is same directory information for data memory node in each mirror image subset, first by mirror image subset
The uniform piecemeal of meta file store into each memory node, when built-in terminal needs meta file information, mirror image subset
Each memory node will inquire about the metadata of this node storage.The inquiry of metadata is so converted into each memory node difference
Inquiry to metadata sub-block.
When built-in terminal sends file storage request, file server is according to subset Sk(A;B) the operation of interior joint
Situation, produces metadata, and to subset Sk(A;B a certain idle node in) sends instruction, and built-in terminal will be direct and is somebody's turn to do
Memory node carries out data interaction.The node carries out piecemeal to file and encoded, and produces verification data, and encapsulate lattice according to data
Formula is packaged.The original piecemeal of file will be sent to subset Sk(A), verification data block distributed storage is to subset Sk(B) on.
Between all memory nodes that the data block and check block of so each file are crossed in subset, each node there is identical file to deposit
Store up view.As mirror image subset Sk(A;B the memory space in) will be using when finishing, and system starts next mirror image subset Sk+1(A;
B).To increase the utilization rate of cluster storage system computing resource, subset S1(B), S2(B) ... Sk+1(B) interior joint also can be literary
Part server is used for distributing carries out coding calculating to follow-up file to be stored.
When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server
Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response,
Required filename and byte offset are first converted into the index of file by built-in terminal, are sent to memory node and include file
The request of name and index, directly sets up file and reads interoperability with target storage node set.Built-in terminal is to file mirrors
The memory node of each in subset sends a request, data field in the file block and block that request is specified.If downloading mirror image
Subset Sk(B) storage check block, then obtain original using the mode of data reconstruction in.
When built-in terminal sends file storage request, and when in storage system without identical data, coordination service device according to
Subset Sk(B) running situation of interior joint produces metadata, and to subset Sk(B) a certain idle node in sends instruction, embedded
Formula terminal is directly and the memory node carries out file interaction.The node carries out piecemeal to file and encoded, and produces verification data, and
It is packaged, while calculating the hashed value of each fileinfo piecemeal, and hashed value is sent to according to data encapsulation format
Coordination service device.The deblocking of storage file will be sent to subset Sk(A), verification data block distributed storage is to subset Sk(B)
On.There is identical text between all memory nodes that the data block and check block of so each file are crossed in subset, each node
Part stores view.As mirror image subset Sk(A;B the memory space in) will be using when finishing, and system starts next mirror image subset
Sk+1(A;B).
When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server
Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response,
Required filename and byte offset are first converted into the index of file by user, are sent to memory node and include filename and rope
The request drawn, directly sets up file and reads interoperability with target storage node set.Coordination service device is returned according to Query Result
The index of blocks of files, including the collection group subset where file and the position of data block.Built-in terminal is to file mirrors subset
In each memory node send a request, ask data field in the file block specified and block.Built-in terminal will be by suitable
Sequence restructuring file block obtains original.
Coordination service device is responsible for each in distribution cluster store tasks and metadata management, coordination service device record storage cluster
Nodal information, the subset division information of memory node, system storage file directory information, the mapping relation information from file to block
And the hashed value of each file block, while being also responsible for determining to losing strategy when blocks of files is recovered, and recover to appoint
Business distribution, the migration management of file block.The mode that system broadcasts combination using regular heartbeat and event realizes system node shape
The monitoring of state, when new node occurs in system, the node, which will be given the Information Communication of this node by way of broadcast, to be coordinated
Server and each memory node.Meanwhile, each memory node is periodically reported from existing to its corresponding file management nodes
State, if corresponding file management nodes are interior for a period of time do not receive heartbeat if think that the memory node is abnormal.
Abnormal nodes are in recovery process, the connection first between detection and remaining node, when the section for recovering file
When point has difference larger network connection with remaining node, test data bag is sent to each link and carries out data path survey
Comment, and be ranked up, and calculate the minimum blocks of files number p of required recovery, so as to from p optimal network connection, obtain
Blocks of files is taken, file is recovered.If node k is responsible for recovering the blocks of files that file f is lost, node k connects from it
P node connecing is sent after all files block read requests, blocks of files needed for node k is obtained, using system coding method by source
After file access pattern, according to mark and the node lost and blocks of files, the coding method used using system carries out two to source file
Secondary coding, obtains the redundant file block lost, again according to Document encapsulation agreement, and the file block size according to each loss is carried out
Reseal, and by the blocks of files f recovered again l blocks of files, be uniformly stored in sequence in respectively on memory node.Respectively
Individual node opens up a special subregion and temporarily stores the data block being reconstructed out, after abnormal nodes are replaced, reconstructs
Data will place path according to the data block of coordination service device and unify to be placed.
Use demand of the present invention based on built-in terminal carries out losing data block recovery.Using being dispersed in built-in terminal
Substantial amounts of computing resource participates in losing the mode that the reconstruction of blocks of files concentrates reconstruction to be combined with cluster internal, to realize abnormal section
The reconstruction of file block on point.
If original is k piecemeal, n-k verification data block of encoded rear generation then arbitrarily takes from this n data block
Going out k piecemeal just can reconstruct original.When cluster storage system is run, to rebuilding critical parameters k (1<k<N-m) set
Put.As abnormal memory node k in clusterf<During k, cluster manager dual system not organization internal node to the number on abnormal memory node
Recovered according to block, but required data block is recovered using user.User is when reading a certain file, it is necessary to same
When download remaining n-k in clusterfIndividual original deblocking and kfIndividual verification data block, and codeword information is downloaded, reconstruct
Abnormal kfIndividual data block, and by the data block reconstructed and the n-k that has downloadedfIndividual data block, is spliced into original.Meanwhile, it is embedding
Enter the k that formula terminal-pair is recoveredfThe form that individual data block is encapsulated according to data block in group system is packaged again, and is uploaded
Onto server cluster.File server will calculate the data block hashed value, and with the hashed value of the former data block stored
It is compared, if identical, stores the piecemeal, refuses the upload request of the deblocking if different.When in computer cluster
When abnormal node number exceedes the reconstruction critical parameters value k set, coordination service device is by according to the file block not being resumed
Data volume size and cluster interior nodes running situation determine recovery policy in cluster.Coordination service device will calculate and rebuild all residues
Recovery nodes, according to the equally loaded principle of amount of calculation, are carried out task distribution by the amount of calculation required for file block.And will be extensive
Multiple file block is deployed on cluster memory storage node again.
When realizing, built-in terminal calculates the Hash functional value for having reconstructed data block first, and hashed value is uploaded to
Coordination service device, due to having stored the hashed value storehouse stored when initial file is stored in coordination service device, with storehouse
In the hashed value of loss file be compared, if it find that the hashed value for the data block that built-in terminal is reconstructed is with losing number
It is identical according to the hashed value of block, then allow built-in terminal to upload the data block, if the data block that built-in terminal is reconstructed is scattered
Train value is different from the hashed value for losing data block, illustrates that the data block rebuild is incorrect, or the data have been maliciously tampered,
The data block is not received.For significant data, when data block, which is uploaded, to be finished, internal system is needed to having uploaded the number finished
Secondary detection is carried out according to block.Management node calculates the hashed value of the data block again, and compares again with the hashed value of former data block
It is right, it is detected whether in upload procedure, by malicious attack or is distorted.
When system is lost without original piecemeal, built-in terminal directly downloads original piecemeal to realize the reading of file
Take.When there is network congestion, built-in terminal is obtained in the way of rebuilding to be read than directly downloading the more preferable file of former data block
Take performance.If certain file is M, information node number is k, and check-node number is r.If certain available number of moment information node
It is m according to reading ratea, and the available data downloading speed of check-node is mb, then mb>ma.If built-in terminal rebuilds M/k
The speed of data block is mdIf havingSet up, then source file is carried out using adaptive mode
Obtain.Built-in terminal sends file read request, and detects and each meshed network signal intelligence.It is connected when with each node
When situation is identical, then select directly to read fileinfo with former data block.When the company for finding to have with the node of the former data block of storage
When connecing poor, calculate and the time t that reconstruction mode obtains file is carried out by download part verification data blockr, while calculating from even
The time t0 that poor node directly reads data block is connect, works as tr<t0When, then file is obtained by way of reconstruction.Work as tr>t0
When, then with common downloading mode.
The feature read for files in stream media inside, the present invention carries out uniform piecemeal to media file first, then right
Piecemeal carries out verification and calculated to obtain check block, while t blocks of files is replicated before streaming media files, and is respectively stored into and deposits
On each node of accumulation.System is for backup piecemeal using the pattern individually managed, and memory node will be used in disk individually
The Backup Data for the space data storage piecemeal opened up.Count certain file is read number of times x, if being read within the unit interval
Number of times is more than a certain setting value y, then the data block number of this document keeps the state replicated and coding redundancy coexists.If the unit interval
Inside it is read number of times and is less than a certain setting value z, then system removes all duplication piecemeals of this document.
Node in system is further divided into active storage node and dormancy memory node.The task of active node is storage
New file, and undertake reading task of the user to internal system data.Preferably, by the memory node of storage file information
Subset Sn(A) active node is set to, allows its disk to be active, to meet the data read request of mass users, is deposited
Store up the memory node subset S of verification datan(B) memory node is set to static node in, request is only directed to part cloth
On storage storage node.System carries out distributed query when repeated data hashed value is inquired about using the node of the part.Meanwhile,
Reading frequency of the statistics to file, high-frequency data are transferred to active node by file manager, visiting frequency very little
Data will be transferred to dormancy memory node.
Logically, if storage system have n memory node, storage system need reach entangle delete performance be system can
To allow that it is abnormal that any r memory node occurs.Then when built-in terminal proposes file storage request, system is first to file
Piecemeal is carried out, the number of piecemeal is k=n-r.And ReedSolomon encoder matrix G are utilized, produce r verification piecemeal.And utilize
The original piecemeal of k node storage file, remaining r node is used to store by the verification data piecemeal with producing after G operation.
Its detailed process is:
Step one:When system is connected to file storage request, system directly carries out piecemeal to file, is divided into m × k file
Block, if file size directly can not be divided exactly by m × k, Plus "0" is added in end of file.Using in each row vector in generator matrix
" 0 ", the rule of " 1 " corresponding placement configurations, is directly transported vectorial and m × k data blocks that are being partitioned into encoder matrix G
Calculate, to obtain verification data block.
Step 2:If the piecemeal of original D=(D1, D2... Dk)TRepresent, by DiReferred to as macro block.DiBy m microlith group
Into, and for DiIn m data block (dI, 1, dI, 2…dI, m)TReferred to as microlith group.If the verification macro block group P=(P of generation1,
P2... Pr)TRepresent, each of which verification macro block PiIn comprising m verify microlith.The collection of original document block and check block is shared
E=(D1, D2... Dk|P1, P2... Pr)TRepresent.Then:GD=E.
M × k of whole file is represented by according to piecemeal:d1,1, d1,2…d1, m ...,dK, 1, dK, 2…dK, m.Original document
Each verification macro block P of piecemeal generationiIn comprising m verification microlith, then verification microlith be expressed as:p1,1, p1,2…p1, m ...,
pR, 1, pR, 2…pR, m。
ReedSolomon encoder matrixs G is expressed as G=[I, V ']T.Wherein I is m × m unit matrix, and V ' is (m × r)
× (m × k) matrix.Microlith pI, jGenerating process be:By m × k of file to be stored according to piecemeal d1,1, d1,2…d1, m ...,
dK, 1, dK, 2…dK, mArrange in order, and and matrix V ' in mk element on (i-1) m+j rows position it is right successively
Should.0-1 distribution situations on (i-1) m+j rows decide verification microlith pI, jGeneration rule:By on (i-1) m+j rows
All values for " 1 " element position corresponding to those file data piecemeals carry out the accumulating operation of mould 2, obtained result be exactly by
The verification microlith that the row is determined.In this way, the submatrix V ' in matrix G, which has altogether, can produce the rm school for original
Test microlith p1,1, p1,2..., p1, m..., PR, 1, PR, 2..., PR, m, you can to produce t verification macro block.Unit matrix I generations
Data block is the original piecemeal of file.Get up in order these original document piecemeals be exactly original by direct splicing.
Then code optimization is carried out for binary coded matrix.Continue encoder matrix being expressed as first:
G=[Ik×m, Gr×m]TWherein:GR, m=[l1, i, l2, i... lR × m, i]T
According to the row vector l of generation check bit1, i, l2, i... lR × m, iIn " 1 " number determine according to the vector calculate
Required XOR calculation times during check bit.And calculate any two vectors lA, j, lB, jBetween the digit that differs.Below according to
Above parameter determines check bit calculation optimization method.Its Optimizing Flow is as follows:
1. according to the number of " 1 " in each row vector in encoder matrix, determine and check bit institute is calculated according to the row vector
The XOR number of times needed;
2. comparing the number of the element identical bits position different from element in encoder matrix between any two row vector, it is designated as
(e/d), wherein e represents element identical position number in two vectors;D represents the different position number of element in two vectors;
If 3. row vector li(1<i<Rm the XOR number of times required for) is less than or equal in step 2 not isotopic number d, then directly
The verification data block according to corresponding to the vector calculates the row is connect, and the vector is designated as lj;
4. utilize the vectorial l determined in step 3j, according to identical digit in step 2 and not the ratio between isotopic number, determine next
Individual calculating row vector.As certain row vector lkWith vectorial ljIsotopic number is not less than identical digit, and lkWith vectorial ljNot isotopic number and its
Each remaining vector is not when isotopic number reaches minimum, then according to vectorial ljThe verification data that has calculated that is calculated by lkThe school of determination
Test data;
If not calculating check bit 5. still having, according to the computation rule in step 4, with lkBased on vector, find it is next
Vector to be calculated.
6. complete verification position calculating process is determined whether, if so, check bit calculating process successively is then preserved, if it is not, then
Calculated according to original corresponding relation.
For detailed description this method, it is assumed that data storage block D1, D2... DrNode there is abnormal, then built-in terminal
The detailed process for obtaining original is as follows:
Step 1:According to encoder matrix G=[I, V ']TDirectly obtain check matrix H=[V ' T, Im·r]TFor to loss
Data block is rebuild.
Step 2:From the memory node of normal work, any k memory node of selection downloads k according to block Dr+1,
Dr+2... Dk, Dk+l... Dk+r-1, Dk+r。
Step 3:By the macro block D of loss1, D2... DrIt is expressed as X1, X2... Xr, make β=[X1, X2... Xr, Dr+1...
Dk+r-1, Dk+r], wherein βr=[X1, X2... Xr], βk=[Dr+1... Dk+r-1, Dk+r].That is β=[βr, βk].Then according to relation β
H(k+r)r=0 reconstructs the data block of loss.
Step 4:If matrix H(k+r)rIn be expressed as H ' with losing the corresponding vector matrix of data blockr·r, matrix H(k+r)rIn
Vector matrix corresponding with health data block is expressed as H "k.r;Then have:
βl×r·H’r·r=βl×k·H”k·r
Wherein βl×rIt is unknown, the data block β of lossl×rLoss data block can be solved according to above formula, i.e.,:
βl×r=βl×k·H”k·r(H’r·r)-1
Data block [the X obtained1, X2... Xr] be loss data block [D1, D2... Dr]。
Step 5:By data block [D1, D2... Dr] with the data block D that does not lose in systemr+1, Dr+2..., Dk, according to successively
Sequential combination is [D1, D2... Dk], then the data block combinations are original.
In storage system network bandwidth constrained environment, if relatively low safeguards bandwidth to realize the reliable of loss data
Property recover.Then use the loss data block optimized reconstruction method below based on check matrix.Select the minimum reconstruction band of needs
Wide recovery matrix H(k+r)m·rmMethod.It is specific as follows:
1. check matrix H is calculated first(k+r)m·rmEach column vector in element " 1 " number.
2. from check matrix H(k+r)m·rmIn extract lose data block corresponding to row vector, constitute matrix Hr’m·rm, then
H(k+r)m·rmIn remaining row vector constitute matrix H(k+r-r’)m·rm, the vector of its lower end rm constitutes a unit matrix.Top
It is expressed as H(k-r’)m·rm。
3. H is determined successively(k-r’)m·rmThe number of element " 0 " in middle row vector, be more than when the number of " 0 " in the row vector or
During equal to r ' m, the column vector where each " 0 " element of record;And further look for whether to deposit in identified column vector
It is more than or equal to r ' m row vector in " 0 " element number, if nothing, column vector determined by record previous step.If so, then
Determine new column vector.Circulated with this, and record column vector determined by circulation every time.
4. after chaining search is finished, respectively according to the number of " 1 " in each group of column vector, determine " 1 " element and be
Minimum r ' m column vectors, and determine corresponding H(r’·m)(r’·m)Order be that full rank, i.e. the submatrix order are r'm.
In further aspect of the invention, address reference table AIT is introduced to mirror image subset as the addressing dimension of extension.Ground
Location concordance list AIT is the metadata for describing addressing chained list ACT attributes, and ACT is divided into single addressable logic by AIT
Composition, can independently be accessed, and the video data storage systems of ternary dynamic structure have the ability that concurrent reading and concurrent writing is accessed.And
AIT pointer is pointing directly toward the target address location of ACT logical components, and random visit can quickly be realized by comparing without search
Ask.
Address reference table AIT is the AHT set of addressing item, i.e. AIT={ AHT1..., AHTm..., AHTM};
Wherein AHTmThere is an input item and a corresponding output item.Its input item is one of addressing variables value
Combination, output item is the corresponding data directory of the combination.
In the AIT of Video Storage System mirror image subset, each addressing item AHT input value is the addressing change of one group of data
Value, i.e. logical address LA, its output valve are an addressing chained list ACT corresponding with LA values pointer, a skew
Amount, a data length.The ACT pointers point to position of this group of data memory cell to be accessed in addressing chained list ACT;
Offset determines access initial address in the memory cell;Data length defines access profile;When the data length lacks
When saving or being 0, represent to access until the end of file.Then, the access for video data storage systems can be according to target
The logical address LA of the addressing variables combination of data, uniquely determines a position, from this in file metadata addressing chained list ACT
Play the data length that access is defined in read and write access memory node, AHT in position.
The address reference table AIT for accessing Video Storage System mirror image subset is realized using following steps:
1. retrieving metadata address concordance list AIT according to the addressing variables value of access target data, it is derived from one and seeks
Location chained list ACT pointers, an offset and a data length;
2. position of the memory cell to be read and write of this group of data in addressing chained list ACT is obtained by the addressing chain table pointer
Put, the read-write initial address in the memory cell to be read and write of this group of data is obtained by the offset, passes through the data length
Obtain read-write scope;The read-write operation of this group of data is carried out according to the position, the read-write initial address, the read-write scope;
3. when carrying out the read operation of data using multiple threads, or multiple threads carry out the write operation of data but not related to
And modification addressing chained list ACT pointers, offset and data length, then it is not related to and generates new addressing chained list, then each thread is each
From step (1) and (2) is performed, the concurrent reading and concurrent writing operation of multi-group data is achieved in;
4. when carrying out the write operation of data using multiple threads, it is related to modification addressing chained list ACT pointers or offset
Or data length, accessing step is as follows:
(4-1) is not related to modification addressing chained list ACT pointers, then will when being carried out the write operation of data using multiple threads
Data are from write storage unit in newly given offset location, when needing to update the data length, calculate new data length
And new offset and new data length are charged into the AHT output items;
(4-2) is related to modification addressing chained list ACT pointers, then visited when being carried out the write operation of data using multiple threads
Ask the flow into generation address reference table AIT.
In on the basis of the video data storage organization of the present invention, video requency frame data is estimated in coding side using quick motion
Meter, is briefly discussed below first:In addition to block motion search is carried out in the range of whole frame and range motion search is limited, simultaneously
The search of correspondence grand movement and correspondence small range motion search.Iterative search is used in extensive search mode, with last time
The initial point position that search result location is searched for as next time, when search result meets certain condition, i.e., when last time search result
With when search result is identical next time, small range search is carried out by starting point of its result position.Small range search result is made
For final result.
The function of block sort is added in coding side, the block of frame in is divided into by the present invention jumps over block and direct blocks.To jumping over block,
Motion vector is 0, and actual residual error only transmits olive formula information close to 0, not translatory movement vector sum residual information.Wherein
Judge to jump over block in the following manner:
Dm=ΣI, j ∈ blockm|X(i, j)-Y(i, j)|/N
X(i, j)Represent block position m in frame in pixel an i, j, Y(i, j)Represent the respective pixel in reference frame, N
Represent the number of pixels in block.As result DmDuring less than predetermined threshold value, the block is set to jump over block.Only delivery mode information is arrived
Decoding end.
When carrying out residual computations, reference frame is used as using decoded key frame.Use below decoded key frame
Generate side information.In frame in remaining piece, continue to determine to belong to the block of Direct Model, the residual error of the type block is close to 0, only
Delivery mode information and motion vector information.In order to reduce coding side complexity, quick moving mass searching algorithm can be used.
Grand movement search iteration number of times maximum is set as 4, small range searching times be 1, correspondence maximum transversal or
Fore-and-aft distance (0,7) or (7,0), correspondence fixed length code encoder bit rate are 3 bits.If a wide range of vector search successfully restrains, then
Obtained motion residuals and threshold value are compared, threshold value is identical with jumping over pattern thresholding, when no more than threshold value, really
It is set to direct module, it is necessary to decoding end delivery mode information and motion vector information.
When block and direct blocks relevant information are jumped in transmission, if both are merged into coding, zero motion vector (0,0) table
Show.In translatory movement vector information, block code or algorithm can be used, specific calculation process is as follows:
Step 1. is respectively adopted block code and index coding, takes the less value of code word size in K to motion vector information
For rate1;
Step 2. will jump over pattern and Direct Model merges into a class, and code check now is:
Mod e1=mod e (jumping over pattern) ∪ mod e (Direct Model)
Mod e2=mod e (general mode)
Rate2=ent (mod e1, mod e2) * code_length+2*num (mode (jumping over pattern))
Wherein, pattern information mod e () represent the corresponding pattern of type of block, and ent () calculates the entropy of corresponding informance,
Code_length is code word size to be encoded.
Step 3. total bitrate is two above code check sum.
Total_rate=ratel+rate2
In decoding end, obtain after corresponding pattern information and motion vector information, carried out to jumping over module and direct module
Rebuild.To jumping over module, directly the block of the same position of previous reference frame is regard as last reconstructed block.For direct blocks, profit
With motion vector, the block of corresponding motion compensation is regard as last reconstructed block.And block for remaining general mode, it is necessary to
Side information and residual information are generated in decoding end.
It is described to generate side information, including procedure below using decoded key frame:
Step 1. obtains initial motion vector.Parallel motion algorithm for estimating is used first, calculates parallel motion vector.Motion
What matching was searched for is expressed as:
(vx, vy)=argminmx,my(D(mx,my)*(1+0.05(mx2+my2)1/2)
(vx, vy)=± (vx/2,vy/2)
Wherein, x(i,j)Represent reference block pixel, y(i+mx,j+my)Represent another frame motion search block pixel.| | m | | it is 0 rank
Normal form, represents block m size.Last obtained motion estimation vector is the half that the first row calculates obtained motion vector.
Negated according to its original orientation of estimation or constant.
After the motion vector of front and rear frame correspondence position block is obtained, the two is converted into unidirectional motion vector, i.e.,
One of motion vector is negated.This two motion vectors are averaging, the initial motion of bi-directional motion search estimation is obtained
Vector.
Step 2. regard the average vector of the 1st step as initial vector for each block, it is assumed that the block is done in a short time
Linear uniform motion.That is motion vector of the block in front and rear frame is equal in magnitude, in the opposite direction.In the preset range of initial vector
Interior carry out bi-directional motion search, first centered on initial vector, sets hunting zone as -3 to 3, if two fortune of the first step
Moving vector difference is more than 5 in either direction, then direction hunting zone is expanded into -5 to 5.If real in this hunting zone
Border searching position number is less than thresholding, then continues centered on null vector, leads scope to search with -6 to 6 and scans for, takes the two
Minimum value is motion search result, and the result to motion search carries out residual computations, and calculation formula is as follows:
(vx, vy)=arg minmx,my D(mx,my)
X, y represent RELATED APPLICATIONS frame.
When the absolute residuals of calculating is with minimum is reached, the motion vector result of bi-directional motion estimation is obtained.Should with basis
Motion vector, obtains corresponding side Message Reference Block sideblock, residual error estimation block residentblock and residual information
resident。
Residide=min (D(mx,my))
Sideblock(i,j)=(x(i-vx,j-vy)+y(i+vx,j+vy))/2
residentblock(i,j)=(x(i-vx,j-vy)-y(i+vx,j+vy))/2
The result for the motion vector estimation that step 3. is obtained for step 2 is further processed.Work as motion-vector magnitude
During more than certain thresholding, two-way parallel motion estimation compensation is carried out.According to step 1 calculate obtain two-way parallel motion estimate to
Amount, obtains four motion compensation blocks, for four obtained motion compensation blocks, if it is respectively positioned within image display range, and
The distance between equidirectional motion vector is less than preset range, the corresponding side block of information of the block can be calculated by following formula and residual
Poor block:
Sideblock=(block1+block2+block3+block4)/4
Residentblock=(block1+block2-block3-block4)/4
Wherein block1And block2Belong to forerunner's frame of present frame, block3And block4Belong to subsequent frame.
Step 4. pair is in the block of movement edge, if its corresponding residual values is more than thresholding, is handled as follows:It is first
First, the two parallel motion estimate vectors obtained for the 1st step take any one motion vector, if in initial motion estimation
In the reverse direction, the position of the vectorial estimation has exceeded image boundary, and the estimation residual error of this direction motion vector is less than
The estimation residual error of opposite direction motion vector, then its corresponding motion compensation block mended with the parallel motion that is obtained on the estimation direction
Block is repaid to represent.In the case, compensated with unidirectional search.If the condition of the 1st step is unsatisfactory for, to another move to
Amount is handled.Obtained motion compensation block is weighted average.
In summary, the present invention proposes a kind of data encoding storage method, and video memory node is utilized as few as possible
Collect internal network bandwidth and computing capability to realize data recovery, while data high availability is realized, improve autgmentability
Energy.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted
Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored
Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's
Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention
Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing
Change example.
Claims (6)
1. a kind of data encoding storage method, for carrying out video data storage, it is characterised in that including:
Piecemeal is encapsulated after file data is encoded;
By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;
Mirror image subset extension is carried out according to the demand of memory capacity.
2. according to the method described in claim 1, it is characterised in that the video data storage systems include coordination service device,
When memory node is added, the Resources list controlled oneself is supplied to coordination service device.
3. method according to claim 2, further comprises:The hashed value of file to be uploaded is calculated, and the value is uploaded
To coordination service device, coordination service device coordinates each memory node and the value is inquired about, and when finding to exist the value, coordinates clothes
Business device updates the reference degree of this document.
4. method according to claim 3, further comprises:When not detecting same Hash, built-in terminal is received
This document, and hash is calculated to the block sort of file, and distributed storage is into the node of mirror image subset.
5. method according to claim 3, the hashed value of the calculating file to be uploaded, in addition to:File is divided into
Each piecemeal, and calculate the SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature;Will be each
The characteristic signature of file is placed in internal memory together with constituting metadata with file path and other relevant informations, and its each piecemeal
Signature be placed in disk, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, to be embedded in
Formula terminal-pair is lost after data recovery, carries out verification contrast.
6. method according to claim 5, it is characterised in that by the positional information and its hashed value and file of piecemeal
Block identification is uniformly stored in a table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710331789.2A CN107153588A (en) | 2017-05-12 | 2017-05-12 | data encoding storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710331789.2A CN107153588A (en) | 2017-05-12 | 2017-05-12 | data encoding storage method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107153588A true CN107153588A (en) | 2017-09-12 |
Family
ID=59792788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710331789.2A Pending CN107153588A (en) | 2017-05-12 | 2017-05-12 | data encoding storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107153588A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943867A (en) * | 2017-11-10 | 2018-04-20 | 中国电子科技集团公司第三十二研究所 | High-performance hierarchical storage system supporting heterogeneous storage |
CN109194725A (en) * | 2018-08-17 | 2019-01-11 | 杭州数梦工场科技有限公司 | A kind of transmission method of mirror image, device, equipment and computer storage medium |
CN109815292A (en) * | 2019-01-03 | 2019-05-28 | 广州中软信息技术有限公司 | A kind of concerning taxes data collection system based on asynchronous message mechanism |
CN110879807A (en) * | 2018-09-06 | 2020-03-13 | Sap欧洲公司 | File format for quickly and efficiently accessing data |
CN111427718A (en) * | 2019-12-10 | 2020-07-17 | 杭州海康威视数字技术股份有限公司 | File backup method, recovery method and device |
WO2020215951A1 (en) * | 2019-04-26 | 2020-10-29 | 深圳前海微众银行股份有限公司 | Encoding and decoding method and apparatus, computer device and storage medium |
CN112685232A (en) * | 2021-01-11 | 2021-04-20 | 河南大学 | Computer backup data monitoring method and system |
CN113051104A (en) * | 2021-03-11 | 2021-06-29 | 重庆紫光华山智安科技有限公司 | Method and related device for recovering data between disks based on erasure codes |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102752402A (en) * | 2012-07-20 | 2012-10-24 | 广东威创视讯科技股份有限公司 | Cloud storage method and cloud storage system |
CN103873507A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block uploading and storing system and method |
CN103916483A (en) * | 2014-04-28 | 2014-07-09 | 中国科学院成都生物研究所 | Self-adaptation data storage and reconstruction method for coding redundancy storage system |
US20140359054A1 (en) * | 2013-05-29 | 2014-12-04 | Microsoft Corporation | Distributed Storage Defense in a Cluster |
US20140355679A1 (en) * | 2012-01-20 | 2014-12-04 | Canon Kabushiki Kaisha | Method, apparatus and system for encoding and decoding the significance map for residual coefficients of a transform unit |
-
2017
- 2017-05-12 CN CN201710331789.2A patent/CN107153588A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140355679A1 (en) * | 2012-01-20 | 2014-12-04 | Canon Kabushiki Kaisha | Method, apparatus and system for encoding and decoding the significance map for residual coefficients of a transform unit |
CN102752402A (en) * | 2012-07-20 | 2012-10-24 | 广东威创视讯科技股份有限公司 | Cloud storage method and cloud storage system |
CN103873507A (en) * | 2012-12-12 | 2014-06-18 | 鸿富锦精密工业(深圳)有限公司 | Data block uploading and storing system and method |
US20140359054A1 (en) * | 2013-05-29 | 2014-12-04 | Microsoft Corporation | Distributed Storage Defense in a Cluster |
CN103916483A (en) * | 2014-04-28 | 2014-07-09 | 中国科学院成都生物研究所 | Self-adaptation data storage and reconstruction method for coding redundancy storage system |
Non-Patent Citations (2)
Title |
---|
刘仲: "基于对象存储结构的可伸缩集群存储系统研究", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 * |
蒋海波: "海量数据存储系统的高可靠性关键技术研究与应用", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107943867A (en) * | 2017-11-10 | 2018-04-20 | 中国电子科技集团公司第三十二研究所 | High-performance hierarchical storage system supporting heterogeneous storage |
CN107943867B (en) * | 2017-11-10 | 2021-11-23 | 中国电子科技集团公司第三十二研究所 | High-performance hierarchical storage system supporting heterogeneous storage |
CN109194725A (en) * | 2018-08-17 | 2019-01-11 | 杭州数梦工场科技有限公司 | A kind of transmission method of mirror image, device, equipment and computer storage medium |
CN110879807B (en) * | 2018-09-06 | 2023-07-21 | Sap欧洲公司 | File format for quick and efficient access to data |
CN110879807A (en) * | 2018-09-06 | 2020-03-13 | Sap欧洲公司 | File format for quickly and efficiently accessing data |
CN109815292A (en) * | 2019-01-03 | 2019-05-28 | 广州中软信息技术有限公司 | A kind of concerning taxes data collection system based on asynchronous message mechanism |
WO2020215951A1 (en) * | 2019-04-26 | 2020-10-29 | 深圳前海微众银行股份有限公司 | Encoding and decoding method and apparatus, computer device and storage medium |
CN111427718A (en) * | 2019-12-10 | 2020-07-17 | 杭州海康威视数字技术股份有限公司 | File backup method, recovery method and device |
CN111427718B (en) * | 2019-12-10 | 2024-01-23 | 杭州海康威视数字技术股份有限公司 | File backup method, file recovery method and file recovery device |
CN112685232A (en) * | 2021-01-11 | 2021-04-20 | 河南大学 | Computer backup data monitoring method and system |
CN112685232B (en) * | 2021-01-11 | 2022-03-01 | 河南大学 | Computer backup data monitoring method and system |
CN113051104B (en) * | 2021-03-11 | 2022-10-11 | 重庆紫光华山智安科技有限公司 | Method and related device for recovering data between disks based on erasure codes |
CN113051104A (en) * | 2021-03-11 | 2021-06-29 | 重庆紫光华山智安科技有限公司 | Method and related device for recovering data between disks based on erasure codes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107153588A (en) | data encoding storage method | |
CN110169040B (en) | Distributed data storage method and system based on multilayer consistent hash | |
CN107135264A (en) | Data-encoding scheme for embedded device | |
US10956276B2 (en) | System state recovery in a distributed, cloud-based storage system | |
US10719250B2 (en) | System and method for combining erasure-coded protection sets | |
CN110262922B (en) | Erasure code updating method and system based on duplicate data log | |
CN103152395B (en) | A kind of storage means of distributed file system and device | |
US9690823B2 (en) | Synchronizing copies of an extent in an append-only storage system | |
US10620830B2 (en) | Reconciling volumelets in volume cohorts | |
CN108595664B (en) | Agricultural data monitoring method in hadoop environment | |
Monga et al. | ElfStore: A resilient data storage service for federated edge and fog resources | |
Taranov et al. | Fast and strongly-consistent per-item resilience in key-value stores | |
Gupta et al. | {DataFog}: Towards a Holistic Data Management Platform for the {IoT} Age at the Network Edge | |
Goodrich et al. | The rainbow skip graph: a fault-tolerant constant-degree distributed data structure | |
CN111614720B (en) | Cross-cluster flow optimization method for single-point failure recovery of cluster storage system | |
Hua et al. | Scalable and adaptive metadata management in ultra large-scale file systems | |
CN107026912A (en) | Embedded communication equipment data transmission method | |
US20160139980A1 (en) | Erasure-coding extents in an append-only storage system | |
CN106027638B (en) | A kind of hadoop data distributing method based on hybrid coding | |
André et al. | Archiving cold data in warehouses with clustered network coding | |
CN109597903A (en) | Image file processing apparatus and method, document storage system and storage medium | |
KR101254179B1 (en) | Method for effective data recovery in distributed file system | |
CN106547484B (en) | A kind of reliability method of realization internal storage data and system based on RAID5 | |
Ribeiro et al. | Datacube: A p2p persistent data storage architecture based on hybrid redundancy schema | |
Datta et al. | Storage codes: Managing big data with small overheads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170912 |
|
RJ01 | Rejection of invention patent application after publication |