CN107153588A - data encoding storage method - Google Patents

data encoding storage method Download PDF

Info

Publication number
CN107153588A
CN107153588A CN201710331789.2A CN201710331789A CN107153588A CN 107153588 A CN107153588 A CN 107153588A CN 201710331789 A CN201710331789 A CN 201710331789A CN 107153588 A CN107153588 A CN 107153588A
Authority
CN
China
Prior art keywords
file
data
block
node
piecemeal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710331789.2A
Other languages
Chinese (zh)
Inventor
许荣福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Excellent Information Technology Co Ltd
Original Assignee
Chengdu Excellent Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Excellent Information Technology Co Ltd filed Critical Chengdu Excellent Information Technology Co Ltd
Priority to CN201710331789.2A priority Critical patent/CN107153588A/en
Publication of CN107153588A publication Critical patent/CN107153588A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of data encoding storage method, this method includes:Piecemeal is encapsulated after file data is encoded;By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;Mirror image subset extension is carried out according to the demand of memory capacity.The present invention proposes a kind of data encoding storage method, realizes data recovery using video memory node collection internal network bandwidth and computing capability as few as possible, while data high availability is realized, improves scalability.

Description

Data encoding storage method
Technical field
The present invention relates to Video processing, more particularly to a kind of data encoding storage method.
Background technology
With continuing to develop for information technology, data are increasingly becoming valuable source in people's daily life.Explosive growth Data necessarily bring continuing to increase for storage device.At present, the memory node of the modern data center under data storage environment Scale tens of thousands of at most hundreds of thousands, but in huge storage environment system at least, memory node it is abnormal or fail into For a kind of universal phenomenon;At the same time, the data caused by network access device or the other components of memory node can not be visited Ask or Loss also happens occasionally.For video coding and storage, the few encoding and decoding complexity of amount of calculation is lost with data How to carry out data recovery using minimum data volume during mistake all has local time response, such as the storage center network bandwidth Factor, CPU computing capability factors, can be to the storage time of file when video file is stored using coding redundancy strategy Performance is impacted.If having high speed bandwidth and high performance computing capability in system, the file of video storage cell size is just The shorter time can be consumed.And data redundancy minimum in the higher reliability and system electric energy less with system consumption has Global time response, this will equipment cost that directly decision systems are consumed, management cost and energy consumption cost.In order to meet day The data storage requirement of benefit extension, the reliability that people store to video data, the correlation properties such as availability propose higher It is required that, how to realize that the low redundancy high-reliability storage of data has become the huge challenge that industry faces.
The content of the invention
To solve the problems of above-mentioned prior art, the present invention proposes a kind of data encoding storage method, including:
Data encoding storage method, for carrying out video data storage, it is characterised in that including:
Piecemeal is encapsulated after file data is encoded;
By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;
Mirror image subset extension is carried out according to the demand of memory capacity.
Preferably, the video data storage systems include coordination service device, when memory node is added, by the money controlled oneself Source list is supplied to coordination service device.
Preferably, the hashed value of file to be uploaded is calculated, and the value is uploaded into coordination service device, coordination service device is coordinated Each memory node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document.
Preferably, when not detecting same Hash, built-in terminal receives this document, and to the block sort meter of file Hash is calculated, and distributed storage is into the node of mirror image subset.
Preferably, the hashed value for calculating file to be uploaded, in addition to:File is divided into each piecemeal, and calculated The SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature;By the characteristic signature of each file with File path and other relevant informations constitute metadata and are placed on together in internal memory, and the signature of its each piecemeal is placed on disk In, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing number After recovery, verification contrast is carried out.
Preferably, the positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table.
The present invention compared with prior art, with advantages below:
The present invention proposes a kind of data encoding storage method, and video memory node collection internal network is utilized as few as possible Bandwidth and computing capability realize data recovery, while data high availability is realized, improve scalability.
Brief description of the drawings
Fig. 1 is the flow chart of data encoding storage method according to embodiments of the present invention.
Embodiment
Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of data encoding storage method.Fig. 1 is data according to embodiments of the present invention Coding and storing method flow chart.
The video data storage systems of the present invention employ memory node subset expanding policy.Adopted on system software structure With new data recovery mode, while utilizing the computing capability of built-in terminal so that system is as few as possible to utilize storage section Point set internal network bandwidth and computing capability lose the restoration and reconstruction of data to realize.The recovery funtion part of data block will be lost Move to built-in terminal.The encoded rear piecemeal of the data of single file is encapsulated in video data storage systems, uniformly Ground is stored on the different nodes of mirror image subset, and the data that system is provided in an expansible storage volume, volume use level mesh Directory structures tissue, supports the concurrent access of multimachine multi-process.System is set to carry out mirror image subset expansion according to the demand of memory capacity Exhibition, the purpose extended on demand is reached using each mirror image subset nodes storage capacity.
Consistent code storage view is formed between mirror image subset one unified single File Mapping of formation, each subset. The different piecemeals for same file of each subset memory storage node storage, different piecemeals and the storage of system maintenance identical file Mapping relations between node.A dendrogram with hierarchical structure is would be combined between each mirror image subset, to set up storage Mapping relations between file set and cluster tool.Meanwhile, each memory node opens up one section of single memory space, is used for The data storage of system specific use, it is to avoid the situation for causing the memory node storage file catalogue in mirror image subset inconsistent. The memory node of each in system independently safeguard subset storage resource and file metadata in itself, and text can be separately provided Part piecemeal reading service.When disk failures in memory node, node where disk recovers to data block, file access pattern section Point will dispatch distributed storage can meet the minimum verification piecemeal that reconstruction is required in the subsets, be rebuild to losing data block. If there are multiple memory nodes generation exceptions, file server presses the amount of calculation required for calculating recovery All Files According to the equally loaded principle of amount of calculation, task distribution is carried out to recovery nodes.When occurring node exception in check information subset, System will dispatch source file and carry out secondary coding, and be deployed in again on the node newly added.
Designed between memory node using peering structure, when memory node adds video data storage systems, by what is controlled oneself The Resources list is supplied to coordination service device, any one group interior nodes both can as file block requestor, can also make For the supplier of file block, video data storage systems determine whether to start according to data storage amount and storage system utilization rate Next mirror image subset.In either image subset Sk(A;B memory node) is divided into file block storage section in (k=1,2,3...) Point and coding checkout piecemeal memory node.File block memory node configuration information mirror image subset Sk(A), its interior joint aK, i∈Sk (A) (k, i belong to positive integer), the piecemeal for storing original;And coding checkout piecemeal memory node bK, i(k, i belong to just whole Number) constitute check information mirror image subset S (B).
In order to reduce the data redudancy of whole system internal file, present system framework deletes file-level repeated data Except the system of introducing.And perform repeated data in data source location and disappear superfluous strategy.During built-in terminal running software, SHA is utilized AES calculates the hashed value of file to be uploaded, and the value is uploaded into coordination service device, and coordination service device is coordinated each and deposited Storage node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document, and notifies insertion Formula terminal data has been stored, when not detecting same Hash, and built-in terminal receives this document, and to the block sort of file Hash is calculated, and distributed storage is into the node of mirror image subset.
The identical data detection of file-level can detect the same file of different filenames, can also detect difference Same file under catalogue.File is divided into each piecemeal by system, and calculates the SHA values of each piecemeal, by whole file Hashed value as this document characteristic signature.System is by the characteristic signature of each file and file path and other relevant informations Metadata is constituted to be placed in internal memory together, and the signature of its each piecemeal is placed in disk, only when system has node abnormal, Just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing after data recovery, verification contrast is carried out.To add The positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table by the lookup of fast piecemeal, system In.
When system, node occur abnormal, and when loss of data and built-in terminal need to read data, built-in terminal leads to The former deblocking of download part and part verification data block are crossed, is rebuild by way of reconstruction in built-in terminal and loses data Block, after the completion of recovery, built-in terminal is while using file, by the deblocking reconstructed, by data verification, Resend to storage system.
The characteristics of all there is same directory information for data memory node in each mirror image subset, first by mirror image subset The uniform piecemeal of meta file store into each memory node, when built-in terminal needs meta file information, mirror image subset Each memory node will inquire about the metadata of this node storage.The inquiry of metadata is so converted into each memory node difference Inquiry to metadata sub-block.
When built-in terminal sends file storage request, file server is according to subset Sk(A;B) the operation of interior joint Situation, produces metadata, and to subset Sk(A;B a certain idle node in) sends instruction, and built-in terminal will be direct and is somebody's turn to do Memory node carries out data interaction.The node carries out piecemeal to file and encoded, and produces verification data, and encapsulate lattice according to data Formula is packaged.The original piecemeal of file will be sent to subset Sk(A), verification data block distributed storage is to subset Sk(B) on. Between all memory nodes that the data block and check block of so each file are crossed in subset, each node there is identical file to deposit Store up view.As mirror image subset Sk(A;B the memory space in) will be using when finishing, and system starts next mirror image subset Sk+1(A; B).To increase the utilization rate of cluster storage system computing resource, subset S1(B), S2(B) ... Sk+1(B) interior joint also can be literary Part server is used for distributing carries out coding calculating to follow-up file to be stored.
When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response, Required filename and byte offset are first converted into the index of file by built-in terminal, are sent to memory node and include file The request of name and index, directly sets up file and reads interoperability with target storage node set.Built-in terminal is to file mirrors The memory node of each in subset sends a request, data field in the file block and block that request is specified.If downloading mirror image Subset Sk(B) storage check block, then obtain original using the mode of data reconstruction in.
When built-in terminal sends file storage request, and when in storage system without identical data, coordination service device according to Subset Sk(B) running situation of interior joint produces metadata, and to subset Sk(B) a certain idle node in sends instruction, embedded Formula terminal is directly and the memory node carries out file interaction.The node carries out piecemeal to file and encoded, and produces verification data, and It is packaged, while calculating the hashed value of each fileinfo piecemeal, and hashed value is sent to according to data encapsulation format Coordination service device.The deblocking of storage file will be sent to subset Sk(A), verification data block distributed storage is to subset Sk(B) On.There is identical text between all memory nodes that the data block and check block of so each file are crossed in subset, each node Part stores view.As mirror image subset Sk(A;B the memory space in) will be using when finishing, and system starts next mirror image subset Sk+1(A;B).
When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response, Required filename and byte offset are first converted into the index of file by user, are sent to memory node and include filename and rope The request drawn, directly sets up file and reads interoperability with target storage node set.Coordination service device is returned according to Query Result The index of blocks of files, including the collection group subset where file and the position of data block.Built-in terminal is to file mirrors subset In each memory node send a request, ask data field in the file block specified and block.Built-in terminal will be by suitable Sequence restructuring file block obtains original.
Coordination service device is responsible for each in distribution cluster store tasks and metadata management, coordination service device record storage cluster Nodal information, the subset division information of memory node, system storage file directory information, the mapping relation information from file to block And the hashed value of each file block, while being also responsible for determining to losing strategy when blocks of files is recovered, and recover to appoint Business distribution, the migration management of file block.The mode that system broadcasts combination using regular heartbeat and event realizes system node shape The monitoring of state, when new node occurs in system, the node, which will be given the Information Communication of this node by way of broadcast, to be coordinated Server and each memory node.Meanwhile, each memory node is periodically reported from existing to its corresponding file management nodes State, if corresponding file management nodes are interior for a period of time do not receive heartbeat if think that the memory node is abnormal.
Abnormal nodes are in recovery process, the connection first between detection and remaining node, when the section for recovering file When point has difference larger network connection with remaining node, test data bag is sent to each link and carries out data path survey Comment, and be ranked up, and calculate the minimum blocks of files number p of required recovery, so as to from p optimal network connection, obtain Blocks of files is taken, file is recovered.If node k is responsible for recovering the blocks of files that file f is lost, node k connects from it P node connecing is sent after all files block read requests, blocks of files needed for node k is obtained, using system coding method by source After file access pattern, according to mark and the node lost and blocks of files, the coding method used using system carries out two to source file Secondary coding, obtains the redundant file block lost, again according to Document encapsulation agreement, and the file block size according to each loss is carried out Reseal, and by the blocks of files f recovered again l blocks of files, be uniformly stored in sequence in respectively on memory node.Respectively Individual node opens up a special subregion and temporarily stores the data block being reconstructed out, after abnormal nodes are replaced, reconstructs Data will place path according to the data block of coordination service device and unify to be placed.
Use demand of the present invention based on built-in terminal carries out losing data block recovery.Using being dispersed in built-in terminal Substantial amounts of computing resource participates in losing the mode that the reconstruction of blocks of files concentrates reconstruction to be combined with cluster internal, to realize abnormal section The reconstruction of file block on point.
If original is k piecemeal, n-k verification data block of encoded rear generation then arbitrarily takes from this n data block Going out k piecemeal just can reconstruct original.When cluster storage system is run, to rebuilding critical parameters k (1<k<N-m) set Put.As abnormal memory node k in clusterf<During k, cluster manager dual system not organization internal node to the number on abnormal memory node Recovered according to block, but required data block is recovered using user.User is when reading a certain file, it is necessary to same When download remaining n-k in clusterfIndividual original deblocking and kfIndividual verification data block, and codeword information is downloaded, reconstruct Abnormal kfIndividual data block, and by the data block reconstructed and the n-k that has downloadedfIndividual data block, is spliced into original.Meanwhile, it is embedding Enter the k that formula terminal-pair is recoveredfThe form that individual data block is encapsulated according to data block in group system is packaged again, and is uploaded Onto server cluster.File server will calculate the data block hashed value, and with the hashed value of the former data block stored It is compared, if identical, stores the piecemeal, refuses the upload request of the deblocking if different.When in computer cluster When abnormal node number exceedes the reconstruction critical parameters value k set, coordination service device is by according to the file block not being resumed Data volume size and cluster interior nodes running situation determine recovery policy in cluster.Coordination service device will calculate and rebuild all residues Recovery nodes, according to the equally loaded principle of amount of calculation, are carried out task distribution by the amount of calculation required for file block.And will be extensive Multiple file block is deployed on cluster memory storage node again.
When realizing, built-in terminal calculates the Hash functional value for having reconstructed data block first, and hashed value is uploaded to Coordination service device, due to having stored the hashed value storehouse stored when initial file is stored in coordination service device, with storehouse In the hashed value of loss file be compared, if it find that the hashed value for the data block that built-in terminal is reconstructed is with losing number It is identical according to the hashed value of block, then allow built-in terminal to upload the data block, if the data block that built-in terminal is reconstructed is scattered Train value is different from the hashed value for losing data block, illustrates that the data block rebuild is incorrect, or the data have been maliciously tampered, The data block is not received.For significant data, when data block, which is uploaded, to be finished, internal system is needed to having uploaded the number finished Secondary detection is carried out according to block.Management node calculates the hashed value of the data block again, and compares again with the hashed value of former data block It is right, it is detected whether in upload procedure, by malicious attack or is distorted.
When system is lost without original piecemeal, built-in terminal directly downloads original piecemeal to realize the reading of file Take.When there is network congestion, built-in terminal is obtained in the way of rebuilding to be read than directly downloading the more preferable file of former data block Take performance.If certain file is M, information node number is k, and check-node number is r.If certain available number of moment information node It is m according to reading ratea, and the available data downloading speed of check-node is mb, then mb>ma.If built-in terminal rebuilds M/k The speed of data block is mdIf havingSet up, then source file is carried out using adaptive mode Obtain.Built-in terminal sends file read request, and detects and each meshed network signal intelligence.It is connected when with each node When situation is identical, then select directly to read fileinfo with former data block.When the company for finding to have with the node of the former data block of storage When connecing poor, calculate and the time t that reconstruction mode obtains file is carried out by download part verification data blockr, while calculating from even The time t0 that poor node directly reads data block is connect, works as tr<t0When, then file is obtained by way of reconstruction.Work as tr>t0 When, then with common downloading mode.
The feature read for files in stream media inside, the present invention carries out uniform piecemeal to media file first, then right Piecemeal carries out verification and calculated to obtain check block, while t blocks of files is replicated before streaming media files, and is respectively stored into and deposits On each node of accumulation.System is for backup piecemeal using the pattern individually managed, and memory node will be used in disk individually The Backup Data for the space data storage piecemeal opened up.Count certain file is read number of times x, if being read within the unit interval Number of times is more than a certain setting value y, then the data block number of this document keeps the state replicated and coding redundancy coexists.If the unit interval Inside it is read number of times and is less than a certain setting value z, then system removes all duplication piecemeals of this document.
Node in system is further divided into active storage node and dormancy memory node.The task of active node is storage New file, and undertake reading task of the user to internal system data.Preferably, by the memory node of storage file information Subset Sn(A) active node is set to, allows its disk to be active, to meet the data read request of mass users, is deposited Store up the memory node subset S of verification datan(B) memory node is set to static node in, request is only directed to part cloth On storage storage node.System carries out distributed query when repeated data hashed value is inquired about using the node of the part.Meanwhile, Reading frequency of the statistics to file, high-frequency data are transferred to active node by file manager, visiting frequency very little Data will be transferred to dormancy memory node.
Logically, if storage system have n memory node, storage system need reach entangle delete performance be system can To allow that it is abnormal that any r memory node occurs.Then when built-in terminal proposes file storage request, system is first to file Piecemeal is carried out, the number of piecemeal is k=n-r.And ReedSolomon encoder matrix G are utilized, produce r verification piecemeal.And utilize The original piecemeal of k node storage file, remaining r node is used to store by the verification data piecemeal with producing after G operation. Its detailed process is:
Step one:When system is connected to file storage request, system directly carries out piecemeal to file, is divided into m × k file Block, if file size directly can not be divided exactly by m × k, Plus "0" is added in end of file.Using in each row vector in generator matrix " 0 ", the rule of " 1 " corresponding placement configurations, is directly transported vectorial and m × k data blocks that are being partitioned into encoder matrix G Calculate, to obtain verification data block.
Step 2:If the piecemeal of original D=(D1, D2... Dk)TRepresent, by DiReferred to as macro block.DiBy m microlith group Into, and for DiIn m data block (dI, 1, dI, 2…dI, m)TReferred to as microlith group.If the verification macro block group P=(P of generation1, P2... Pr)TRepresent, each of which verification macro block PiIn comprising m verify microlith.The collection of original document block and check block is shared E=(D1, D2... Dk|P1, P2... Pr)TRepresent.Then:GD=E.
M × k of whole file is represented by according to piecemeal:d1,1, d1,2…d1, m ...,dK, 1, dK, 2…dK, m.Original document Each verification macro block P of piecemeal generationiIn comprising m verification microlith, then verification microlith be expressed as:p1,1, p1,2…p1, m ..., pR, 1, pR, 2…pR, m
ReedSolomon encoder matrixs G is expressed as G=[I, V ']T.Wherein I is m × m unit matrix, and V ' is (m × r) × (m × k) matrix.Microlith pI, jGenerating process be:By m × k of file to be stored according to piecemeal d1,1, d1,2…d1, m ..., dK, 1, dK, 2…dK, mArrange in order, and and matrix V ' in mk element on (i-1) m+j rows position it is right successively Should.0-1 distribution situations on (i-1) m+j rows decide verification microlith pI, jGeneration rule:By on (i-1) m+j rows All values for " 1 " element position corresponding to those file data piecemeals carry out the accumulating operation of mould 2, obtained result be exactly by The verification microlith that the row is determined.In this way, the submatrix V ' in matrix G, which has altogether, can produce the rm school for original Test microlith p1,1, p1,2..., p1, m..., PR, 1, PR, 2..., PR, m, you can to produce t verification macro block.Unit matrix I generations Data block is the original piecemeal of file.Get up in order these original document piecemeals be exactly original by direct splicing.
Then code optimization is carried out for binary coded matrix.Continue encoder matrix being expressed as first:
G=[Ik×m, Gr×m]TWherein:GR, m=[l1, i, l2, i... lR × m, i]T
According to the row vector l of generation check bit1, i, l2, i... lR × m, iIn " 1 " number determine according to the vector calculate Required XOR calculation times during check bit.And calculate any two vectors lA, j, lB, jBetween the digit that differs.Below according to Above parameter determines check bit calculation optimization method.Its Optimizing Flow is as follows:
1. according to the number of " 1 " in each row vector in encoder matrix, determine and check bit institute is calculated according to the row vector The XOR number of times needed;
2. comparing the number of the element identical bits position different from element in encoder matrix between any two row vector, it is designated as (e/d), wherein e represents element identical position number in two vectors;D represents the different position number of element in two vectors;
If 3. row vector li(1<i<Rm the XOR number of times required for) is less than or equal in step 2 not isotopic number d, then directly The verification data block according to corresponding to the vector calculates the row is connect, and the vector is designated as lj
4. utilize the vectorial l determined in step 3j, according to identical digit in step 2 and not the ratio between isotopic number, determine next Individual calculating row vector.As certain row vector lkWith vectorial ljIsotopic number is not less than identical digit, and lkWith vectorial ljNot isotopic number and its Each remaining vector is not when isotopic number reaches minimum, then according to vectorial ljThe verification data that has calculated that is calculated by lkThe school of determination Test data;
If not calculating check bit 5. still having, according to the computation rule in step 4, with lkBased on vector, find it is next Vector to be calculated.
6. complete verification position calculating process is determined whether, if so, check bit calculating process successively is then preserved, if it is not, then Calculated according to original corresponding relation.
For detailed description this method, it is assumed that data storage block D1, D2... DrNode there is abnormal, then built-in terminal The detailed process for obtaining original is as follows:
Step 1:According to encoder matrix G=[I, V ']TDirectly obtain check matrix H=[V ' T, Im·r]TFor to loss Data block is rebuild.
Step 2:From the memory node of normal work, any k memory node of selection downloads k according to block Dr+1, Dr+2... Dk, Dk+l... Dk+r-1, Dk+r
Step 3:By the macro block D of loss1, D2... DrIt is expressed as X1, X2... Xr, make β=[X1, X2... Xr, Dr+1... Dk+r-1, Dk+r], wherein βr=[X1, X2... Xr], βk=[Dr+1... Dk+r-1, Dk+r].That is β=[βr, βk].Then according to relation β H(k+r)r=0 reconstructs the data block of loss.
Step 4:If matrix H(k+r)rIn be expressed as H ' with losing the corresponding vector matrix of data blockr·r, matrix H(k+r)rIn Vector matrix corresponding with health data block is expressed as H "k.r;Then have:
βl×r·H’r·rl×k·H”k·r
Wherein βl×rIt is unknown, the data block β of lossl×rLoss data block can be solved according to above formula, i.e.,:
βl×rl×k·H”k·r(H’r·r)-1
Data block [the X obtained1, X2... Xr] be loss data block [D1, D2... Dr]。
Step 5:By data block [D1, D2... Dr] with the data block D that does not lose in systemr+1, Dr+2..., Dk, according to successively Sequential combination is [D1, D2... Dk], then the data block combinations are original.
In storage system network bandwidth constrained environment, if relatively low safeguards bandwidth to realize the reliable of loss data Property recover.Then use the loss data block optimized reconstruction method below based on check matrix.Select the minimum reconstruction band of needs Wide recovery matrix H(k+r)m·rmMethod.It is specific as follows:
1. check matrix H is calculated first(k+r)m·rmEach column vector in element " 1 " number.
2. from check matrix H(k+r)m·rmIn extract lose data block corresponding to row vector, constitute matrix Hr’m·rm, then H(k+r)m·rmIn remaining row vector constitute matrix H(k+r-r’)m·rm, the vector of its lower end rm constitutes a unit matrix.Top It is expressed as H(k-r’)m·rm
3. H is determined successively(k-r’)m·rmThe number of element " 0 " in middle row vector, be more than when the number of " 0 " in the row vector or During equal to r ' m, the column vector where each " 0 " element of record;And further look for whether to deposit in identified column vector It is more than or equal to r ' m row vector in " 0 " element number, if nothing, column vector determined by record previous step.If so, then Determine new column vector.Circulated with this, and record column vector determined by circulation every time.
4. after chaining search is finished, respectively according to the number of " 1 " in each group of column vector, determine " 1 " element and be Minimum r ' m column vectors, and determine corresponding H(r’·m)(r’·m)Order be that full rank, i.e. the submatrix order are r'm.
In further aspect of the invention, address reference table AIT is introduced to mirror image subset as the addressing dimension of extension.Ground Location concordance list AIT is the metadata for describing addressing chained list ACT attributes, and ACT is divided into single addressable logic by AIT Composition, can independently be accessed, and the video data storage systems of ternary dynamic structure have the ability that concurrent reading and concurrent writing is accessed.And AIT pointer is pointing directly toward the target address location of ACT logical components, and random visit can quickly be realized by comparing without search Ask.
Address reference table AIT is the AHT set of addressing item, i.e. AIT={ AHT1..., AHTm..., AHTM};
Wherein AHTmThere is an input item and a corresponding output item.Its input item is one of addressing variables value Combination, output item is the corresponding data directory of the combination.
In the AIT of Video Storage System mirror image subset, each addressing item AHT input value is the addressing change of one group of data Value, i.e. logical address LA, its output valve are an addressing chained list ACT corresponding with LA values pointer, a skew Amount, a data length.The ACT pointers point to position of this group of data memory cell to be accessed in addressing chained list ACT; Offset determines access initial address in the memory cell;Data length defines access profile;When the data length lacks When saving or being 0, represent to access until the end of file.Then, the access for video data storage systems can be according to target The logical address LA of the addressing variables combination of data, uniquely determines a position, from this in file metadata addressing chained list ACT Play the data length that access is defined in read and write access memory node, AHT in position.
The address reference table AIT for accessing Video Storage System mirror image subset is realized using following steps:
1. retrieving metadata address concordance list AIT according to the addressing variables value of access target data, it is derived from one and seeks Location chained list ACT pointers, an offset and a data length;
2. position of the memory cell to be read and write of this group of data in addressing chained list ACT is obtained by the addressing chain table pointer Put, the read-write initial address in the memory cell to be read and write of this group of data is obtained by the offset, passes through the data length Obtain read-write scope;The read-write operation of this group of data is carried out according to the position, the read-write initial address, the read-write scope;
3. when carrying out the read operation of data using multiple threads, or multiple threads carry out the write operation of data but not related to And modification addressing chained list ACT pointers, offset and data length, then it is not related to and generates new addressing chained list, then each thread is each From step (1) and (2) is performed, the concurrent reading and concurrent writing operation of multi-group data is achieved in;
4. when carrying out the write operation of data using multiple threads, it is related to modification addressing chained list ACT pointers or offset Or data length, accessing step is as follows:
(4-1) is not related to modification addressing chained list ACT pointers, then will when being carried out the write operation of data using multiple threads Data are from write storage unit in newly given offset location, when needing to update the data length, calculate new data length And new offset and new data length are charged into the AHT output items;
(4-2) is related to modification addressing chained list ACT pointers, then visited when being carried out the write operation of data using multiple threads Ask the flow into generation address reference table AIT.
In on the basis of the video data storage organization of the present invention, video requency frame data is estimated in coding side using quick motion Meter, is briefly discussed below first:In addition to block motion search is carried out in the range of whole frame and range motion search is limited, simultaneously The search of correspondence grand movement and correspondence small range motion search.Iterative search is used in extensive search mode, with last time The initial point position that search result location is searched for as next time, when search result meets certain condition, i.e., when last time search result With when search result is identical next time, small range search is carried out by starting point of its result position.Small range search result is made For final result.
The function of block sort is added in coding side, the block of frame in is divided into by the present invention jumps over block and direct blocks.To jumping over block, Motion vector is 0, and actual residual error only transmits olive formula information close to 0, not translatory movement vector sum residual information.Wherein Judge to jump over block in the following manner:
DmI, j ∈ blockm|X(i, j)-Y(i, j)|/N
X(i, j)Represent block position m in frame in pixel an i, j, Y(i, j)Represent the respective pixel in reference frame, N Represent the number of pixels in block.As result DmDuring less than predetermined threshold value, the block is set to jump over block.Only delivery mode information is arrived Decoding end.
When carrying out residual computations, reference frame is used as using decoded key frame.Use below decoded key frame Generate side information.In frame in remaining piece, continue to determine to belong to the block of Direct Model, the residual error of the type block is close to 0, only Delivery mode information and motion vector information.In order to reduce coding side complexity, quick moving mass searching algorithm can be used.
Grand movement search iteration number of times maximum is set as 4, small range searching times be 1, correspondence maximum transversal or Fore-and-aft distance (0,7) or (7,0), correspondence fixed length code encoder bit rate are 3 bits.If a wide range of vector search successfully restrains, then Obtained motion residuals and threshold value are compared, threshold value is identical with jumping over pattern thresholding, when no more than threshold value, really It is set to direct module, it is necessary to decoding end delivery mode information and motion vector information.
When block and direct blocks relevant information are jumped in transmission, if both are merged into coding, zero motion vector (0,0) table Show.In translatory movement vector information, block code or algorithm can be used, specific calculation process is as follows:
Step 1. is respectively adopted block code and index coding, takes the less value of code word size in K to motion vector information For rate1;
Step 2. will jump over pattern and Direct Model merges into a class, and code check now is:
Mod e1=mod e (jumping over pattern) ∪ mod e (Direct Model)
Mod e2=mod e (general mode)
Rate2=ent (mod e1, mod e2) * code_length+2*num (mode (jumping over pattern))
Wherein, pattern information mod e () represent the corresponding pattern of type of block, and ent () calculates the entropy of corresponding informance, Code_length is code word size to be encoded.
Step 3. total bitrate is two above code check sum.
Total_rate=ratel+rate2
In decoding end, obtain after corresponding pattern information and motion vector information, carried out to jumping over module and direct module Rebuild.To jumping over module, directly the block of the same position of previous reference frame is regard as last reconstructed block.For direct blocks, profit With motion vector, the block of corresponding motion compensation is regard as last reconstructed block.And block for remaining general mode, it is necessary to Side information and residual information are generated in decoding end.
It is described to generate side information, including procedure below using decoded key frame:
Step 1. obtains initial motion vector.Parallel motion algorithm for estimating is used first, calculates parallel motion vector.Motion What matching was searched for is expressed as:
(vx, vy)=argminmx,my(D(mx,my)*(1+0.05(mx2+my2)1/2)
(vx, vy)=± (vx/2,vy/2)
Wherein, x(i,j)Represent reference block pixel, y(i+mx,j+my)Represent another frame motion search block pixel.| | m | | it is 0 rank Normal form, represents block m size.Last obtained motion estimation vector is the half that the first row calculates obtained motion vector. Negated according to its original orientation of estimation or constant.
After the motion vector of front and rear frame correspondence position block is obtained, the two is converted into unidirectional motion vector, i.e., One of motion vector is negated.This two motion vectors are averaging, the initial motion of bi-directional motion search estimation is obtained Vector.
Step 2. regard the average vector of the 1st step as initial vector for each block, it is assumed that the block is done in a short time Linear uniform motion.That is motion vector of the block in front and rear frame is equal in magnitude, in the opposite direction.In the preset range of initial vector Interior carry out bi-directional motion search, first centered on initial vector, sets hunting zone as -3 to 3, if two fortune of the first step Moving vector difference is more than 5 in either direction, then direction hunting zone is expanded into -5 to 5.If real in this hunting zone Border searching position number is less than thresholding, then continues centered on null vector, leads scope to search with -6 to 6 and scans for, takes the two Minimum value is motion search result, and the result to motion search carries out residual computations, and calculation formula is as follows:
(vx, vy)=arg minmx,my D(mx,my)
X, y represent RELATED APPLICATIONS frame.
When the absolute residuals of calculating is with minimum is reached, the motion vector result of bi-directional motion estimation is obtained.Should with basis Motion vector, obtains corresponding side Message Reference Block sideblock, residual error estimation block residentblock and residual information resident。
Residide=min (D(mx,my))
Sideblock(i,j)=(x(i-vx,j-vy)+y(i+vx,j+vy))/2
residentblock(i,j)=(x(i-vx,j-vy)-y(i+vx,j+vy))/2
The result for the motion vector estimation that step 3. is obtained for step 2 is further processed.Work as motion-vector magnitude During more than certain thresholding, two-way parallel motion estimation compensation is carried out.According to step 1 calculate obtain two-way parallel motion estimate to Amount, obtains four motion compensation blocks, for four obtained motion compensation blocks, if it is respectively positioned within image display range, and The distance between equidirectional motion vector is less than preset range, the corresponding side block of information of the block can be calculated by following formula and residual Poor block:
Sideblock=(block1+block2+block3+block4)/4
Residentblock=(block1+block2-block3-block4)/4
Wherein block1And block2Belong to forerunner's frame of present frame, block3And block4Belong to subsequent frame.
Step 4. pair is in the block of movement edge, if its corresponding residual values is more than thresholding, is handled as follows:It is first First, the two parallel motion estimate vectors obtained for the 1st step take any one motion vector, if in initial motion estimation In the reverse direction, the position of the vectorial estimation has exceeded image boundary, and the estimation residual error of this direction motion vector is less than The estimation residual error of opposite direction motion vector, then its corresponding motion compensation block mended with the parallel motion that is obtained on the estimation direction Block is repaid to represent.In the case, compensated with unidirectional search.If the condition of the 1st step is unsatisfactory for, to another move to Amount is handled.Obtained motion compensation block is weighted average.
In summary, the present invention proposes a kind of data encoding storage method, and video memory node is utilized as few as possible Collect internal network bandwidth and computing capability to realize data recovery, while data high availability is realized, improve autgmentability Energy.
Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.
It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims (6)

1. a kind of data encoding storage method, for carrying out video data storage, it is characterised in that including:
Piecemeal is encapsulated after file data is encoded;
By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems;
Mirror image subset extension is carried out according to the demand of memory capacity.
2. according to the method described in claim 1, it is characterised in that the video data storage systems include coordination service device, When memory node is added, the Resources list controlled oneself is supplied to coordination service device.
3. method according to claim 2, further comprises:The hashed value of file to be uploaded is calculated, and the value is uploaded To coordination service device, coordination service device coordinates each memory node and the value is inquired about, and when finding to exist the value, coordinates clothes Business device updates the reference degree of this document.
4. method according to claim 3, further comprises:When not detecting same Hash, built-in terminal is received This document, and hash is calculated to the block sort of file, and distributed storage is into the node of mirror image subset.
5. method according to claim 3, the hashed value of the calculating file to be uploaded, in addition to:File is divided into Each piecemeal, and calculate the SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature;Will be each The characteristic signature of file is placed in internal memory together with constituting metadata with file path and other relevant informations, and its each piecemeal Signature be placed in disk, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, to be embedded in Formula terminal-pair is lost after data recovery, carries out verification contrast.
6. method according to claim 5, it is characterised in that by the positional information and its hashed value and file of piecemeal Block identification is uniformly stored in a table.
CN201710331789.2A 2017-05-12 2017-05-12 data encoding storage method Pending CN107153588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710331789.2A CN107153588A (en) 2017-05-12 2017-05-12 data encoding storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710331789.2A CN107153588A (en) 2017-05-12 2017-05-12 data encoding storage method

Publications (1)

Publication Number Publication Date
CN107153588A true CN107153588A (en) 2017-09-12

Family

ID=59792788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710331789.2A Pending CN107153588A (en) 2017-05-12 2017-05-12 data encoding storage method

Country Status (1)

Country Link
CN (1) CN107153588A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN109194725A (en) * 2018-08-17 2019-01-11 杭州数梦工场科技有限公司 A kind of transmission method of mirror image, device, equipment and computer storage medium
CN109815292A (en) * 2019-01-03 2019-05-28 广州中软信息技术有限公司 A kind of concerning taxes data collection system based on asynchronous message mechanism
CN110879807A (en) * 2018-09-06 2020-03-13 Sap欧洲公司 File format for quickly and efficiently accessing data
CN111427718A (en) * 2019-12-10 2020-07-17 杭州海康威视数字技术股份有限公司 File backup method, recovery method and device
WO2020215951A1 (en) * 2019-04-26 2020-10-29 深圳前海微众银行股份有限公司 Encoding and decoding method and apparatus, computer device and storage medium
CN112685232A (en) * 2021-01-11 2021-04-20 河南大学 Computer backup data monitoring method and system
CN113051104A (en) * 2021-03-11 2021-06-29 重庆紫光华山智安科技有限公司 Method and related device for recovering data between disks based on erasure codes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752402A (en) * 2012-07-20 2012-10-24 广东威创视讯科技股份有限公司 Cloud storage method and cloud storage system
CN103873507A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block uploading and storing system and method
CN103916483A (en) * 2014-04-28 2014-07-09 中国科学院成都生物研究所 Self-adaptation data storage and reconstruction method for coding redundancy storage system
US20140359054A1 (en) * 2013-05-29 2014-12-04 Microsoft Corporation Distributed Storage Defense in a Cluster
US20140355679A1 (en) * 2012-01-20 2014-12-04 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding the significance map for residual coefficients of a transform unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355679A1 (en) * 2012-01-20 2014-12-04 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding the significance map for residual coefficients of a transform unit
CN102752402A (en) * 2012-07-20 2012-10-24 广东威创视讯科技股份有限公司 Cloud storage method and cloud storage system
CN103873507A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block uploading and storing system and method
US20140359054A1 (en) * 2013-05-29 2014-12-04 Microsoft Corporation Distributed Storage Defense in a Cluster
CN103916483A (en) * 2014-04-28 2014-07-09 中国科学院成都生物研究所 Self-adaptation data storage and reconstruction method for coding redundancy storage system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘仲: "基于对象存储结构的可伸缩集群存储系统研究", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 *
蒋海波: "海量数据存储系统的高可靠性关键技术研究与应用", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943867A (en) * 2017-11-10 2018-04-20 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN107943867B (en) * 2017-11-10 2021-11-23 中国电子科技集团公司第三十二研究所 High-performance hierarchical storage system supporting heterogeneous storage
CN109194725A (en) * 2018-08-17 2019-01-11 杭州数梦工场科技有限公司 A kind of transmission method of mirror image, device, equipment and computer storage medium
CN110879807B (en) * 2018-09-06 2023-07-21 Sap欧洲公司 File format for quick and efficient access to data
CN110879807A (en) * 2018-09-06 2020-03-13 Sap欧洲公司 File format for quickly and efficiently accessing data
CN109815292A (en) * 2019-01-03 2019-05-28 广州中软信息技术有限公司 A kind of concerning taxes data collection system based on asynchronous message mechanism
WO2020215951A1 (en) * 2019-04-26 2020-10-29 深圳前海微众银行股份有限公司 Encoding and decoding method and apparatus, computer device and storage medium
CN111427718A (en) * 2019-12-10 2020-07-17 杭州海康威视数字技术股份有限公司 File backup method, recovery method and device
CN111427718B (en) * 2019-12-10 2024-01-23 杭州海康威视数字技术股份有限公司 File backup method, file recovery method and file recovery device
CN112685232A (en) * 2021-01-11 2021-04-20 河南大学 Computer backup data monitoring method and system
CN112685232B (en) * 2021-01-11 2022-03-01 河南大学 Computer backup data monitoring method and system
CN113051104B (en) * 2021-03-11 2022-10-11 重庆紫光华山智安科技有限公司 Method and related device for recovering data between disks based on erasure codes
CN113051104A (en) * 2021-03-11 2021-06-29 重庆紫光华山智安科技有限公司 Method and related device for recovering data between disks based on erasure codes

Similar Documents

Publication Publication Date Title
CN107153588A (en) data encoding storage method
CN110169040B (en) Distributed data storage method and system based on multilayer consistent hash
CN107135264A (en) Data-encoding scheme for embedded device
US10956276B2 (en) System state recovery in a distributed, cloud-based storage system
US10719250B2 (en) System and method for combining erasure-coded protection sets
CN110262922B (en) Erasure code updating method and system based on duplicate data log
CN103152395B (en) A kind of storage means of distributed file system and device
US9690823B2 (en) Synchronizing copies of an extent in an append-only storage system
US10620830B2 (en) Reconciling volumelets in volume cohorts
CN108595664B (en) Agricultural data monitoring method in hadoop environment
Monga et al. ElfStore: A resilient data storage service for federated edge and fog resources
Taranov et al. Fast and strongly-consistent per-item resilience in key-value stores
Gupta et al. {DataFog}: Towards a Holistic Data Management Platform for the {IoT} Age at the Network Edge
Goodrich et al. The rainbow skip graph: a fault-tolerant constant-degree distributed data structure
CN111614720B (en) Cross-cluster flow optimization method for single-point failure recovery of cluster storage system
Hua et al. Scalable and adaptive metadata management in ultra large-scale file systems
CN107026912A (en) Embedded communication equipment data transmission method
US20160139980A1 (en) Erasure-coding extents in an append-only storage system
CN106027638B (en) A kind of hadoop data distributing method based on hybrid coding
André et al. Archiving cold data in warehouses with clustered network coding
CN109597903A (en) Image file processing apparatus and method, document storage system and storage medium
KR101254179B1 (en) Method for effective data recovery in distributed file system
CN106547484B (en) A kind of reliability method of realization internal storage data and system based on RAID5
Ribeiro et al. Datacube: A p2p persistent data storage architecture based on hybrid redundancy schema
Datta et al. Storage codes: Managing big data with small overheads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170912

RJ01 Rejection of invention patent application after publication