CN107153588A

CN107153588A - data encoding storage method

Info

Publication number: CN107153588A
Application number: CN201710331789.2A
Authority: CN
Inventors: 许荣福
Original assignee: Chengdu Excellent Information Technology Co Ltd
Current assignee: Chengdu Excellent Information Technology Co Ltd
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2017-09-12

Abstract

The invention provides a kind of data encoding storage method, this method includes：Piecemeal is encapsulated after file data is encoded；By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems；Mirror image subset extension is carried out according to the demand of memory capacity.The present invention proposes a kind of data encoding storage method, realizes data recovery using video memory node collection internal network bandwidth and computing capability as few as possible, while data high availability is realized, improves scalability.

Description

Data encoding storage method

Technical field

The present invention relates to Video processing, more particularly to a kind of data encoding storage method.

Background technology

With continuing to develop for information technology, data are increasingly becoming valuable source in people's daily life.Explosive growth Data necessarily bring continuing to increase for storage device.At present, the memory node of the modern data center under data storage environment Scale tens of thousands of at most hundreds of thousands, but in huge storage environment system at least, memory node it is abnormal or fail into For a kind of universal phenomenon；At the same time, the data caused by network access device or the other components of memory node can not be visited Ask or Loss also happens occasionally.For video coding and storage, the few encoding and decoding complexity of amount of calculation is lost with data How to carry out data recovery using minimum data volume during mistake all has local time response, such as the storage center network bandwidth Factor, CPU computing capability factors, can be to the storage time of file when video file is stored using coding redundancy strategy Performance is impacted.If having high speed bandwidth and high performance computing capability in system, the file of video storage cell size is just The shorter time can be consumed.And data redundancy minimum in the higher reliability and system electric energy less with system consumption has Global time response, this will equipment cost that directly decision systems are consumed, management cost and energy consumption cost.In order to meet day The data storage requirement of benefit extension, the reliability that people store to video data, the correlation properties such as availability propose higher It is required that, how to realize that the low redundancy high-reliability storage of data has become the huge challenge that industry faces.

The content of the invention

To solve the problems of above-mentioned prior art, the present invention proposes a kind of data encoding storage method, including：

Data encoding storage method, for carrying out video data storage, it is characterised in that including：

Piecemeal is encapsulated after file data is encoded；

By the data storage after encapsulation on the different nodes of the mirror image subset of video data storage systems；

Mirror image subset extension is carried out according to the demand of memory capacity.

Preferably, the video data storage systems include coordination service device, when memory node is added, by the money controlled oneself Source list is supplied to coordination service device.

Preferably, the hashed value of file to be uploaded is calculated, and the value is uploaded into coordination service device, coordination service device is coordinated Each memory node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document.

Preferably, when not detecting same Hash, built-in terminal receives this document, and to the block sort meter of file Hash is calculated, and distributed storage is into the node of mirror image subset.

Preferably, the hashed value for calculating file to be uploaded, in addition to：File is divided into each piecemeal, and calculated The SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature；By the characteristic signature of each file with File path and other relevant informations constitute metadata and are placed on together in internal memory, and the signature of its each piecemeal is placed on disk In, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing number After recovery, verification contrast is carried out.

Preferably, the positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table.

The present invention compared with prior art, with advantages below：

The present invention proposes a kind of data encoding storage method, and video memory node collection internal network is utilized as few as possible Bandwidth and computing capability realize data recovery, while data high availability is realized, improve scalability.

Brief description of the drawings

Fig. 1 is the flow chart of data encoding storage method according to embodiments of the present invention.

Embodiment

Retouching in detail to one or more embodiment of the invention is hereafter provided together with illustrating the accompanying drawing of the principle of the invention State.The present invention is described with reference to such embodiment, but the invention is not restricted to any embodiment.The scope of the present invention is only by right Claim is limited, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with Thorough understanding of the present invention is just provided.These details are provided for exemplary purposes, and without in these details Some or all details can also realize the present invention according to claims.

An aspect of of the present present invention provides a kind of data encoding storage method.Fig. 1 is data according to embodiments of the present invention Coding and storing method flow chart.

The video data storage systems of the present invention employ memory node subset expanding policy.Adopted on system software structure With new data recovery mode, while utilizing the computing capability of built-in terminal so that system is as few as possible to utilize storage section Point set internal network bandwidth and computing capability lose the restoration and reconstruction of data to realize.The recovery funtion part of data block will be lost Move to built-in terminal.The encoded rear piecemeal of the data of single file is encapsulated in video data storage systems, uniformly Ground is stored on the different nodes of mirror image subset, and the data that system is provided in an expansible storage volume, volume use level mesh Directory structures tissue, supports the concurrent access of multimachine multi-process.System is set to carry out mirror image subset expansion according to the demand of memory capacity Exhibition, the purpose extended on demand is reached using each mirror image subset nodes storage capacity.

Consistent code storage view is formed between mirror image subset one unified single File Mapping of formation, each subset. The different piecemeals for same file of each subset memory storage node storage, different piecemeals and the storage of system maintenance identical file Mapping relations between node.A dendrogram with hierarchical structure is would be combined between each mirror image subset, to set up storage Mapping relations between file set and cluster tool.Meanwhile, each memory node opens up one section of single memory space, is used for The data storage of system specific use, it is to avoid the situation for causing the memory node storage file catalogue in mirror image subset inconsistent. The memory node of each in system independently safeguard subset storage resource and file metadata in itself, and text can be separately provided Part piecemeal reading service.When disk failures in memory node, node where disk recovers to data block, file access pattern section Point will dispatch distributed storage can meet the minimum verification piecemeal that reconstruction is required in the subsets, be rebuild to losing data block. If there are multiple memory nodes generation exceptions, file server presses the amount of calculation required for calculating recovery All Files According to the equally loaded principle of amount of calculation, task distribution is carried out to recovery nodes.When occurring node exception in check information subset, System will dispatch source file and carry out secondary coding, and be deployed in again on the node newly added.

Designed between memory node using peering structure, when memory node adds video data storage systems, by what is controlled oneself The Resources list is supplied to coordination service device, any one group interior nodes both can as file block requestor, can also make For the supplier of file block, video data storage systems determine whether to start according to data storage amount and storage system utilization rate Next mirror image subset.In either image subset S_k(A；B memory node) is divided into file block storage section in (k=1,2,3...) Point and coding checkout piecemeal memory node.File block memory node configuration information mirror image subset S_k(A), its interior joint a_{K, i}∈S_k (A) (k, i belong to positive integer), the piecemeal for storing original；And coding checkout piecemeal memory node b_{K, i}(k, i belong to just whole Number) constitute check information mirror image subset S (B).

In order to reduce the data redudancy of whole system internal file, present system framework deletes file-level repeated data Except the system of introducing.And perform repeated data in data source location and disappear superfluous strategy.During built-in terminal running software, SHA is utilized AES calculates the hashed value of file to be uploaded, and the value is uploaded into coordination service device, and coordination service device is coordinated each and deposited Storage node is inquired about the value, and when finding to exist the value, coordination service device updates the reference degree of this document, and notifies insertion Formula terminal data has been stored, when not detecting same Hash, and built-in terminal receives this document, and to the block sort of file Hash is calculated, and distributed storage is into the node of mirror image subset.

The identical data detection of file-level can detect the same file of different filenames, can also detect difference Same file under catalogue.File is divided into each piecemeal by system, and calculates the SHA values of each piecemeal, by whole file Hashed value as this document characteristic signature.System is by the characteristic signature of each file and file path and other relevant informations Metadata is constituted to be placed in internal memory together, and the signature of its each piecemeal is placed in disk, only when system has node abnormal, Just the signature of each piecemeal is read in internal memory, so that built-in terminal is to losing after data recovery, verification contrast is carried out.To add The positional information and its hashed value and file block identification of piecemeal are uniformly stored in a table by the lookup of fast piecemeal, system In.

When system, node occur abnormal, and when loss of data and built-in terminal need to read data, built-in terminal leads to The former deblocking of download part and part verification data block are crossed, is rebuild by way of reconstruction in built-in terminal and loses data Block, after the completion of recovery, built-in terminal is while using file, by the deblocking reconstructed, by data verification, Resend to storage system.

The characteristics of all there is same directory information for data memory node in each mirror image subset, first by mirror image subset The uniform piecemeal of meta file store into each memory node, when built-in terminal needs meta file information, mirror image subset Each memory node will inquire about the metadata of this node storage.The inquiry of metadata is so converted into each memory node difference Inquiry to metadata sub-block.

When built-in terminal sends file storage request, file server is according to subset S_k(A；B) the operation of interior joint Situation, produces metadata, and to subset S_k(A；B a certain idle node in) sends instruction, and built-in terminal will be direct and is somebody's turn to do Memory node carries out data interaction.The node carries out piecemeal to file and encoded, and produces verification data, and encapsulate lattice according to data Formula is packaged.The original piecemeal of file will be sent to subset S_k(A), verification data block distributed storage is to subset S_k(B) on. Between all memory nodes that the data block and check block of so each file are crossed in subset, each node there is identical file to deposit Store up view.As mirror image subset S_k(A；B the memory space in) will be using when finishing, and system starts next mirror image subset S_k+1(A； B).To increase the utilization rate of cluster storage system computing resource, subset S₁(B), S₂(B) ... S_k+1(B) interior joint also can be literary Part server is used for distributing carries out coding calculating to follow-up file to be stored.

When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response, Required filename and byte offset are first converted into the index of file by built-in terminal, are sent to memory node and include file The request of name and index, directly sets up file and reads interoperability with target storage node set.Built-in terminal is to file mirrors The memory node of each in subset sends a request, data field in the file block and block that request is specified.If downloading mirror image Subset S_k(B) storage check block, then obtain original using the mode of data reconstruction in.

When built-in terminal sends file storage request, and when in storage system without identical data, coordination service device according to Subset S_k(B) running situation of interior joint produces metadata, and to subset S_k(B) a certain idle node in sends instruction, embedded Formula terminal is directly and the memory node carries out file interaction.The node carries out piecemeal to file and encoded, and produces verification data, and It is packaged, while calculating the hashed value of each fileinfo piecemeal, and hashed value is sent to according to data encapsulation format Coordination service device.The deblocking of storage file will be sent to subset S_k(A), verification data block distributed storage is to subset S_k(B) On.There is identical text between all memory nodes that the data block and check block of so each file are crossed in subset, each node Part stores view.As mirror image subset S_k(A；B the memory space in) will be using when finishing, and system starts next mirror image subset S_k+1(A；B).

When there is file to read demand, built-in terminal is directly sent out to file server requests source data, file server Send the mapping between file to be read and mirror image subset and memory node with match information to built-in terminal.Obtain after response, Required filename and byte offset are first converted into the index of file by user, are sent to memory node and include filename and rope The request drawn, directly sets up file and reads interoperability with target storage node set.Coordination service device is returned according to Query Result The index of blocks of files, including the collection group subset where file and the position of data block.Built-in terminal is to file mirrors subset In each memory node send a request, ask data field in the file block specified and block.Built-in terminal will be by suitable Sequence restructuring file block obtains original.

Coordination service device is responsible for each in distribution cluster store tasks and metadata management, coordination service device record storage cluster Nodal information, the subset division information of memory node, system storage file directory information, the mapping relation information from file to block And the hashed value of each file block, while being also responsible for determining to losing strategy when blocks of files is recovered, and recover to appoint Business distribution, the migration management of file block.The mode that system broadcasts combination using regular heartbeat and event realizes system node shape The monitoring of state, when new node occurs in system, the node, which will be given the Information Communication of this node by way of broadcast, to be coordinated Server and each memory node.Meanwhile, each memory node is periodically reported from existing to its corresponding file management nodes State, if corresponding file management nodes are interior for a period of time do not receive heartbeat if think that the memory node is abnormal.

Abnormal nodes are in recovery process, the connection first between detection and remaining node, when the section for recovering file When point has difference larger network connection with remaining node, test data bag is sent to each link and carries out data path survey Comment, and be ranked up, and calculate the minimum blocks of files number p of required recovery, so as to from p optimal network connection, obtain Blocks of files is taken, file is recovered.If node k is responsible for recovering the blocks of files that file f is lost, node k connects from it P node connecing is sent after all files block read requests, blocks of files needed for node k is obtained, using system coding method by source After file access pattern, according to mark and the node lost and blocks of files, the coding method used using system carries out two to source file Secondary coding, obtains the redundant file block lost, again according to Document encapsulation agreement, and the file block size according to each loss is carried out Reseal, and by the blocks of files f recovered again l blocks of files, be uniformly stored in sequence in respectively on memory node.Respectively Individual node opens up a special subregion and temporarily stores the data block being reconstructed out, after abnormal nodes are replaced, reconstructs Data will place path according to the data block of coordination service device and unify to be placed.

Use demand of the present invention based on built-in terminal carries out losing data block recovery.Using being dispersed in built-in terminal Substantial amounts of computing resource participates in losing the mode that the reconstruction of blocks of files concentrates reconstruction to be combined with cluster internal, to realize abnormal section The reconstruction of file block on point.

If original is k piecemeal, n-k verification data block of encoded rear generation then arbitrarily takes from this n data block Going out k piecemeal just can reconstruct original.When cluster storage system is run, to rebuilding critical parameters k (1<k<N-m) set Put.As abnormal memory node k in cluster_f<During k, cluster manager dual system not organization internal node to the number on abnormal memory node Recovered according to block, but required data block is recovered using user.User is when reading a certain file, it is necessary to same When download remaining n-k in cluster_fIndividual original deblocking and k_fIndividual verification data block, and codeword information is downloaded, reconstruct Abnormal k_fIndividual data block, and by the data block reconstructed and the n-k that has downloaded_fIndividual data block, is spliced into original.Meanwhile, it is embedding Enter the k that formula terminal-pair is recovered_fThe form that individual data block is encapsulated according to data block in group system is packaged again, and is uploaded Onto server cluster.File server will calculate the data block hashed value, and with the hashed value of the former data block stored It is compared, if identical, stores the piecemeal, refuses the upload request of the deblocking if different.When in computer cluster When abnormal node number exceedes the reconstruction critical parameters value k set, coordination service device is by according to the file block not being resumed Data volume size and cluster interior nodes running situation determine recovery policy in cluster.Coordination service device will calculate and rebuild all residues Recovery nodes, according to the equally loaded principle of amount of calculation, are carried out task distribution by the amount of calculation required for file block.And will be extensive Multiple file block is deployed on cluster memory storage node again.

When realizing, built-in terminal calculates the Hash functional value for having reconstructed data block first, and hashed value is uploaded to Coordination service device, due to having stored the hashed value storehouse stored when initial file is stored in coordination service device, with storehouse In the hashed value of loss file be compared, if it find that the hashed value for the data block that built-in terminal is reconstructed is with losing number It is identical according to the hashed value of block, then allow built-in terminal to upload the data block, if the data block that built-in terminal is reconstructed is scattered Train value is different from the hashed value for losing data block, illustrates that the data block rebuild is incorrect, or the data have been maliciously tampered, The data block is not received.For significant data, when data block, which is uploaded, to be finished, internal system is needed to having uploaded the number finished Secondary detection is carried out according to block.Management node calculates the hashed value of the data block again, and compares again with the hashed value of former data block It is right, it is detected whether in upload procedure, by malicious attack or is distorted.

When system is lost without original piecemeal, built-in terminal directly downloads original piecemeal to realize the reading of file Take.When there is network congestion, built-in terminal is obtained in the way of rebuilding to be read than directly downloading the more preferable file of former data block Take performance.If certain file is M, information node number is k, and check-node number is r.If certain available number of moment information node It is m according to reading rate_a, and the available data downloading speed of check-node is m_b, then m_b>m_a.If built-in terminal rebuilds M/k The speed of data block is m_dIf havingSet up, then source file is carried out using adaptive mode Obtain.Built-in terminal sends file read request, and detects and each meshed network signal intelligence.It is connected when with each node When situation is identical, then select directly to read fileinfo with former data block.When the company for finding to have with the node of the former data block of storage When connecing poor, calculate and the time t that reconstruction mode obtains file is carried out by download part verification data block_r, while calculating from even The time t0 that poor node directly reads data block is connect, works as t_r<t₀When, then file is obtained by way of reconstruction.Work as t_r>t₀ When, then with common downloading mode.

The feature read for files in stream media inside, the present invention carries out uniform piecemeal to media file first, then right Piecemeal carries out verification and calculated to obtain check block, while t blocks of files is replicated before streaming media files, and is respectively stored into and deposits On each node of accumulation.System is for backup piecemeal using the pattern individually managed, and memory node will be used in disk individually The Backup Data for the space data storage piecemeal opened up.Count certain file is read number of times x, if being read within the unit interval Number of times is more than a certain setting value y, then the data block number of this document keeps the state replicated and coding redundancy coexists.If the unit interval Inside it is read number of times and is less than a certain setting value z, then system removes all duplication piecemeals of this document.

Node in system is further divided into active storage node and dormancy memory node.The task of active node is storage New file, and undertake reading task of the user to internal system data.Preferably, by the memory node of storage file information Subset S_n(A) active node is set to, allows its disk to be active, to meet the data read request of mass users, is deposited Store up the memory node subset S of verification data_n(B) memory node is set to static node in, request is only directed to part cloth On storage storage node.System carries out distributed query when repeated data hashed value is inquired about using the node of the part.Meanwhile, Reading frequency of the statistics to file, high-frequency data are transferred to active node by file manager, visiting frequency very little Data will be transferred to dormancy memory node.

Logically, if storage system have n memory node, storage system need reach entangle delete performance be system can To allow that it is abnormal that any r memory node occurs.Then when built-in terminal proposes file storage request, system is first to file Piecemeal is carried out, the number of piecemeal is k=n-r.And ReedSolomon encoder matrix G are utilized, produce r verification piecemeal.And utilize The original piecemeal of k node storage file, remaining r node is used to store by the verification data piecemeal with producing after G operation. Its detailed process is：

Step one：When system is connected to file storage request, system directly carries out piecemeal to file, is divided into m × k file Block, if file size directly can not be divided exactly by m × k, Plus "0" is added in end of file.Using in each row vector in generator matrix " 0 ", the rule of " 1 " corresponding placement configurations, is directly transported vectorial and m × k data blocks that are being partitioned into encoder matrix G Calculate, to obtain verification data block.

Step 2：If the piecemeal of original D=(D₁, D₂... D_k)^TRepresent, by D_iReferred to as macro block.D_iBy m microlith group Into, and for D_iIn m data block (d_{I, 1}, d_{I, 2}…d_{I, m})^TReferred to as microlith group.If the verification macro block group P=(P of generation₁, P₂... P_r)^TRepresent, each of which verification macro block P_iIn comprising m verify microlith.The collection of original document block and check block is shared E=(D₁, D₂... D_k|P₁, P₂... P_r)^TRepresent.Then：GD=E.

M × k of whole file is represented by according to piecemeal：d_1,1, d_1,2…d_{1, m ...,}d_{K, 1}, d_{K, 2}…d_{K, m}.Original document Each verification macro block P of piecemeal generation_iIn comprising m verification microlith, then verification microlith be expressed as：p_1,1, p_1,2…p_{1, m ...,} p_{R, 1}, p_{R, 2}…p_{R, m}。

ReedSolomon encoder matrixs G is expressed as G=[I, V ']^T.Wherein I is m × m unit matrix, and V ' is (m × r) × (m × k) matrix.Microlith p_{I, j}Generating process be：By m × k of file to be stored according to piecemeal d_1,1, d_1,2…d_{1, m ...,} d_{K, 1}, d_{K, 2}…d_{K, m}Arrange in order, and and matrix V ' in mk element on (i-1) m+j rows position it is right successively Should.0-1 distribution situations on (i-1) m+j rows decide verification microlith p_{I, j}Generation rule：By on (i-1) m+j rows All values for " 1 " element position corresponding to those file data piecemeals carry out the accumulating operation of mould 2, obtained result be exactly by The verification microlith that the row is determined.In this way, the submatrix V ' in matrix G, which has altogether, can produce the rm school for original Test microlith p_1,1, p_1,2..., p_{1, m}..., P_{R, 1}, P_{R, 2}..., P_{R, m}, you can to produce t verification macro block.Unit matrix I generations Data block is the original piecemeal of file.Get up in order these original document piecemeals be exactly original by direct splicing.

Then code optimization is carried out for binary coded matrix.Continue encoder matrix being expressed as first：

G=[I_k×m, G_r×m]^TWherein:G_{R, m}=[l_{1, i}, l_{2, i}... l_{R × m, i}]^T

According to the row vector l of generation check bit_{1, i}, l_{2, i}... l_{R × m, i}In " 1 " number determine according to the vector calculate Required XOR calculation times during check bit.And calculate any two vectors l_{A, j}, l_{B, j}Between the digit that differs.Below according to Above parameter determines check bit calculation optimization method.Its Optimizing Flow is as follows：

1. according to the number of " 1 " in each row vector in encoder matrix, determine and check bit institute is calculated according to the row vector The XOR number of times needed；

2. comparing the number of the element identical bits position different from element in encoder matrix between any two row vector, it is designated as (e/d), wherein e represents element identical position number in two vectors；D represents the different position number of element in two vectors；

If 3. row vector l_i(1<i<Rm the XOR number of times required for) is less than or equal in step 2 not isotopic number d, then directly The verification data block according to corresponding to the vector calculates the row is connect, and the vector is designated as l_j；

4. utilize the vectorial l determined in step 3_j, according to identical digit in step 2 and not the ratio between isotopic number, determine next Individual calculating row vector.As certain row vector l_kWith vectorial l_jIsotopic number is not less than identical digit, and l_kWith vectorial l_jNot isotopic number and its Each remaining vector is not when isotopic number reaches minimum, then according to vectorial l_jThe verification data that has calculated that is calculated by l_kThe school of determination Test data；

If not calculating check bit 5. still having, according to the computation rule in step 4, with l_kBased on vector, find it is next Vector to be calculated.

6. complete verification position calculating process is determined whether, if so, check bit calculating process successively is then preserved, if it is not, then Calculated according to original corresponding relation.

For detailed description this method, it is assumed that data storage block D₁, D₂... D_rNode there is abnormal, then built-in terminal The detailed process for obtaining original is as follows：

Step 1：According to encoder matrix G=[I, V ']^TDirectly obtain check matrix H=[V ' T, I_m·r]^TFor to loss Data block is rebuild.

Step 2：From the memory node of normal work, any k memory node of selection downloads k according to block D_r+1, D_r+2... D_k, D_k+l... D_k+r-1, D_k+r。

Step 3：By the macro block D of loss₁, D₂... D_rIt is expressed as X₁, X₂... Xr, make β=[X₁, X₂... X_r, D_r+1... D_k+r-1, D_k+r], wherein β_r=[X₁, X₂... X_r], β_k=[D_r+1... D_k+r-1, D_k+r].That is β=[β_r, β_k].Then according to relation β H_(k+r)r=0 reconstructs the data block of loss.

Step 4：If matrix H_(k+r)rIn be expressed as H ' with losing the corresponding vector matrix of data block_r·r, matrix H_(k+r)rIn Vector matrix corresponding with health data block is expressed as H "_k.r；Then have：

β_l×r·H’_r·r=β_l×k·H”_k·r

Wherein β_l×rIt is unknown, the data block β of loss_l×rLoss data block can be solved according to above formula, i.e.,：

β_l×r=β_l×k·H”_k·r(H’_r·r)^-1

Data block [the X obtained₁, X₂... X_r] be loss data block [D₁, D₂... D_r]。

Step 5：By data block [D₁, D₂... D_r] with the data block D that does not lose in system_r+1, D_r+2..., D_k, according to successively Sequential combination is [D₁, D₂... D_k], then the data block combinations are original.

In storage system network bandwidth constrained environment, if relatively low safeguards bandwidth to realize the reliable of loss data Property recover.Then use the loss data block optimized reconstruction method below based on check matrix.Select the minimum reconstruction band of needs Wide recovery matrix H_(k+r)m·rmMethod.It is specific as follows：

1. check matrix H is calculated first_(k+r)m·rmEach column vector in element " 1 " number.

2. from check matrix H_(k+r)m·rmIn extract lose data block corresponding to row vector, constitute matrix H_r’m·rm, then H_(k+r)m·rmIn remaining row vector constitute matrix H_{(k+r-r’)m·rm}, the vector of its lower end rm constitutes a unit matrix.Top It is expressed as H_{(k-r’)m·rm}。

3. H is determined successively_{(k-r’)m·rm}The number of element " 0 " in middle row vector, be more than when the number of " 0 " in the row vector or During equal to r ' m, the column vector where each " 0 " element of record；And further look for whether to deposit in identified column vector It is more than or equal to r ' m row vector in " 0 " element number, if nothing, column vector determined by record previous step.If so, then Determine new column vector.Circulated with this, and record column vector determined by circulation every time.

4. after chaining search is finished, respectively according to the number of " 1 " in each group of column vector, determine " 1 " element and be Minimum r ' m column vectors, and determine corresponding H_{(r’·m)(r’·m)}Order be that full rank, i.e. the submatrix order are r'm.

In further aspect of the invention, address reference table AIT is introduced to mirror image subset as the addressing dimension of extension.Ground Location concordance list AIT is the metadata for describing addressing chained list ACT attributes, and ACT is divided into single addressable logic by AIT Composition, can independently be accessed, and the video data storage systems of ternary dynamic structure have the ability that concurrent reading and concurrent writing is accessed.And AIT pointer is pointing directly toward the target address location of ACT logical components, and random visit can quickly be realized by comparing without search Ask.

Address reference table AIT is the AHT set of addressing item, i.e. AIT={ AHT₁..., AHT_m..., AHT_M}；

Wherein AHT_mThere is an input item and a corresponding output item.Its input item is one of addressing variables value Combination, output item is the corresponding data directory of the combination.

In the AIT of Video Storage System mirror image subset, each addressing item AHT input value is the addressing change of one group of data Value, i.e. logical address LA, its output valve are an addressing chained list ACT corresponding with LA values pointer, a skew Amount, a data length.The ACT pointers point to position of this group of data memory cell to be accessed in addressing chained list ACT； Offset determines access initial address in the memory cell；Data length defines access profile；When the data length lacks When saving or being 0, represent to access until the end of file.Then, the access for video data storage systems can be according to target The logical address LA of the addressing variables combination of data, uniquely determines a position, from this in file metadata addressing chained list ACT Play the data length that access is defined in read and write access memory node, AHT in position.

The address reference table AIT for accessing Video Storage System mirror image subset is realized using following steps：

1. retrieving metadata address concordance list AIT according to the addressing variables value of access target data, it is derived from one and seeks Location chained list ACT pointers, an offset and a data length；

2. position of the memory cell to be read and write of this group of data in addressing chained list ACT is obtained by the addressing chain table pointer Put, the read-write initial address in the memory cell to be read and write of this group of data is obtained by the offset, passes through the data length Obtain read-write scope；The read-write operation of this group of data is carried out according to the position, the read-write initial address, the read-write scope；

3. when carrying out the read operation of data using multiple threads, or multiple threads carry out the write operation of data but not related to And modification addressing chained list ACT pointers, offset and data length, then it is not related to and generates new addressing chained list, then each thread is each From step (1) and (2) is performed, the concurrent reading and concurrent writing operation of multi-group data is achieved in；

4. when carrying out the write operation of data using multiple threads, it is related to modification addressing chained list ACT pointers or offset Or data length, accessing step is as follows：

(4-1) is not related to modification addressing chained list ACT pointers, then will when being carried out the write operation of data using multiple threads Data are from write storage unit in newly given offset location, when needing to update the data length, calculate new data length And new offset and new data length are charged into the AHT output items；

(4-2) is related to modification addressing chained list ACT pointers, then visited when being carried out the write operation of data using multiple threads Ask the flow into generation address reference table AIT.

In on the basis of the video data storage organization of the present invention, video requency frame data is estimated in coding side using quick motion Meter, is briefly discussed below first：In addition to block motion search is carried out in the range of whole frame and range motion search is limited, simultaneously The search of correspondence grand movement and correspondence small range motion search.Iterative search is used in extensive search mode, with last time The initial point position that search result location is searched for as next time, when search result meets certain condition, i.e., when last time search result With when search result is identical next time, small range search is carried out by starting point of its result position.Small range search result is made For final result.

The function of block sort is added in coding side, the block of frame in is divided into by the present invention jumps over block and direct blocks.To jumping over block, Motion vector is 0, and actual residual error only transmits olive formula information close to 0, not translatory movement vector sum residual information.Wherein Judge to jump over block in the following manner：

D_m=Σ_{I, j ∈ blockm}|X_{(i, j)}-Y_{(i, j)}|/N

X_{(i, j)}Represent block position m in frame in pixel an i, j, Y_{(i, j)}Represent the respective pixel in reference frame, N Represent the number of pixels in block.As result D_mDuring less than predetermined threshold value, the block is set to jump over block.Only delivery mode information is arrived Decoding end.

When carrying out residual computations, reference frame is used as using decoded key frame.Use below decoded key frame Generate side information.In frame in remaining piece, continue to determine to belong to the block of Direct Model, the residual error of the type block is close to 0, only Delivery mode information and motion vector information.In order to reduce coding side complexity, quick moving mass searching algorithm can be used.

Grand movement search iteration number of times maximum is set as 4, small range searching times be 1, correspondence maximum transversal or Fore-and-aft distance (0,7) or (7,0), correspondence fixed length code encoder bit rate are 3 bits.If a wide range of vector search successfully restrains, then Obtained motion residuals and threshold value are compared, threshold value is identical with jumping over pattern thresholding, when no more than threshold value, really It is set to direct module, it is necessary to decoding end delivery mode information and motion vector information.

When block and direct blocks relevant information are jumped in transmission, if both are merged into coding, zero motion vector (0,0) table Show.In translatory movement vector information, block code or algorithm can be used, specific calculation process is as follows：

Step 1. is respectively adopted block code and index coding, takes the less value of code word size in K to motion vector information For rate1；

Step 2. will jump over pattern and Direct Model merges into a class, and code check now is：

Mod e1=mod e (jumping over pattern) ∪ mod e (Direct Model)

Mod e2=mod e (general mode)

Rate2=ent (mod e1, mod e2) * code_length+2*num (mode (jumping over pattern))

Wherein, pattern information mod e () represent the corresponding pattern of type of block, and ent () calculates the entropy of corresponding informance, Code_length is code word size to be encoded.

Step 3. total bitrate is two above code check sum.

Total_rate=ratel+rate2

In decoding end, obtain after corresponding pattern information and motion vector information, carried out to jumping over module and direct module Rebuild.To jumping over module, directly the block of the same position of previous reference frame is regard as last reconstructed block.For direct blocks, profit With motion vector, the block of corresponding motion compensation is regard as last reconstructed block.And block for remaining general mode, it is necessary to Side information and residual information are generated in decoding end.

It is described to generate side information, including procedure below using decoded key frame：

Step 1. obtains initial motion vector.Parallel motion algorithm for estimating is used first, calculates parallel motion vector.Motion What matching was searched for is expressed as：

(v_x, v_y)=argmin_mx,my(D_(mx,my)*(1+0.05(mx²+my²)^1/2)

(v_x, v_y)=± (v_x/2,v_y/2)

Wherein, x_(i,j)Represent reference block pixel, y_(i+mx,j+my)Represent another frame motion search block pixel.| | m | | it is 0 rank Normal form, represents block m size.Last obtained motion estimation vector is the half that the first row calculates obtained motion vector. Negated according to its original orientation of estimation or constant.

After the motion vector of front and rear frame correspondence position block is obtained, the two is converted into unidirectional motion vector, i.e., One of motion vector is negated.This two motion vectors are averaging, the initial motion of bi-directional motion search estimation is obtained Vector.

Step 2. regard the average vector of the 1st step as initial vector for each block, it is assumed that the block is done in a short time Linear uniform motion.That is motion vector of the block in front and rear frame is equal in magnitude, in the opposite direction.In the preset range of initial vector Interior carry out bi-directional motion search, first centered on initial vector, sets hunting zone as -3 to 3, if two fortune of the first step Moving vector difference is more than 5 in either direction, then direction hunting zone is expanded into -5 to 5.If real in this hunting zone Border searching position number is less than thresholding, then continues centered on null vector, leads scope to search with -6 to 6 and scans for, takes the two Minimum value is motion search result, and the result to motion search carries out residual computations, and calculation formula is as follows:

(v_x, v_y)=arg min_mx,my D_(mx,my)

X, y represent RELATED APPLICATIONS frame.

When the absolute residuals of calculating is with minimum is reached, the motion vector result of bi-directional motion estimation is obtained.Should with basis Motion vector, obtains corresponding side Message Reference Block sideblock, residual error estimation block residentblock and residual information resident。

Residide=min (D_(mx,my))

Sideblock_(i,j)=(x_(i-vx,j-vy)+y_(i+vx,j+vy))/2

residentblock_(i,j)=(x_(i-vx,j-vy)-y_(i+vx,j+vy))/2

The result for the motion vector estimation that step 3. is obtained for step 2 is further processed.Work as motion-vector magnitude During more than certain thresholding, two-way parallel motion estimation compensation is carried out.According to step 1 calculate obtain two-way parallel motion estimate to Amount, obtains four motion compensation blocks, for four obtained motion compensation blocks, if it is respectively positioned within image display range, and The distance between equidirectional motion vector is less than preset range, the corresponding side block of information of the block can be calculated by following formula and residual Poor block：

Sideblock=(block₁+block₂+block₃+block₄)/4

Residentblock=(block₁+block₂-block₃-block₄)/4

Wherein block₁And block₂Belong to forerunner's frame of present frame, block₃And block₄Belong to subsequent frame.

Step 4. pair is in the block of movement edge, if its corresponding residual values is more than thresholding, is handled as follows：It is first First, the two parallel motion estimate vectors obtained for the 1st step take any one motion vector, if in initial motion estimation In the reverse direction, the position of the vectorial estimation has exceeded image boundary, and the estimation residual error of this direction motion vector is less than The estimation residual error of opposite direction motion vector, then its corresponding motion compensation block mended with the parallel motion that is obtained on the estimation direction Block is repaid to represent.In the case, compensated with unidirectional search.If the condition of the 1st step is unsatisfactory for, to another move to Amount is handled.Obtained motion compensation block is weighted average.

In summary, the present invention proposes a kind of data encoding storage method, and video memory node is utilized as few as possible Collect internal network bandwidth and computing capability to realize data recovery, while data high availability is realized, improve autgmentability Energy.

Obviously, can be with general it should be appreciated by those skilled in the art, above-mentioned each module of the invention or each step Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and constituted Network on, alternatively, the program code that they can be can perform with computing system be realized, it is thus possible to they are stored Performed within the storage system by computing system.So, the present invention is not restricted to any specific hardware and software combination.

It should be appreciated that the above-mentioned embodiment of the present invention is used only for exemplary illustration or explains the present invention's Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent substitution, improvement etc., should be included in the scope of the protection.In addition, appended claims purport of the present invention Covering the whole changes fallen into scope and border or this scope and the equivalents on border and repairing Change example.

Claims

1. a kind of data encoding storage method, for carrying out video data storage, it is characterised in that including：

Piecemeal is encapsulated after file data is encoded；

2. according to the method described in claim 1, it is characterised in that the video data storage systems include coordination service device, When memory node is added, the Resources list controlled oneself is supplied to coordination service device.

3. method according to claim 2, further comprises：The hashed value of file to be uploaded is calculated, and the value is uploaded To coordination service device, coordination service device coordinates each memory node and the value is inquired about, and when finding to exist the value, coordinates clothes Business device updates the reference degree of this document.

4. method according to claim 3, further comprises：When not detecting same Hash, built-in terminal is received This document, and hash is calculated to the block sort of file, and distributed storage is into the node of mirror image subset.

5. method according to claim 3, the hashed value of the calculating file to be uploaded, in addition to：File is divided into Each piecemeal, and calculate the SHA values of each piecemeal, using the hashed value of whole file as this document characteristic signature；Will be each The characteristic signature of file is placed in internal memory together with constituting metadata with file path and other relevant informations, and its each piecemeal Signature be placed in disk, only when system has node abnormal, just the signature of each piecemeal is read in internal memory, to be embedded in Formula terminal-pair is lost after data recovery, carries out verification contrast.

6. method according to claim 5, it is characterised in that by the positional information and its hashed value and file of piecemeal Block identification is uniformly stored in a table.