CN107135264B

CN107135264B - Data coding method for embedded device

Info

Publication number: CN107135264B
Application number: CN201710331787.3A
Authority: CN
Inventors: 许荣福
Original assignee: Chengdu Ueevee Information Technology Co ltd
Current assignee: Chengdu Ueevee Information Technology Co ltd
Priority date: 2017-05-12
Filing date: 2017-05-12
Publication date: 2020-09-08
Anticipated expiration: 2037-05-12
Also published as: CN107135264A

Abstract

The invention provides a data coding method for an embedded device, which comprises the following steps: when the embedded terminal software runs, calculating the hash value of the file to be uploaded, inquiring the value from each storage node, when the same hash is not detected, receiving the file by the embedded terminal, calculating the hash of the information of the file in blocks, and storing the hash to the nodes of the mirror image subset in a distributed manner. The invention provides a data encoding method for embedded equipment, which realizes data recovery by using the internal network bandwidth and the computing capacity of a video storage node set as little as possible, and improves the expansion performance while realizing high availability of data.

Description

Data coding method for embedded device

Technical Field

The present invention relates to video processing, and more particularly, to a data encoding method for an embedded device.

Background

With the continuous development of information technology, data increasingly becomes an important resource in daily life of people. The explosive growth of data entails a continuous increase in storage devices. At present, the scale of storage nodes of a modern data center in a data storage environment is few, and tens of thousands, but in a storage environment system with a huge scale, the abnormal or failure of the storage nodes becomes a common phenomenon; meanwhile, data inaccessibility or data loss caused by network connection equipment or other components of the storage node also occurs occasionally. For video coding and storage, the coding and decoding complexity with less calculation amount and how to recover data with the least data amount when the data is lost have local time characteristics, such as storage center network bandwidth factor and CPU computing power factor, and when a video file is stored by using a coding redundancy strategy, the storage time performance of the file is affected. If the system has high-speed bandwidth and high-performance computing power, a file with a video storage unit size consumes a shorter time. The higher reliability and the minimum data redundancy in the system and the less power consumed by the system have global time characteristics, which directly determine the equipment cost, the management cost and the energy consumption cost consumed by the system. In order to meet the increasingly expanding data storage requirements, people put higher demands on the reliability, availability and other relevant characteristics of video data storage, and how to realize low-redundancy high-reliability storage of data has become a great challenge in the industry.

Disclosure of Invention

In order to solve the problems existing in the prior art, the invention provides a data encoding method for an embedded device, which comprises the following steps:

when the embedded terminal software runs, calculating the hash value of the file to be uploaded, inquiring the value from each storage node, when the same hash is not detected, receiving the file by the embedded terminal, calculating the hash of the information of the file in blocks, and storing the hash to the nodes of the mirror image subset in a distributed manner.

Preferably, the querying the value from each storage node further comprises:

when the value is found to exist, the embedded terminal is informed that the data is stored.

Preferably, the mirror subsets form a unified single file map, with consistent encoded storage views formed between each subset.

Preferably, the storage nodes in each subset store different partitions of the same file, and the system maintains a mapping relationship between the different partitions of the same file and the storage nodes.

Preferably, the mirror subsets are combined into a tree diagram with a hierarchical structure to establish a mapping relationship between the storage file set and the device set.

Preferably, each storage node independently maintains the storage resources of the subset and the metadata of the file, and independently provides a file blocking reading service.

Preferably, whether to start the next mirror subset is determined according to the storage data amount and the utilization rate of the storage system.

Compared with the prior art, the invention has the following advantages:

the invention provides a data encoding method for embedded equipment, which realizes data recovery by using the internal network bandwidth and the computing capacity of a video storage node set as little as possible, and improves the expansion performance while realizing high availability of data.

Drawings

Fig. 1 is a flowchart of a data encoding method for an embedded device according to an embodiment of the present invention.

Detailed Description

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.

An aspect of the present invention provides a data encoding method for an embedded device. Fig. 1 is a flowchart of a data encoding method for an embedded device according to an embodiment of the present invention.

The video data storage system of the invention adopts a storage node subset expansion strategy. A new data recovery mode is adopted on a system software structure, and meanwhile, the computing capacity of the embedded terminal is utilized, so that the system can realize the recovery and reconstruction of the lost data by utilizing the internal network bandwidth and the computing capacity of the storage node set as less as possible. And migrating the recovery function part of the lost data block to the embedded terminal. The video data storage system encodes and packages data of a single file in blocks, and uniformly stores the data on different nodes of a mirror image subset, the system provides an extensible storage volume, and the data in the volume is organized by adopting a hierarchical directory structure and supports concurrent access of multiple machines and multiple processes. The system can expand the mirror subsets according to the requirement of storage capacity, and the aim of expanding as required is achieved by using the storage capacity of each mirror subset node.

The mirror subsets form a unified single file map, and a consistent encoded storage view is formed between each subset. The storage nodes in each subset store different blocks of the same file, and the system maintains the mapping relation between the different blocks of the same file and the storage nodes. All the mirror image subsets are combined into a tree diagram with a hierarchical structure so as to establish the mapping relation between the storage file set and the equipment set. Meanwhile, each storage node develops a section of independent storage space for data storage with special purposes of the system, and the condition that storage node storage file directories in the mirror image subset are inconsistent is avoided. Each storage node in the system independently maintains the storage resources of the subsets and the metadata of the files, and can independently provide file blocking reading service. When the magnetic disk in the storage node is damaged, the node where the magnetic disk is located recovers the data block, the file recovery node stores the scheduling distribution type in the minimum check blocks which can meet the reconstruction requirement in the subset, and the lost data block is reconstructed. And if the plurality of storage nodes are abnormal, the file server calculates the calculated amount required by recovering all the files and distributes tasks to the recovery nodes according to the balanced load principle of the calculated amount. When the node abnormality occurs in the check information subset, the system carries out secondary encoding on the scheduling source file and deploys the scheduling source file on the newly added node again.

The storage nodes adopt a peer-to-peer structure design, when the storage nodes are added into the video data storage system, the self resource list is provided for the coordination server, any one of the nodes in the group can be used as a requester of file blocking and also can be used as a provider of the file blocking, and the video data storage system determines whether to start the next mirror image subset according to the storage data volume and the utilization rate of the storage system. In any mirror subset S_kAnd (A; B) (k is 1, 2, 3.) the storage nodes are divided into file block storage nodes and encoding check block storage nodes. Information mirror subset S formed by file block storage nodes_k(A) Wherein node a_k，i∈S_k(A) (k, i belong to positive integers) for storing the chunks of the original file; and a code check chunking storage node b_k，i(k, i belong to a positive integer) constitutes a check information mirror subset s (b).

In order to reduce the data redundancy of the files inside the whole system, the system architecture of the invention introduces file-level data de-duplication into the system. And executing a policy of duplicate data deduplication at the data source location. When the embedded terminal software runs, calculating a hash value of a file to be uploaded by using an SHA encryption algorithm, uploading the hash value to a coordination server, coordinating each storage node by the coordination server to inquire the value, updating the reference degree of the file by the coordination server when the value is found to exist, informing the embedded terminal that data is stored, receiving the file by the embedded terminal when the same hash is not detected, calculating the hash of information blocks of the file, and storing the hash into nodes of a mirror subset in a distributed manner.

The same data detection of the file level can detect the same files with different file names and can also detect the same files under different directories. The system divides the file into blocks, calculates SHA value of each block, and takes the hash value of the whole file as the characteristic signature of the file. The system stores the characteristic signature of each file, the file path and other related information into metadata in a memory, the signature of each block is stored in a disk, and the signature of each block is read into the memory only when the system has node abnormality, so that the embedded terminal performs verification and comparison after recovering lost data. In order to quicken the search of the blocks, the system uniformly stores the position information of the blocks, the hash value of the block and the file block identification in a table.

When the system has node abnormality, data is lost and the embedded terminal needs to read data, the embedded terminal reconstructs the lost data block at the embedded terminal in a reconstruction mode by downloading part of original data blocks and part of verification data blocks, and after the recovery is finished, the embedded terminal reconstructs the reconstructed data blocks while using the file, and the data blocks are subjected to data verification and are sent to the storage system again.

Aiming at the characteristic that the data storage nodes in each mirror image subset have the same directory information, the metafiles in the mirror image subset are uniformly stored into each storage node in a blocking mode, and when the embedded terminal needs the metafile information, each storage node of the mirror image subset inquires the metafile stored by the node. Thus, the query of the metadata is converted into the query of each storage node to the metadata subblock.

When the embedded terminal sends a file storage request, the file server stores the file according to the subset S_k(A; B) operation of the nodes, generating metadata, and forwarding to the subset S_kSome idle node in (A; B) sends out instruction, embedded terminalData interaction will be directly performed with the storage node. The node divides and codes the file to generate check data, and packages the check data according to a data packaging format. The original chunk of the file will be sent to the subset S_k(A) Checking data block distributed storage to subset S_k(B) The above. Thus, the data blocks and check blocks of each file span all storage nodes in the subset, and the nodes have the same file storage view. When mirroring the subset S_kWhen the storage space in (A; B) is about to be used up, the system starts the next mirror image subset S_k+1(A; B). To increase the utilization of the computing resources of a cluster storage system, subset S₁(B)，S₂(B)，...S_k+1(B) The middle node can also be used by the file server to distribute the encoding calculation for the subsequent files to be stored.

When a file reading requirement exists, the embedded terminal requests source data from the file server, and the file server directly sends mapping and matching information among the file to be read, the mirror image subset and the storage node to the embedded terminal. After obtaining the response, the embedded terminal firstly converts the required file name and byte offset into the index of the file, sends a request containing the file name and the index to the storage node, and directly establishes file reading interoperation with the target storage node set. And the embedded terminal sends a request to each storage node in the file image subset to request the specified file blocks and the data areas in the blocks. If the subset of the image S is downloaded_k(B) And if the check block is stored in the file, the original file is obtained by using a data reconstruction mode.

When the embedded terminal sends a file storage request and the storage system has no same data, the coordination server is used for coordinating the server according to the subset S_k(B) Generating metadata for the operation of the middle node and applying the metadata to the subset S_k(B) A certain idle node in the embedded terminal sends out an instruction, and the embedded terminal directly performs file interaction with the storage node. The node divides the file into blocks and codes the file, generates check data, packages the check data according to a data packaging format, calculates the hash value of each file information block at the same time, and sends the hash value to the coordination server. The data blocks of the storage file will be sent to the subset S_k(A) Checking numberDistributed storage of data blocks into subsets S_k(B) The above. Thus, the data blocks and check blocks of each file span all storage nodes in the subset, and the nodes have the same file storage view. When mirroring the subset S_kWhen the storage space in (A; B) is about to be used up, the system starts the next mirror image subset S_k+1(A；B)。

When a file reading requirement exists, the embedded terminal requests source data from the file server, and the file server directly sends mapping and matching information among the file to be read, the mirror image subset and the storage node to the embedded terminal. After obtaining the response, the user firstly converts the required file name and byte offset into the index of the file, sends a request containing the file name and the index to the storage node, and directly establishes file reading interoperation with the target storage node set. And the coordination server returns the index of the file block according to the query result, wherein the index comprises the cluster subset of the file and the position of the data block. And the embedded terminal sends a request to each storage node in the file image subset to request the specified file blocks and the data areas in the blocks. The embedded terminal will recombine the file in order and obtain the original file.

The coordination server is responsible for distributing cluster storage tasks and managing metadata, records information of each node in the storage cluster, subset division information of the storage nodes, directory information of system storage files, mapping relation information from files to blocks and hash values of each file block, and is also responsible for strategy determination when a lost file block is recovered, recovery task distribution and migration management of the file blocks. The system realizes the monitoring of the system node state by using a combination mode of periodic heartbeat and event broadcasting, and when a new node appears in the system, the node transmits the information of the node to the coordination server and each storage node in a broadcasting mode. Meanwhile, each storage node reports the existing state to the corresponding file management node periodically, and if the corresponding file management node does not receive the heartbeat within a period of time, the storage node is considered to be abnormal.

In the recovery process, the abnormal node firstly detects the connection condition with other nodes, when the node for recovering the file has network connection with larger difference with other nodes, the data path evaluation is carried out on the test data packets sent by each link, the data path evaluation is carried out, the sequencing is carried out, and the minimum number p of file blocks needing to be recovered is calculated, so that the file blocks are obtained from the optimal p network connections, and the file is recovered. If the node k is responsible for recovering the lost file blocks of the file f, the node k sends all file block reading requests from p nodes connected with the node k, after the node k obtains the required file blocks, the node k recovers the source file by using a system coding method, the source file is secondarily coded by using a coding method adopted by the system according to the marks and the lost nodes and file blocks to obtain lost redundant file blocks, the redundant file blocks are re-packaged according to the file packaging protocol and the size of each lost file block, and one file block of the re-recovered file block f is respectively and uniformly stored on the storage nodes in sequence. And each node develops a special partition to temporarily store the reconstructed data block, and when the abnormal node is replaced, the reconstructed data are uniformly placed according to the data block placing path of the coordination server.

The method and the device perform lost data block recovery based on the use requirement of the embedded terminal. And realizing the reconstruction of the file blocks on the abnormal nodes by combining the reconstruction of the lost file blocks participated by a large amount of computing resources scattered on the embedded terminal with the centralized reconstruction in the cluster.

If the original file is k blocks and n-k check data blocks are generated after encoding, the original file can be reconstructed by randomly taking out the k blocks from the n data blocks. When the cluster storage system runs, a critical parameter k (1) is rebuilt<k<n-m) are set. When abnormal storage node k in cluster_f<And k, the cluster manager does not organize the internal nodes to recover the data blocks on the abnormal storage nodes, but recovers the required data blocks by using the user. When a user reads a certain file, the remaining n-k in the cluster needs to be downloaded at the same time_fIndividual original file data block sum k_fChecking the data block, downloading the code word information, and reconstructing abnormal k_fA data block, and the reconstructed data block and the downloaded n-k_fAnd splicing the data blocks into an original file. Meanwhile, the embedded terminal pair recovers k_fAnd packaging the data blocks again according to the packaging format of the data blocks in the cluster system, and uploading the data blocks to the server cluster. The file server calculates the hash value of the data block, compares the hash value with the stored hash value of the original data block, stores the data block if the hash values are the same, and rejects the uploading request of the data block if the hash values are different. When the number of abnormal nodes in the computer cluster exceeds the set reconstruction critical parameter value k, the coordination server determines an intra-cluster recovery strategy according to the data size of the unrecovered file blocks and the operation condition of the nodes in the cluster. And the coordination server calculates the calculated amount required by reconstructing all the residual file blocks and performs task allocation on the recovery nodes according to the balance load principle of the calculated amount. And deploying the recovered file blocks on the storage nodes in the cluster again.

When the data block is realized, the embedded terminal firstly calculates the hash function value of the reconstructed data block, the hash value is uploaded to the coordination server, the hash value database which is stored when the initial file is stored in the coordination server, the hash value database is compared with the hash value of the lost file in the database, if the hash value of the data block reconstructed by the embedded terminal is found to be the same as the hash value of the lost data block, the embedded terminal is allowed to upload the data block, and if the hash value of the data block reconstructed by the embedded terminal is different from the hash value of the lost data block, the reconstructed data block is incorrect, or the data is maliciously tampered, and the data block is not received. For important data, when the data block is uploaded, secondary detection needs to be carried out on the uploaded data block in the system. And the management node calculates the hash value of the data block again, compares the hash value with the hash value of the original data block again, and detects whether the data block is attacked or tampered maliciously in the uploading process.

When the system has no original file blocks lost, the embedded terminal directly downloads the original file blocks to read the file. When network blockage occurs, the embedded terminal obtains better file reading performance than directly downloading the original data block in a reconstruction mode. If a file is M, the number of information nodes isThe number is k, and the number of check nodes is r. If the data reading rate that the information node can provide at a certain time is m_aAnd the check node can provide a data download rate of m_bThen m is_b>m_a. If the rate of reconstructing the M/k data block by the embedded terminal is M_dIf there is

And if so, acquiring the source file in a self-adaptive mode. And the embedded terminal sends a file reading request and detects the network communication condition with each node. And when the connection state of each node is the same, directly reading the file information with the original data block. When the connection with the node storing the original data block is found to be poor, calculating the time t for acquiring the file by downloading part of check data blocks to reconstruct_rAt the same time, the time t0 for directly reading the data block from the node with poor connection is calculated, when t_r<t₀And acquiring the file in a reconstruction mode. When t is_r>t₀And if so, the method adopts a common downloading mode.

For the characteristics of reading the interior of the streaming media file, the invention firstly carries out uniform blocking on the media file, then carries out check calculation on the blocks to obtain check blocks, and simultaneously copies the first t file blocks of the streaming media file and respectively stores the copied blocks on each node of the storage cluster. The system adopts an independent management mode for the backup blocks, and the storage nodes store the backup data of the data blocks by using the space which is independently opened up in the disk. Counting the number x of times of reading a certain file, and if the number of times of reading the certain file in unit time is greater than a certain set value y, keeping the data block number of the file in a state of coexistence of copy and coding redundancy. If the number of times of reading in unit time is less than a set value z, the system clears all the copy blocks of the file.

Nodes in the system are further divided into active storage nodes and dormant storage nodes. The active node is used for storing new files and reading data in the system by a user. Preferably, a subset S of storage nodes is to store file information_n(A) Set as an active node with its disk atActive state to satisfy data reading request of mass users, storage node subset S for storing check data_n(B) The medium storage nodes are set as static nodes, so that the requests are only directed to partial distributed storage nodes. When the system is used for querying repeated data hash values, distributed query is carried out by using the nodes of the part. Meanwhile, the file manager counts the reading frequency of the file, high-frequency data are transferred to the active node, and data with low access frequency are transferred to the dormant storage node.

Logically, if the storage system has n storage nodes in total, the storage system needs to achieve erasure correction performance that the system can tolerate the occurrence of an exception of any r storage nodes. When the embedded terminal makes a file storage request, the system firstly blocks the file, and the number of the blocks is k-n-r. And r check blocks are generated by using a Reed Solomon coding matrix G. And storing original blocks of the file by using k nodes, wherein the rest r nodes are used for storing check data blocks generated after the AND-G operation. The specific process comprises the following steps:

the method comprises the following steps: when the system receives a file storage request, the system directly blocks the file into m × k file blocks, and if the file size cannot be directly divided by m × k, a "0" is added at the end of the file. And directly calculating the vector in the coding matrix G and the divided m multiplied by k data blocks by using a rule of a position structure corresponding to '0' and '1' in each row vector in the generated matrix to obtain a check data block.

Step two: if the partitioning of the original file uses D ═ D (D)₁，D₂，…D_k)^TIs shown by_iReferred to as macroblocks. D_iConsisting of m micro-blocks, and for D_iM data blocks (d) in (c)_i，1，d_i，2…d_i，m)^TReferred to as micro-block groups. If the generated check macro block group uses P ═ P (P)₁，P₂，…P_r)^TRepresents, wherein each check macro block P_iWhich contains m check microblocks. The original file block and the check block are collected by E ═ D₁，D₂，…D_k|P₁，P₂，…P_r)^TAnd (4) showing. Then: g · D ═ E.

The m × k data chunks for the entire file may be denoted as d_1，1，d_1，2…d_1，m，…，d_k，1，d_k，2…d_k，_m. Each check macro block P generated by original file block_iContains m check microblocks, the check microblocks are respectively expressed as: p is a radical of_1，1，p_1，2…p_1，m，…，p_r，1，p_r，2…p_r，m。

Representing the Reed Solomon coding matrix G as G ═ I, V']^TWhere I is a unit matrix of m × m and V' is a matrix of (m × r) × (m × k)_i，jThe generation process comprises the following steps of dividing m × k data blocks of the file to be stored into d_1，1，d_1，2…d_1，m，…，d_k，1，d_k，2…d_k，mArranged in sequence and corresponding to the position of the m.k elements in the (i-1). m + j row of the matrix V'. The 0-1 distribution on the (i-1) · m + j row determines the check of the micro-block p_i，jThe generation rule of (2): and performing modulo-2 accumulation operation on the file data blocks corresponding to all element positions with the value of 1 on the (i-1) · m + j row to obtain a result, namely a check micro-block determined by the row. Thus, the sub-matrices V' in the matrix G can generate r · m check micro-blocks p for the original file altogether_1，1，p_1，2，…，p_1，m，…，P_r，1，P_r，2，…，P_r，mI.e. t check macroblocks can be generated. And the data block generated by the unit matrix I is the original block of the file. The original file blocks are directly spliced together in sequence to form the original file.

Then, encoding optimization is carried out on the binary encoding matrix. We first continue to express the coding matrix as:

G＝[I_k×m，G_r×m]^Twherein G is_r，m＝[l_1，i，l_2，i，…l_r×m，i]^T

According to the row vector l of the generated check bit_1，i，l_2，i，…l_r×m，iThe number of the middle '1's determines the number of XOR computations required to compute the check bits from the vector. And calculates any two vectors l_a，j，l_b，jThe number of bits being different from one another. The check bit calculation optimization method is determined according to the parameters. The optimization process comprises the following steps:

1. determining the XOR times required by calculating the check bit according to the row vector according to the number of '1' in each row vector in the coding matrix;

2. comparing the number of identical elements and different elements between any two row vectors in the coding matrix, and marking as (e/d), wherein e represents the number of identical elements in the two vectors; d represents the number of bits of the two vectors with different elements;

3. if the row vector l_i(1<i<r.m) is less than or equal to the different digits d in step 2, the check data block corresponding to the row is directly calculated according to the vector, and the vector is recorded as l_j；

4. Using the vector l determined in step 3_jAnd determining the next calculation row vector according to the ratio of the same digit to different digits in the step 2. When a certain row vector l_kAnd vector l_jDifferent digits are less than the same digit, and l_kAnd vector l_jWhen the different number of bits reaches the minimum with the different number of bits of the rest vectors, the vector l is used_jCalculated check data to calculate the sum of_kThe determined check data;

5. if there are not calculated check bits, according to the calculation rule in step 4, using l_kFor the base vector, the next vector to be calculated is sought.

6. And determining whether all check bit calculation processes are performed, if so, storing the check bit calculation processes in sequence, and if not, calculating according to the original corresponding relation.

For a detailed description of the method, assume that a block of data D is stored₁，D₂，…D_rWhen the node is abnormal, the embedded terminal obtains the original file in the following specific process:

step 1: root of herbaceous plantAccording to coding matrix G ═ I, V']^TDirectly obtaining check matrix H ═ V' T, I_m·r]^TFor reconstructing the lost data block.

Step 2: randomly selecting k storage nodes to download k data blocks D from the storage nodes which normally work_r+1，D_r+2，…D_k，D_k+l，…D_k+r-1，D_k+r。

And step 3: macroblock D to be lost₁，D₂，…D_rAre respectively represented as X₁，X₂，…X_r，Let β be [ X ]₁，X₂，…X_r，D_r+1，…D_k+r-1，D_k+r]Wherein β_r＝[X₁，X₂，…X_r]，β_k＝[D_r+1，…D_k+r-1，D_k+r]That is, β ═ β_r，β_k]According to the relation β. H_(k+r)rThe lost data block is reconstructed as 0.

Step four: if the matrix H_(k+r)rIs represented as H 'by the vector matrix corresponding to the missing data block'_r·rMatrix H_(k+r)rThe vector matrix corresponding to the good data block is denoted as H "_k.r(ii) a Then there are:

β_l×r·H’_r·r＝β_l×k·H”_k·r

β therein_l×rIs unknown, missing data block β_l×rThe missing data block can be solved as follows:

β_l×r＝β_l×k·H”_k·r(H’_r·r)^-1

found data block [ X₁，X₂，…X_r]I.e. the lost data block D₁，D₂，…D_r]。

Step five: will data block [ D₁，D₂，…D_r]And the data block D not lost in the system_r+1，D_r+2…，D_kAre combined into [ D ] according to the sequence₁，D₂，…D_k]Then the data block combination is the original file.

In an environment where storage system network bandwidth is limited, reliability recovery of lost data is achieved if bandwidth is maintained low. The following lost data block optimized reconstruction method based on the check matrix is adopted. I.e. the recovery matrix H requiring the least reconstruction bandwidth is selected_(k+r)m·rmThe method of (1). The method comprises the following specific steps:

1. firstly, a check matrix H is calculated_(k+r)m·rmThe number of elements "1" in each column vector.

2. From the check matrix H_(k+r)m·rmExtracting the row vector corresponding to the lost data block to form a matrix H_r’m·rmThen H is_(k+r)m·rmThe remaining row vectors form a matrix H_{(k+r-r’)m·rm}The lower r · m vectors form a unit matrix. The upper part is represented by H_{(k-r’)m·rm}。

3. Sequentially determining H_{(k-r’)m·rm}The number of elements 0 in the middle row vector is recorded, and when the number of 0 in the row vector is greater than or equal to r' · m, the column vector where each 0 element is located is recorded; and further searching whether a row vector with the number of 0 elements being more than or equal to r'm exists in the determined column vectors, and if not, recording the column vector determined in the previous step. If yes, determining a new column vector. This is looped around and the column vector determined for each loop is recorded.

4. After the cyclic search is finished, determining the ' 1 ' element and r ' · m column vectors which are minimum according to the number of ' 1 ' in each group of column vectors, and determining the H corresponding to the column vectors_{(r’·m)(r’·m)}Is full, i.e. the sub-matrix rank is r' · m.

In a further aspect of the invention, the address index table AIT is introduced into the mirror subset as an extended addressing dimension. The address index table AIT is used for describing the metadata of the attribute of the addressing linked list ACT, the AIT divides the ACT into single addressable logic components which can be accessed independently, and the video data storage system with a ternary dynamic structure has the capability of parallel read-write access. And the pointer of AIT points to the target address unit of ACT logic component directly, and it can realize random access quickly without searching and comparing.

The address index table AIT is a set of addressing entries AHT, i.e. AIT ═ AHT₁，…，AHT_m，…，AHT_M}；

Wherein AHT_mThere is one input item and a corresponding one output item. Its input item is a combination of addressing variable values and its output item is the data index corresponding to the combination.

In the AIT of the image subset of the video storage system, the input value of each addressing item AHT is the addressing variable value of a group of data, i.e. the logical address LA, and the output values are the pointer of an addressing linked list ACT corresponding to the LA value, an offset and a data length. The ACT pointer points to the position of the group of memory cells to be accessed in the addressing linked list ACT; the offset determines an access start address in the memory location; the data length specifies the access range; when the data length is 0 or default, it indicates access up to the end of the file. Therefore, for the access of the video data storage system, a position can be uniquely determined in the file metadata addressing linked list ACT according to the logical address LA of the addressing variable combination of the target data, the storage node is read and written from the position to access, and the length of the accessed data is specified in the AHT.

The access to the address index table AIT of the video storage system image subset is realized by the following steps:

1. retrieving a metadata address index table AIT according to an addressing variable value of access target data, thereby obtaining an addressing linked list ACT pointer, an offset and a data length;

2. obtaining the position of the group of memory units to be read and written in the addressing linked list ACT through the addressing linked list pointer, obtaining the read-write initial address in the group of memory units to be read and written through the offset, and obtaining the read-write range through the data length; performing the reading and writing operation of the group of data according to the position, the reading and writing initial address and the reading and writing range;

3. when a plurality of threads are adopted for reading data, or a plurality of threads are adopted for writing data but not involved in modifying the ACT pointer, the offset and the data length of the addressing linked list, and not involved in generating a new addressing linked list, each thread respectively executes the steps (1) and (2), thereby realizing the parallel reading and writing operation of a plurality of groups of data;

4. when a plurality of threads are adopted to write data, the addressing linked list ACT pointer or offset or data length is modified, and the access steps are as follows:

(4-1) when a plurality of threads are adopted to write data, the addressing linked list ACT pointer is not modified, the data is written into the storage unit from a new given offset position, and when the data length needs to be updated, the new data length is calculated and the new offset and the new data length are recorded into the AHT output item;

(4-2) when a plurality of threads are used for data writing operation, the addressing linked list ACT pointer is modified, and then the access enters the process of generating the address index table AIT.

In the video data storage structure of the present invention, the video frame data uses fast motion estimation at the encoding end, and first briefly described as follows: in addition to performing the block motion search and the limited range motion search over the entire frame range, a large range motion search and a corresponding small range motion search are simultaneously corresponded. Iterative search is adopted in a large-range search mode, the position of the last search result is used as the position of the starting point of the next search, and when the search result meets a certain condition, namely when the last search result is the same as the next search result, the result position is used as the starting point to conduct small-range search. And taking the small-range search result as a final result.

The invention adds the function of block classification at the encoding end, and divides the blocks in the frame into a skip block and a direct block. For a skip block, the motion vector is 0 and the actual residual is close to 0, so only the elemene information is transmitted and the motion vector and residual information are not transmitted. Wherein the skip block is judged by:

D_m＝Σ_{i，j∈blockm}|X_(i，j)-Y_(i，j)|/N

X_(i，j)representing a pixel i, j, Y in a block position m in a frame_(i，j)Representing the corresponding pixel in the reference frame and N representing the number of pixels in the block. When the result D is_mAnd when the value is smaller than the preset threshold value, setting the block as a skip block. Only the mode information is transmitted to the decoding side.

During residual calculation, a decoded key frame is used as a reference frame. The side information is generated using the decoded key frames as follows. Among the remaining blocks in the frame, the block belonging to the direct mode is determined continuously, the residual of this type of block is close to 0, and only the mode information and the motion vector information are transmitted. To reduce the complexity at the encoding end, a fast motion block search algorithm may be employed.

The maximum value of the large-range motion search iteration times is set to be 4, the small-range search times are set to be 1, the maximum transverse or longitudinal distance (0, 7) or (7, 0) corresponds to, and the coding code rate of the corresponding fixed length code is 3 bits. If the large-range vector search is successfully converged, the obtained motion residual error is compared with a threshold value, the threshold value is the same as the skip mode threshold, and when the threshold value is not exceeded, the direct module is determined and mode information and motion vector information need to be transmitted to a decoding end.

When the skip block and the direct block related information are transmitted, if both are coded in combination, a zero motion vector is represented by (0, 0). When the motion vector information is transmitted, fixed-length coding or an algorithm can be used, and the specific calculation flow is as follows:

step 1, respectively adopting fixed length coding and exponential coding to the motion vector information, and taking the value with smaller code word length in K as rate 1;

step 2, combining the skip mode and the direct mode into a class, wherein the code rate at the moment is as follows:

mod e1 ═ mod e (skip mode) < U mod e (direct mode)

mode 2 ═ mode (common mode)

rate2 ent (mode 1, mode 2) code _ length +2 num (mode)

Mode information mode () represents a mode corresponding to the type of the block, ent () calculates entropy of corresponding information, and code _ length is the length of a codeword to be encoded.

And 3, the total code rate is the sum of the two code rates.

total_rate＝ratel+rate2

And at the decoding end, after obtaining the corresponding mode information and the corresponding motion vector information, reconstructing the skip module and the direct module. For the skip module, directly using the block at the same position of the previous reference frame as the last reconstructed block. For a direct block, the corresponding motion compensated block is taken as the last reconstructed block using the motion vector. While for the remaining blocks of the normal mode, side information and residual information need to be generated at the decoding end.

The generating of the side information using the decoded key frame includes the following processes:

step 1, obtaining an initial motion vector. Firstly, a parallel motion estimation algorithm is adopted to calculate a parallel motion vector. The representation of the motion matching search is:

(v_x，v_y)＝argmin_mx,my(D_(mx,my)*(1+0.05(mx²+my²)^1/2)

(v_x，v_y)＝±(v_x/2,v_y/2)

wherein x is_(i,j)Representing a reference block pixel, y_(i+mx,j+my)The block pixels are searched for on behalf of another frame of motion. And m is a 0-order norm representing the size of the block m. The resulting motion estimation vector is half of the motion vector calculated in the first row. Inverted or unchanged according to the original direction of motion estimation.

And after the motion vectors of the blocks at the corresponding positions of the front frame and the rear frame are obtained, converting the motion vectors into the motion vectors in the same direction, namely inverting one of the motion vectors. Averaging the two motion vectors to obtain the initial motion vector of the bidirectional motion search estimation.

And 2, regarding each block, taking the average vector in the step 1 as an initial vector, and assuming that the block does uniform linear motion in a short time. I.e. the motion vectors in the previous and subsequent frames of the block are equal in size and opposite in direction. And performing bidirectional motion search in a preset range of the initial vector, setting a search range to be-3 to 3 by taking the initial vector as a center, and expanding the search range of the direction to be-5 to 5 if the difference value of the two motion vectors in the first step is more than 5 in any direction. If the actual search position number is smaller than the threshold in the search range, continuously searching by taking the zero vector as the center and the-6 to 6 as the search guide range, taking the minimum value of the two as a motion search result, and performing residual calculation on the result of the motion search, wherein the calculation formula is as follows:

(v_x，v_y)＝arg min_mx,myD_(mx,my)

x, y represent the pre-and post-reference frames.

When the calculated sum of absolute residuals reaches a minimum, a motion vector result of the bidirectional motion estimation is obtained. And obtaining a corresponding side information reference block sideblock, a residual estimation block resementantblock and residual information reseident according to the motion vector.

Residide＝min(D_(mx,my))

Sideblock_(i,j)＝(x_(i-vx,j-vy)+y_(i+vx,j+vy))/2

residentblock_(i,j)＝(x_(i-vx,j-vy)-y_(i+vx,j+vy))/2

And 3, further processing the motion vector estimation result obtained in the step 2. And when the motion vector magnitude is larger than a certain threshold, performing bidirectional parallel motion estimation compensation. Calculating to obtain a bidirectional parallel motion estimation vector according to the step 1 to obtain four motion compensation blocks, and if the four obtained motion compensation blocks are all located within the image display range and the distance between the motion vectors in the same direction is smaller than a preset range, calculating a side information block and a residual block corresponding to the block by the following formula:

Sideblock＝(block₁+block₂+block₃+block₄)/4

residentblock＝(block₁+block₂-block₃-block₄)/4

wherein the block₁And block₂Precursor frames, blocks, belonging to the current frame₃And block₄Belonging to the subsequent frame.

And 4, if the corresponding residual error value of the block at the moving edge is greater than the threshold, performing the following processing: firstly, any one of the two parallel motion estimation vectors obtained in step 1 is taken, and if the position of the vector motion estimation exceeds the image boundary in the reverse direction of the initial motion estimation and the estimation residual error of the motion vector in the current direction is smaller than the estimation residual error of the motion vector in the reverse direction, the corresponding motion compensation block is represented by the parallel motion compensation block obtained in the estimation direction. In this case, compensation is performed with a one-way search. If the condition of step 1 is not met, another motion vector is processed. And carrying out weighted average on the obtained motion compensation blocks.

In summary, the present invention provides a data encoding method for an embedded device, which uses the network bandwidth and the computing power inside a video storage node set as little as possible to achieve data recovery, and improves the scalability while achieving high availability of data.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing system, centralized on a single computing system, or distributed across a network of computing systems, and optionally implemented in program code that is executable by the computing system, such that the program code is stored in a storage system and executed by the computing system. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A data encoding method for an embedded device, comprising:

when the embedded terminal software runs, calculating the hash value of the file to be uploaded, inquiring the value from each storage node, when the same hash is not detected, receiving the file by the embedded terminal, calculating the hash of the information of the file in blocks, and storing the hash to the nodes of the mirror image subset in a distributed manner; peer-to-peer structure design is adopted among the storage nodes;

the querying the value from each storage node further comprises:

when the value is found to exist, the embedded terminal is informed that the data is stored;

the mirror image subsets form a uniform single file mapping, and a consistent coding storage view is formed among all the subsets;

if the storage system has n storage nodes in total, and the erasure correction performance required by the storage system is that the system can allow any r storage nodes to be abnormal, when the embedded terminal makes a file storage request, firstly partitioning a file, wherein the number of the partitioned blocks is k-n-r; generating r check blocks by utilizing a Reed Solomon coding matrix G; storing original blocks of the file by using k nodes, and storing check data blocks generated after the operation of the check data blocks and the G by using the other r nodes; the specific process comprises the following steps:

when the system receives a file storage request, the system directly blocks the file into m × k file blocks, and if the size of the file cannot be directly divided by m × k, 0 is added at the end of the file; directly calculating the vector in the coding matrix G and the divided mxk data block by using a rule of a position structure corresponding to '0' and '1' in each row vector in a generated matrix to obtain a check data block;

if the partitioning of the original file uses D ═ D (D)₁，D₂，…D_k)^TIs shown by_iCalled macroblocks；D_iConsisting of m micro-blocks, and for D_iM data blocks (d) in (c)_i，1，d_i，2…d_i，m)^TCalled micro-block group; if the generated check macro block group uses P ═ P (P)₁，P₂，…P_r)^TRepresents, wherein each check macro block P_iThe check micro block comprises m check micro blocks; the original file block and the check block are collected by E ═ D₁，D₂，…D_k|P₁，P₂，…P_r)^TRepresents; then: g, D ═ E;

the m × k data chunks for the entire file may be denoted as d_1，1，d_1，2…d_1，m，…，d_k，1，d_k，2…d_k，m(ii) a Each check macro block P generated by original file block_iContains m check microblocks, the check microblocks are respectively expressed as: p is a radical of_1，1，p_1，2…p_1，m，…，p_r，1，p_r，2…p_r，m；

The method further comprises the following steps:

introducing an address index table AIT into the image subset as an expanded addressing dimension; the AIT divides the ACT into single addressable logic components which can be accessed independently respectively, and the video data storage system with a ternary dynamic structure has the capability of parallel read-write access; the AIT's pointer is a target address location that points directly to the ACT logic component;

Wherein AHT_mThere is one input item and a corresponding one output item; its input item is a combination of addressing variable values and its output item is the data index corresponding to the combination.

2. The method of claim 1, wherein in the AIT of the image subset of the video storage system, the input value of each address item AHT is the addressing variable value of a group of data, i.e. the logical address LA, and the output value is the pointer of an addressing chain table ACT corresponding to the LA value, an offset, a data length; the ACT pointer points to the position of the group of memory cells to be accessed in the addressing linked list ACT; the offset determines an access start address in the memory location; the data length specifies the access range; when the data length is default or 0, the data length indicates that the access reaches the end of the file, the access to the video data storage system can only determine a position in the file metadata addressing linked list ACT according to the logical address LA of the addressing variable combination of the target data, and the storage node is accessed by reading and writing from the position, wherein the data length of the access is specified in the AHT.

3. The method according to claim 1, characterized in that on the basis of the above video data storage structure, the video frame data uses fast motion estimation at the encoding end, specifically, in addition to block motion search and limited range motion search within the whole frame range, simultaneously corresponding to a large range motion search and corresponding to a small range motion search; adopting iterative search in a large-range search mode, taking the position of the last search result as the position of the starting point of the next search, and when the search result meets a certain condition, namely when the last search result is the same as the next search result, carrying out small-range search by taking the position of the result as the starting point; and taking the small-range search result as a final result.

4. The method of claim 3, further comprising:

carrying out block classification operation at a coding end, and dividing blocks in a frame into a skip block and a direct block; for the skip block, not transmitting the motion vector and residual error information; wherein the skip block is judged by:

D_m＝Σ_{i，j∈blockm}|X_(i，j)-Y_(i，j)|/N

X_(i，j)representing a pixel i, j, Y in a block position m in a frame_(i，j)Representing the corresponding pixels in the reference frame, and N representing the number of pixels in the block; when the result D is_mWhen the value is smaller than the preset threshold value, setting the block as a skip block; transmitting only the mode information to a decoding end;

when the skip block and the direct block related information are transmitted, if the skip block and the direct block are coded by combination, a zero motion vector is represented by (0, 0); when the motion vector information is transmitted, fixed-length coding or algorithm is used, and the calculation flow is as follows:

mode e1 ═ mode (skip mode) < U mode (direct mode)

mode 2 ═ mode (common mode)

rate2 ent (mode 1, mode 2) code _ length +2 num (mode)

The mode information mode () represents a mode corresponding to the type of the block, ent () calculates the entropy of the corresponding information, and code _ length is the length of the codeword to be coded;

step 3, the total code rate is the sum of the two code rates;

total_rate＝ratel+rate2

at a decoding end, after corresponding mode information and motion vector information are obtained, a skip module and a direct module are reconstructed; for the skip block, directly taking the block at the same position of the previous reference frame as the final reconstruction block; for the direct block, using the motion vector, and taking the corresponding motion compensated block as the final reconstruction block; and for the remaining blocks of the normal mode, side information and residual information are generated at the decoding end.

5. The method of claim 1, wherein the storage nodes within each subset store different partitions of the same file, and wherein the system maintains a mapping between the different partitions of the same file and the storage nodes.

6. The method of claim 5, wherein each mirror subset is combined into a tree graph with a hierarchical structure to establish a mapping relationship between the storage file set and the device set.

7. The method of claim 5, wherein each storage node independently maintains metadata for the storage resources and files of the subset and independently provides file blocking read services.

8. The method of claim 1, further comprising: and determining whether to start the next mirror image subset according to the storage data volume and the utilization rate of the storage system.