CN115361401B - Data encoding and decoding method and system for copy certification - Google Patents

Data encoding and decoding method and system for copy certification Download PDF

Info

Publication number
CN115361401B
CN115361401B CN202210829009.8A CN202210829009A CN115361401B CN 115361401 B CN115361401 B CN 115361401B CN 202210829009 A CN202210829009 A CN 202210829009A CN 115361401 B CN115361401 B CN 115361401B
Authority
CN
China
Prior art keywords
layer
node
data
decoding
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210829009.8A
Other languages
Chinese (zh)
Other versions
CN115361401A (en
Inventor
万胜刚
董子豪
易成龙
朱捷瑞
何绪斌
谢长生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202210829009.8A priority Critical patent/CN115361401B/en
Publication of CN115361401A publication Critical patent/CN115361401A/en
Application granted granted Critical
Publication of CN115361401B publication Critical patent/CN115361401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a data encoding and decoding method and system for copy certification, belonging to the technical field of computer storage; after the coded data corresponding to each node of the upper layer is obtained, based on the relative dependency relationship of the node of the lower layer on the node of the upper layer in the stacked depth robust graph, sequentially writing the coded data corresponding to the father node of the lower layer into the external memory, and deleting the coded data corresponding to each node of the upper layer in the internal memory; the invention stores the coded data corresponding to the upper node in the memory, and researches proper distribution of the memory nodes, so that the child nodes positioned on the next layer are sequentially read when reading the upper parent node relied on, thereby greatly relieving algorithm execution performance loss caused by internal and external memory performance difference, and greatly reducing the cost of memory space resources.

Description

Data encoding and decoding method and system for copy certification
Technical Field
The invention belongs to the technical field of computer storage, and particularly relates to a data encoding and decoding method and system for copy certification.
Background
The distributed storage network achieves the purpose of transverse capacity expansion by adding anonymous and idle storage equipment in the Internet. In order to ensure the reliability of data, the distributed storage network mainly adopts two data redundancy mechanisms, namely a multi-copy mechanism and an erasure code mechanism. Under the multi-copy mechanism, data is copied and multiple data copies are generated, and by storing the multiple data copies on different devices in a scattered manner, any copy can be repaired by the rest of any copy when any copy is lost. However, in a decentralized storage network, malicious attacks on the replica data by malicious attackers, i.e. witches attacks, are made possible due to the anonymous nature of the storage device.
In order to resist the Sybil attack, a copy proving method based on time limit is often adopted to realize, under a time assumption mode, the coding process from the original data to the data copy needs to meet a certain assumption, namely, the coding process needs a long period of time to calculate and the calculating process is anti-parallel (the calculating process is difficult to accelerate in parallel through a plurality of cores), so that when the challenge occurs, honest nodes can pass the challenge only through the existing data generation proving, and malicious nodes can recover the required challenge data copy only through a section of decoding process plus a section of extremely long-time coding process; therefore, in order to easily distinguish malicious nodes from honest nodes so as to resist the attack of Sybiles, it is important to research a data encoding method for copy certification.
The existing copy proving method based on time limit is usually based on SDR algorithm to encode data, wherein the SDR algorithm introduces DRG and expansion graph to encode data to resist Sybil attack and derivative attack, in the process, deleting data of any node needs to recalculate all father nodes of the node, thus causing calculation amplification, resulting in the need of reserving two layers of nodes in memory in the execution process, calculating all calculation resources in cache memory, and higher memory overhead exists.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a data encoding and decoding method and a system for copy certification, which are used for solving the technical problem that the memory resource cost of the existing data encoding/decoding method for copy certification is high.
To achieve the above object, in a first aspect, the present invention provides a data encoding method for copy certification, comprising:
performing L-layer encoding serially on data to be encoded:
first layer coding: obtaining a random seed according to the information of the data to be encoded, and generating random data with the same size as the data to be encoded according to the random seed; dividing random data into M data blocks sequentially, wherein each data block corresponds to each node position in a first layer of a pre-built stack depth Lu Bangtu one by one; coding each data block in sequence based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, and generating M coded data in sequence to obtain coded data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
Layer i coding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading coded data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, and based on the dependency relationship among each node in the ith layer of stack depth Lu Bangtu, sequentially obtaining coded data corresponding to all father nodes of each node of the ith layer, splicing, and further coding to obtain coded data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the coded data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
layer L encoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading coded data corresponding to father nodes in the L-1-th layer of each node of the L-th layer from an external memory, sequentially obtaining coded data corresponding to all father nodes of each node of the L-th layer based on the dependency relationship among the nodes in the L-th layer of the stacked depth Lu Bangtu, splicing and then further coding to obtain coded data corresponding to each node of the L-th layer, and carrying out bit-by-bit exclusive OR on the coded data corresponding to each node of the L-th layer and the data to be coded to obtain a final coding result;
Wherein the stacked depth robust graph comprises an L layer; the number of the nodes in each layer is M; and randomly generated dependency relations exist between nodes in the same layer and between two adjacent layers of nodes.
Further preferably, after obtaining the coded data corresponding to each node of the j-th layer, the coded data corresponding to each node of the j-th layer is uniformly distributed into different calculation windows, and each calculation window is respectively allocated with a memory file; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, the coded data in different calculation windows corresponding to all father nodes of each node in the j+1th layer are respectively written into corresponding external memory files in sequence, and the coded data corresponding to each node in the j layer in the internal memory is deleted; wherein j=1, 2, …, L-1;
correspondingly, the calculation method of the coded data corresponding to each node of the j+1th layer comprises the following steps:
when the coded data corresponding to the node of the j+1 layer is calculated, respectively reading a preset number of coded data which cannot be put back from the head of each external memory file into the corresponding memory window in advance, respectively reading one coded data from the head of each memory window where the coded data corresponding to all the father nodes of the node of the j+1 layer are located in an unreplaceable sequence, obtaining the coded data corresponding to all the father nodes of the node of the j+1 layer, and expelling the read coded data from the memory window; when the coded data corresponding to the next node of the j+1 layer is calculated, the coded data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the coded data are sequentially placed at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further coded after being spliced, so as to obtain the coded data corresponding to each node of the j+1 layer.
In a second aspect, the present invention provides a data encoding method for copy certification, comprising: dividing the original data sequence into a plurality of continuous data sets, wherein the coding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method provided by the first aspect of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
In a third aspect, the present invention provides a data encoding method for copy certification for simultaneously encoding a plurality of original data, comprising: dividing each original data sequence into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
In a fourth aspect, the present invention provides a data encoding system for copy certification, comprising: a memory storing a computer program, and a processor executing the computer program to perform the data encoding method provided in the first aspect, the second aspect and/or the third aspect of the present invention.
In a fifth aspect, a data decoding method for copy certification includes:
performing L-layer decoding on the data to be decoded serially:
first layer decoding: obtaining a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing the random data sequence into M' data blocks, wherein each data block corresponds to each node position in a first layer of the pre-built stack depth Lu Bangtu one by one; based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, sequentially decoding each data block, and sequentially generating M' decoding data to obtain decoding data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type deep robust graph, sequentially writing the decoding data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
Layer i decoding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading decoding data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of each node of the ith layer based on the dependency relationship among each node of the ith layer in the stack depth Lu Bangtu, and further decoding after splicing to obtain decoding data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the decoding data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the decoding data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
layer L decoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading decoding data corresponding to father nodes of the L-th layer nodes in the L-1-th layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of the L-th layer nodes based on the dependency relationship among the L-th layer nodes in the stacked depth Lu Bangtu, performing splicing, further performing decoding to obtain decoding data corresponding to the L-th layer nodes, and performing bitwise exclusive OR on the decoding data corresponding to the L-th layer nodes and the data to be decoded to obtain a final decoding result;
Wherein the stacked depth robust graph comprises an L layer; the number of nodes in each layer is M'; and randomly generated dependency relations exist between nodes in the same layer and between two adjacent layers of nodes.
Further preferably, after obtaining the decoded data corresponding to each node of the j-th layer, the decoded data corresponding to each node of the j-th layer is uniformly distributed into different calculation windows, and each calculation window is respectively allocated with a memory file; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, sequentially writing the decoding data in different calculation windows corresponding to all father nodes of each j+1th layer node into corresponding external memory files respectively, and deleting the decoding data corresponding to each j layer node in the internal memory; wherein j=1, 2, …, L-1;
correspondingly, the method for calculating the decoding data corresponding to each node of the j+1th layer comprises the following steps:
when decoding data corresponding to a node of a j+1th layer is calculated, respectively reading preset quantity of decoding data which cannot be put back from the head of each external memory file into a corresponding internal memory window in advance, respectively reading one decoding data from the head of each internal memory window where the decoding data corresponding to all father nodes of the node of the j+1th layer are located in an unreback sequence, obtaining decoding data corresponding to all father nodes of the node of the j+1th layer, and expelling the read decoding data from the internal memory window; when decoding data corresponding to the next node of the j+1 layer is calculated, the decoding data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the decoding data are sequentially placed at the tail of a corresponding memory window; based on the above process, the decoding data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further decoded after being spliced, so as to obtain the decoding data corresponding to each node of the j+1 layer.
In a sixth aspect, the present invention provides a data decoding method for copy certification, comprising: dividing the original data sequence into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method provided by the fifth aspect of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
In a seventh aspect, the present invention provides a data decoding method for copy certification for simultaneously decoding a plurality of original data, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method provided by the fifth aspect of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
In an eighth aspect, the present invention provides a data decoding system for copy certification, comprising: a memory storing a computer program, and a processor executing the computer program to perform the data decoding method provided in the fifth, sixth and/or seventh aspect of the present invention.
In a ninth aspect, the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program, when executed by a processor, controls a device in which the storage medium is located to perform the data encoding method provided by the first aspect, the second aspect, the third aspect, and/or the data decoding method provided by the fifth aspect, the sixth aspect, and the seventh aspect of the present invention.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
1. after obtaining the coded data corresponding to each node of the upper layer, the data coding method for copy proving provided by the first aspect of the invention writes the coded data corresponding to the father node of each node of the lower layer into the external memory in sequence based on the relative dependency relationship of the node of the lower layer to the node of the upper layer in the stacked depth robust graph, and deletes the coded data corresponding to each node of the upper layer in the internal memory; the invention stores the coded data corresponding to the upper node in the memory, and researches proper distribution of the memory nodes, so that the child nodes positioned on the next layer are sequentially read when reading the upper parent node relied on, thereby greatly relieving algorithm execution performance loss caused by internal and external memory performance difference, and greatly reducing the cost of memory space resources.
2. The invention provides a data coding method for copy certification, which divides a single-layer large-scale node into a plurality of small windows, writes the nodes in the windows into an external memory and writes the nodes in the external memory into the external memory corresponding to a file according to the relative sequence of the nodes, thereby ensuring that the father nodes in a single file are kept relatively continuous during reading, so that any number of father nodes are read in the file corresponding to the single window, and the subsequent father nodes can be read from the file after the use of the read father nodes is completed. Random reading of large data blocks is then used to reduce the impact of read operations on the overall execution time of the algorithm. In addition, the reading and writing can be performed concurrently with the node calculation, thereby further reducing the time overhead of the algorithm.
3. Because the memory resources are pre-allocated in the execution process of the SDR algorithm, the memory space is pre-allocated before the SDR starts to calculate the nodes. However, as most of the memory space is not used for a long time after allocation in the progressive execution process of the SDR algorithm, the data encoding method for copy proving provided by the second aspect and the third aspect of the present invention adopts a memory instant allocation strategy to perform progressive memory allocation, and it is first proposed to decompose large-scale data in the layer into small data sets, each of which is called a window, and allocate memory before the current window starts calculation, so as to avoid a large amount of memory occupation for a long time; in addition, the coding methods provided in the first aspect of the present invention can be executed simultaneously and in parallel, so as to ensure that the memory space usage of a plurality of executed coding methods is unchanged, thereby reducing the average memory space overhead.
Drawings
FIG. 1 is a schematic diagram of a data encoding method for copy certification according to embodiment 1 of the present invention;
FIG. 2 is a stacked depth Lu Bangtu provided by example 1 of the present invention;
FIG. 3 is a schematic diagram of the relative sequence distribution of parent nodes at the upper layer according to embodiment 1 of the present invention;
FIG. 4 is a schematic diagram of a parent node in a relatively orderly read distribution provided in embodiment 1 of the present invention;
fig. 5 is a schematic diagram of a four-stage progressive memory allocation method according to embodiment 2 of the present invention;
fig. 6 is a schematic diagram of the memory space occupation after the encoding of 4 original data concurrently performed according to embodiment 3 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
It should be noted that, by comprehensively analyzing the SDR algorithm of the largest and most representative distributed storage system filoin in the world at present, the invention discovers two types of characteristics which are ubiquitous and highly ignored: (1) the node dependency relationship is fixed, and the father node on which each node depends can be predetermined before calculation; (2) the node is gradually generated, and in the process of carrying out hash calculation on the node to be generated, the hash operation of the latter node is required to wait for the completion of the former hash operation. More importantly, in spite of the two characteristics, in order to avoid the calculation amplification caused by the loss of the father node in the execution process of the SDR algorithm, the SDR algorithm still needs to store all node data of two adjacent layers into a memory, and the total data of the nodes is up to 64GB, which creates a great challenge for popularizing the distributed storage to general users. Considering that the above-mentioned devices generally have a large capacity of external memory (hundreds of GB or even several TB), if a part of nodes that should be originally stored in the memory can be saved in the external memory, the memory usage can be effectively reduced. However, there is a great performance gap between the external memory and the internal memory in the modern computer system, if the node data is directly stored in the external memory according to the layout manner in the internal memory, the difference of 4-6 numbers of node reading delays (for example, the access delay of the hard disk is about 10 milliseconds, and the access delay of the internal memory is about 10 nanoseconds) is caused, so that the algorithm execution performance is greatly compromised. In order to solve the above problems, the present invention provides a coding and decoding method for saving memory required for computation in the process of copy conversion of stored data, which specifically comprises the following steps:
Example 1,
A data encoding method for copy certification, the corresponding schematic diagram is shown in fig. 1, comprising: performing L-layer encoding serially on data to be encoded:
first layer coding: obtaining a random seed according to the information of the data to be encoded, and generating random data with the same size as the data to be encoded according to the random seed; dividing random data into M data blocks sequentially, wherein each data block corresponds to each node position in a first layer of a pre-built stack depth Lu Bangtu one by one; coding each data block in sequence based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, and generating M coded data in sequence to obtain coded data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
layer i coding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading coded data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, and based on the dependency relationship among each node in the ith layer of stack depth Lu Bangtu, sequentially obtaining coded data corresponding to all father nodes of each node of the ith layer, splicing, and further coding to obtain coded data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the coded data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
Layer L encoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading coded data corresponding to father nodes in the L-1-th layer of each node of the L-th layer from an external memory, sequentially obtaining coded data corresponding to all father nodes of each node of the L-th layer based on the dependency relationship among the nodes in the L-th layer of the stacked depth Lu Bangtu, splicing and then further coding to obtain coded data corresponding to each node of the L-th layer, and carrying out bit-by-bit exclusive OR on the coded data corresponding to each node of the L-th layer and the data to be coded to obtain a final coding result;
it should be noted that, the encoding method in the above process may be any encoding method, for example, encoding using SHA256 hash algorithm.
In summary, in order to solve the problem of excessive memory space overhead of the SDR algorithm, the invention provides an upper node write-back strategy to reduce the memory space overhead of the SDR algorithm. The upper node write-back strategy considers that upper node data in the memory is stored in the external memory, so that the memory space overhead is reduced. And proper external memory node distribution is researched, so that child nodes positioned on the next layer are sequentially read when the parent node on the previous layer on which the child nodes depend is read, and algorithm execution performance loss caused by internal and external memory performance difference is greatly relieved. Experimental results show that when the copy size is 32GB and the algorithm is executed on the Intel test environment, compared with the original SDR algorithm, the memory space overhead is reduced by 50%, and the execution time increment of the algorithm is not more than 6.9%.
Further, the stacked depth robust graph is constructed based on a depth robust graph structure and an extended graph structure and comprises an L layer; the number of the nodes in each layer is M; each layer is a depth robust graph, and an expansion graph is formed between any two adjacent layers; randomly generated father-son dependency relationships exist between nodes in the same layer and between two adjacent layers, namely, the nodes in the same layer form father-son dependency relationships based on depth robust graphs, and the nodes in two adjacent layers form father-son dependency relationships based on expansion graphs.
Specifically, depth Lu Bangtu (DRG, depth Robust Garph) is one of the directed acyclic graphs, generally denoted as (N, α, β, d) -DRG; wherein N is the total number of nodes included in the DRG, all nodes have the same degree of entry and are d, and α and β are two robustness coefficients between 0 and 1. It has the following properties: when any (1- α) N nodes of the depth Lu Bangtu (N, α, β, d) -DRG and all edges taking these nodes as starting points or end points are deleted, there is necessarily one node v0 in the graph formed by the remaining nodes and edges, and the maximum depth (v 0) is equal to or greater than βn.
The expansion graph is composed of directed bipartite graphs, generally denoted as (N, α, β, d) -expansion graphs; a directed bipartite graph may generally be composed of two sets of nodes (a source node set and a target node set, respectively) and one set of edges. The total number of nodes contained in a source node set and a target node set in the expansion graph is N, the output degree of any node in the source node set is d, the input degree of any node in the target node set is d, and alpha and beta are two expansibility coefficients between 0 and 1, and alpha is smaller than beta. It has the following properties: any αn number of target nodes in the set of target nodes of the (N, α, β, d) -expansion graph are connected to βn number of source nodes in the set of source nodes L of the (N, α, β, d) -expansion graph.
Specifically, the stacked depth robust graph in one embodiment, as shown in fig. 2, includes two layers of nodes, each layer including 6 nodes, for a total of 12 nodes. Between 6 nodes in the same layer, a (6, 5/6,1/2, 2) -DRG is formed, and the ingress of other nodes in the same layer except the first node in the same layer is 2, for example, except N in the first layer 1 N in the second layer 7 The ingress is not 2, and the ingress of the remaining nodes is 2. Thus, all nodes depend on 2 parent nodes that are at the same level and are preceded by the node, and the 2 parent nodes of the node are indexed by the stacked deep robust graphAnd (5) outputting. In addition, a (6, 1/2, 3) -expansion diagram is formed between the nodes of the second layer and the nodes of the first layer. The source node at the first layer has an outbound degree of 3, and the target node at the second layer has an inbound degree of 3. Thus, all nodes at the lower level of the first layer depend on the 3 parent nodes at the upper level thereof, and the 3 parent nodes of the node are given by the expansion graph structure. In summary, in a stacked depth robust graph structure constructed based on DRGs and extended graphs, parent-child node dependencies are fixed: the node at the first layer depends on the preamble d of the node in the first layer L And parent nodes. The node located in the lower layer of the first layer depends on the preamble d located in the same layer L Each father node and d located at the upper layer E And parent nodes. And the index position of the parent node of the dependency is fixed.
It should be noted that, since the sequential access performance of the external memory (especially, the hard disk) is far higher than the random access performance thereof, for example, when the sequential read/write bandwidth of the hard disk is 250MB/s, the random read/write bandwidth of 4KB is 556KB/s. In order to solve the above problem, considering that the stacked deep robust graph structure has the characteristic of fixed dependency relationship of parent-child nodes, by sequentially writing back the upper parent nodes of all nodes to be generated into the external memory according to the node generation sequence, the access to the parent nodes can be ensured to be sequential, so that the algorithm execution performance loss caused by the internal and external memory performance difference can be greatly relieved. Since each parent node of the upper layer has d E The child nodes are located at the lower layer, so that each parent node needs to write d in the external memory in order to maintain the sequential read-write characteristics of the data of the parent nodes E And twice. Based on this, the invention provides a write-back strategy for upper layer data, which writes intermediate data into the external memory after ordering in the internal memory according to relative order distribution, and sequentially writes and reads d in the external memory E The data with the size being the same as that of the copy avoids that the data of the upper parent node is stored in the memory, thereby reducing the memory overhead by 50%; meanwhile, the sequential writing is guaranteed, and the external memory bandwidth is fully utilized.
Specifically, fig. 1 depicts a parent node original distribution and a parent node sequential distribution. In the parent node original distribution mode, N 7 Reading nodes positioned at 1 st, 2 nd and 5 th bits in original distribution of nodes, N 8 Nodes located in bits 2-4 of the original distribution of nodes are read. Therefore, in the node original distribution, the reading process of the parent node of the node to be generated is a random reading process. Then by traversing the node N to be generated 7 To N 12 Will N 7 Is not equal to all parent nodes N of 1 、N 2 N 5 Placed at positions 1-3. By analogy, N is 12 Is not equal to all parent nodes N of 2 、N 3 N 6 And the parent node is placed at the 16 th to 18 th bits, and can form a parent node sequential distribution mode. When the parent node is read in a node sequence distribution mode, N 7 Reading nodes located at 1 st-3 rd bits in a node order distribution, N 8 Nodes located at bits 4-6 of the node order distribution are read. Similarly, N 12 Nodes located at bits 16-18 of the node order distribution are read. Therefore, in the parent node order distribution mode, the parent node reading process of the node to be generated is an order reading process. And after the father nodes are ordered in the memory according to the rule in a node order distribution mode, writing the father nodes into the external memory, and realizing that the accesses to the father nodes are ordered.
Further, the execution process of the data write-back strategy can be divided into the following 4 steps:
1) Reading coded data corresponding to a parent node: and sequentially reading encoded data corresponding to the parent node of the node to be generated from the external memory to the read buffer RBuf, and stopping when the read buffer is full.
2) Calculating coded data corresponding to the lower node: sequential reading of d from the buffer E Encoded data corresponding to each father node and randomly reading d from the memory L Encoded data corresponding to each parent node. According to the read d E +d L Encoded data corresponding to each parent node and auxiliary information related to the copy, and calculating the value of the node through a hash algorithm.
3) Ordering the coded data corresponding to the parent node: after hash calculation of all nodes in the same layer is completed, the nodes in the memory are ordered according to the node distribution mode in the external memory, and the ordering result is written into the write buffer WBuf.
4) Writing coded data corresponding to the parent node into an external memory: when the write buffer is full, the data in the buffer are sequentially written into the external memory.
In the above 4 steps, the step of reading the parent node and the step of calculating the generation node are typical producer and consumer problems, so in an alternative embodiment, a thread can be respectively allocated to the process of reading the encoded data corresponding to the upper node and the process of calculating the encoded data corresponding to the lower node, and the processes are respectively marked as a first thread and a second thread; the first thread reads the corresponding coded data to the buffer area for the second thread to calculate the coded data corresponding to the lower node; the first thread and the second thread are executed concurrently, so that the execution time of the upper layer data write-back strategy is greatly reduced. In addition, the steps of ordering and writing the parent node into the external memory are also a typical producer and consumer problem, so that a thread can be respectively allocated for the ordering process of the encoded data corresponding to the parent node and the process of writing the encoded data corresponding to the parent node into the external memory, and the thread is respectively marked as a third thread and a fourth thread, and the third thread and the fourth thread Cheng Bingfa are executed, thereby further reducing the execution time under the upper-layer data write-back strategy.
Further, in the data encoding method provided in this embodiment, although a data write-back scheme is adopted, it is still necessary to wait for the completion of the calculation of the data of a single layer node before writing back, so that a layer of data needs to be reserved in the main memory: all nodes located at the same level as the node to be generated. When the data copy size is large, such as 32GB, the main memory space usage still needs 32GB. Therefore, the above-mentioned data encoding method still requires a large amount of memory space, and for a device having a large amount of idle memory resources, the large amount of memory space is still not satisfied. Considering that if it is desired to further reduce the memory space overhead of the SDR algorithm, it is necessary to write back the external memory after the computation of part of the nodes (the set of part of the nodes is called a window), for example, the total number of single-layer nodes is N, and every time the computation of N/8 nodes is completed, the N/8 nodes are written into the external memory according to a certain node distribution. Only the memory space required by all nodes in the window is reserved in the memory, so that the consumption of the main memory space is greatly reduced. However, since the parent node is random and spans very large (most likely greater than N/2) in the SDR algorithm, especially among parent-child node dependencies of adjacent layers. When only nodes of window size are reserved in the memory, the node distribution generated by the partial nodes is necessarily not a node order distribution regardless of how the window size is adjusted. Thus, when reading these node distributions, there must be random read operations. In order to solve the above problems, the present invention further researches node distribution to find that if it is desired to only keep part of the nodes of the layer in the memory, that is, a single window, it is necessary to write the nodes into the memory after all the node calculation in the window is completed. However, if nodes in the window are written into the external memory, the node distribution in the external memory cannot necessarily be ordered to obtain the node order distribution or the node order sub-distribution (a continuous part of the node order distribution). And the parent node of the node to be generated must be sequentially read according to the node sequence distribution. Thus, regardless of the distribution used to store all nodes in a window into the external memory, the process of reading the nodes from the external memory must be a random read process. In addition, all nodes in each window correspond to two node distribution modes, and a single-layer node can be divided into a plurality of windows. If it is desired to keep only some nodes of the layer in memory, there must be a large number of node distributions, with each node distribution corresponding to a file in memory. The reading of the parent node is therefore equivalent to randomly reading a large number of files. However, it is considered that the random read bandwidth of the external memory (particularly, hard disk) is related to the single read data amount, for example, in a hard disk with a sequential read bandwidth of 250MB/s, the random read bandwidth of 32KB for single read data amount is about 4MB/s, the random read bandwidth of 1MB for single read data amount is about 75MB/s, and the random read bandwidth of 32MB for single read data amount is about 240MB/s. If random reading with large data quantity is adopted for reading the external memory node, especially random reading with the data quantity of 32MB is adopted, the random reading bandwidth can be close to or equal to the sequential reading bandwidth.
If the relative reading order of the nodes in the parent node distribution relative to other nodes in the parent node distribution can be ensured when the parent node distribution is generated. In colloquial terms, if in a parent node distribution, the current node is necessarily used between a preceding node and a following node (as input to its child node hash operation), then the node distribution is referred to as a node relative distribution. The adoption of the relative order distribution of the nodes can ensure that any number of nodes can be read, and the nodes can be continuously read from the node distribution after the nodes are used up. Therefore, the nodes are stored in the external memory according to the relative order distribution of the nodes, and random reading with large data volume can be adopted when the nodes are read, so that algorithm execution performance loss caused by random reading is relieved. In addition, since each parent node of the lower layer has d E The child nodes are located at the upper layer, and each father node of the layer has d L The child nodes are located at the same level, but most of the parent nodes are located in the same window as the child nodes. Thus, to maintain a relative sequential distribution of these parent node data, each parent node needs to be written in memory to at most d E +dL (worst case, the real case is closer to d E ) Secondary, i.e. having at most d on external memory E Data write amplification by +dL. Thus, nodes are sequentially written to and randomly read d in node-relative order distribution E The data with the size being the same as that of the copy is stored only by storing the nodes in the window in the memory, so that the memory space overhead is greatly reduced.
Since the SDR algorithm is generated in the order of the index numbers during execution. Thus, the node distribution may be generated in the relative order of parent nodes in the window. And is divided into upper parent node distribution and local parent node distribution during writing. In the upper parent node distribution, traversing index i= … n of the next layer node according to the generation sequence of the next layer node, and according to SDR E (i) The function outputs the index of all parent nodes of node i, and then locates to that parent node according to the index. In the node distribution of the layer, traversing the index i=k … n of the next layer node according to the generation sequence of the subsequent nodes of the layer (k represents the index of the last node in the current window)Numbered) and outputs the index of all parent nodes of node i according to the sdr.paramentse (i) function, and then locates to that parent node according to the index. According to the upper parent node distribution and the local parent node distribution which are respectively obtained by the rules, the data is written into the external memory according to the distribution rules, so that the sequential reading and writing can be ensured in the reading and writing process. And random reads of any size can be employed in subsequent node computation.
Specifically, in the second alternative embodiment, after obtaining the encoded data corresponding to each node of the j-th layer, the encoded data corresponding to each node of the j-th layer is uniformly distributed into different calculation windows, and each calculation window is respectively allocated with an external memory file; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, the coded data in different calculation windows corresponding to all father nodes of each node in the j+1th layer are respectively written into corresponding external memory files in sequence, and the coded data corresponding to each node in the j layer in the internal memory is deleted; where j=1, 2, …, L-1. Specifically, a schematic diagram of the relative order distribution of the parent nodes at the upper layer is shown in fig. 3. Wherein the single layer contains 6 nodes and the number of divided windows is 2, each window corresponding to a file in the external memory. Taking window 1 as an example, the nodes of window 1 include N 1 ,N 2 N 3 . The sorting method is to traverse all target nodes and traverse N 7 ,N 8 …N 12 The parent nodes of the 6 target nodes among the windows are acquired. Wherein N is 7 Parent node in window 1 is N 1 N 2 Thus N is 1 N 2 Arranged at the first digit of file 1. N (N) 8 Parent node in window 1 is N 2 N 3 Thus N is 2 N 3 Arranged in file 1N 1 N 2 Is included in the above-described patent document. Analogize to the last N 12 Parent node in window 1 is N 2 N 3 Thus N is 2 N 3 Arranged at the end of document 1. After the calculation of the node data in the window 1 is completed, the node data in the window 1 is calculated according to the ruleAnd sorting is carried out, and the sorted data are written into the external memory.
Correspondingly, the calculation method of the coded data corresponding to each node of the j+1th layer comprises the following steps:
when the coded data corresponding to the node of the j+1 layer is calculated, respectively reading a preset number of coded data from the head of each external memory file into a corresponding memory window in advance (a plurality of coded data are read from the head of each external memory file in advance in an unreplaceable sequence, specifically how much of the read is related to the window size, the robust diagram parameters and the actual connection of the robust diagram, in this embodiment, 1000 coded data are read from the head of each external memory file in advance in an unreplaceable sequence, wherein one coded data is 256 bits, the coded data of 32KB are read altogether, and one coded data is read from the head of each memory window where the coded data corresponding to all father nodes of the node of the j+1 layer are located in advance, so as to obtain the coded data corresponding to all father nodes of the node of the j+1 layer, and the read coded data are ejected from the memory window; when the coded data corresponding to the next node of the j+1 layer is calculated, the coded data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the coded data are sequentially placed at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further coded after being spliced, so as to obtain the coded data corresponding to each node of the j+1 layer.
In the process of reading the dependency, after the relative order distribution of the nodes is rearranged, the order distribution of the nodes is obtained, so that the next hash calculation can be performed. Thus, the read dependency is effectively a reordering scheme that arranges the node data in the file into a fully ordered state. Fig. 4 is a schematic diagram of a parent node in a relatively orderly read distribution, wherein the total data amount is 6, and the window number is 2, so that two files are corresponding in the external memory. Wherein N is stored in one part 1 ,N 2 N 3 Is relatively orderly distributed, another part is stored with N 4 ,N 5 N 6 Is distributed in a relative order. At read time, the index of the parent node of the dependency may be utilized to find the corresponding file, e.g., parent node No. 3 in file 1 and parent node No. 5 in file 2. Subsequently, when the dependent node is acquired, it is acquired from the latest place of the dependent file. By N 7 For example, it is determined from the dependency that node N needs to be read from window 1 1 And N 2 Reading node N from window 2 4 Therefore, two nodes, namely N, are sequentially read from the file 1 corresponding to the window 1 1 And N 2 The method comprises the steps of carrying out a first treatment on the surface of the Sequentially reading one node from the file 2 corresponding to the window 2, namely N 5 The method comprises the steps of carrying out a first treatment on the surface of the Thereby reading N 7 All upper nodes depended on start N 7 Is calculated by the computer. Specifically, N 7 Dependent on N in the expansion graph 1 ,N 2 N 5 Node, thus N 7 Is 1,2. Only the first bit in file 1, exactly N, needs to be read first 1 Subsequent reading of the second bit in file 1 is exactly N 2 The first bit in the last read file two is exactly N 5 Therefore, N can be correctly obtained 1 ,N 2 N 5 The value of the node. Similarly, N can be obtained 8 ,N 9 And the dependent data of the equal nodes.
In summary, the invention adopts a window mode when the memory is allocated for the nodes, the node data required by each calculation is pre-fetched into the memory window in advance, the data after the calculation is completed and used is ejected out of the memory window and is covered by other node data. After all nodes in the current window are calculated, the intermediate nodes in the window are written into an external memory according to the relative sequence distribution of the stacked depth robust graph and the relative sequence distribution of the extended graph, and a mapping relation table of the father node and the file index is stored. After the calculation of the upper node data is completed, M upper nodes required by the calculation of each node of the lower layer are sequentially formed into a pre-fetch data set according to the dependency relationship between the two layers of nodes, and the N pre-fetch data sets are sequentially written into the external memory according to the sequence of the nodes to be generated. When the lower node is calculated, a memory window with the size of M is maintained, a prefetching data set is sequentially read from an external memory during each calculation, the data is evicted from the memory window after the calculation is completed, and the next prefetching data set is used for covering, so that the memory space is saved. Further, the window mode can be adopted to slide to the node at the lower part after the node calculation in the current window is completed. Compared with the original SDR algorithm, the method has the advantages that the memory with the size of two complete layers of nodes is required to be allocated, and the memory with the size of a window is only required to be allocated in the mode. In the above process, the mapping relation table of the parent node and the file index is stored, and the encoded data corresponding to the parent node in the file is randomly read through the mapping relation table, so that the encoded data corresponding to all the parent nodes of any child node to be generated can be obtained from the distribution of a plurality of relative sequence nodes. The nodes in the relative sequence node distribution have space locality in the access process, and in the process of reading the nodes in the relative sequence node distribution, a large number of nodes can be read in advance and a buffer area can be allocated for the nodes, so that the random reading bandwidth is improved.
Furthermore, the node of the layer can be written into the external memory in a window mode, so that only partial nodes need to be saved in the internal memory, and therefore, a single-layer large-scale node is divided into a plurality of small windows. The nodes in the window are written into the external memory and correspond to one file in the external memory, and the external memory is written into according to the relative sequence of the nodes, so that the father nodes in the single file are ensured to be kept relatively continuous during reading, therefore, any number of father nodes are read in the file corresponding to the single window, and the subsequent father nodes can be read from the file after the use of the read father nodes is completed. Random reading of large data blocks is then used to reduce the impact of read operations on the overall execution time of the algorithm. In addition, the reading and writing can be performed concurrently with the node calculation, thereby further reducing the time overhead of the algorithm. Experimental results show that when the copy size is 32GB and the algorithm is executed on the Intel test environment, the node write-back strategy of the layer can reduce the memory consumption of the algorithm from 64GB to 192MB, namely, the memory space overhead is reduced by 99.7%, and the execution time of the algorithm is kept unchanged.
EXAMPLE 2,
The embodiment also provides a data encoding method for copy certification; it should be noted that, in the existing SDR algorithm implementation, the memory is allocated in a pre-allocation manner, and when the algorithm is initially executed, the memory space with twice the copy size is allocated. Because the generation of the encoded data corresponding to the node is progressive generation during the execution of the SDR algorithm, most of the memory space allocated for the SDR algorithm is wasted, for example, when the SDR algorithm calculates the encoded data corresponding to the node i, the memory space allocated for the node i+1 and the encoded data corresponding to the subsequent node is not used. In order to solve the above problems, a progressive memory allocation strategy is proposed, and a memory allocation mode adopts progressive allocation to reduce the total memory space overhead.
Specifically, the data encoding method provided by the present real-time example includes: dividing the original data sequence into a plurality of continuous data sets, wherein the coding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method provided by the embodiment 1 of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage. And controlling the execution process of each data coding method, and ensuring that the memory allocation and the processing of each data coding method to the current window are simultaneously started and ended. For example, when processing raw data composed of 1000 nodes, each layer is composed of 1000 nodes, and the nodes with numbers 1 to 500 are composed into a window 1, and the nodes with numbers 501 to 1000 are composed into a window 2. And the process of processing the nodes in each window using the above-described encoding method is referred to as a phase. Only after the processing in the current stage is completed and when the next stage is entered, the memory space is allocated to the window to be processed in the next stage. The above memory allocation method is called a progressive memory allocation method. As shown in fig. 5, a four-stage progressive memory allocation manner is shown, and as can be seen from fig. 5, when the node processing in the window 1 in the stage 1 is completed, a stage is allocated to the window and jumps to the stage 2.
The related technical solution is the same as that of embodiment 1, and will not be described here in detail.
EXAMPLE 3,
The embodiment also provides a data encoding method for copy certification; it should be noted that, in the execution process, the SDR algorithm adopting progressive memory allocation is not fixed according to the window size of the SDR algorithm, and the memory space occupation amount is not fixed. When the SDR algorithm processes the first window in the execution process, the memory space occupation of the SDR algorithm is minimum. When the SDR algorithm processes the last window in the execution process, the memory space occupation of the SDR algorithm is maximum. At this time, if the SDR algorithm for processing the last window and the SDR algorithm for processing the first window start to process their corresponding windows at the same time, the memory space occupation amount reaches twice the average memory space occupation amount of progressive memory allocation. Of course, when the corresponding window is processed, the memory space still changes. However, when K SDR algorithms of the window number are executed concurrently, and each SDR algorithm starts to process mutually different windows at the same time, the memory space consumption reaches K times of the average memory space consumption of progressive memory allocation. When all window processing ends at the same time, the overall memory overhead does not change, although the memory space overhead changes for a single SDR algorithm.
In order to solve the above-mentioned problems, the data encoding method provided in the present embodiment is used for encoding a plurality of original data simultaneously, and includes: dividing each original data sequence into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method provided in the embodiment 1; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage. In the concurrent execution process, the execution process of each data encoding method is controlled, so that the memory allocation of each data encoding method to the current window and the processing of the current window are simultaneously started, and the processing of each data encoding method to the current window is simultaneously ended. Specifically, as shown in fig. 6, taking the case of concurrently executing the encoding of 4 original data, dividing a single-layer large-scale node into 4 windows, and simultaneously performing multiple concurrent computation of memory windows in different stages in a progressive memory allocation manner during the execution of the foregoing process, allocating memory space for the window to be processed in the next stage after the processing in the current stage is completed, and controlling the execution process of each data encoding method, so as to ensure that the memory allocation of each data encoding method for the current window and the processing of the current window begin simultaneously, and ensure that the processing of each data encoding method for the current window ends simultaneously. And after the windows 4-1 are respectively processed in the copy conversion process of the copies 1-4, entering the next stage. At this time, although the amount of memory space occupied by the execution process of the single data encoding method is changed, the total memory space occupied by the execution process of the 4 data encoding methods is not changed and the total memory space occupied is reduced.
The related technical solution is the same as that of embodiment 1, and will not be described here in detail.
It should be noted that, both the embodiment 2 and the embodiment 3 adopt the memory instant allocation policy. In the execution process of SDR algorithm, memory resources are pre-allocated in a pre-allocation mode, and memory space is pre-allocated before SDR starts to calculate nodes. However, since most of the memory space is not used for a long time after allocation in the progressive execution process of the SDR algorithm, the present invention proposes a policy of progressive memory allocation. First, it is proposed to decompose large-scale data in the layer into small data sets, each of which is called a window, and allocate memory before the current window starts calculation, so as to avoid occupation of a large amount of memory for a long time. In addition, the encoding methods in embodiment 1 can be executed simultaneously and in parallel, so that the memory space usage of a plurality of executed encoding methods is ensured to be unchanged, and the average memory space overhead is reduced. Experimental results show that when the 32GB copies are processed by concurrently executing the coding methods in 4 embodiments 1 through a progressive memory allocation strategy, compared with the original SDR algorithm, the memory space overhead can be reduced by 18.75% and the execution time is kept unchanged.
EXAMPLE 4,
A data encoding system for proof of replication, comprising: a memory storing a computer program and a processor executing the computer program to perform the data encoding method provided in embodiment 1, embodiment 2 and/or embodiment 3 of the present invention.
The related technical solutions are the same as those in embodiments 1-3, and are not described here in detail.
EXAMPLE 5,
A data decoding method for copy certification, comprising:
performing L-layer decoding on the data to be decoded serially:
first layer decoding: obtaining a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing the random data sequence into M' data blocks, wherein each data block corresponds to each node position in a first layer of the pre-built stack depth Lu Bangtu one by one; based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, sequentially decoding each data block, and sequentially generating M' decoding data to obtain decoding data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type deep robust graph, sequentially writing the decoding data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
Layer i decoding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading decoding data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of each node of the ith layer based on the dependency relationship among each node of the ith layer in the stack depth Lu Bangtu, and further decoding after splicing to obtain decoding data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the decoding data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the decoding data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
layer L decoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading decoding data corresponding to father nodes of the L-th layer nodes in the L-1-th layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of the L-th layer nodes based on the dependency relationship among the L-th layer nodes in the stacked depth Lu Bangtu, performing splicing, further performing decoding to obtain decoding data corresponding to the L-th layer nodes, and performing bitwise exclusive OR on the decoding data corresponding to the L-th layer nodes and the data to be decoded to obtain a final decoding result;
Wherein the stacked depth robust graph comprises an L layer; the number of nodes in each layer is M'; and randomly generated dependency relations exist between nodes in the same layer and between two adjacent layers of nodes.
In the same manner as in embodiment 1, in this embodiment, a thread may be allocated to each of the process of reading the decoded data corresponding to the upper node and the process of calculating the decoded data corresponding to the lower node, and the threads may be respectively denoted as a fifth thread and a sixth thread; the fifth thread reads the corresponding decoding data to the buffer area so that the sixth thread can calculate the decoding data corresponding to the lower node; the fifth thread and the sixth thread are executed concurrently, so that the execution time of the upper layer data write-back strategy is greatly reduced. In addition, a thread can be allocated to the sorting process of the decoding data corresponding to the father node and the process of writing the decoding data corresponding to the father node into the external memory respectively, and the thread is marked as a seventh thread and an eighth thread respectively, and the seventh thread and the eighth thread are executed concurrently, so that the execution time under the write-back strategy of the upper layer data is further reduced. The correlation analysis was the same as in example 1.
Preferably, after obtaining the decoding data corresponding to each node of the j-th layer, dividing the decoding data corresponding to each node of the j-th layer into different calculation windows, and respectively distributing a memory file for each calculation window; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, sequentially writing the decoding data in different calculation windows corresponding to all father nodes of each j+1th layer node into corresponding external memory files respectively, and deleting the decoding data corresponding to each j layer node in the internal memory; wherein j=1, 2, …, L-1;
Correspondingly, the method for calculating the decoding data corresponding to each node of the j+1th layer comprises the following steps:
when the decoding data corresponding to the node of the j+1 layer is calculated, respectively reading a preset number of decoding data which cannot be replaced from the head of each external memory file into a corresponding memory window in advance (a plurality of decoding data are read from the unreplacing sequence of the head of each external memory file in advance, specifically how much of the reading is related to the window size, the robust diagram parameters and the actual connection of the robust diagram, in this embodiment, 1000 decoding data are read from the unreplacing sequence of each external memory file in advance, wherein one decoding data is 256 bits, the decoding data of 32KB are read altogether, and one decoding data is read from the unreplacing sequence of the heads of each memory window of all the father nodes of the node of the j+1 layer respectively, so as to obtain the decoding data corresponding to all the father nodes of the node of the j+1 layer, and the read decoding data are ejected from the memory window; when decoding data corresponding to the next node of the j+1 layer is calculated, the decoding data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the decoding data are sequentially placed at the tail of a corresponding memory window; based on the above process, the decoding data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further decoded after being spliced, so as to obtain the decoding data corresponding to each node of the j+1 layer.
EXAMPLE 6,
A data decoding method for copy certification, comprising: dividing the original data sequence into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method provided by the embodiment 5 of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
The related technical solution is the same as embodiment 5, and will not be described here in detail.
EXAMPLE 7,
A data decoding method for copy certification for simultaneously decoding a plurality of original data, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method provided by the embodiment 5 of the invention; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
The related technical solution is the same as embodiment 5, and will not be described here in detail.
For the above embodiments 5 to 7, it should be noted that, the specific implementation of the data decoding method and the data encoding method in the present invention is the same, and only the data to be encoded in the data encoding method is replaced by the data to be decoded, and the related technical schemes and analysis are the same as those in embodiments 1 to 3, and are not repeated here.
In summary, the present invention innovatively utilizes two important features of stacked depth robust graph structures and algorithm execution processes: (1) node dependency is fixed; (2) The nodes are gradually generated, a novel technical framework for reducing the memory space overhead of the SDR algorithm is provided, namely, aiming at the characteristic of fixed node dependency, the intermediate data are sequentially stored in the external memory according to the use time sequence, so that the memory space overhead is greatly reduced; the intermediate data is accessed in a sequential mode to maximally utilize the memory bandwidth, so that higher algorithm performance is maintained; aiming at the characteristic of orderly generation of nodes, a progressive memory allocation mechanism is adopted, so that the total occupied amount of a plurality of SDR algorithm processes executed in parallel to the memory space is reduced, and the expense of the SDR memory space can be greatly reduced while higher algorithm performance is maintained.
EXAMPLE 8,
A data decoding system for proof of replication, comprising: a memory storing a computer program and a processor that when executing the computer program performs the data decoding method provided by embodiment 5, embodiment 6 and/or embodiment 7 of the present invention.
The related technical solutions are the same as those in embodiments 5 to 7, and are not described here in detail.
EXAMPLE 9,
A computer readable storage medium comprising a stored computer program, wherein the computer program, when executed by a processor, controls a device in which the storage medium is located to perform the data encoding method provided in embodiment 1, embodiment 2, embodiment 3 of the present invention, and/or the data decoding method provided in embodiment 5, embodiment 6, embodiment 7 of the present invention.
Related technical schemes are the same as those of embodiments 1-3 and embodiments 5-7, and are not described here again.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. A data encoding method for copy certification, comprising: performing L-layer encoding serially on data to be encoded:
first layer coding: obtaining a random seed according to the information of the data to be encoded, and generating random data with the same size as the data to be encoded according to the random seed; dividing random data into M data blocks sequentially, wherein each data block corresponds to each node position in a first layer of a pre-built stack depth Lu Bangtu one by one; coding each data block in sequence based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, and generating M coded data in sequence to obtain coded data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the coded data corresponding to each node of the first layer in the internal memory;
layer i coding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading coded data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, and based on the dependency relationship among each node in the ith layer of stack depth Lu Bangtu, sequentially obtaining coded data corresponding to all father nodes of each node of the ith layer, splicing, and further coding to obtain coded data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the coded data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the coded data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
Layer L encoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading coded data corresponding to father nodes in the L-1-th layer of each node of the L-th layer from an external memory, sequentially obtaining coded data corresponding to all father nodes of each node of the L-th layer based on the dependency relationship among the nodes in the L-th layer of the stacked depth Lu Bangtu, splicing and then further coding to obtain coded data corresponding to each node of the L-th layer, and carrying out bit-by-bit exclusive OR on the coded data corresponding to each node of the L-th layer and the data to be coded to obtain a final coding result;
wherein the stacked depth robust graph comprises an L layer; the number of the nodes in each layer is M; and randomly generated dependency relations exist between nodes in the same layer and between two adjacent layers of nodes.
2. The data encoding method according to claim 1, wherein after the encoded data corresponding to each node of the j-th layer is obtained, the encoded data corresponding to each node of the j-th layer is uniformly divided into different calculation windows, and each calculation window is respectively allocated with a memory file; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, the coded data in different calculation windows corresponding to all father nodes of each node in the j+1th layer are respectively written into corresponding external memory files in sequence, and the coded data corresponding to each node in the j layer in the internal memory is deleted; wherein j=1, 2, …, L-1;
Correspondingly, the calculation method of the coded data corresponding to each node of the j+1th layer comprises the following steps:
when the coded data corresponding to the node of the j+1 layer is calculated, respectively reading a preset number of coded data which cannot be put back from the head of each external memory file into the corresponding memory window in advance, respectively reading one coded data from the head of each memory window where the coded data corresponding to all the father nodes of the node of the j+1 layer are located in an unreplaceable sequence, obtaining the coded data corresponding to all the father nodes of the node of the j+1 layer, and expelling the read coded data from the memory window; when the coded data corresponding to the next node of the j+1 layer is calculated, the coded data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the coded data are sequentially placed at the tail of the corresponding memory window; based on the above process, the coded data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further coded after being spliced, so as to obtain the coded data corresponding to each node of the j+1 layer.
3. A data encoding method for copy certification, comprising: dividing the original data sequence into a plurality of continuous data sets, wherein the coding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method according to claim 1 or 2; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
4. A data encoding method for copy certification, for simultaneously encoding a plurality of original data, comprising: dividing each original data sequence into a plurality of continuous data sets, wherein the encoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be encoded, and executes the data encoding method according to claim 1 or 2; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
5. A data decoding method for copy certification, comprising: performing L-layer decoding on the data to be decoded serially:
first layer decoding: obtaining a random seed according to the information of the data to be decoded, and generating random data with the same size as the data to be decoded according to the random seed; dividing the random data sequence into M' data blocks, wherein each data block corresponds to each node position in a first layer of the pre-built stack depth Lu Bangtu one by one; based on the dependency relationship among the nodes in the first layer of the stack depth Lu Bangtu, sequentially decoding each data block, and sequentially generating M' decoding data to obtain decoding data corresponding to each node of the first layer; based on the relative dependency relationship of the second layer node to the first layer node in the stack type deep robust graph, sequentially writing the decoding data corresponding to the father node in the upper layer of each node of the second layer into the external memory, and deleting the decoding data corresponding to each node of the first layer in the internal memory;
Layer i decoding: based on the relative dependency relationship of the ith layer node to the ith-1 layer node in the stack depth robust graph, reading decoding data corresponding to the father node of each node of the ith layer in the ith-1 layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of each node of the ith layer based on the dependency relationship among each node of the ith layer in the stack depth Lu Bangtu, and further decoding after splicing to obtain decoding data corresponding to each node of the ith layer; based on the relative dependency relationship of the i+1th layer node to the i layer node in the stack type depth robust graph, sequentially writing the decoding data corresponding to the father node of each i+1th layer node in the i layer into the external memory, and deleting the decoding data corresponding to each i layer node in the internal memory; i=2, 3, …, L-1;
layer L decoding: based on the relative dependency relationship of the L-th layer node to the L-1-th layer node in the stacked depth robust graph, reading decoding data corresponding to father nodes of the L-th layer nodes in the L-1-th layer from an external memory, sequentially obtaining decoding data corresponding to all father nodes of the L-th layer nodes based on the dependency relationship among the L-th layer nodes in the stacked depth Lu Bangtu, performing splicing, further performing decoding to obtain decoding data corresponding to the L-th layer nodes, and performing bitwise exclusive OR on the decoding data corresponding to the L-th layer nodes and the data to be decoded to obtain a final decoding result;
Wherein the stacked depth robust graph comprises an L layer; the number of nodes in each layer is M'; and randomly generated dependency relations exist between nodes in the same layer and between two adjacent layers of nodes.
6. The method for decoding data according to claim 5, wherein after obtaining the decoded data corresponding to each node of the j-th layer, the decoded data corresponding to each node of the j-th layer is equally divided into different calculation windows, and each calculation window is respectively allocated with an external memory file; based on the relative dependency relationship of the j+1th layer node to the j layer node in the stack type deep robust graph and the dependency relationship among the nodes in the j+1th layer, sequentially writing the decoding data in different calculation windows corresponding to all father nodes of each j+1th layer node into corresponding external memory files respectively, and deleting the decoding data corresponding to each j layer node in the internal memory; wherein j=1, 2, …, L-1;
correspondingly, the method for calculating the decoding data corresponding to each node of the j+1th layer comprises the following steps:
when decoding data corresponding to a node of a j+1th layer is calculated, respectively reading preset quantity of decoding data which cannot be put back from the head of each external memory file into a corresponding internal memory window in advance, respectively reading one decoding data from the head of each internal memory window where the decoding data corresponding to all father nodes of the node of the j+1th layer are located in an unreback sequence, obtaining decoding data corresponding to all father nodes of the node of the j+1th layer, and expelling the read decoding data from the internal memory window; when decoding data corresponding to the next node of the j+1 layer is calculated, the decoding data of a preset number is read from the head of each external memory file in a non-replaceable mode, and the decoding data are sequentially placed at the tail of a corresponding memory window; based on the above process, the decoding data corresponding to all the father nodes of each node of the j+1 layer are sequentially obtained, and are further decoded after being spliced, so as to obtain the decoding data corresponding to each node of the j+1 layer.
7. A data decoding method for copy certification, comprising: dividing the original data sequence into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage and is sequentially executed according to the dividing sequence; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method according to claim 5 or 6; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
8. A data decoding method for copy certification, for simultaneously decoding a plurality of original data, comprising: sequentially dividing each original data into a plurality of continuous data sets, wherein the decoding process of each continuous data set corresponds to an execution stage; each original data is executed according to the execution stage, and each original data adopts a pipeline mode to execute the execution stage; each execution stage takes the corresponding continuous data set as data to be decoded, and executes the data decoding method according to claim 5 or 6; and only after the processing of the current execution stage is completed and when the next execution stage is entered, the memory space is allocated for the data required to be processed in the next execution stage.
9. A computer readable storage medium comprising a stored computer program, wherein the computer program, when run by a processor, controls a device in which the storage medium is located to perform the data encoding method of any one of claims 1-2, the data encoding method of claim 3, the data encoding method of claim 4, the data decoding method of claims 5-6, the data decoding method of claim 7, and/or the data decoding method of claim 8.
CN202210829009.8A 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification Active CN115361401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210829009.8A CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210829009.8A CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Publications (2)

Publication Number Publication Date
CN115361401A CN115361401A (en) 2022-11-18
CN115361401B true CN115361401B (en) 2024-04-05

Family

ID=84032494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210829009.8A Active CN115361401B (en) 2022-07-14 2022-07-14 Data encoding and decoding method and system for copy certification

Country Status (1)

Country Link
CN (1) CN115361401B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188048A (en) * 2013-02-01 2013-07-03 北京邮电大学 Network coding method oriented to peer-to-peer communication in tree topology structure
CN104904202A (en) * 2012-09-28 2015-09-09 三星电子株式会社 Video encoding method and apparatus for parallel processing using reference picture information, and video decoding method and apparatus for parallel processing using reference picture information
CN108540306A (en) * 2018-02-28 2018-09-14 博尔联科(厦门)智能技术有限公司 A kind of network node management method and its communicating control method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070177739A1 (en) * 2006-01-27 2007-08-02 Nec Laboratories America, Inc. Method and Apparatus for Distributed Data Replication

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104904202A (en) * 2012-09-28 2015-09-09 三星电子株式会社 Video encoding method and apparatus for parallel processing using reference picture information, and video decoding method and apparatus for parallel processing using reference picture information
CN103188048A (en) * 2013-02-01 2013-07-03 北京邮电大学 Network coding method oriented to peer-to-peer communication in tree topology structure
CN108540306A (en) * 2018-02-28 2018-09-14 博尔联科(厦门)智能技术有限公司 A kind of network node management method and its communicating control method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Early Identification of Critical Blocks: Making Replicated Distributed Storage Systems Reliable Against Node Failures;Juntao Fang;《IEEE Transactions on Parallel and Distributed Systems》;20180507;全文 *
分散式存储系统中多层次容错机制的设计与实现;罗宏宇;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115;全文 *
网格环境下基于编码机制的数据复制研究;陶钧;沙基昌;王晖;;计算机科学;20080225(02);全文 *

Also Published As

Publication number Publication date
CN115361401A (en) 2022-11-18

Similar Documents

Publication Publication Date Title
Wang et al. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD
CN107066393B (en) Method for improving mapping information density in address mapping table
US9471500B2 (en) Bucketized multi-index low-memory data structures
US8732538B2 (en) Programmable data storage management
US11349639B2 (en) Circuit and method for overcoming memory bottleneck of ASIC-resistant cryptographic algorithms
JP6678230B2 (en) Storage device
US20100169544A1 (en) Methods for distributing log block associativity for real-time system and flash memory devices performing the same
JP6608468B2 (en) Storage apparatus and control method thereof
CN104503703B (en) The treating method and apparatus of caching
US11061788B2 (en) Storage management method, electronic device, and computer program product
Yao et al. Building efficient key-value stores via a lightweight compaction tree
CN112579602A (en) Multi-version data storage method and device, computer equipment and storage medium
CN109558456A (en) A kind of file migration method, apparatus, equipment and readable storage medium storing program for executing
US9304946B2 (en) Hardware-base accelerator for managing copy-on-write of multi-level caches utilizing block copy-on-write differential update table
CN116501249A (en) Method for reducing repeated data read-write of GPU memory and related equipment
US20240070120A1 (en) Data processing method and apparatus
CN115361401B (en) Data encoding and decoding method and system for copy certification
CN116893786A (en) Data processing method and device, electronic equipment and storage medium
CN103279562B (en) A kind of method, device and database storage system for database L2 cache
KR20130089324A (en) Data i/o controller and system having the same
CN115827511A (en) Method and device for setting cache
JP6254986B2 (en) Information processing apparatus, access controller, and information processing method
US20210303153A1 (en) Method, device and computer program for data storage
CN109960588B (en) Read request scheduling method and system for heterogeneous memory cluster
JPWO2019008715A1 (en) Data load program, data load method and data load device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant