WO2022222786A1 - File storage method and apparatus, and device - Google Patents

File storage method and apparatus, and device Download PDF

Info

Publication number
WO2022222786A1
WO2022222786A1 PCT/CN2022/086300 CN2022086300W WO2022222786A1 WO 2022222786 A1 WO2022222786 A1 WO 2022222786A1 CN 2022086300 W CN2022086300 W CN 2022086300W WO 2022222786 A1 WO2022222786 A1 WO 2022222786A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
sets
positioning
data
file data
Prior art date
Application number
PCT/CN2022/086300
Other languages
French (fr)
Chinese (zh)
Inventor
印明亮
余珊
王凯
Original Assignee
支付宝(杭州)信息技术有限公司
蚂蚁区块链科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司, 蚂蚁区块链科技(上海)有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2022222786A1 publication Critical patent/WO2022222786A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Definitions

  • the present application relates to the field of blockchain technology, and in particular, to a method, device and equipment for document storage.
  • Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • the data blocks are sequentially connected to form a chain data structure according to the time sequence, and a distributed ledger that cannot be tampered with and cannot be forged by cryptography. It is essentially a decentralized distributed database system participated by nodes. Because the blockchain has the characteristics of decentralization, non-tampering of information, and autonomy, the use of blockchain to store document data has also received more and more attention and applications.
  • the embodiments of the present specification provide a method, device and device for document storage, so as to solve the problem of difficulty in locating tampered data in a file in existing document storage and verification methods.
  • a method for document storage includes: acquiring a target file to be stored; splitting to obtain multiple split file data; dividing the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; calculating the multiple file sets Corresponding check code; determine the positioning file data information in each of the file sets; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the positioning file
  • the content information of the data; the check code and the data information of the positioning file are stored in the blockchain network.
  • a file verification method provided by the embodiments of this specification includes: obtaining from a blockchain network the positioning file data information and a check code corresponding to the file to be verified; The multiple file sets are calculated by splitting the target file according to the preset splitting method to obtain multiple file data, and dividing the multiple files according to the preset splitting method.
  • the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data; based on the positioning file data information, determine multiple file sets corresponding to the files to be verified; calculating the check codes corresponding to the multiple file sets; comparing the calculated check codes corresponding to the multiple file sets with those obtained from the blockchain network The verification codes are compared to obtain a comparison result; based on the comparison result, the document to be verified is verified.
  • a file certificate storage device includes: a target file acquisition module for acquiring a target file to be stored for a certificate; a file data determination module for splitting the target file according to a preset splitting method to obtain the split multiple file data; the file set division module is used to divide the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; The verification code calculation module is used to calculate the verification codes corresponding to the multiple file sets; the location file data information determination module is used to determine the location file data information in each of the file sets; the location file data information includes: The location information of the location file data in the file set in the target file and the content information of the location file data; a data storage module for storing the check code and the location file data information in the area in the blockchain network.
  • a file verification device includes: a data acquisition module for acquiring, from a blockchain network, positioning file data information and a check code corresponding to a file to be verified; the check code is obtained by The multiple file sets corresponding to the target file in the certificate are calculated and obtained; the multiple file sets are obtained by splitting the target file according to the preset splitting method to obtain multiple file data, which are divided according to the preset splitting method.
  • the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data;
  • the file set determines a module for determining multiple file sets corresponding to the files to be verified based on the positioning file data information;
  • a check code calculation module for calculating the check codes corresponding to the multiple file sets;
  • a comparison module for comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result;
  • a verification module for comparing based on the comparison As a result, the document to be verified is verified.
  • a file certification device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores data that can be executed by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can: obtain the target file to be stored; The divided multiple file data; the multiple file data is divided into multiple file sets according to a preset division method; each of the file sets contains m elements; the check codes corresponding to the multiple file sets are calculated ; Determine the location file data information in each of the file sets; the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data; The check code and the positioning file data information are stored in the blockchain network.
  • a file verification device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores a program executable by the at least one processor. instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code It is obtained by calculating the multiple file sets corresponding to the pre-stored target files; the multiple file sets are splitting the target files according to the preset splitting method to obtain multiple file data, and according to obtained by dividing the plurality of file data by a preset division method; the location file data information includes location information of the location file data in the file set in the target file and content information of the location file data ; Based on the positioning file data information, determine multiple file sets corresponding to the document to be verified; Calculate the corresponding check codes of the multiple file sets; Calculate the corresponding check codes of the multiple file sets that are obtained Comparing with the check code obtained from the blockchain network to obtain
  • An embodiment of the present specification provides a computer-readable medium on which computer-readable instructions are stored, and the computer-readable instructions can be executed by a processor to implement a method for document storage and verification.
  • An embodiment of the present specification achieves the following beneficial effects: obtaining the target file to be stored; splitting the target file according to a preset splitting method to obtain multiple split file data; according to the preset splitting method Divide the multiple file data into multiple file sets; calculate the check codes corresponding to the multiple file sets, determine the location file data information in each file set, and store the check codes and the location file data information in the in the blockchain network.
  • FIG. 1 is a schematic diagram of an application scenario of a method for document storage and verification in the embodiment of this specification
  • FIG. 2 is a schematic flowchart of a method for document storage provided in an embodiment of the present specification
  • FIG. 3 is a schematic diagram of the arrangement of file sets provided by the embodiment of the present specification.
  • FIG. 4 is a schematic flowchart of a method for document storage provided in an embodiment of the present specification
  • FIG. 5 is a schematic diagram of preliminary positioning based on positioning file data information provided by an embodiment of the present specification
  • FIG. 6 is a schematic diagram of supplementary positioning based on positioning file data information provided by an embodiment of the present specification
  • FIG. 7 is a schematic diagram of verification based on a check code provided by an embodiment of the present specification.
  • FIG. 8 is a schematic structural diagram of a file certificate storage device according to an embodiment of the present specification.
  • FIG. 9 is a schematic structural diagram of a file verification device according to an embodiment of the present specification.
  • FIG. 10 is a schematic diagram of a document storage and verification device provided by an embodiment of the present specification.
  • Blockchain “Blockchain” technology was originally developed by a pseudonym “Satoshi Nakamoto” for Bitcoin (a digital currency).
  • a special distributed database technology designed can be understood as a data chain composed of sequential storage of multiple blocks.
  • the block header of each block contains the timestamp of this block and the information of the previous block.
  • the hash value and the hash value of the information of this block thereby realizing the mutual verification between blocks and forming an immutable blockchain.
  • Each block can be understood as a data block (unit of storing data).
  • blockchain is a series of data blocks that are correlated with each other using cryptographic methods.
  • Each data block contains the information of a network transaction, which is used to verify the validity of its information (anti-counterfeiting). and generate the next block.
  • a chain formed by connecting blocks end-to-end is called a blockchain. If you need to modify the data in the block, you need to modify the content of all blocks after this block, and modify the data backed up by all nodes in the blockchain network. Therefore, the blockchain has the characteristics of being difficult to tamper and delete. After the data has been saved to the blockchain, it is reliable as a method to maintain the integrity of the content.
  • Hash algorithm It transforms an input of arbitrary length (also called pre-image pre-image) into a fixed-length output through a hash algorithm, and the output is the hash value.
  • This transformation is a compressed map, the space of the hash value is usually much smaller than the space of the input, and different inputs may hash to the same output, so it is impossible to determine the unique input value from the hash value.
  • Erasure correction algorithm It is a coding fault-tolerant technology. It was first used in the communication industry to solve the problem of loss of some data during transmission. The basic principle is to segment the transmitted signal, add a certain check, and then make a certain connection between the segments , even if part of the signal is lost during transmission, the receiver can still calculate the complete information through an algorithm. For example: an error correction code algorithm (Reed-Solomen, RS algorithm for short).
  • each participant will have at least one piece of ledger data.
  • the document data is generally recorded directly on the blockchain, and the certificate is stored directly through the blockchain.
  • Adopting the mode of directly uploading files to the chain will cause the problem of data expansion.
  • Due to the limitation of the transaction size of the blockchain the size of the file will be strictly limited, which is convenient for storing certificates, and there will also be potential security risks, which is not conducive to the development of blockchain technology.
  • Hash on-chain a certain Hash algorithm is used to calculate the Hash value (digest value) of the file, and then the Hash value is uploaded to the chain to store the certificate.
  • SHA256 hash algorithm pre-calculate the SHA256 value of the file that needs to be stored, and then upload the 32-byte SHA256 value to the chain; in this way, no matter how large the source file is, it will be irreversibly converted into a 32-byte digest.
  • Hash value is calculated again, and compared with the chain.
  • Hash value is consistent, it means that the file has not been tampered with.
  • Hash value is inconsistent, it means that the file has been tampered with.
  • hash algorithms with sufficient strength, such as SHA256, SM3 and other algorithms; avoid using algorithms with potential security risks, such as CRC, MD5, etc.
  • FIG. 1 is a schematic diagram of an application scenario of a method for document storage and verification in an embodiment of this specification.
  • the target file X for the target file X to be certified, the target file X is analyzed to obtain the certification data X1 corresponding to the target file.
  • the certification data X1 may include a digest value, a check code and positioning file data information.
  • the depository data X1 is stored in the blockchain network 110 .
  • the original data X2 corresponding to the target file X can be stored in an external device 120 outside the blockchain network, such as a USB flash drive or other servers.
  • FIG. 2 is a schematic flowchart of a method for document storage provided by an embodiment of the present specification. From a program perspective, the execution body of the process may be a program mounted on an application server or an application client.
  • Step 210 Obtain the target file to be stored.
  • Blockchain data storage can mean that data is stored on the blockchain to achieve the purpose of anti-tampering, traceability, and trustworthy data sources.
  • the on-chain and off-chain collaborative work is adopted, and the file and the hash value are separated. Only the hash value of the file is stored on the chain, and the original file is stored off-chain. As long as the hash value of the file is calculated and compared with the hash value on the chain, it is known whether the file has been tampered with.
  • the data in the target file to be certified may be in any file form such as text, video, audio and picture. The entire file data corresponding to the target file does not need to be stored in the blockchain network, only the corresponding data that can represent the target file needs to be stored.
  • Step 220 Splitting the target file according to a preset splitting manner to obtain multiple split file data.
  • splitting can be: the process of combining different parts of a whole and being separated separately is splitting.
  • the mid-splitting in this step can be understood as: dividing the data constituting the target bit file into multiple file data in turn according to the byte length corresponding to the fixed file data, for example: there is data corresponding to 100 bytes in the target file,
  • the target file can be split into 100 pieces of file data according to the splitting method of 1 byte for each small file data.
  • every ten bytes can be a file data or element, that is, the data corresponding to the 1st to 10th bytes are marked as the first file data or the first element, and the 11th to 20th bytes are marked as the first file data or the first element.
  • the data corresponding to each byte is marked as the second file data or the second element, ..., and so on, split the target file into multiple file data or multiple elements.
  • the file data or elements obtained from the split are used for subsequent set division.
  • Step 230 Divide the multiple file data into multiple file sets according to a preset division method; each of the file sets includes m elements, where m ⁇ 1.
  • the preset division mode may represent the preset number of files included in each file set, for example, the preset division mode is that three files are one file set.
  • the multiple files are divided into multiple file sets according to a preset division method.
  • an element may represent one or more file data, specifically, an element may be file data.
  • an element may be file data.
  • the target file X it is divided into X1, X2, X3, and X4.
  • X1 , X2 are divided into set 1
  • X3 and X4 are divided into set 2.
  • the elements in set 1 may be specific data corresponding to X1 and X2.
  • elements in each file set is set to be fixed, there may be a situation where the number of elements in the last set is not enough.
  • fixed characters can be used to fill the missing elements in the set. Therefore, elements can also be fixed characters, eg: ⁇ 00.
  • Step 240 Calculate the check codes corresponding to the multiple file sets.
  • the check code can represent a piece of check data obtained by calculating on a certain length of original data through a certain algorithm (for example, a Hash algorithm or an erasure algorithm). By verifying the data, it can be determined whether the new data is consistent with the original data, and further, the original data can be deduced from the new data in which the original data has been slightly modified.
  • a certain algorithm for example, a Hash algorithm or an erasure algorithm.
  • a Hash algorithm or an erasure algorithm can be used to calculate the check code corresponding to each file.
  • Step 250 Determine the location file data information in each of the file sets.
  • the determining the location file data information in each of the file sets may specifically include: determining an element at a specific position in each group of the file sets as the location file data to obtain a location file data set.
  • the locating file data information may represent data information at a specific position in each file set, and more specifically, the locating file data information may include the position information of the locating file data in the file set in the target file and the content information of the positioning file data.
  • the locating file data may also be referred to as a locating code.
  • the location code may be an element in the file set, and the element may be one or more file data; after the target file is divided into multiple file sets, it can be roughly determined by the location code that the previously divided set is in the new data approximate location. For example, the first element in each set can be used for the positioning code. Suppose there are 9 elements in each set. According to the positioning code, it can be roughly determined that starting from the data corresponding to the positioning code, nine consecutive elements may belong to one File collection.
  • the positioning file data may be the original data in the target file, or may be a value obtained by calculating the original data, for example, it may be a hash obtained by calculating the data used for positioning value or check value. In this way, when storing the location file data in the blockchain network, it is only necessary to store the calculated hash value or check value.
  • Step 260 Store the check code and the positioning file data information in the blockchain network.
  • the target file When the target file is stored, not all the data corresponding to the entire target file is stored in the blockchain network, but the original data is stored on a device outside the blockchain network, for example, it can be stored in a mobile hard disk.
  • the data that needs to be stored in the blockchain network is the check code obtained by analyzing and calculating the target file and the data information of the positioning file. Subsequently, the tampered data in the target file can also be located through the check code and the location file data information.
  • the method in FIG. 2 obtains the target file to be stored; splits the target file according to the preset splitting method to obtain multiple split file data; divides the multiple file data according to the preset splitting method at most a file set; calculate the check codes corresponding to the multiple file sets, determine the location file data information in each file set, and store the check code and the location file data information in the blockchain network.
  • the method may further include: using a hash algorithm to calculate a digest value of the target file; and storing the digest value in the blockchain network.
  • a digest value corresponding to the target file can also be stored in the blockchain network, and the digest value is obtained by calculating the target file by using a digest algorithm (also called a hash algorithm or a hash algorithm).
  • the digest algorithm is used to prevent tampering. For example, assuming that the content in the target file is calculated by MD5, the internal digest value obtained is A1. During verification, if the digest value corresponding to the obtained file is B, which is different from the digest of the original text, it can be determined that the target file has been tampered with.
  • the digest function is a one-way function
  • the digest function f() is used to calculate a fixed-length digest value for data of any length, but it is difficult to deduce the original data through the digest value. Slight changes to the original data can result in completely different summaries. Therefore, if the digest value is changed, it can be determined that the original data has been tampered with.
  • the digest value corresponding to the target file is calculated, and the digest value is stored in the blockchain network. Compare the digest values of , and if they are consistent, it can be determined that the target file has not been tampered with, and there is no need to perform the subsequent steps of locating the tampered data in the file. If the digest values are inconsistent, the solutions in the embodiments of this specification can be further used to locate the tampered data in the target file.
  • the storing the check code and the positioning file data information in the blockchain network may specifically include: storing the digest value, the check code and all the data according to a preset splicing method.
  • the positioning file data information is spliced to obtain the certificate data corresponding to the target file; and the certificate data is stored in the blockchain network.
  • the data stored in the blockchain network can be called certification data
  • the certification data can include digest value, check code and positioning file data information.
  • the certificate data may be obtained by splicing the digest value, the check code and the location file data information.
  • the splicing method may be sequential splicing, or may be splicing according to other preset methods.
  • the certificate data corresponding to the target file is stored in the blockchain system, and the digest value in the certificate data can preliminarily determine whether the target file has been tampered with, and by locating the file data information, it can be determined that the target file is divided into the certificate when it is stored.
  • File collection, the tampered data in the target file can be determined through the check code.
  • the storing the deposit data in the blockchain network may specifically include: generating a deposit certificate including authentication information based on the deposit data; storing the deposit certificate including the deposit certificate Data is sent to the blockchain network for storage.
  • the deposit certificate can correspond to a network address and/or picture, which can be used to view the deposit certificate corresponding to the deposit data; the deposit certificate can be viewed according to the corresponding webpage or picture of the network address, and it can be displayed in the blockchain network. Validate the deposit certificate to ensure the authenticity of the deposit data.
  • dividing the multiple file data into multiple file sets according to a preset division method may specifically include: dividing the multiple file data into N files according to the preset number of files in the file set. Collections; elements in each of said document collections have an order between them.
  • the multiple file data can be divided into multiple file sets.
  • the preset number of files in each file set for example: in each file set The number of files is 5
  • the data of multiple files is divided into multiple file sets.
  • the elements in each set can have an order.
  • the target file When the target file is split, it can be understood as determining the data corresponding to each byte in the target file.
  • the split does not affect the original data in the target file.
  • the split data will not be stored, and the splitting step is only used for the subsequent steps of dividing into sets. Therefore, multiple file data can be divided into multiple file sets in turn, for example: after splitting the target file
  • the obtained file data is A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14.
  • the file data is divided into multiple file sets in sequence, which is helpful for locating the location of the tampered data more quickly when verifying the tampered condition of the target file later.
  • the method may further include: arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns;
  • the check codes corresponding to the multiple file sets may specifically include: using an erasure correction algorithm to generate a corresponding check code for each row of the matrices corresponding to the multiple file sets; The corresponding check code is generated for each column of file collections.
  • the multiple file sets can be arranged in the form of a matrix.
  • FIG. 3 is a schematic diagram of arrangement of file sets provided in an embodiment of the present specification.
  • multiple file sets can be labeled in sequence, for example: A 11 , A 12 , ..., A 1n , ..., A 21 , A 31 , ... , A n1 , etc.
  • Spliced into an N*N matrix the value of N can be adjusted according to the specific situation. If there are remaining file sets, you can continue to label and concatenate into the next N*N matrix.
  • a check code can be selected, and the check code of each row and the check code of each column of the matrix in FIG. 3 can be calculated respectively.
  • the check code is RS 1
  • the check code corresponding to the second row is RS 2
  • the check code corresponding to the third row is RS 3
  • the check code corresponding to the Nth row is RS n
  • the corresponding check code of the first column is RS n .
  • the check code is RS n+1
  • the check code corresponding to the second column is RS n+2
  • the check code corresponding to the third column is RS n+3
  • ... the check code corresponding to the Nth column is RS n+n .
  • the algorithm used is not limited to using the erasure algorithm, and other types of Hash algorithms can also be used.
  • the file is verified and stored in blocks, and a check code and a positioning code are introduced to correct errors during file verification.
  • By writing the file digest check code and locating the file data information for storage on the blockchain using a controllable storage cost, it is possible to determine which file sets in the large file have been modified and which are the original content, and can still be used. Provide proof of non-tampering of correct or missing parts.
  • FIG. 4 is a schematic flowchart of a method for document storage provided by an embodiment of the present specification.
  • the verification phase it can detect which shards in the large file have been modified, and can still provide unmodified proofs for unmodified shards; further, if the number of modified shards is small, it can be pointed out in the verification phase that these shards have been modified. how to modify.
  • Step 410 Obtain the location file data information and check code corresponding to the file to be verified from the blockchain network.
  • the check code can be obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets can be divided according to the preset splitting method to obtain A plurality of file data is obtained by dividing the plurality of file data according to a preset division method; the positioning file data information may include the position information of the positioning file data in the file set in the target file and Content information of the location file data.
  • the stored certificate data corresponding to the document to be verified is obtained from the blockchain network.
  • the certificate data may include the digest value of the to-be-verified file, the check code, and the location file data information.
  • the digest value may be calculated on pre-certified target files.
  • Step 420 Determine multiple file sets corresponding to the to-be-verified file based on the location file data information.
  • the multiple file sets corresponding to the files to be verified can be determined according to the preset division method used in the certification stage.
  • the location information in the locating file data information and the content information of the locating file data can roughly locate the elements of the file set corresponding to the file to be verified in the certification stage, for example: determine from the obtained locating file data information.
  • the first element in each file set is used as the positioning code, then the first element can be found in the file to be verified as the first element of the first set, and then according to each The number of fixed elements in the set, for example: 9, in the data corresponding to the file to be verified, find 9 elements in sequence starting from the first element, as the first set of files, and so on, to find out the possibility of the file to be identified
  • the corresponding set of files Finally, multiple file sets corresponding to the files to be verified are determined.
  • Step 430 Calculate the check codes corresponding to the multiple file sets.
  • Step 440 Compare the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result.
  • Step 450 Verify the to-be-verified file based on the comparison result.
  • the calculated check code can be compared with the check code obtained from the blockchain network, so as to locate the tampered file set.
  • the method in FIG. 4 obtains the location file data information and check code corresponding to the file to be verified from the blockchain network; based on the location file data information, multiple file sets corresponding to the to-be-verified file are determined; calculate Check codes corresponding to the multiple file sets; compare the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result; Based on the comparison result, the to-be-verified file is verified.
  • the tampered data can be located by determining the specific set of tampered files in the files to be verified from the certificate data stored in the blockchain network.
  • the obtaining the location file data information and the check code corresponding to the file to be verified from the blockchain network may specifically include: obtaining identification information corresponding to the file to be verified; The positioning file data information and the check code corresponding to the to-be-verified file are obtained in the blockchain network.
  • the identification information can represent information that can uniquely identify the file to be verified.
  • the storage data corresponding to the document to be verified can be obtained from the blockchain network. Further, in order to ensure the storage data Before obtaining the certificate data corresponding to the document to be verified based on the identification information, the requester requesting to obtain the certificate data corresponding to the document to be verified can be required to provide an authorization statement, and the authorization statement can be issued by the holder of the document to be verified. The specific address and the authorization statement issued by the party can carry the digital signature of the holder.
  • the blockchain network receives the request for obtaining the data of the deposit, it can review the authorization statement carried in the request for the data of the deposit.
  • the certificate data corresponding to the file to be verified can be provided to the requester based on the identification information in the request.
  • the method may further include: calculating the digest value of the to-be-verified file by using the Hash algorithm used when storing the certificate. ; Compare the digest value of the described document to be verified that is calculated with the digest value in the data on record; When the digest value of the document to be verified calculated is consistent with the digest value in the data on record when the file to be verified has not been tampered with; when the calculated digest value of the file to be verified is inconsistent with the digest value in the certificate data, determine the file to be verified based on the location file data information A collection of multiple files corresponding to a file.
  • the determining, based on the location file data information, multiple file sets corresponding to the to-be-verified file may specifically include: determining a first location file data set based on the location information; based on a dynamic programming algorithm, Determine from the first positioning file set a second positioning file data set whose coverage satisfies a preset condition; compare the second positioning file data set with pre-stored positioning file data information to determine inconsistent positioning file data information; according to the inconsistent positioning file data information, determine a third positioning file data set, so that the matching rate between the third positioning file data set and the pre-stored positioning file data is maximized; based on the third positioning file data set, and multiple file sets corresponding to the to-be-verified file are determined.
  • the first positioning file data set may represent all possible positioning code sets determined according to fixed intervals, and the second positioning file data set is based on the dynamic programming algorithm, and the determined matrix coverage meets the preset conditions.
  • the third positioning file data set is the final positioning code set obtained after performing supplementary positioning on the second positioning file data set.
  • the goal is to obtain the positioning code from the evidence data, and make the matrix determined by the positioning code in the verified file consistent with the matrix in the original file as much as possible.
  • the corresponding stage can be divided into preliminary positioning and supplementary positioning: the preliminary positioning can be based on the fixed interval between the file sets when the certificate is stored. content as much as possible. It should be noted that, in practical applications, when there are multiple arrangements with the same coverage ratio, the overall matching ratio of the matrices determined by these arrangements can be further compared, and a positioning code arrangement with the highest matching ratio can be selected.
  • FIG. 5 provides preliminary positioning based on the positioning file data information provided by the embodiment of this specification Schematic.
  • the intervals between A11 and B11, E11 and F11 and G11, and H11 and I11 are all fixed intervals when the certificate is stored; there is an unrelated location code between B11 and E11 C11, D11 and there are data not covered by the matrix determined by the positioning code; at the same time, the positioning code H11 is located inside the matrix determined by the positioning code G11.
  • Supplementary positioning can be a supplement to the preliminary positioning.
  • the supplementary positioning is responsible for inserting these positioning codes into these two positions of the verified file. Between certain positioning codes, the matching rate of each matrix is made as high as possible. Description will be made with reference to FIG. 6 : FIG. 6 is a schematic diagram of supplementary positioning based on positioning file data information provided by an embodiment of the present specification.
  • the matrix determined by the two positioning codes C11 and D11 is inserted through complementary positioning. For example: record each positioning code as a 1 ,a 2 ,...,an , the size of the matrix determined by the positioning code when the certificate is stored is L bytes, and the byte stream of the verified file is d 1 ,d 2 ,...,d m .
  • Valid positioning interval (d i ,d i+1 ,...,d i+(n-1)*L+3 ]
  • Valid positioning interval (d i ,d i+1 ,...,d i+(n-1)*L+3 )
  • all valid intervals on the verified file are arranged as S 1 , S 2 ,...,S n in ascending order according to the byte stream position of the termination positioning code (d i+(n-1)*L+3 ), and the length of each interval is X 1 , X 2 ,...,X n .
  • Length number of positioning codes - 1, as well as the initial positioning code serial numbers a 1 , a 2 , ..., a n of each interval and the end positions d 1 , d 2 , ..., dn in the verified file; then the problem is Converted to selecting interval sequence [S R1 ,S R2 ,...,S Rm ] from S, satisfying a i +X i ⁇ a i+1 , and maximizing X R1 +X R2 +...+X Rm .
  • This problem can be solved by the following dynamic programming when a i +X i ⁇ a i+1 is ignored.
  • the state transition equation is: when a i +X i ⁇ a i+1 is added, the following two-dimensional dynamic programming can be used to solve it.
  • an interval sequence [S R1 ,S R2 ,...,S Rm ] in S can be obtained, which satisfies a i +X i ⁇ a i+1 , and maximizes X R1 +X R2 +...+X Rm ; It is an arrangement of the positioning code in the original file according to the fixed interval of the positioning code when the certificate is stored, which can make the content covered by the matrix determined by the positioning code in the verified file as much as possible.
  • the undetermined positioning codes in the recording interval are a 1 , a 2 ,...,an , and the size of the matrix determined by the positioning codes when the certificate is stored is L bytes.
  • the first step is to look for the first positioning code a 1 in the interval (if not found, then look for a 2 , if not found, skip the supplementary positioning); the second step is to find a2 on the left and right of a 1 +L (if not found) Then look for a3, the leftmost position on the left to a1, and the right side to the end of the interval); the third step repeats step 2 until the interval ends or there is no next undetermined positioning code.
  • the location of the positioning code determined in the above three steps is the supplementary determined positioning code.
  • the calculating the check codes corresponding to the multiple file sets may specifically include: arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; using an erasure correction algorithm, A corresponding row check code is generated for the multiple file sets corresponding to each row of the matrix; and a corresponding column check code is generated for the multiple file sets corresponding to each column of the matrix.
  • each positioning code eg A11, B11, C11
  • the file data division method of each set for example, 4 Bytes
  • FIG. 7 is a schematic diagram of verification based on a check code according to an embodiment of the present specification.
  • comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result which may specifically include: of the file sets in the i-th line, obtain the line check codes corresponding to the i-th line of the file sets from the blockchain network; Comparing the line check codes corresponding to the plurality of the file sets with the line check codes corresponding to the plurality of the file sets of the i-th row obtained from the blockchain to obtain a line check code comparison Result: for the plurality of file sets in the jth column of the matrix, obtain the column check codes corresponding to the jth column of the file sets from the blockchain network; calculate the result The column check codes corresponding to the plurality of the document sets in the jth column are compared with the column check codes corresponding to the plurality of the document sets in the jth column obtained from the blockchain, Get the column check code comparison result.
  • the specific tampered data can be determined, for example, by comparing the row check codes It is determined that the check codes in the third row in the matrix are inconsistent. By comparing the column check codes, it is determined that the check codes in the second column are inconsistent, and it can be determined that the data at the third row and the second column in the matrix has been tampered with. .
  • part of the modified data can be recovered through the check code.
  • the algorithm used in the calculation of the check code is an erasure algorithm
  • the technical effects that can be achieved by the above two embodiments are as follows: 1) In the scheme of storing the certificate on the chain of the large file Hash, the verifier cannot know which part of the file has been tampered with, and can only obtain whether the entire file is consistent with the original one. Final result.
  • the technical solutions of the embodiments of this specification can independently verify each file set of a single large file, and use a controllable storage cost to determine which file sets in the large file have been modified and which are original content. In this way, during verification, the contents of the modified file collection can be directly found, and other contents can still be proved that they have not been tampered with.
  • Hash on-chain deposit certificate does not store the original file, the source file is hosted and stored in an external system. There is a very small probability that the external system may cause partial damage to the file. For example, when using disk storage, the magnetic poles are reversed, causing data errors, or bad sectors causing data loss.
  • the solutions in the embodiments of this specification can still provide proof of non-tampering for the correct or non-lost part; further, when the tampered data or the lost data accounts for a small part of the sorted data, the lost data can be verified through the certificate information. or tampered data to restore.
  • FIG. 8 is a schematic structural diagram of a file certificate storage device according to an embodiment of the present specification.
  • the device may include: a target file acquisition module 810 for acquiring a target file to be stored; a file data determination module 820 for splitting the target file according to a preset splitting method, Obtaining the split multiple file data; a file set dividing module 830, configured to divide the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; verifying
  • the code calculation module 840 is used to calculate the check codes corresponding to the multiple file sets;
  • the location file data information determination module 850 is used to determine the location file data information in each of the file sets; the location file data information Including the location information of the location file data in the file set in the target file and the content information of the location file data; the data storage module 860 is used to store the check code and the location file data information.
  • a target file acquisition module 810 for acquiring a target file to be stored
  • the apparatus may further include: a digest value calculation module for calculating the digest value of the target file by using a hash algorithm; a digest value storage module for storing the digest value in the block in the chain network.
  • the data storage module 860 may specifically include: a storage certificate data determination unit, configured to splicing the digest value, the check code and the positioning file data information according to a preset splicing method, Obtain the certificate data corresponding to the target file; the certificate data storage unit is used to store the certificate data in the blockchain network.
  • a storage certificate data determination unit configured to splicing the digest value, the check code and the positioning file data information according to a preset splicing method, Obtain the certificate data corresponding to the target file; the certificate data storage unit is used to store the certificate data in the blockchain network.
  • the storage unit for the storage data may be specifically configured to: generate a certificate of storage including authentication information based on the storage data; send the storage data including the certificate of storage to the block storage in the chain network.
  • the file set dividing module 830 may specifically include: a dividing unit, configured to divide the multiple file data into N file sets according to the preset number of files in the file set; The elements in the collection have order between them.
  • the apparatus may further include: a file set arrangement module, configured to arrange the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; the check code calculation module, specifically It includes: a check code calculation unit, configured to use an erasure correction algorithm to generate a corresponding check code for each row of the matrix corresponding to the multiple file sets; for each column of the file set of the matrix corresponding to the multiple file sets Generate the corresponding check code.
  • a file set arrangement module configured to arrange the N file sets in the form of a matrix to obtain a matrix with N rows and M columns
  • the check code calculation module specifically It includes: a check code calculation unit, configured to use an erasure correction algorithm to generate a corresponding check code for each row of the matrix corresponding to the multiple file sets; for each column of the file set of the matrix corresponding to the multiple file sets Generate the corresponding check code.
  • the positioning file data information determining module 850 may specifically include: a positioning file data information determining unit, configured to determine an element at a specific position in each group of the file sets as positioning file data to obtain a positioning file. data collection.
  • FIG. 9 is a schematic structural diagram of a file verification apparatus according to an embodiment of the present specification.
  • the device may include: a data acquisition module 910 for acquiring the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by The multiple file sets corresponding to the target file of the obtained by dividing the multiple file data; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data; the file set determining module 920 , for determining multiple file sets corresponding to the files to be verified based on the positioning file data information; a check code calculation module 930 for calculating the check codes corresponding to the multiple file sets; a comparison module 940 , for comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result; the verification module 950 is used for Comparing
  • the data acquisition module 910 may specifically include: an identification information acquisition unit for acquiring identification information corresponding to the to-be-verified file; a data acquisition unit for obtaining an identification information from the blockchain based on the identification information.
  • the positioning file data information and the check code corresponding to the file to be verified are obtained from the network.
  • the device may further include: a digest value calculation module, used for calculating the digest value of the to-be-verified file by using the Hash algorithm used when storing the certificate; a digest value comparison module, used for calculating the calculated digest value.
  • the digest value of the document to be verified is compared with the digest value in the certificate data; the tampering situation determination unit is used to calculate the digest value of the document to be verified and the digest in the certificate data.
  • the values are consistent, it is determined that the file to be verified has not been tampered with; when the calculated digest value of the to-be-verified file is inconsistent with the digest value in the certificate data, based on the location file data information, it is determined that the Multiple file sets corresponding to the files to be verified.
  • the file set determining module 920 may specifically include: a first positioning file data set determining unit, configured to determine a first positioning file data set based on the location information; a second positioning file data set determining unit, Based on a dynamic programming algorithm, determine a second positioning file data set whose coverage meets a preset condition from the first positioning file set; a positioning file data information comparison unit is used to compare the second positioning file data set with the The pre-stored positioning file data information is compared to determine inconsistent positioning file data information; the third positioning file data set determining unit is used to determine the third positioning file data set according to the inconsistent positioning file data information, so that the The matching rate between the third positioning file data set and the pre-stored positioning file data is the largest; the file set determining unit is configured to determine, based on the third positioning file data set, multiple file sets corresponding to the to-be-verified file.
  • the check code calculation module 930 may specifically include: a file set arrangement unit for arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; check code generation The unit is configured to use an erasure correction algorithm to generate corresponding row check codes for the multiple file sets corresponding to each row of the matrix; generate the corresponding multiple file sets corresponding to each column of the matrix.
  • Column check code for generating corresponding row check codes for the multiple file sets corresponding to each row of the matrix.
  • the comparison module 940 may be specifically configured to: for a plurality of the file sets in the i-th row in the matrix, obtain a plurality of the i-th row from the blockchain network The line check code corresponding to the file set; compare the calculated line check code corresponding to the i-th line of the file set with the i-th line obtained from the blockchain. The row check codes corresponding to the file sets are compared to obtain a row check code comparison result; for the plurality of the document sets in the jth column of the matrix, all the file sets are obtained from the blockchain network.
  • the column check codes corresponding to the plurality of the document sets in the jth column are compared to obtain a column check code comparison result.
  • the embodiments of this specification also provide a device corresponding to the above method.
  • FIG. 10 is a schematic diagram of a document storage and verification device provided by an embodiment of the present specification.
  • the device 1000 may include: at least one processor 1010 ; and a memory 1030 communicatively connected to the at least one processor; wherein the memory 1030 stores information executable by the at least one processor 1010 corresponding to Embodiment 1, the instruction 1020 is executed by the at least one processor 1010, so that the at least one processor 1010 can: obtain the target file to be stored; The target file is split to obtain multiple split file data; the multiple file data is divided into multiple file sets according to a preset division method; each of the file sets contains m elements; the corresponding check codes of the multiple file sets; determine the location file data information in each of the file sets; the location file data information includes the location information of the location file data in the file set in the target file. and the content information of the positioning file data; the check code and the positioning file data information are stored in the blockchain network.
  • the instruction 1020 is executed by the at least one processor 1010, so that the at least one processor 1010 can: obtain the location file data information corresponding to the to-be-verified file and verify it from the blockchain network
  • the check code is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets are splitting the target file according to the preset splitting method to obtain
  • a plurality of file data is obtained by dividing the plurality of file data according to a preset division method; the location file data information includes the location information of the location file data in the file set in the target file and the location information of the target file.
  • the content information of the location file data based on the location file data information, determine multiple file sets corresponding to the to-be-verified file; calculate the check codes corresponding to the multiple file sets;
  • the check code corresponding to the file set is compared with the check code obtained from the blockchain network to obtain a comparison result; based on the comparison result, the to-be-verified file is verified.
  • the embodiments of the present specification also provide a computer-readable medium corresponding to the above method.
  • Computer-readable instructions are stored on the computer-readable medium, corresponding to Embodiment 1, and the computer-readable instructions can be executed by the processor to implement the following methods: acquiring the target file to be stored; splitting the target file to obtain multiple file data after splitting; dividing the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; calculating the Check codes corresponding to multiple file sets; determine the location file data information in each of the file sets; the location file data information includes location information of the location file data in the file set in the target file and The content information of the positioning file data; the check code and the positioning file data information are stored in the blockchain network.
  • the computer-readable instructions can be executed by the processor to implement the following method: obtain the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by It is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets are obtained by splitting the target file according to the preset splitting method to obtain multiple file data, and the multiple file sets are divided according to the preset method.
  • the division method is obtained by dividing the plurality of file data; the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data; based on Locating the file data information, determining multiple file sets corresponding to the to-be-verified file; calculating the check codes corresponding to the multiple file sets; comparing the calculated check codes corresponding to the multiple file sets with the The verification codes obtained in the blockchain network are compared to obtain a comparison result; based on the comparison result, the to-be-verified file is verified.
  • a Programmable Logic Device (such as a Field Programmable Gate Array (FPGA)) is an integrated circuit whose logic function is determined by user programming of the device.
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory.
  • the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers
  • ASICs application specific integrated circuits
  • controllers include but are not limited to
  • the controller in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps.
  • the same function can be realized in the form of a microcontroller, etc. Therefore, this kind of controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure in the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device Or a combination of any of these devices.
  • embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer readable media includes both persistent and non-permanent, removable and non-removable media and can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD), or other optical storage , magnetic tape cartridges, magnetic tape-disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

A file storage method and apparatus, and a device. The method comprises: acquiring a target file to be stored (210); according to a pre-set splitting mode, splitting the target file to obtain a plurality of pieces of split file data (220); according to a pre-set division mode, dividing the plurality of pieces of file data into a plurality of file sets (230); calculating check codes corresponding to the plurality of file sets (240); determining positioning file data information in each file set (250); and storing the check codes and the positioning file data information in a blockchain network (260).

Description

文件存证的方法、装置及设备Method, device and equipment for document preservation 技术领域technical field
本申请涉及区块链技术领域,尤其涉及一种文件存证方法、装置及设备。The present application relates to the field of blockchain technology, and in particular, to a method, device and equipment for document storage.
背景技术Background technique
区块链(Blockchain)是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链系统中按照时间顺序将数据区块以顺序相连的方式组合成链式数据结构,并以密码学方式保证的不可篡改和不可伪造的分布式账本。其本质上是一个由节点参与的去中心化的分布式数据库系统。由于区块链具有去中心化、信息不可篡改、自治性等特性,采用区块链存证文件数据,也受到人们越来越多的重视和应用。Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. In the blockchain system, the data blocks are sequentially connected to form a chain data structure according to the time sequence, and a distributed ledger that cannot be tampered with and cannot be forged by cryptography. It is essentially a decentralized distributed database system participated by nodes. Because the blockchain has the characteristics of decentralization, non-tampering of information, and autonomy, the use of blockchain to store document data has also received more and more attention and applications.
发明内容SUMMARY OF THE INVENTION
本说明书实施例提供一种文件存证方法、装置及设备,以解决现有的文件存证以及验证方法中存在的难以定位文件中被篡改数据的问题。The embodiments of the present specification provide a method, device and device for document storage, so as to solve the problem of difficulty in locating tampered data in a file in existing document storage and verification methods.
为解决上述技术问题,本说明书实施例是这样实现的:本说明书实施例提供的一种文件存证方法,包括:获取待存证的目标文件;按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;计算所述多个文件集合对应的校验码;确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;将所述校验码以及所述定位文件数据信息存入区块链网络中。In order to solve the above technical problems, the embodiments of this specification are implemented as follows: a method for document storage provided by the embodiments of this specification includes: acquiring a target file to be stored; splitting to obtain multiple split file data; dividing the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; calculating the multiple file sets Corresponding check code; determine the positioning file data information in each of the file sets; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the positioning file The content information of the data; the check code and the data information of the positioning file are stored in the blockchain network.
本说明书实施例提供的一种文件验证方法,包括:从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;计算所述多个文件集合对应的校验码;将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;基于所述比对结果,对所述待验证文件进行验证。A file verification method provided by the embodiments of this specification includes: obtaining from a blockchain network the positioning file data information and a check code corresponding to the file to be verified; The multiple file sets are calculated by splitting the target file according to the preset splitting method to obtain multiple file data, and dividing the multiple files according to the preset splitting method. obtained by dividing the data; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data; based on the positioning file data information, determine multiple file sets corresponding to the files to be verified; calculating the check codes corresponding to the multiple file sets; comparing the calculated check codes corresponding to the multiple file sets with those obtained from the blockchain network The verification codes are compared to obtain a comparison result; based on the comparison result, the document to be verified is verified.
本说明书实施例提供的一种文件存证装置,包括:目标文件获取模块,用于获取待存证的目标文件;文件数据确定模块,用于按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;文件集合划分模块,用于按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;校验码计算模块,用于计算所述多个文件集合对应的校验码;定位文件数据信息确定模块,用于确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;数据存储模块,用于将所述校验码以及所述定位文件数据信息存入区块链网络中。A file certificate storage device provided by an embodiment of the present specification includes: a target file acquisition module for acquiring a target file to be stored for a certificate; a file data determination module for splitting the target file according to a preset splitting method to obtain the split multiple file data; the file set division module is used to divide the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; The verification code calculation module is used to calculate the verification codes corresponding to the multiple file sets; the location file data information determination module is used to determine the location file data information in each of the file sets; the location file data information includes: The location information of the location file data in the file set in the target file and the content information of the location file data; a data storage module for storing the check code and the location file data information in the area in the blockchain network.
本说明书实施例提供的一种文件验证装置,包括:数据获取模块,用于从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文 件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;文件集合确定模块,用于基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;校验码计算模块,用于计算所述多个文件集合对应的校验码;比对模块,用于将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;验证模块,用于基于所述比对结果,对所述待验证文件进行验证。A file verification device provided by the embodiments of this specification includes: a data acquisition module for acquiring, from a blockchain network, positioning file data information and a check code corresponding to a file to be verified; the check code is obtained by The multiple file sets corresponding to the target file in the certificate are calculated and obtained; the multiple file sets are obtained by splitting the target file according to the preset splitting method to obtain multiple file data, which are divided according to the preset splitting method. Obtained by dividing the plurality of file data; the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data; the file set determines a module for determining multiple file sets corresponding to the files to be verified based on the positioning file data information; a check code calculation module for calculating the check codes corresponding to the multiple file sets; a comparison module, for comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result; a verification module for comparing based on the comparison As a result, the document to be verified is verified.
本说明书实施例提供的一种文件存证设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:获取待存证的目标文件;按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;计算所述多个文件集合对应的校验码;确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;将所述校验码以及所述定位文件数据信息存入区块链网络中。A file certification device provided by an embodiment of the present specification includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores data that can be executed by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can: obtain the target file to be stored; The divided multiple file data; the multiple file data is divided into multiple file sets according to a preset division method; each of the file sets contains m elements; the check codes corresponding to the multiple file sets are calculated ; Determine the location file data information in each of the file sets; the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data; The check code and the positioning file data information are stored in the blockchain network.
本说明书实施例提供的一种文件验证设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;计算所述多个文件集合对应的校验码;将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;基于所述比对结果,对所述待验证文件进行验证。A file verification device provided by an embodiment of the present specification includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores a program executable by the at least one processor. instruction, the instruction is executed by the at least one processor, so that the at least one processor can: obtain the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code It is obtained by calculating the multiple file sets corresponding to the pre-stored target files; the multiple file sets are splitting the target files according to the preset splitting method to obtain multiple file data, and according to obtained by dividing the plurality of file data by a preset division method; the location file data information includes location information of the location file data in the file set in the target file and content information of the location file data ; Based on the positioning file data information, determine multiple file sets corresponding to the document to be verified; Calculate the corresponding check codes of the multiple file sets; Calculate the corresponding check codes of the multiple file sets that are obtained Comparing with the check code obtained from the blockchain network to obtain a comparison result; and verifying the to-be-verified file based on the comparison result.
本说明书实施例提供的一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现一种文件存证、验证方法。An embodiment of the present specification provides a computer-readable medium on which computer-readable instructions are stored, and the computer-readable instructions can be executed by a processor to implement a method for document storage and verification.
本说明书一个实施例实现了能够达到以下有益效果:通过获取待存证的目标文件;按照预设拆分方式对目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将多个文件数据划分至多个文件集合;计算多个文件集合对应的校验码,确定每个文件集合中的定位文件数据信息,并将所述校验码以及所述定位文件数据信息存入区块链网络中。通过分块大文件链上存证方法,不用将目标文件对应的全部数据存储在区块链网络中,避免造成数据膨胀的缺陷。引入校验码以及定位文件数据信息,当被存证的目标文件被篡改时,可以定位到目标文件中被篡改的数据。An embodiment of the present specification achieves the following beneficial effects: obtaining the target file to be stored; splitting the target file according to a preset splitting method to obtain multiple split file data; according to the preset splitting method Divide the multiple file data into multiple file sets; calculate the check codes corresponding to the multiple file sets, determine the location file data information in each file set, and store the check codes and the location file data information in the in the blockchain network. Through the method of storing the certificate on the block large file chain, it is not necessary to store all the data corresponding to the target file in the blockchain network, so as to avoid the defect of data expansion. Introducing check code and locating file data information, when the target file being certified is tampered with, the tampered data in the target file can be located.
附图说明Description of drawings
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present specification or the prior art, the following briefly introduces the accompanying drawings required in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1为本说明书实施例中一种文件存证、验证方法的应用场景示意图;1 is a schematic diagram of an application scenario of a method for document storage and verification in the embodiment of this specification;
图2为本说明书实施例提供的一种文件存证方法的流程示意图;FIG. 2 is a schematic flowchart of a method for document storage provided in an embodiment of the present specification;
图3为本说明书实施例提供的文件集合排列示意图;3 is a schematic diagram of the arrangement of file sets provided by the embodiment of the present specification;
图4为本说明书实施例提供的一种文件存证方法的流程示意图;FIG. 4 is a schematic flowchart of a method for document storage provided in an embodiment of the present specification;
图5为本说明书实施例提供的基于定位文件数据信息进行初步定位示意图;5 is a schematic diagram of preliminary positioning based on positioning file data information provided by an embodiment of the present specification;
图6为本说明书实施例提供的基于定位文件数据信息进行补充定位示意图;6 is a schematic diagram of supplementary positioning based on positioning file data information provided by an embodiment of the present specification;
图7为本说明书实施例提供的基于校验码进行验证的示意图;7 is a schematic diagram of verification based on a check code provided by an embodiment of the present specification;
图8为本说明书实施例提供的一种文件存证装置的结构示意图;FIG. 8 is a schematic structural diagram of a file certificate storage device according to an embodiment of the present specification;
图9为本说明书实施例提供的一种文件验证装置的结构示意图;9 is a schematic structural diagram of a file verification device according to an embodiment of the present specification;
图10是本说明书实施例提供的一种文件存证、验证设备示意图。FIG. 10 is a schematic diagram of a document storage and verification device provided by an embodiment of the present specification.
具体实施方式Detailed ways
为使本说明书一个或多个实施例的目的、技术方案和优点更加清楚,下面将结合本说明书具体实施例及相应的附图对本说明书一个或多个实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本说明书的一部分实施例,而不是全部的实施例。基于本说明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本说明书一个或多个实施例保护的范围。In order to make the objectives, technical solutions and advantages of one or more embodiments of this specification clearer, the technical solutions of one or more embodiments of this specification will be clearly and completely described below with reference to the specific embodiments of this specification and the corresponding drawings. . Obviously, the described embodiments are only some of the embodiments of the present specification, but not all of the embodiments. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the present specification without creative efforts fall within the protection scope of one or more embodiments of the present specification.
以下结合附图,详细说明本说明书各实施例提供的技术方案。The technical solutions provided by the embodiments of the present specification will be described in detail below with reference to the accompanying drawings.
在本说明书实施例中,涉及到的名词解释如下:区块链(Block chain):“区块链”技术最初是由一位化名“中本聪”的人为比特币(一种数字货币)而设计出的一种特殊的分布式数据库技术,可以理解为是多个区块顺序存储构成的数据链,每个区块的区块头都包含有本区块的时间戳、前一个区块信息的哈希值和本区块信息的哈希值,由此实现区块与区块之间的相互验证,构成不可篡改的区块链。每个区块都可以理解为是一个数据块(存储数据的单元)。区块链作为一种去中心化的数据库,是一串使用密码学方法相互关联产生的数据块,每一个数据块中包含了一次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块与区块首尾相连形成的链,即为区块链。若需要修改块内数据,则需要修改此区块之后所有区块的内容,并将区块链网络中所有节点备份的数据进行修改。因此,区块链具有难以篡改、删除的特点,在数据已保存至区块链后,其作为一种保持内容完整性的方法具有可靠性。In the embodiments of this specification, the terms involved are explained as follows: Blockchain: "Blockchain" technology was originally developed by a pseudonym "Satoshi Nakamoto" for Bitcoin (a digital currency). A special distributed database technology designed can be understood as a data chain composed of sequential storage of multiple blocks. The block header of each block contains the timestamp of this block and the information of the previous block. The hash value and the hash value of the information of this block, thereby realizing the mutual verification between blocks and forming an immutable blockchain. Each block can be understood as a data block (unit of storing data). As a decentralized database, blockchain is a series of data blocks that are correlated with each other using cryptographic methods. Each data block contains the information of a network transaction, which is used to verify the validity of its information (anti-counterfeiting). and generate the next block. A chain formed by connecting blocks end-to-end is called a blockchain. If you need to modify the data in the block, you need to modify the content of all blocks after this block, and modify the data backed up by all nodes in the blockchain network. Therefore, the blockchain has the characteristics of being difficult to tamper and delete. After the data has been saved to the blockchain, it is reliable as a method to maintain the integrity of the content.
Hash算法:是把任意长度的输入(又叫做预映射pre-image)通过散列算法变换成固定长度的输出,该输出就是散列值。这种转换是一种压缩映射,散列值的空间通常远小于输入的空间,不同的输入可能会散列成相同的输出,所以不可能从散列值来确定唯一的输入值。可以为一种将任意长度的消息压缩到某一固定长度的消息摘要的函数。比较常见的是SHA256、SM3算法。Hash algorithm: It transforms an input of arbitrary length (also called pre-image pre-image) into a fixed-length output through a hash algorithm, and the output is the hash value. This transformation is a compressed map, the space of the hash value is usually much smaller than the space of the input, and different inputs may hash to the same output, so it is impossible to determine the unique input value from the hash value. Can be a function that compresses a message of arbitrary length into a message digest of a fixed length. The more common ones are SHA256 and SM3 algorithms.
纠删算法:是一种编码容错技术,最早是在通信行业解决部分数据在传输中损耗的问题,基本原理是把传输的信号分段,加入一定的校验再让各段间发生一定的联系,即使在传输过程中丢失掉部分信号,接收端仍然能通过算法把完整的信息计算出来。例如:纠错码算法(Reed-Solomen,简称RS算法)。Erasure correction algorithm: It is a coding fault-tolerant technology. It was first used in the communication industry to solve the problem of loss of some data during transmission. The basic principle is to segment the transmitted signal, add a certain check, and then make a certain connection between the segments , even if part of the signal is lost during transmission, the receiver can still calculate the complete information through an algorithm. For example: an error correction code algorithm (Reed-Solomen, RS algorithm for short).
由于区块链分布式账本的特性,各个参与方会拥有至少一份账本数据。现有基于区块链的文件存证方案中,一般是将文件数据直接记录在区块链上,直接通过区块链进行存证。采用文件直接上链的模式,会产生数据膨胀问题,因为区块链交易大小的限制,会严格限制文件的大小,存证方便,也会存在安全性隐患,不利于区块链技术的发展。Due to the characteristics of blockchain distributed ledgers, each participant will have at least one piece of ledger data. In the existing blockchain-based document storage solutions, the document data is generally recorded directly on the blockchain, and the certificate is stored directly through the blockchain. Adopting the mode of directly uploading files to the chain will cause the problem of data expansion. Due to the limitation of the transaction size of the blockchain, the size of the file will be strictly limited, which is convenient for storing certificates, and there will also be potential security risks, which is not conducive to the development of blockchain technology.
另一方面,一些文件中包含了敏感信息,不适合直接在链上以原文的形式进行存储。 通常对于大文件或敏感文件,会采用Hash上链的方式:在上链前,通过某一种Hash算法,计算出文件的Hash值(摘要值),然后将Hash值上链存证。例如使用SHA256这种Hash算法,预先计算出需要存证文件的SHA256值,然后将32字节的SHA256值上链;这种方式无论源文件多大,都会被不可逆的转换成32字节摘要。On the other hand, some files contain sensitive information and are not suitable to be stored in the original text directly on the chain. Usually, for large files or sensitive files, the method of Hash on-chain is adopted: before the on-chain, a certain Hash algorithm is used to calculate the Hash value (digest value) of the file, and then the Hash value is uploaded to the chain to store the certificate. For example, using the SHA256 hash algorithm, pre-calculate the SHA256 value of the file that needs to be stored, and then upload the 32-byte SHA256 value to the chain; in this way, no matter how large the source file is, it will be irreversibly converted into a 32-byte digest.
但是,当验证文件的真实性时,则从其它系统中获取到需要验证的文件,并再次计算文件的Hash值,并和链上进行比对比,当Hash值一致时,表示文件没有被篡改过,当Hash值不一致时,则表示文件被篡改过。在大文件Hash上链的方案中,一般需要选择强度足够的Hash算法,例如SHA256、SM3等算法;避免使用存在安全隐患的算法,例如:CRC、MD5等。However, when verifying the authenticity of the file, the file that needs to be verified is obtained from other systems, and the Hash value of the file is calculated again, and compared with the chain. When the Hash value is consistent, it means that the file has not been tampered with. , when the Hash value is inconsistent, it means that the file has been tampered with. In the scheme of Hash uploading large files, it is generally necessary to select hash algorithms with sufficient strength, such as SHA256, SM3 and other algorithms; avoid using algorithms with potential security risks, such as CRC, MD5, etc.
图1为本说明书实施例中一种文件存证、验证方法的应用场景示意图。如图1所示,对于待存证的目标文件X,将目标文件X进行分析,得到目标文件对应的存证数据X1,该存证数据X1可以包括摘要值、校验码以及定位文件数据信息。将存证数据X1存储在区块链网络110中。而目标文件X对应的原始数据X2,可以存放在区块链网络外的外部设备120中,例如:U盘或其他服务器中。FIG. 1 is a schematic diagram of an application scenario of a method for document storage and verification in an embodiment of this specification. As shown in FIG. 1 , for the target file X to be certified, the target file X is analyzed to obtain the certification data X1 corresponding to the target file. The certification data X1 may include a digest value, a check code and positioning file data information. . The depository data X1 is stored in the blockchain network 110 . The original data X2 corresponding to the target file X can be stored in an external device 120 outside the blockchain network, such as a USB flash drive or other servers.
接下来,将针对说明书实施例提供的一种文件存证方法结合附图进行具体说明:Next, a method for document depositing provided by the embodiments of the description will be described in detail with reference to the accompanying drawings:
实施例1Example 1
图2为本说明书实施例提供的一种文件存证方法的流程示意图。从程序角度而言,流程的执行主体可以为搭载于应用服务器的程序或应用客户端。FIG. 2 is a schematic flowchart of a method for document storage provided by an embodiment of the present specification. From a program perspective, the execution body of the process may be a program mounted on an application server or an application client.
如图2所示,该流程可以包括以下步骤:步骤210:获取待存证的目标文件。As shown in FIG. 2 , the flow may include the following steps: Step 210 : Obtain the target file to be stored.
区块链数据存证,可以表示把数据存储在区块链上,达到防篡改、可追溯、数据来源可信任的目的。为了实现快速交易,一般情况下,采用链上链下协同工作,采用文件与哈希值分离的方式,链上只保存文件的哈希值,原文件保存在链下。只要计算出文件的哈希值,与链上的哈希值比对,就知道文件是否被篡改了。在本说明书实施例中,待存证的目标文件中的数据可以是文字、视频、音频图片等任何文件形式。目标文件对应的整个文件数据不需要全部都存储在区块链网络中,只需要存储能够表示该目标文件的相应数据即可。Blockchain data storage can mean that data is stored on the blockchain to achieve the purpose of anti-tampering, traceability, and trustworthy data sources. In order to achieve fast transactions, in general, the on-chain and off-chain collaborative work is adopted, and the file and the hash value are separated. Only the hash value of the file is stored on the chain, and the original file is stored off-chain. As long as the hash value of the file is calculated and compared with the hash value on the chain, it is known whether the file has been tampered with. In the embodiment of this specification, the data in the target file to be certified may be in any file form such as text, video, audio and picture. The entire file data corresponding to the target file does not need to be stored in the blockchain network, only the corresponding data that can represent the target file needs to be stored.
步骤220:按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据。Step 220: Splitting the target file according to a preset splitting manner to obtain multiple split file data.
需要说明的是,拆分在传统意义上的理解可以是:组合成一个整体的不同部分,单独被分开的过程为拆分。该步骤的中拆分可以理解为:将组成目标位文件的数据按照固定的文件数据对应的字节长度,依次分为多个文件数据,例如:目标文件中有100个字节对应的数据,可以按照每个小文件数据1个字节的拆分方式,将目标文件拆分成100个文件数据。当然也可以按照每十个字节为一个文件数据或元素的方式,即:第1至第10个字节对应的数据标记为第一个文件数据或者第一个元素,将第11至第20个字节对应的数据标记为第二个文件数据或者第二个元素,……,以此类推,将目标文件拆分为多个文件数据或者多个元素。拆分得到的文件数据或者元素,用于后续集合的划分。It should be noted that the traditional understanding of splitting can be: the process of combining different parts of a whole and being separated separately is splitting. The mid-splitting in this step can be understood as: dividing the data constituting the target bit file into multiple file data in turn according to the byte length corresponding to the fixed file data, for example: there is data corresponding to 100 bytes in the target file, The target file can be split into 100 pieces of file data according to the splitting method of 1 byte for each small file data. Of course, every ten bytes can be a file data or element, that is, the data corresponding to the 1st to 10th bytes are marked as the first file data or the first element, and the 11th to 20th bytes are marked as the first file data or the first element. The data corresponding to each byte is marked as the second file data or the second element, ..., and so on, split the target file into multiple file data or multiple elements. The file data or elements obtained from the split are used for subsequent set division.
步骤230:按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素,m≥1。Step 230: Divide the multiple file data into multiple file sets according to a preset division method; each of the file sets includes m elements, where m≧1.
预设划分方式可以表示的是预先设置的每个文件集合中包含的文件个数,例如:预设划分方式为3个文件为一个文件集合。The preset division mode may represent the preset number of files included in each file set, for example, the preset division mode is that three files are one file set.
目标文件被拆分为多个文件数据或多个元素之后,再按照预设划分方式将多个文件进行划分至多个文件集合。After the target file is divided into multiple file data or multiple elements, the multiple files are divided into multiple file sets according to a preset division method.
需要说明的是,一个元素可以表示一个或多个文件数据,具体地,元素可以是文件数据,例如:对于目标文件X,被拆分为X1、X2、X3、X4,在划分时,将X1、X2划分至集合1,将X3、X4划分至集合2。以集合1为例,集合1中的元素可以是X1和X2对应的具体数据。It should be noted that an element may represent one or more file data, specifically, an element may be file data. For example, for the target file X, it is divided into X1, X2, X3, and X4. When dividing, X1 , X2 are divided into set 1, and X3 and X4 are divided into set 2. Taking set 1 as an example, the elements in set 1 may be specific data corresponding to X1 and X2.
当然,在设定了每个文件集合中的元素数量固定的情况下,可能会存在最后一个集合中的元素数量不够的情况,在这种情况下,可以采用固定字符填充集合中缺少的元素。因此,元素也可以是固定字符,例如:\×00。Of course, when the number of elements in each file set is set to be fixed, there may be a situation where the number of elements in the last set is not enough. In this case, fixed characters can be used to fill the missing elements in the set. Therefore, elements can also be fixed characters, eg: \×00.
步骤240:计算所述多个文件集合对应的校验码。Step 240: Calculate the check codes corresponding to the multiple file sets.
校验码可以表示通过某种算法(例如:Hash算法或纠删算法)在一定长度的原始数据上计算后得到的一段校验数据。通过校验数据能够判断出新数据与原始数据是否一致,进一步地能从原始数据被少量修改的新数据中推算出原始数据。The check code can represent a piece of check data obtained by calculating on a certain length of original data through a certain algorithm (for example, a Hash algorithm or an erasure algorithm). By verifying the data, it can be determined whether the new data is consistent with the original data, and further, the original data can be deduced from the new data in which the original data has been slightly modified.
因此,可以针对每一个文件集合,采用Hash算法或纠删算法计算每个文件对应的校验码。Therefore, for each file set, a Hash algorithm or an erasure algorithm can be used to calculate the check code corresponding to each file.
步骤250:确定每个所述文件集合中的定位文件数据信息。Step 250: Determine the location file data information in each of the file sets.
可选的,所述确定每个所述文件集合中的定位文件数据信息,具体可以包括:将每组所述文件集合中的特定位置处的元素确定为定位文件数据,得到定位文件数据集合。Optionally, the determining the location file data information in each of the file sets may specifically include: determining an element at a specific position in each group of the file sets as the location file data to obtain a location file data set.
定位文件数据信息可以表示的每个文件集合中特定位置处的数据信息,更为具体地,所述定位文件数据信息可以包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息。定位文件数据也可以称为定位码。其中,定位码可以是文件集合中的一个元素,该元素可以是一个或多个文件数据;目标文件被划分为多个文件集合之后,可以通过定位码大致确定原先被划分的集合在新数据中的大致位置。例如:可以将各个集合中的第一个元素用于定位码,假设每个集合中有9个元素,根据定位码可以大致确定从定位码对应的数据开始,往后连续9个元素可能属于一个文件集合。The locating file data information may represent data information at a specific position in each file set, and more specifically, the locating file data information may include the position information of the locating file data in the file set in the target file and the content information of the positioning file data. The locating file data may also be referred to as a locating code. The location code may be an element in the file set, and the element may be one or more file data; after the target file is divided into multiple file sets, it can be roughly determined by the location code that the previously divided set is in the new data approximate location. For example, the first element in each set can be used for the positioning code. Suppose there are 9 elements in each set. According to the positioning code, it can be roughly determined that starting from the data corresponding to the positioning code, nine consecutive elements may belong to one File collection.
需要说明的是,在实际应用中,定位文件数据可以是目标文件中的原始数据,也可以是对原始数据进行计算得到的值,例如:可以是对用于定位的数据进行计算得到的哈希值或者校验值。这样,在将定位文件数据存储在区块链网络中时,只需要存储计算得到的哈希值或校验值即可。It should be noted that, in practical applications, the positioning file data may be the original data in the target file, or may be a value obtained by calculating the original data, for example, it may be a hash obtained by calculating the data used for positioning value or check value. In this way, when storing the location file data in the blockchain network, it is only necessary to store the calculated hash value or check value.
步骤260:将所述校验码以及所述定位文件数据信息存入区块链网络中。Step 260: Store the check code and the positioning file data information in the blockchain network.
在对目标文件进行存证时,并不是将整个目标文件对应的数据全部存储在区块链网络中,原始数据存储在区块链网络外的设备上,例如:可以存储在移动硬盘中。而区块链网络中需要存证的数据是对该目标文件进行分析计算得到的校验码以及定位文件数据信息。后续通过校验码以及定位文件数据信息也能够定位到该目标文件中被篡改的数据。When the target file is stored, not all the data corresponding to the entire target file is stored in the blockchain network, but the original data is stored on a device outside the blockchain network, for example, it can be stored in a mobile hard disk. The data that needs to be stored in the blockchain network is the check code obtained by analyzing and calculating the target file and the data information of the positioning file. Subsequently, the tampered data in the target file can also be located through the check code and the location file data information.
应当理解,本说明书一个或多个实施例所述的方法其中部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除。It should be understood that the order of some steps in the method described in one or more embodiments of this specification may be interchanged according to actual needs, or some steps may be omitted or deleted.
图2中的方法,通过获取待存证的目标文件;按照预设拆分方式对目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将多个文件数据划分至多个文件集合;计算多个文件集合对应的校验码,确定每个文件集合中的定位文件数据信息,并将所述校验码以及所述定位文件数据信息存入区块链网络中。通过分块大文件链上存证方法,不用将目标文件对应的全部数据存储在区块链网络中,避免造成数据膨胀的缺陷。引入校验码以及定位文件数据信息,当被存证的目标文件被篡改时,可以定位到目标文 件中被篡改的数据。The method in FIG. 2 obtains the target file to be stored; splits the target file according to the preset splitting method to obtain multiple split file data; divides the multiple file data according to the preset splitting method at most a file set; calculate the check codes corresponding to the multiple file sets, determine the location file data information in each file set, and store the check code and the location file data information in the blockchain network. Through the method of storing the certificate on the block large file chain, it is not necessary to store all the data corresponding to the target file in the blockchain network, so as to avoid the defect of data expansion. Introducing check code and locating file data information, when the target file being certified is tampered with, the tampered data in the target file can be located.
基于图2的方法,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。Based on the method of FIG. 2 , some specific implementations of the method are also provided in the examples of this specification, which will be described below.
可选的,该方法还可以包括:采用哈希算法计算所述目标文件的摘要值;将所述摘要值存储在所述区块链网络中。Optionally, the method may further include: using a hash algorithm to calculate a digest value of the target file; and storing the digest value in the blockchain network.
需要说明的是,区块链网络中还可以存储目标文件对应的摘要值,摘要值是采用摘要算法(也称为哈希算法或散列算法)对目标文件进行计算后得到的。摘要算法用于防篡改,例如:假设目标文件中的内容经MD5计算,得到的内摘要值是A1。在验证时,如果获取的文件对应的摘要值为B,不同于原文的摘要,可以确定目标文件被篡改。It should be noted that a digest value corresponding to the target file can also be stored in the blockchain network, and the digest value is obtained by calculating the target file by using a digest algorithm (also called a hash algorithm or a hash algorithm). The digest algorithm is used to prevent tampering. For example, assuming that the content in the target file is calculated by MD5, the internal digest value obtained is A1. During verification, if the digest value corresponding to the obtained file is B, which is different from the digest of the original text, it can be determined that the target file has been tampered with.
由于摘要函数是一个单向函数通过摘要函数f()对任意长度的数据计算出固定长度的摘要值,但是通过摘要值反推出原始数据较为困难。对原始数据稍微进行改变,都会导致计算出的摘要完全不同。因此,如果摘要值改变,则可以确定原始数据被篡改。Since the digest function is a one-way function, the digest function f() is used to calculate a fixed-length digest value for data of any length, but it is difficult to deduce the original data through the digest value. Slight changes to the original data can result in completely different summaries. Therefore, if the digest value is changed, it can be determined that the original data has been tampered with.
上述方法中,计算目标文件对应的摘要值,并把摘要值存储在区块链网络中,在后续验证过程中,可以先计算需要验证的文件对应的摘要值,通过与区块链网络中存储的摘要值进行比对,如果一致,可以确定目标文件没有被篡改,则不需要再进行后续定位文件中被篡改数据的相关步骤。如果摘要值不一致,则可以进一步采用本说明书实施例中的方案定位目标文件中被篡改的数据。In the above method, the digest value corresponding to the target file is calculated, and the digest value is stored in the blockchain network. Compare the digest values of , and if they are consistent, it can be determined that the target file has not been tampered with, and there is no need to perform the subsequent steps of locating the tampered data in the file. If the digest values are inconsistent, the solutions in the embodiments of this specification can be further used to locate the tampered data in the target file.
可选的,所述将所述校验码以及所述定位文件数据信息存入区块链网络中,具体可以包括:按照预设拼接方式,将所述摘要值、所述校验码以及所述定位文件数据信息进行拼接,得到所述目标文件对应的存证数据;将所述存证数据存储在区块链网络中。Optionally, the storing the check code and the positioning file data information in the blockchain network may specifically include: storing the digest value, the check code and all the data according to a preset splicing method. The positioning file data information is spliced to obtain the certificate data corresponding to the target file; and the certificate data is stored in the blockchain network.
需要说明的是,在对目标文件进行存证时,存储在区块链网络中的数据可以称为存证数据,存证数据中可以包括摘要值、校验码以及定位文件数据信息。更为具体地,存证数据可以是将摘要值、校验码以及定位文件数据信息进行拼接得到的。拼接方式可以是顺序拼接,也可以是按照其他预先设定的方式进行拼接。It should be noted that, when the target file is stored for certification, the data stored in the blockchain network can be called certification data, and the certification data can include digest value, check code and positioning file data information. More specifically, the certificate data may be obtained by splicing the digest value, the check code and the location file data information. The splicing method may be sequential splicing, or may be splicing according to other preset methods.
上述方法中,在区块链系统中存储目标文件对应的存证数据,存证数据中的摘要值可以初步确定目标文件是否被篡改,通过定位文件数据信息可以确定目标文件存证时划分得到的文件集合,通过校验码可以确定目标文件中被篡改的数据。In the above method, the certificate data corresponding to the target file is stored in the blockchain system, and the digest value in the certificate data can preliminarily determine whether the target file has been tampered with, and by locating the file data information, it can be determined that the target file is divided into the certificate when it is stored. File collection, the tampered data in the target file can be determined through the check code.
可选的,所述将所述存证数据存入区块链网络中,具体可以包括:基于所述存证数据,生成包含认证信息的存证证书;将包含所述存证证书的存证数据发送到所述区块链网络中进行存储。Optionally, the storing the deposit data in the blockchain network may specifically include: generating a deposit certificate including authentication information based on the deposit data; storing the deposit certificate including the deposit certificate Data is sent to the blockchain network for storage.
需要说明的是,存证证书可以对应有网络地址和/或图片,用以查看存证数据对应的存证证书;根据网络地址对应网页或图片可以查看存证证书,并在区块链网络中对存证证书进行验证,以确保存证数据的真实性。It should be noted that the deposit certificate can correspond to a network address and/or picture, which can be used to view the deposit certificate corresponding to the deposit data; the deposit certificate can be viewed according to the corresponding webpage or picture of the network address, and it can be displayed in the blockchain network. Validate the deposit certificate to ensure the authenticity of the deposit data.
可选的,所述按照预设划分方式将所述多个文件数据划分至多个文件集合,具体可以包括:按照文件集合的预设文件个数,将所述多个文件数据划分至N个文件集合;每个所述文件集合中的元素之间具有顺序。Optionally, dividing the multiple file data into multiple file sets according to a preset division method may specifically include: dividing the multiple file data into N files according to the preset number of files in the file set. Collections; elements in each of said document collections have an order between them.
在将目标文件拆分为多个文件数据之后,可以将多个文件数据划分至多个文件集合中,在划分时,可以按照每个文件集合的预设文件个数(例如:每个文件集合中的文件个数为5个),将多个文件数据划分至多个文件集合中。After the target file is divided into multiple file data, the multiple file data can be divided into multiple file sets. When dividing, the preset number of files in each file set (for example: in each file set The number of files is 5), and the data of multiple files is divided into multiple file sets.
每个集合中的元素之间可以具有顺序,在对目标文件进行拆分时,可以理解为确定目标文件中每个字节对应的数据,拆分不影响原始数据在目标文件中的数据,也不会将 拆分后的数据进行存储,拆分的步骤仅用于后续划分至集合的步骤,因此,可以将多个文件数据依次划分至多个文件集合中,例如:将目标文件进行拆分后得到的文件数据为A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,在划分时,假设每个文件集合中的元素个数为5个,则按照顺序划分后得到的集合为:集合1={A1,A2,A3,A4,A5}、集合2={A6,A7,A8,A9,A10}、集合3={A11,A12,A13,A14,\×00}。The elements in each set can have an order. When the target file is split, it can be understood as determining the data corresponding to each byte in the target file. The split does not affect the original data in the target file. The split data will not be stored, and the splitting step is only used for the subsequent steps of dividing into sets. Therefore, multiple file data can be divided into multiple file sets in turn, for example: after splitting the target file The obtained file data is A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14. When dividing, it is assumed that the number of elements in each file set is 5 , then the sets obtained by dividing in order are: set 1 = {A1, A2, A3, A4, A5}, set 2 = {A6, A7, A8, A9, A10}, set 3 = {A11, A12, A13 , A14, \×00}.
上述方法,按照顺序将文件数据划分至多个文件集合中,有利于后续在验证目标文件的被篡改情况时,更快速定位出被篡改的数据所在位置。In the above method, the file data is divided into multiple file sets in sequence, which is helpful for locating the location of the tampered data more quickly when verifying the tampered condition of the target file later.
可选的,所述计算所述多个文件集合对应的校验码之前,还可以包括:按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;所述计算所述多个文件集合对应的校验码,具体可以包括:采用纠删算法,对多个所述文件集合对应的矩阵的每一行生成对应的校验码;对多个所述文件集合对应的矩阵的每一列文件集合生成对应的校验码。Optionally, before calculating the check codes corresponding to the multiple file sets, the method may further include: arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; The check codes corresponding to the multiple file sets may specifically include: using an erasure correction algorithm to generate a corresponding check code for each row of the matrices corresponding to the multiple file sets; The corresponding check code is generated for each column of file collections.
在实际应用中,在将目标文件划分为多个文件集合之后,可以将多个文件集合按照矩阵的形式进行排列。In practical applications, after the target file is divided into multiple file sets, the multiple file sets can be arranged in the form of a matrix.
可以结合图3进行说明:图3为本说明书实施例提供的文件集合排列示意图。如图3所示,可以依次对多个文件集合进行标号,例如:A 11、A 12、……、A 1n、……、A 21、A 31、……、A n1等等,并将其拼接成一个N*N的矩阵,N的值可以根据具体情况进行调整。如果还有剩余的文件集合,则可以继续进行标号并拼接成下一个N*N的矩阵。当文件集合不足以填充最后一个N*N矩阵时,使用由固定字符,例如:\x00组成的文件集合进行填充,如图3中的阴影部分所示,A 33中的部分字符至A nn均是使用由固定字符组成的文件集合进行填充得到的。其中,固定字符也可以选择其他固定字符。 Description can be made with reference to FIG. 3 : FIG. 3 is a schematic diagram of arrangement of file sets provided in an embodiment of the present specification. As shown in FIG. 3 , multiple file sets can be labeled in sequence, for example: A 11 , A 12 , ..., A 1n , ..., A 21 , A 31 , ... , A n1 , etc. Spliced into an N*N matrix, the value of N can be adjusted according to the specific situation. If there are remaining file sets, you can continue to label and concatenate into the next N*N matrix. When the file set is not enough to fill the last N*N matrix, use the file set composed of fixed characters, such as: \x00 to fill, as shown in the shaded part in Figure 3, some characters in A 33 to A nn are all It is filled with a set of files consisting of fixed characters. Among them, other fixed characters can also be selected for the fixed characters.
在进行校验码计算时,可以选择一种校验码,分别计算图3中矩阵的每一行的校验码以及每一列的校验码,例如:图3中的矩阵第一行对应的校验码为RS 1,第二行对应的校验码为RS 2,第三行对应的校验码为RS 3,……,第N行对应的校验码为RS n;第一列对应的校验码为RS n+1,第二列对应的校验码为RS n+2,第三列对应的校验码为RS n+3,……,第N列对应的校验码为RS n+nWhen calculating the check code, a check code can be selected, and the check code of each row and the check code of each column of the matrix in FIG. 3 can be calculated respectively. For example, the check code corresponding to the first row of the matrix in FIG. 3 The check code is RS 1 , the check code corresponding to the second row is RS 2 , the check code corresponding to the third row is RS 3 , ..., the check code corresponding to the Nth row is RS n ; the corresponding check code of the first column is RS n . The check code is RS n+1 , the check code corresponding to the second column is RS n+2 , the check code corresponding to the third column is RS n+3 , ..., the check code corresponding to the Nth column is RS n+n .
需要说明的是,在计算校验码时,使用的算法不局限于使用纠删算法,还可以使用其他类型的Hash算法。It should be noted that, when calculating the check code, the algorithm used is not limited to using the erasure algorithm, and other types of Hash algorithms can also be used.
在使用纠删算法时,例如使用RS算法时,取N(Data Shards)=50,Parity Shards=2,则会产生原始文件(2/50)*2=8%的信息用来纠删及校验;并能保证矩阵中任意4个文件集合被篡改后仍能恢复出原始存证数据(每行或每列至多2个文件集合)。When using the erasure algorithm, for example, when using the RS algorithm, take N(Data Shards)=50, Parity Shards=2, the original file (2/50)*2=8% of the information will be generated for erasure and correction. It can also ensure that any four file sets in the matrix can be tampered with and can still restore the original evidence data (up to two file sets per row or column).
通过上述方法,在对文件进行存证时,对文件进行分块校验存证的方式,以及引入校验码和定位码,在文件验证时进行错误纠正。首先对文件进行分块,然后将各个分块依次排列成n*n的二维矩阵;针对矩阵的每一行和每一列分别计算校验码,将每个矩阵的第一个元素作为定位码;在区块链上通过写入文件摘要校验码和定位文件数据信息进行存证,使用可控的存储成本就能判断出大文件中哪些文件集合被修改过,哪些是原始内容,并且仍然能对正确或未丢失的部分提供未篡改证明。Through the above method, when the document is stored for verification, the file is verified and stored in blocks, and a check code and a positioning code are introduced to correct errors during file verification. First divide the file into blocks, and then arrange each block into an n*n two-dimensional matrix in turn; calculate the check code for each row and each column of the matrix, and use the first element of each matrix as the positioning code; By writing the file digest check code and locating the file data information for storage on the blockchain, using a controllable storage cost, it is possible to determine which file sets in the large file have been modified and which are the original content, and can still be used. Provide proof of non-tampering of correct or missing parts.
实施例2Example 2
图4为本说明书实施例提供的一种文件存证方法的流程示意图。FIG. 4 is a schematic flowchart of a method for document storage provided by an embodiment of the present specification.
在验证阶段能检测大文件中哪些分片被修改过,对于未修改过的分片仍然能提供未修改证明;进一步如果修改的分片量很少,则能在验证阶段指出这些分片是被如何修改 的。In the verification phase, it can detect which shards in the large file have been modified, and can still provide unmodified proofs for unmodified shards; further, if the number of modified shards is small, it can be pointed out in the verification phase that these shards have been modified. how to modify.
如图4所示,该流程可以包括以下步骤:步骤410:从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码。As shown in FIG. 4 , the process may include the following steps: Step 410 : Obtain the location file data information and check code corresponding to the file to be verified from the blockchain network.
其中,校验码是可以通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是可以对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息可以包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息。Wherein, the check code can be obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets can be divided according to the preset splitting method to obtain A plurality of file data is obtained by dividing the plurality of file data according to a preset division method; the positioning file data information may include the position information of the positioning file data in the file set in the target file and Content information of the location file data.
具体地,该步骤中涉及的相关内容可以参见实施例1中的解释,此处不再赘述。Specifically, for the relevant content involved in this step, reference may be made to the explanation in Embodiment 1, and details are not repeated here.
需要说明的是,验证阶段中,根据待验证文件的文件标识,从区块链网络中获取存储的待验证文件对应的存证数据。其中,存证数据可以包括待验证文件的摘要值、校验码以及定位文件数据信息。摘要值可以是对预先存证的目标文件进行计算得到的。It should be noted that, in the verification stage, according to the file identifier of the document to be verified, the stored certificate data corresponding to the document to be verified is obtained from the blockchain network. Wherein, the certificate data may include the digest value of the to-be-verified file, the check code, and the location file data information. The digest value may be calculated on pre-certified target files.
步骤420:基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合。Step 420: Determine multiple file sets corresponding to the to-be-verified file based on the location file data information.
在验证阶段,通过区块链网络中存储的定位文件信息,可以按照存证阶段使用的预设划分方式,确定待验证文件对应的多个文件集合。In the verification stage, through the location file information stored in the blockchain network, the multiple file sets corresponding to the files to be verified can be determined according to the preset division method used in the certification stage.
更为具体地,定位文件数据信息中的位置信息以及定位文件数据的内容信息,可以大致定位出待验证文件在存证阶段对应的文件集合的元素,例如:从获取的定位文件数据信息中确定在存证阶段,将每个文件集合中的第一个元素作为定位码,那么,可以在待验证文件中先找到第一个元素,作为第一个集合的第一个元素,然后按照每个集合中的固定元素数量,例如:9个,在待验证文件对应的数据中,从第一个元素开始依次找满9个元素,作为第一个文件集合,依次类推,找出待识别文件可能对应的文件集合。最终确定出待验证文件对应的多个文件集合。More specifically, the location information in the locating file data information and the content information of the locating file data can roughly locate the elements of the file set corresponding to the file to be verified in the certification stage, for example: determine from the obtained locating file data information. In the certification stage, the first element in each file set is used as the positioning code, then the first element can be found in the file to be verified as the first element of the first set, and then according to each The number of fixed elements in the set, for example: 9, in the data corresponding to the file to be verified, find 9 elements in sequence starting from the first element, as the first set of files, and so on, to find out the possibility of the file to be identified The corresponding set of files. Finally, multiple file sets corresponding to the files to be verified are determined.
步骤430:计算所述多个文件集合对应的校验码。Step 430: Calculate the check codes corresponding to the multiple file sets.
此处计算待验证文件对应的多个文件集合对应的校验码时,可以先将多个文件集合按照矩阵形式进行排列,按照计算矩阵中每一行文件集合对应的校验码以及计算每一列文件集合对应的校验码。When calculating the check codes corresponding to multiple file sets corresponding to the files to be verified, you can first arrange the multiple file sets in a matrix form, and calculate the check codes corresponding to each row of file sets in the matrix and calculate each column of files. Set the corresponding check code.
步骤440:将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果。Step 440: Compare the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result.
步骤450:基于所述比对结果,对所述待验证文件进行验证。Step 450: Verify the to-be-verified file based on the comparison result.
通过比对校验码的方式,可以确定矩阵中的某一行和/或某一列中的文件集合存在被篡改的情况。因此,可以将计算得到的校验码与从区块链网络中获取的校验码进行比对,从而定位出被篡改的文件集合。By comparing the check codes, it can be determined that the file set in a certain row and/or a certain column in the matrix has been tampered with. Therefore, the calculated check code can be compared with the check code obtained from the blockchain network, so as to locate the tampered file set.
图4中的方法,通过从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;计算所述多个文件集合对应的校验码;将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;基于所述比对结果,对所述待验证文件进行验证。可以通过从区块链网络中存证的存证数据确定待验证文件中具体被篡改的文件集合,从而定位出被篡改的数据。The method in FIG. 4 obtains the location file data information and check code corresponding to the file to be verified from the blockchain network; based on the location file data information, multiple file sets corresponding to the to-be-verified file are determined; calculate Check codes corresponding to the multiple file sets; compare the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result; Based on the comparison result, the to-be-verified file is verified. The tampered data can be located by determining the specific set of tampered files in the files to be verified from the certificate data stored in the blockchain network.
基于图4的方法,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。Based on the method of FIG. 4 , some specific implementations of the method are also provided in the examples of this specification, which will be described below.
可选的,所述从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码, 具体可以包括:获取所述待验证文件对应的标识信息;基于所述标识信息从所述区块链网络中获取所述待验证文件对应的定位文件数据信息以及校验码。Optionally, the obtaining the location file data information and the check code corresponding to the file to be verified from the blockchain network may specifically include: obtaining identification information corresponding to the file to be verified; The positioning file data information and the check code corresponding to the to-be-verified file are obtained in the blockchain network.
需要说明的是,标识信息可以表示的是能够唯一标识待验证文件的信息,基于该标识信息可以从区块链网络中获取该待验证文件对应的存证数据,进一步地,为了保证存证数据的安全性,在基于标识信息获取待验证文件对应的存证数据之前,可以要求请求获取待验证文件对应的存证数据的请求方提供授权声明,该授权声明可以是由待验证文件的持有方所颁布的,具体地址,授权声明中可以携带有持有方的数字签名,区块链网络在接收到存证数据获取请求时,可以审核存证数据请求中携带的授权声明,审核通过后,就可以基于该请求中的标识信息,为请求方提供待验证文件对应的存证数据。It should be noted that the identification information can represent information that can uniquely identify the file to be verified. Based on the identification information, the storage data corresponding to the document to be verified can be obtained from the blockchain network. Further, in order to ensure the storage data Before obtaining the certificate data corresponding to the document to be verified based on the identification information, the requester requesting to obtain the certificate data corresponding to the document to be verified can be required to provide an authorization statement, and the authorization statement can be issued by the holder of the document to be verified. The specific address and the authorization statement issued by the party can carry the digital signature of the holder. When the blockchain network receives the request for obtaining the data of the deposit, it can review the authorization statement carried in the request for the data of the deposit. , the certificate data corresponding to the file to be verified can be provided to the requester based on the identification information in the request.
可选的,所述基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合之前,还可以包括:使用存证时使用的Hash算法,计算所述待验证文件的摘要值;将计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值进行比对;当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值一致时,确定所述待验证文件未被篡改;当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值不一致时,基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合。Optionally, before determining the multiple file sets corresponding to the to-be-verified file based on the locating file data information, the method may further include: calculating the digest value of the to-be-verified file by using the Hash algorithm used when storing the certificate. ; Compare the digest value of the described document to be verified that is calculated with the digest value in the data on record; When the digest value of the document to be verified calculated is consistent with the digest value in the data on record when the file to be verified has not been tampered with; when the calculated digest value of the file to be verified is inconsistent with the digest value in the certificate data, determine the file to be verified based on the location file data information A collection of multiple files corresponding to a file.
在实际应用中,在验证文件是否被篡改以及具体被篡改的数据时,可以首先判断文件是否被篡改,判断的方式可以是通过比对摘要值的方式,计算待验证文件的摘要值,与从区块链网络中获取到的目标文件的摘要值进行比对,如果一致,可以确定待验证文件就是存证时对应的目标文件,待验证文件未被篡改,则不需要再进行后续的定位篡改数据的步骤。相反,如果摘要值不一致,则可以确定待验证文件是被篡改过的文件,可以进一步定位出待验证文件中被篡改过的具体数据。In practical applications, when verifying whether the file has been tampered with and the specific data that has been tampered with, it is possible to first determine whether the file has been tampered with. The digest values of the target files obtained in the blockchain network are compared, and if they are consistent, it can be determined that the file to be verified is the corresponding target file when the certificate is stored. data steps. On the contrary, if the digest values are inconsistent, it can be determined that the file to be verified is a tampered file, and the tampered specific data in the file to be verified can be further located.
可选的,所述基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合,具体可以包括:基于所述位置信息,确定第一定位文件数据集合;基于动态规划算法,从所述第一定位文件集合确定覆盖率满足预设条件的第二定位文件数据集合;将所述第二定位文件数据集合与预先存储的定位文件数据信息进行比对,确定不一致的定位文件数据信息;根据所述不一致的定位文件数据信息,确定第三定位文件数据集合,以使所述第三定位文件数据集合与预先存储的定位文件数据的匹配率最大;基于所述第三定位文件数据集合,确定所述待验证文件对应的多个文件集合。Optionally, the determining, based on the location file data information, multiple file sets corresponding to the to-be-verified file may specifically include: determining a first location file data set based on the location information; based on a dynamic programming algorithm, Determine from the first positioning file set a second positioning file data set whose coverage satisfies a preset condition; compare the second positioning file data set with pre-stored positioning file data information to determine inconsistent positioning file data information; according to the inconsistent positioning file data information, determine a third positioning file data set, so that the matching rate between the third positioning file data set and the pre-stored positioning file data is maximized; based on the third positioning file data set, and multiple file sets corresponding to the to-be-verified file are determined.
需要说明的是,第一定位文件数据集合可以表示按照固定间隔确定的所有可能的定位码集合,第二定位文件数据集合是基于动态规划算法,确定的矩阵覆盖率满足预设条件的定位码集合;第三定位文件数据集合是对第二定位文件数据集合进行补充定位之后,得到的最终的定位码集合。It should be noted that the first positioning file data set may represent all possible positioning code sets determined according to fixed intervals, and the second positioning file data set is based on the dynamic programming algorithm, and the determined matrix coverage meets the preset conditions. The set of positioning codes. ; The third positioning file data set is the final positioning code set obtained after performing supplementary positioning on the second positioning file data set.
在基于定位文件数据信息确定多个文件集合时,目标是从存证数据中获取定位码,并尽可能地让定位码在被验证文件中确定的矩阵与原始文件中的矩阵保持一致。对应的阶段可以分为初步定位和补充定位:初步定位可以是根据存证时文件集合之间的固定间隔,在文件中寻找与之匹配的序列,使被验证文件中被定位码确定的矩阵覆盖的内容尽可能地多。需要说明的是,在实际应用中,当有多种排列使覆盖率相同时,则可以进一步比较这些排列确定的矩阵的整体匹配率,选择匹配率最高的一种定位码排列。When determining multiple file sets based on the positioning file data information, the goal is to obtain the positioning code from the evidence data, and make the matrix determined by the positioning code in the verified file consistent with the matrix in the original file as much as possible. The corresponding stage can be divided into preliminary positioning and supplementary positioning: the preliminary positioning can be based on the fixed interval between the file sets when the certificate is stored. content as much as possible. It should be noted that, in practical applications, when there are multiple arrangements with the same coverage ratio, the overall matching ratio of the matrices determined by these arrangements can be further compared, and a positioning code arrangement with the highest matching ratio can be selected.
具体地,各个矩阵的匹配率=矩阵中未被修改的文件集合的数量/矩阵文件集合总数量);结合图5进行说明:图5为本说明书实施例提供的基于定位文件数据信息进行初步定位示意图。Specifically, the matching rate of each matrix=the number of unmodified file sets in the matrix/the total number of matrix file sets); description is made in conjunction with FIG. 5: FIG. 5 provides preliminary positioning based on the positioning file data information provided by the embodiment of this specification Schematic.
如图5所示,在被验证文件中,A11与B11、E11与F11与G11、H11与I11之间的间隔均为存证时的固定间隔;B11与E11之间存在未被关联的定位码C11、D11并且 存在未被定位码确定的矩阵覆盖的数据;同时定位码H11位于定位码G11确定的矩阵内部。As shown in Figure 5, in the verified file, the intervals between A11 and B11, E11 and F11 and G11, and H11 and I11 are all fixed intervals when the certificate is stored; there is an unrelated location code between B11 and E11 C11, D11 and there are data not covered by the matrix determined by the positioning code; at the same time, the positioning code H11 is located inside the matrix determined by the positioning code G11.
补充定位可以是初步定位的补充,当被验证文件中两个确定的定位码之间存在定位码序列中未被确定的定位码时,补充定位负责将这些定位码插入到被验证文件的这两个确定的定位码之间,使各个矩阵的匹配率尽可能地高。结合图6进行说明:图6为本说明书实施例提供的基于定位文件数据信息进行补充定位示意图。Supplementary positioning can be a supplement to the preliminary positioning. When there is an undetermined positioning code in the positioning code sequence between the two determined positioning codes in the verified file, the supplementary positioning is responsible for inserting these positioning codes into these two positions of the verified file. Between certain positioning codes, the matching rate of each matrix is made as high as possible. Description will be made with reference to FIG. 6 : FIG. 6 is a schematic diagram of supplementary positioning based on positioning file data information provided by an embodiment of the present specification.
如图6所示,在B11和E11定位码中间,通过补充定位插入了C11、D11两个定位码确定的矩阵。例如:记各个定位码为a 1,a 2,…,a n,存证时定位码确定的矩阵大小为L字节,被验证文件的字节流为d 1,d 2,…,d m。逐字节遍历文件,若满足[d i,d i+1,d i+2,d i+3]==a j(假设文件集合4个字节),且[d i+L,d i+L+1,d i+L+2,d i+L+3]==a j+1、[d i+n*L,d i+n*L+1,d i+n*L+2,d i+n*L+3]!=a j+n;则a j,a j+1,…,a j+n-1是在[d i,d i+1,…,d i+(n-1)*L+3]上的有效定位区间。记被验证文件上的所有有效区间按终止定位码的字节流位置(d i+(n-1)*L+3)升序排列为S 1,S 2,…,S n,各个区间的长度为X 1,X 2,…,X n。长度=定位码个数-1,以及各个区间的起始定位码序号a 1,a 2,…,a n和在被验证文件中的终止位置d 1,d 2,…,dn;则问题被转换成了从S中选择区间序列[S R1,S R2,…,S Rm],满足a i+X i≤a i+1,并使X R1+X R2+…+X Rm最大。这个问题忽略a i+X i≤a i+1时,可以用以下动态规划求解。 As shown in Figure 6, between the B11 and E11 positioning codes, the matrix determined by the two positioning codes C11 and D11 is inserted through complementary positioning. For example: record each positioning code as a 1 ,a 2 ,…,an , the size of the matrix determined by the positioning code when the certificate is stored is L bytes, and the byte stream of the verified file is d 1 ,d 2 ,…,d m . Traverse the file byte by byte, if [d i , d i+1 , d i+2 , d i+3 ]==a j (assuming the file set is 4 bytes), and [d i+L ,d i +L+1 ,d i+L+2 ,d i+L+3 ]==a j+1 , [d i+n*L ,d i+n*L+1 ,d i+n*L+ 2 ,d i+n*L+3 ]! =a j+n ; then a j ,a j+1 ,...,a j+n-1 is on [d i ,d i+1 ,...,d i+(n-1)*L+3 ] Valid positioning interval. Note that all valid intervals on the verified file are arranged as S 1 , S 2 ,...,S n in ascending order according to the byte stream position of the termination positioning code (d i+(n-1)*L+3 ), and the length of each interval is X 1 , X 2 ,...,X n . Length = number of positioning codes - 1, as well as the initial positioning code serial numbers a 1 , a 2 , ..., a n of each interval and the end positions d 1 , d 2 , ..., dn in the verified file; then the problem is Converted to selecting interval sequence [S R1 ,S R2 ,...,S Rm ] from S, satisfying a i +X i ≤a i+1 , and maximizing X R1 +X R2 +...+X Rm . This problem can be solved by the following dynamic programming when a i +X i ≤a i+1 is ignored.
具体地,可以检测被验证文件中数据顺序调整的情况:记dp[i]为在被验证文件中,到第i个字节的最大覆盖长度,Sk为所有以i为终止位置的有效区间集合,则状态转移方程为:增加a i+X i≤a i+1条件时,可以用以下二维动态规划求解。 Specifically, it is possible to detect the adjustment of the data sequence in the verified file: denote dp[i] as the maximum coverage length to the i-th byte in the verified file, and Sk as the set of all valid intervals with i as the termination position , then the state transition equation is: when a i +X i ≤a i+1 is added, the following two-dimensional dynamic programming can be used to solve it.
Figure PCTCN2022086300-appb-000001
Figure PCTCN2022086300-appb-000001
进而可以求出S中的一个区间序列[S R1,S R2,…,S Rm],满足a i+X i≤a i+1,并使X R1+X R2+…+X Rm最大;也就是根据存证时定位码的固定间隔,定位码在原始文件中的一种排布,能使被验证文件中被定位码确定的矩阵覆盖的内容尽可能地多。 Furthermore, an interval sequence [S R1 ,S R2 ,…,S Rm ] in S can be obtained, which satisfies a i +X i ≤a i+1 , and maximizes X R1 +X R2 +…+X Rm ; It is an arrangement of the positioning code in the original file according to the fixed interval of the positioning code when the certificate is stored, which can make the content covered by the matrix determined by the positioning code in the verified file as much as possible.
记区间中未被确定的定位码为a 1,a 2,…,a n,存证时定位码确定的矩阵大小为L字节。第一步在区间中寻找首个定位码a 1(如果未找到则寻找a 2,均未找到时跳过补充定位);第二步在a 1+L的左、右寻找a2(如果未找到则寻找a3,左侧最左到a1的位置,右侧到区间结束);第三步重复步骤二,直到区间结束或者没有下一个未被确定的定位码为止。以上三步中确定的定位码位置即为补充确定的定位码。 The undetermined positioning codes in the recording interval are a 1 , a 2 ,...,an , and the size of the matrix determined by the positioning codes when the certificate is stored is L bytes. The first step is to look for the first positioning code a 1 in the interval (if not found, then look for a 2 , if not found, skip the supplementary positioning); the second step is to find a2 on the left and right of a 1 +L (if not found) Then look for a3, the leftmost position on the left to a1, and the right side to the end of the interval); the third step repeats step 2 until the interval ends or there is no next undetermined positioning code. The location of the positioning code determined in the above three steps is the supplementary determined positioning code.
可选的,所述计算所述多个文件集合对应的校验码,具体可以包括:按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;采用纠删算法,对所述矩阵的每一行对应的多个所述文件集合生成对应的行校验码;对所述矩阵的每一列对应的多个所述文件集合生成对应的列校验码。Optionally, the calculating the check codes corresponding to the multiple file sets may specifically include: arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; using an erasure correction algorithm, A corresponding row check code is generated for the multiple file sets corresponding to each row of the matrix; and a corresponding column check code is generated for the multiple file sets corresponding to each column of the matrix.
在计算多个文件集合对应的校验码时,可以先确定多个文件集合对应的矩阵,然后再针对矩阵,计算矩阵中每一行每一列对应的校验码:具体地,从各个定位码开始(如A11、B11、C11),按存证时的每个集合的文件数据划分方式(例如4Byte)对定位码后续的数据进行划分,直到下一个定位码为止。如果到下一个定位码为止,数据无法填充满一个文件集合,则使用存证时的固定字节如\x00填充最后一个文件集合中的剩余部分。When calculating the check codes corresponding to multiple file sets, you can first determine the matrices corresponding to the multiple file sets, and then calculate the check codes corresponding to each row and each column in the matrix for the matrix: Specifically, starting from each positioning code (eg A11, B11, C11), divide the data following the positioning code according to the file data division method of each set (for example, 4 Bytes) when the certificate is stored, until the next positioning code. If the data cannot fill a file set until the next positioning code, the remaining part of the last file set will be filled with fixed bytes such as \x00 when the certificate is stored.
通过各个定位码及其对应的文件集合,还原出存证使用的N*N矩阵。如果到下一个定位码为止,数据文件集合的数量超出了存证矩阵N*N的文件集合数量,则多出的 文件集合被当作插入的数据标记;如果文件集合数量不足,则用存证时确定的固定字节如\x00填充后续文件集合,凑齐N*N的矩阵,并将缺少的文件集合当作缺少的数据标记。Through each positioning code and its corresponding file set, restore the N*N matrix used for depositing the certificate. If by the next positioning code, the number of data file sets exceeds the number of file sets in the evidence matrix N*N, the excess file sets will be regarded as inserted data marks; if the number of file sets is insufficient, the evidence The fixed bytes determined at the time such as \x00 fill the subsequent file set, make up the N*N matrix, and regard the missing file set as the missing data mark.
然后在计算待验证文件对应的多个文件集合的校验码时,使用存证时的校验码或者Hash函数,分别计算出每个矩阵的各行各列的校验码,并与存证时保存的校验码进行对比,定位出被修改的文件集合。可以结合图6进行说明:图7为本说明书实施例提供的基于校验码进行验证的示意图。Then, when calculating the check codes of the multiple file sets corresponding to the files to be verified, the check codes or Hash functions are used to calculate the check codes of each row and column of each matrix, respectively, and the check codes of each row and column are calculated with the check codes when the certificates are stored. Compare the saved checksum to locate the modified file set. Description can be made with reference to FIG. 6 : FIG. 7 is a schematic diagram of verification based on a check code according to an embodiment of the present specification.
如图7所示,例如:在计算矩阵的校验码,与区块链网络中获取的校验码进行比对时,发现RS 1、RS 3、RS n+1、RS n+n与存证时的校验码的值不一致,则可以表示矩阵中的文件集合A 11、A 1n、A 31、A 3n的内容与存证时不一致。 As shown in Figure 7, for example, when the check code of the matrix is calculated and compared with the check code obtained in the blockchain network, it is found that RS 1 , RS 3 , RS n+1 , RS n+n and the storage If the value of the check code in the certificate is inconsistent, it can indicate that the contents of the file sets A 11 , A 1n , A 31 , and A 3n in the matrix are inconsistent with the content of the certificate.
进一步地,将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果,具体可以包括:对于所述矩阵中的第i行的多个所述文件集合,从所述区块链网络中获取所述第i行的多个所述文件集合对应的行校验码;将计算得到的所述第i行的多个所述文件集合对应的行校验码与从所述区块链获取的所述第i行的多个所述文件集合对应的行校验码进行比对,得到行校验码比对结果;对于所述矩阵中的第j列的多个所述文件集合,从所述区块链网络中获取所述第j列的多个所述文件集合对应的列校验码;将计算得到的所述第j列的多个所述文件集合对应的列校验码与从所述区块链获取的所述第j列的多个所述文件集合对应的列校验码进行比对,得到列校验码比对结果。Further, comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result, which may specifically include: of the file sets in the i-th line, obtain the line check codes corresponding to the i-th line of the file sets from the blockchain network; Comparing the line check codes corresponding to the plurality of the file sets with the line check codes corresponding to the plurality of the file sets of the i-th row obtained from the blockchain to obtain a line check code comparison Result: for the plurality of file sets in the jth column of the matrix, obtain the column check codes corresponding to the jth column of the file sets from the blockchain network; calculate the result The column check codes corresponding to the plurality of the document sets in the jth column are compared with the column check codes corresponding to the plurality of the document sets in the jth column obtained from the blockchain, Get the column check code comparison result.
需要说明的是,在对矩阵的每一行的校验码进行比对,以及对矩阵的每一列校验码进行比对之后,可以确定出具体被篡改的数据,例如:通过比对行校验码,确定出矩阵中的第3行校验码不一致,通过列校验码比对,确定出第2列的校验码不一致,则可以确定矩阵中第3行第2列处的数据被篡改。It should be noted that after the check codes of each row of the matrix are compared and the check codes of each column of the matrix are compared, the specific tampered data can be determined, for example, by comparing the row check codes It is determined that the check codes in the third row in the matrix are inconsistent. By comparing the column check codes, it is determined that the check codes in the second column are inconsistent, and it can be determined that the data at the third row and the second column in the matrix has been tampered with. .
可选的,如果在计算校验码时使用的算法为纠删算法,则可以通过校验码对部分修改数据进行恢复。例如使用RS算法时,取N(Data Shards)=50,Parity Shards=2,那么,当某一行或某一列中的被修改文件集合数小于等于2个时;根据该行或列的其它值及校验码,通过RS算法就可以恢复出该行或列中被修改的文件集合。Optionally, if the algorithm used in the calculation of the check code is an erasure algorithm, part of the modified data can be recovered through the check code. For example, when using the RS algorithm, take N(Data Shards)=50 and Parity Shards=2, then, when the number of modified file sets in a row or column is less than or equal to 2; according to other values of the row or column and Check code, the set of files modified in the row or column can be recovered through the RS algorithm.
上述两个实施例,可以实现的技术效果如下:1)在大文件Hash上链存证的方案中,验证者无法得知文件中哪一部分被篡改过,只能获得文件整体是否和原先一致的最终结果。本说明书实施例的技术方案,能对单个大文件的各个文件集合进行独立验证,使用可控的存储成本来判断出大文件中哪些文件集合被修改过,哪些是原始内容。这样在验证时,能够直接发现被修改的文件集合内容,对于其他内容仍然能证明其没有经过篡改。The technical effects that can be achieved by the above two embodiments are as follows: 1) In the scheme of storing the certificate on the chain of the large file Hash, the verifier cannot know which part of the file has been tampered with, and can only obtain whether the entire file is consistent with the original one. Final result. The technical solutions of the embodiments of this specification can independently verify each file set of a single large file, and use a controllable storage cost to determine which file sets in the large file have been modified and which are original content. In this way, during verification, the contents of the modified file collection can be directly found, and other contents can still be proved that they have not been tampered with.
2)由于Hash上链存证不会存储原文件,源文件被托管存储在外部系统中。外部系统有极小概率可能会导致文件部分损坏,例如使用磁盘存储时,磁极出现反转后导致数据错误,或出现坏道后导致数据丢失等。本说明书实施例中的方案,仍然能对正确或未丢失的部分提供未篡改证明;进一步的,当篡改的数据或丢失的数据占整理数据很小的一部分时,能够通过存证信息,对丢失或篡改的数据进行还原。2) Since the Hash on-chain deposit certificate does not store the original file, the source file is hosted and stored in an external system. There is a very small probability that the external system may cause partial damage to the file. For example, when using disk storage, the magnetic poles are reversed, causing data errors, or bad sectors causing data loss. The solutions in the embodiments of this specification can still provide proof of non-tampering for the correct or non-lost part; further, when the tampered data or the lost data accounts for a small part of the sorted data, the lost data can be verified through the certificate information. or tampered data to restore.
基于同样的思路,本说明书实施例还提供了上述方法对应的装置。图8为本说明书实施例提供的一种文件存证装置的结构示意图。如图8所示,该装置可以包括:目标文件获取模块810,用于获取待存证的目标文件;文件数据确定模块820,用于按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;文件集合划分模块830,用于按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;校验码计算模块840,用于计算所述多个文件集合对应的校验码;定位 文件数据信息确定模块850,用于确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;数据存储模块860,用于将所述校验码以及所述定位文件数据信息存入区块链网络中。Based on the same idea, the embodiments of the present specification also provide a device corresponding to the above method. FIG. 8 is a schematic structural diagram of a file certificate storage device according to an embodiment of the present specification. As shown in FIG. 8 , the device may include: a target file acquisition module 810 for acquiring a target file to be stored; a file data determination module 820 for splitting the target file according to a preset splitting method, Obtaining the split multiple file data; a file set dividing module 830, configured to divide the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; verifying The code calculation module 840 is used to calculate the check codes corresponding to the multiple file sets; the location file data information determination module 850 is used to determine the location file data information in each of the file sets; the location file data information Including the location information of the location file data in the file set in the target file and the content information of the location file data; the data storage module 860 is used to store the check code and the location file data information. into the blockchain network.
基于图8的装置,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。Based on the device in FIG. 8 , some specific implementations of the method are also provided in the embodiments of this specification, which will be described below.
可选的,所述装置,还可以包括:摘要值计算模块,用于采用哈希算法计算所述目标文件的摘要值;摘要值存储模块,用于将所述摘要值存储在所述区块链网络中。Optionally, the apparatus may further include: a digest value calculation module for calculating the digest value of the target file by using a hash algorithm; a digest value storage module for storing the digest value in the block in the chain network.
可选的,所述数据存储模块860,具体可以包括:存证数据确定单元,用于按照预设拼接方式,将所述摘要值、所述校验码以及所述定位文件数据信息进行拼接,得到所述目标文件对应的存证数据;存证数据存储单元,用于将所述存证数据存储在区块链网络中。Optionally, the data storage module 860 may specifically include: a storage certificate data determination unit, configured to splicing the digest value, the check code and the positioning file data information according to a preset splicing method, Obtain the certificate data corresponding to the target file; the certificate data storage unit is used to store the certificate data in the blockchain network.
可选的,所述存证数据存储单元,具体可以用于:基于所述存证数据,生成包含认证信息的存证证书;将包含所述存证证书的存证数据发送到所述区块链网络中进行存储。Optionally, the storage unit for the storage data may be specifically configured to: generate a certificate of storage including authentication information based on the storage data; send the storage data including the certificate of storage to the block storage in the chain network.
可选的,所述文件集合划分模块830,具体可以包括:划分单元,用于按照文件集合的预设文件个数,将所述多个文件数据划分至N个文件集合;每个所述文件集合中的元素之间具有顺序。Optionally, the file set dividing module 830 may specifically include: a dividing unit, configured to divide the multiple file data into N file sets according to the preset number of files in the file set; The elements in the collection have order between them.
可选的,所述装置,还可以包括:文件集合排列模块,用于按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;所述校验码计算模块,具体包括:校验码计算单元,用于采用纠删算法,对多个所述文件集合对应的矩阵的每一行生成对应的校验码;对多个所述文件集合对应的矩阵的每一列文件集合生成对应的校验码。Optionally, the apparatus may further include: a file set arrangement module, configured to arrange the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; the check code calculation module, specifically It includes: a check code calculation unit, configured to use an erasure correction algorithm to generate a corresponding check code for each row of the matrix corresponding to the multiple file sets; for each column of the file set of the matrix corresponding to the multiple file sets Generate the corresponding check code.
可选的,所述定位文件数据信息确定模块850,具体可以包括:定位文件数据信息确定单元,用于将每组所述文件集合中的特定位置处的元素确定为定位文件数据,得到定位文件数据集合。Optionally, the positioning file data information determining module 850 may specifically include: a positioning file data information determining unit, configured to determine an element at a specific position in each group of the file sets as positioning file data to obtain a positioning file. data collection.
基于同样的思路,本说明书实施例还提供了上述方法对应的装置。图9为本说明书实施例提供的一种文件验证装置的结构示意图。如图9所示,该装置可以包括:数据获取模块910,用于从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;文件集合确定模块920,用于基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;校验码计算模块930,用于计算所述多个文件集合对应的校验码;比对模块940,用于将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;验证模块950,用于基于所述比对结果,对所述待验证文件进行验证。Based on the same idea, the embodiments of the present specification also provide a device corresponding to the above method. FIG. 9 is a schematic structural diagram of a file verification apparatus according to an embodiment of the present specification. As shown in FIG. 9 , the device may include: a data acquisition module 910 for acquiring the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by The multiple file sets corresponding to the target file of the obtained by dividing the multiple file data; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data; the file set determining module 920 , for determining multiple file sets corresponding to the files to be verified based on the positioning file data information; a check code calculation module 930 for calculating the check codes corresponding to the multiple file sets; a comparison module 940 , for comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result; the verification module 950 is used for Comparing the results, the document to be verified is verified.
基于图9的装置,本说明书实施例还提供了该方法的一些具体实施方案,下面进行说明。Based on the device in FIG. 9 , some specific implementations of the method are also provided in the embodiments of this specification, which will be described below.
可选的,所述数据获取模块910,具体可以包括:标识信息获取单元,用于获取所述待验证文件对应的标识信息;数据获取单元,用于基于所述标识信息从所述区块链网络中获取所述待验证文件对应的定位文件数据信息以及校验码。Optionally, the data acquisition module 910 may specifically include: an identification information acquisition unit for acquiring identification information corresponding to the to-be-verified file; a data acquisition unit for obtaining an identification information from the blockchain based on the identification information. The positioning file data information and the check code corresponding to the file to be verified are obtained from the network.
可选的,所述装置,还可以包括:摘要值计算模块,用于使用存证时使用的Hash算 法,计算所述待验证文件的摘要值;摘要值比对模块,用于将计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值进行比对;篡改情况确定单元,用于当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值一致时,确定所述待验证文件未被篡改;当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值不一致时,基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合。Optionally, the device may further include: a digest value calculation module, used for calculating the digest value of the to-be-verified file by using the Hash algorithm used when storing the certificate; a digest value comparison module, used for calculating the calculated digest value. The digest value of the document to be verified is compared with the digest value in the certificate data; the tampering situation determination unit is used to calculate the digest value of the document to be verified and the digest in the certificate data. When the values are consistent, it is determined that the file to be verified has not been tampered with; when the calculated digest value of the to-be-verified file is inconsistent with the digest value in the certificate data, based on the location file data information, it is determined that the Multiple file sets corresponding to the files to be verified.
可选的,所述文件集合确定模块920,具体可以包括:第一定位文件数据集合确定单元,用于基于所述位置信息,确定第一定位文件数据集合;第二定位文件数据集合确定单元,用于基于动态规划算法,从所述第一定位文件集合确定覆盖率满足预设条件的第二定位文件数据集合;定位文件数据信息比对单元,用于将所述第二定位文件数据集合与预先存储的定位文件数据信息进行比对,确定不一致的定位文件数据信息;第三定位文件数据集合确定单元,用于根据所述不一致的定位文件数据信息,确定第三定位文件数据集合,以使所述第三定位文件数据集合与预先存储的定位文件数据的匹配率最大;文件集合确定单元,用于基于所述第三定位文件数据集合,确定所述待验证文件对应的多个文件集合。Optionally, the file set determining module 920 may specifically include: a first positioning file data set determining unit, configured to determine a first positioning file data set based on the location information; a second positioning file data set determining unit, Based on a dynamic programming algorithm, determine a second positioning file data set whose coverage meets a preset condition from the first positioning file set; a positioning file data information comparison unit is used to compare the second positioning file data set with the The pre-stored positioning file data information is compared to determine inconsistent positioning file data information; the third positioning file data set determining unit is used to determine the third positioning file data set according to the inconsistent positioning file data information, so that the The matching rate between the third positioning file data set and the pre-stored positioning file data is the largest; the file set determining unit is configured to determine, based on the third positioning file data set, multiple file sets corresponding to the to-be-verified file.
可选的,所述校验码计算模块930,具体可以包括:文件集合排列单元,用于按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;校验码生成单元,用于采用纠删算法,对所述矩阵的每一行对应的多个所述文件集合生成对应的行校验码;对所述矩阵的每一列对应的多个所述文件集合生成对应的列校验码。Optionally, the check code calculation module 930 may specifically include: a file set arrangement unit for arranging the N file sets in the form of a matrix to obtain a matrix with N rows and M columns; check code generation The unit is configured to use an erasure correction algorithm to generate corresponding row check codes for the multiple file sets corresponding to each row of the matrix; generate the corresponding multiple file sets corresponding to each column of the matrix. Column check code.
可选的,所述比对模块940,具体可以用于:对于所述矩阵中的第i行的多个所述文件集合,从所述区块链网络中获取所述第i行的多个所述文件集合对应的行校验码;将计算得到的所述第i行的多个所述文件集合对应的行校验码与从所述区块链获取的所述第i行的多个所述文件集合对应的行校验码进行比对,得到行校验码比对结果;对于所述矩阵中的第j列的多个所述文件集合,从所述区块链网络中获取所述第j列的多个所述文件集合对应的列校验码;将计算得到的所述第j列的多个所述文件集合对应的列校验码与从所述区块链获取的所述第j列的多个所述文件集合对应的列校验码进行比对,得到列校验码比对结果。Optionally, the comparison module 940 may be specifically configured to: for a plurality of the file sets in the i-th row in the matrix, obtain a plurality of the i-th row from the blockchain network The line check code corresponding to the file set; compare the calculated line check code corresponding to the i-th line of the file set with the i-th line obtained from the blockchain. The row check codes corresponding to the file sets are compared to obtain a row check code comparison result; for the plurality of the document sets in the jth column of the matrix, all the file sets are obtained from the blockchain network. The column check codes corresponding to the plurality of the document sets in the jth column; the calculated column check codes corresponding to the plurality of the document sets in the jth column and the obtained from the blockchain. The column check codes corresponding to the plurality of the document sets in the jth column are compared to obtain a column check code comparison result.
基于同样的思路,本说明书实施例还提供了上述方法对应的设备。Based on the same idea, the embodiments of this specification also provide a device corresponding to the above method.
图10是本说明书实施例提供的一种文件存证、验证设备示意图。如图10所示,设备1000可以包括:至少一个处理器1010;以及,与所述至少一个处理器通信连接的存储器1030;其中,所述存储器1030存储有可被所述至少一个处理器1010执行的指令1020;对应于实施例1,所述指令1020被所述至少一个处理器1010执行,以使所述至少一个处理器1010能够:获取待存证的目标文件;按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;计算所述多个文件集合对应的校验码;确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;将所述校验码以及所述定位文件数据信息存入区块链网络中。FIG. 10 is a schematic diagram of a document storage and verification device provided by an embodiment of the present specification. As shown in FIG. 10 , the device 1000 may include: at least one processor 1010 ; and a memory 1030 communicatively connected to the at least one processor; wherein the memory 1030 stores information executable by the at least one processor 1010 corresponding to Embodiment 1, the instruction 1020 is executed by the at least one processor 1010, so that the at least one processor 1010 can: obtain the target file to be stored; The target file is split to obtain multiple split file data; the multiple file data is divided into multiple file sets according to a preset division method; each of the file sets contains m elements; the corresponding check codes of the multiple file sets; determine the location file data information in each of the file sets; the location file data information includes the location information of the location file data in the file set in the target file. and the content information of the positioning file data; the check code and the positioning file data information are stored in the blockchain network.
对应于实施例2,所述指令1020被所述至少一个处理器1010执行,以使所述至少一个处理器1010能够:从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合; 计算所述多个文件集合对应的校验码;将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;基于所述比对结果,对所述待验证文件进行验证。Corresponding to Embodiment 2, the instruction 1020 is executed by the at least one processor 1010, so that the at least one processor 1010 can: obtain the location file data information corresponding to the to-be-verified file and verify it from the blockchain network The check code is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets are splitting the target file according to the preset splitting method to obtain A plurality of file data is obtained by dividing the plurality of file data according to a preset division method; the location file data information includes the location information of the location file data in the file set in the target file and the location information of the target file. the content information of the location file data; based on the location file data information, determine multiple file sets corresponding to the to-be-verified file; calculate the check codes corresponding to the multiple file sets; The check code corresponding to the file set is compared with the check code obtained from the blockchain network to obtain a comparison result; based on the comparison result, the to-be-verified file is verified.
基于同样的思路,本说明书实施例还提供了上述方法对应的计算机可读介质。计算机可读介质上存储有计算机可读指令,对应于实施例1,所述计算机可读指令可被处理器执行以实现以下方法:获取待存证的目标文件;按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;计算所述多个文件集合对应的校验码;确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;将所述校验码以及所述定位文件数据信息存入区块链网络中。Based on the same idea, the embodiments of the present specification also provide a computer-readable medium corresponding to the above method. Computer-readable instructions are stored on the computer-readable medium, corresponding to Embodiment 1, and the computer-readable instructions can be executed by the processor to implement the following methods: acquiring the target file to be stored; splitting the target file to obtain multiple file data after splitting; dividing the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements; calculating the Check codes corresponding to multiple file sets; determine the location file data information in each of the file sets; the location file data information includes location information of the location file data in the file set in the target file and The content information of the positioning file data; the check code and the positioning file data information are stored in the blockchain network.
对应于实施例2,所述计算机可读指令可被处理器执行以实现以下方法:从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;计算所述多个文件集合对应的校验码;将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;基于所述比对结果,对所述待验证文件进行验证。Corresponding to Embodiment 2, the computer-readable instructions can be executed by the processor to implement the following method: obtain the location file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by It is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple file sets are obtained by splitting the target file according to the preset splitting method to obtain multiple file data, and the multiple file sets are divided according to the preset method. The division method is obtained by dividing the plurality of file data; the location file data information includes the location information of the location file data in the file set in the target file and the content information of the location file data; based on Locating the file data information, determining multiple file sets corresponding to the to-be-verified file; calculating the check codes corresponding to the multiple file sets; comparing the calculated check codes corresponding to the multiple file sets with the The verification codes obtained in the blockchain network are compared to obtain a comparison result; based on the comparison result, the to-be-verified file is verified.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to the partial descriptions of the method embodiments for related parts.
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字符系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, improvements in a technology could be clearly differentiated between improvements in hardware (eg, improvements to circuit structures such as diodes, transistors, switches, etc.) or improvements in software (improvements in method flow). However, with the development of technology, the improvement of many methods and processes today can be regarded as a direct improvement of the hardware circuit structure. Designers almost get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware entity modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is an integrated circuit whose logic function is determined by user programming of the device. It is programmed by the designer to "integrate" a digital character system on a PLD, without the need for a chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, today, instead of making integrated circuit chips by hand, this kind of programming is also mostly implemented using "logic compiler" software, which is similar to the software compiler used in program development and writing, and needs to be compiled before compiling. The original code also has to be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., currently the most commonly used The ones are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that a hardware circuit for implementing the logic method process can be easily obtained by simply programming the method process in the above-mentioned several hardware description languages and programming it into the integrated circuit.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器 以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that, in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps. The same function can be realized in the form of a microcontroller, etc. Therefore, this kind of controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure in the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字符助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device Or a combination of any of these devices.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described respectively. Of course, when implementing the present application, the functions of each unit may be implemented in one or more software and/or hardware.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或 技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字符多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带式磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both persistent and non-permanent, removable and non-removable media and can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD), or other optical storage , magnetic tape cartridges, magnetic tape-disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims (24)

  1. 一种文件存证方法,包括:A method of document preservation, including:
    获取待存证的目标文件;Obtain the target file to be stored;
    按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;Splitting the target file according to a preset splitting method to obtain multiple split file data;
    按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;Divide the multiple file data into multiple file sets according to a preset division method; each of the file sets includes m elements;
    计算所述多个文件集合对应的校验码;calculating the check codes corresponding to the multiple file sets;
    确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;Determine the positioning file data information in each of the file sets; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data;
    将所述校验码以及所述定位文件数据信息存入区块链网络中。The check code and the positioning file data information are stored in the blockchain network.
  2. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    采用哈希算法计算所述目标文件的摘要值;adopt a hash algorithm to calculate the digest value of the target file;
    将所述摘要值存储在所述区块链网络中。The digest value is stored in the blockchain network.
  3. 根据权利要求2所述的方法,所述将所述校验码以及所述定位文件数据信息存入区块链网络中,具体包括:The method according to claim 2, wherein storing the check code and the positioning file data information in a blockchain network specifically includes:
    按照预设拼接方式,将所述摘要值、所述校验码以及所述定位文件数据信息进行拼接,得到所述目标文件对应的存证数据;According to a preset splicing method, splicing the digest value, the check code and the positioning file data information to obtain the certificate data corresponding to the target file;
    将所述存证数据存储在区块链网络中。The depository data is stored in the blockchain network.
  4. 根据权利要求3所述的方法,所述将所述存证数据存入区块链网络中,具体包括:The method according to claim 3, the storing the certificate data in the blockchain network specifically includes:
    基于所述存证数据,生成包含认证信息的存证证书;Based on the deposit data, generating a deposit certificate containing authentication information;
    将包含所述存证证书的存证数据发送到所述区块链网络中进行存储。Send the deposit data including the deposit certificate to the blockchain network for storage.
  5. 根据权利要求1所述的方法,所述按照预设划分方式将所述多个文件数据划分至多个文件集合,具体包括:The method according to claim 1, wherein the dividing the plurality of file data into a plurality of file sets according to a preset dividing manner specifically includes:
    按照文件集合的预设文件个数,将所述多个文件数据划分至N个文件集合;每个所述文件集合中的元素之间具有顺序。According to the preset number of files in the file set, the plurality of file data is divided into N file sets; the elements in each of the file sets have an order.
  6. 根据权利要求5所述的方法,所述计算所述多个文件集合对应的校验码之前,还包括:The method according to claim 5, before calculating the check codes corresponding to the multiple file sets, further comprising:
    按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;Arrange the N document sets in the form of a matrix to obtain a matrix of N rows and M columns;
    所述计算所述多个文件集合对应的校验码,具体包括:The calculating the check codes corresponding to the multiple file sets specifically includes:
    采用纠删算法,对多个所述文件集合对应的矩阵的每一行生成对应的校验码;对多个所述文件集合对应的矩阵的每一列文件集合生成对应的校验码。Using an erasure correction algorithm, a corresponding check code is generated for each row of the matrix corresponding to the multiple file sets; and a corresponding check code is generated for each column file set of the matrix corresponding to the multiple file sets.
  7. 根据权利要求1所述的方法,所述确定每个所述文件集合中的定位文件数据信息,具体包括:The method according to claim 1, wherein the determining of the location file data information in each of the file sets specifically includes:
    将每组所述文件集合中的特定位置处的元素确定为定位文件数据,得到定位文件数据集合。Elements at specific positions in each group of the file sets are determined as positioning file data to obtain a positioning file data set.
  8. 一种文件验证方法,包括:A method of document verification, including:
    从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;Obtain the positioning file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple The file set is obtained by splitting the target file according to a preset splitting method to obtain multiple file data, and dividing the multiple file data according to the preset splitting method; the location file data information includes all the file data. the location information of the location file data in the described file set in the target file and the content information of the location file data;
    基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;Determine, based on the location file data information, multiple file sets corresponding to the to-be-verified file;
    计算所述多个文件集合对应的校验码;calculating the check codes corresponding to the multiple file sets;
    将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验 码进行比对,得到比对结果;Comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result;
    基于所述比对结果,对所述待验证文件进行验证。Based on the comparison result, the to-be-verified file is verified.
  9. 根据权利要求8所述的方法,所述从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码,具体包括:The method according to claim 8, the obtaining from the blockchain network the positioning file data information and the check code corresponding to the file to be verified, specifically includes:
    获取所述待验证文件对应的标识信息;obtaining the identification information corresponding to the document to be verified;
    基于所述标识信息从所述区块链网络中获取所述待验证文件对应的定位文件数据信息以及校验码。Based on the identification information, the positioning file data information and the check code corresponding to the to-be-verified file are obtained from the blockchain network.
  10. 根据权利要求8所述的方法,所述基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合之前,还包括:The method according to claim 8, before determining, based on the locating file data information, multiple file sets corresponding to the to-be-verified file, further comprising:
    使用存证时使用的Hash算法,计算所述待验证文件的摘要值;Calculate the digest value of the document to be verified by using the Hash algorithm used when depositing the certificate;
    将计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值进行比对;Comparing the calculated digest value of the document to be verified with the digest value in the certificate data;
    当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值一致时,确定所述待验证文件未被篡改;When the calculated digest value of the to-be-verified file is consistent with the digest value in the certificate data, it is determined that the to-be-verified file has not been tampered with;
    当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值不一致时,基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合。When the calculated digest value of the to-be-verified file is inconsistent with the digest value in the certificate storage data, multiple file sets corresponding to the to-be-verified file are determined based on the location file data information.
  11. 根据权利要求8所述的方法,所述基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合,具体包括:The method according to claim 8, wherein the determining of multiple file sets corresponding to the to-be-verified file based on the positioning file data information specifically includes:
    基于所述位置信息,确定第一定位文件数据集合;determining a first positioning file data set based on the location information;
    基于动态规划算法,从所述第一定位文件集合确定覆盖率满足预设条件的第二定位文件数据集合;Based on a dynamic programming algorithm, determine a second location file data set whose coverage meets a preset condition from the first location file set;
    将所述第二定位文件数据集合与预先存储的定位文件数据信息进行比对,确定不一致的定位文件数据信息;Comparing the second positioning file data set with pre-stored positioning file data information to determine inconsistent positioning file data information;
    根据所述不一致的定位文件数据信息,确定第三定位文件数据集合,以使所述第三定位文件数据集合与预先存储的定位文件数据的匹配率最大;According to the inconsistent positioning file data information, determine a third positioning file data set, so that the matching rate between the third positioning file data set and the pre-stored positioning file data is maximized;
    基于所述第三定位文件数据集合,确定所述待验证文件对应的多个文件集合。Based on the third location file data set, multiple file sets corresponding to the to-be-verified file are determined.
  12. 根据权利要求11所述的方法,所述计算所述多个文件集合对应的校验码,具体包括:The method according to claim 11, wherein the calculating the check codes corresponding to the multiple file sets specifically includes:
    按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;Arrange the N document sets in the form of a matrix to obtain a matrix of N rows and M columns;
    采用纠删算法,对所述矩阵的每一行对应的多个所述文件集合生成对应的行校验码;对所述矩阵的每一列对应的多个所述文件集合生成对应的列校验码。Erasure erasure algorithm is used to generate corresponding row check codes for multiple file sets corresponding to each row of the matrix; and corresponding column check codes are generated for multiple file sets corresponding to each column of the matrix .
  13. 根据权利要求12所述的方法,将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果,具体包括:The method according to claim 12, comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result, which specifically includes:
    对于所述矩阵中的第i行的多个所述文件集合,从所述区块链网络中获取所述第i行的多个所述文件集合对应的行校验码;For a plurality of the file sets in the i-th row of the matrix, obtain row check codes corresponding to the i-th row of the file sets from the blockchain network;
    将计算得到的所述第i行的多个所述文件集合对应的行校验码与从所述区块链获取的所述第i行的多个所述文件集合对应的行校验码进行比对,得到行校验码比对结果;Perform the calculation of the line check codes corresponding to the plurality of the file sets in the i-th row obtained from the calculation with the line check codes corresponding to the file sets in the i-th line obtained from the blockchain. Compare, get the row check code comparison result;
    对于所述矩阵中的第j列的多个所述文件集合,从所述区块链网络中获取所述第j列的多个所述文件集合对应的列校验码;For a plurality of the document sets in the jth column of the matrix, obtain from the blockchain network the column check codes corresponding to the jth column of the document sets;
    将计算得到的所述第j列的多个所述文件集合对应的列校验码与从所述区块链获取的所述第j列的多个所述文件集合对应的列校验码进行比对,得到列校验码比对结果。The calculated column check codes corresponding to the plurality of the document sets in the jth column are performed with the column check codes corresponding to the plurality of the document sets in the jth column obtained from the blockchain. Compare, get the column check code comparison result.
  14. 一种文件存证装置,包括:A document storage device, comprising:
    目标文件获取模块,用于获取待存证的目标文件;The target file acquisition module is used to obtain the target file to be stored;
    文件数据确定模块,用于按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;a file data determination module, configured to split the target file according to a preset splitting method to obtain multiple split file data;
    文件集合划分模块,用于按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;a file set division module, configured to divide the multiple file data into multiple file sets according to a preset division method; each of the file sets contains m elements;
    校验码计算模块,用于计算所述多个文件集合对应的校验码;a check code calculation module, configured to calculate the check codes corresponding to the multiple file sets;
    定位文件数据信息确定模块,用于确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;The positioning file data information determination module is used to determine the positioning file data information in each of the file sets; the positioning file data information includes the position information of the positioning file data in the file set in the target file and all Describe the content information of the location file data;
    数据存储模块,用于将所述校验码以及所述定位文件数据信息存入区块链网络中。The data storage module is used for storing the check code and the data information of the positioning file in the blockchain network.
  15. 根据权利要求14所述的装置,所述文件集合划分模块,具体包括:The device according to claim 14, wherein the file set dividing module specifically comprises:
    划分单元,用于按照文件集合的预设文件个数,将所述多个文件数据划分至N个文件集合;每个所述文件集合中的元素之间具有顺序。The dividing unit is configured to divide the plurality of file data into N file sets according to the preset number of files in the file set; elements in each of the file sets have an order.
  16. 根据权利要求15所述的装置,所述装置,还包括:The apparatus of claim 15, the apparatus further comprising:
    文件集合排列模块,用于按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;a file set arrangement module, used for arranging the N described file sets in the form of a matrix to obtain a matrix of N rows and M columns;
    所述校验码计算模块,具体包括:The verification code calculation module specifically includes:
    校验码计算单元,用于采用纠删算法,对多个所述文件集合对应的矩阵的每一行生成对应的校验码;对多个所述文件集合对应的矩阵的每一列文件集合生成对应的校验码。A check code calculation unit, configured to use an erasure correction algorithm to generate a corresponding check code for each row of the matrix corresponding to the multiple file sets; generate a corresponding check code for each column of the file set of the matrix corresponding to the multiple file sets check code.
  17. 一种文件验证装置,包括:A document verification device, comprising:
    数据获取模块,用于从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;The data acquisition module is used to acquire the positioning file data information and the check code corresponding to the file to be verified from the blockchain network; the check code is obtained by calculating multiple file sets corresponding to the pre-certified target file The multiple file sets are obtained by splitting the target file according to a preset splitting method to obtain multiple file data, and dividing the multiple file data according to a preset splitting method; the The location file data information includes location information of the location file data in the file set in the target file and content information of the location file data;
    文件集合确定模块,用于基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;a file set determining module, configured to determine a plurality of file sets corresponding to the to-be-verified file based on the positioning file data information;
    校验码计算模块,用于计算所述多个文件集合对应的校验码;a check code calculation module, configured to calculate the check codes corresponding to the multiple file sets;
    比对模块,用于将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;A comparison module, configured to compare the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result;
    验证模块,用于基于所述比对结果,对所述待验证文件进行验证。A verification module, configured to verify the to-be-verified file based on the comparison result.
  18. 根据权利要求17所述的装置,所述装置,还包括:The apparatus of claim 17, the apparatus further comprising:
    摘要值计算模块,用于使用存证时使用的Hash算法,计算所述待验证文件的摘要值;The digest value calculation module is used to calculate the digest value of the to-be-verified file by using the Hash algorithm used when storing the certificate;
    摘要值比对模块,用于将计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值进行比对;The digest value comparison module is configured to compare the calculated digest value of the document to be verified with the digest value in the certificate data;
    篡改情况确定单元,用于当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值一致时,确定所述待验证文件未被篡改;当计算得到的所述待验证文件的摘要值与所述存证数据中的摘要值不一致时,基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合。A tampering situation determination unit, configured to determine that the to-be-verified file has not been tampered with when the calculated digest value of the to-be-verified file is consistent with the digest value in the certificate data; When the digest value of the file is inconsistent with the digest value in the certification data, multiple file sets corresponding to the to-be-verified file are determined based on the location file data information.
  19. 根据权利要求17所述的装置,所述文件集合确定模块,具体包括:The apparatus according to claim 17, the file set determination module, specifically comprising:
    第一定位文件数据集合确定单元,用于基于所述位置信息,确定第一定位文件数据集合;a first positioning file data set determining unit, configured to determine a first positioning file data set based on the location information;
    第二定位文件数据集合确定单元,用于基于动态规划算法,从所述第一定位文件集合确定覆盖率满足预设条件的第二定位文件数据集合;A second positioning file data set determining unit, configured to determine, based on a dynamic programming algorithm, a second positioning file data set whose coverage satisfies a preset condition from the first positioning file set;
    定位文件数据信息比对单元,用于将所述第二定位文件数据集合与预先存储的定位文件数据信息进行比对,确定不一致的定位文件数据信息;a positioning file data information comparison unit, configured to compare the second positioning file data set with pre-stored positioning file data information, and determine inconsistent positioning file data information;
    第三定位文件数据集合确定单元,用于根据所述不一致的定位文件数据信息,确定第三定位文件数据集合,以使所述第三定位文件数据集合与预先存储的定位文件数据 的匹配率最大;A third positioning file data set determining unit, configured to determine a third positioning file data set according to the inconsistent positioning file data information, so as to maximize the matching rate between the third positioning file data set and the pre-stored positioning file data ;
    文件集合确定单元,用于基于所述第三定位文件数据集合,确定所述待验证文件对应的多个文件集合。A file set determination unit, configured to determine, based on the third located file data set, multiple file sets corresponding to the to-be-verified file.
  20. 根据权利要求19所述的装置,所述校验码计算模块,具体包括:The device according to claim 19, the check code calculation module specifically comprises:
    文件集合排列单元,用于按照矩阵的形式对N个所述文件集合进行排列,得到N行M列的矩阵;a file set arrangement unit, used for arranging the N described file sets in the form of a matrix to obtain a matrix of N rows and M columns;
    校验码生成单元,用于采用纠删算法,对所述矩阵的每一行对应的多个所述文件集合生成对应的行校验码;对所述矩阵的每一列对应的多个所述文件集合生成对应的列校验码。A check code generation unit, configured to use an erasure correction algorithm to generate a corresponding row check code for a plurality of the file sets corresponding to each row of the matrix; for a plurality of the files corresponding to each column of the matrix The set generates the corresponding column check code.
  21. 根据权利要求20所述的装置,所述比对模块,具体用于:The device according to claim 20, the comparison module is specifically used for:
    对于所述矩阵中的第i行的多个所述文件集合,从所述区块链网络中获取所述第i行的多个所述文件集合对应的行校验码;For a plurality of the file sets in the i-th row of the matrix, obtain row check codes corresponding to the i-th row of the file sets from the blockchain network;
    将计算得到的所述第i行的多个所述文件集合对应的行校验码与从所述区块链获取的所述第i行的多个所述文件集合对应的行校验码进行比对,得到行校验码比对结果;Perform the calculation of the line check codes corresponding to the plurality of the file sets in the i-th row obtained from the calculation with the line check codes corresponding to the file sets in the i-th line obtained from the blockchain. Compare, get the row check code comparison result;
    对于所述矩阵中的第j列的多个所述文件集合,从所述区块链网络中获取所述第j列的多个所述文件集合对应的列校验码;For a plurality of the document sets in the jth column of the matrix, obtain from the blockchain network the column check codes corresponding to the jth column of the document sets;
    将计算得到的所述第j列的多个所述文件集合对应的列校验码与从所述区块链获取的所述第j列的多个所述文件集合对应的列校验码进行比对,得到列校验码比对结果。The calculated column check codes corresponding to the plurality of the document sets in the jth column are performed with the column check codes corresponding to the plurality of the document sets in the jth column obtained from the blockchain. Compare, get the column check code comparison result.
  22. 一种文件存证设备,包括:A document storage device, comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
    获取待存证的目标文件;按照预设拆分方式对所述目标文件进行拆分,得到拆分后的多个文件数据;obtaining the target file to be stored; splitting the target file according to a preset splitting method to obtain multiple split file data;
    按照预设划分方式将所述多个文件数据划分至多个文件集合;每个所述文件集合中包含m个元素;Divide the multiple file data into multiple file sets according to a preset division method; each of the file sets includes m elements;
    计算所述多个文件集合对应的校验码;calculating the check codes corresponding to the multiple file sets;
    确定每个所述文件集合中的定位文件数据信息;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;Determine the positioning file data information in each of the file sets; the positioning file data information includes the position information of the positioning file data in the file set in the target file and the content information of the positioning file data;
    将所述校验码以及所述定位文件数据信息存入区块链网络中。The check code and the positioning file data information are stored in the blockchain network.
  23. 一种文件验证设备,包括:A document verification device comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
    从区块链网络中获取待验证文件对应的定位文件数据信息以及校验码;所述校验码是通过对预先存证的目标文件对应的多个文件集合进行计算得到的;所述多个文件集合是对所述目标文件按照预设拆分方式进行拆分,得到多个文件数据,并按照预设划分方式将所述多个文件数据进行划分得到的;所述定位文件数据信息包括所述文件集合中的定位文件数据在所述目标文件中的位置信息以及所述定位文件数据的内容信息;Obtain the positioning file data information and check code corresponding to the file to be verified from the blockchain network; the check code is obtained by calculating multiple file sets corresponding to the pre-stored target file; the multiple The file set is obtained by splitting the target file according to a preset splitting method to obtain multiple file data, and dividing the multiple file data according to the preset splitting method; the location file data information includes all the file data. the location information of the location file data in the described file set in the target file and the content information of the location file data;
    基于所述定位文件数据信息,确定所述待验证文件对应的多个文件集合;Determine, based on the location file data information, multiple file sets corresponding to the to-be-verified file;
    计算所述多个文件集合对应的校验码;calculating the check codes corresponding to the multiple file sets;
    将计算得到的所述多个文件集合对应的校验码与从所述区块链网络中获取的校验码进行比对,得到比对结果;Comparing the calculated check codes corresponding to the multiple file sets with the check codes obtained from the blockchain network to obtain a comparison result;
    基于所述比对结果,对所述待验证文件进行验证。Based on the comparison result, the to-be-verified file is verified.
  24. 一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现权利要求1至13中任一项所述的方法。A computer readable medium having stored thereon computer readable instructions executable by a processor to implement the method of any one of claims 1 to 13.
PCT/CN2022/086300 2021-04-20 2022-04-12 File storage method and apparatus, and device WO2022222786A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110426516.2 2021-04-20
CN202110426516.2A CN113065169B (en) 2021-04-20 2021-04-20 File storage method, device and equipment

Publications (1)

Publication Number Publication Date
WO2022222786A1 true WO2022222786A1 (en) 2022-10-27

Family

ID=76567120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086300 WO2022222786A1 (en) 2021-04-20 2022-04-12 File storage method and apparatus, and device

Country Status (2)

Country Link
CN (2) CN113065169B (en)
WO (1) WO2022222786A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117979118A (en) * 2024-03-29 2024-05-03 杭州海康威视数字技术股份有限公司 Data stream recording method, device, recorder and recording system

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065169B (en) * 2021-04-20 2023-05-09 支付宝(杭州)信息技术有限公司 File storage method, device and equipment
CN115664854B (en) * 2022-12-22 2023-03-10 广州市悦智计算机有限公司 Method for chaining and confirming data of data acquisition equipment of Internet of things
CN117097559B (en) * 2023-10-17 2023-12-19 天津德科智控股份有限公司 EPS steering angle message transmission verification method
CN117273974B (en) * 2023-11-21 2024-02-06 中国人寿保险股份有限公司上海数据中心 Large enterprise expense reimbursement data generation and verification method based on block chain consensus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647523A (en) * 2018-04-28 2018-10-12 华南理工大学 A kind of electronic identification system based on block chain and deposit card, file access pattern method
CN109409135A (en) * 2018-10-19 2019-03-01 北京金山云网络技术有限公司 A kind of characteristic information preparation method, device, equipment and the storage medium of data
US20200159696A1 (en) * 2018-11-16 2020-05-21 Advanced Messaging Technologies, Inc. Systems and Methods for Distributed Data Storage and Delivery Using Blockchain
CN111222176A (en) * 2020-01-08 2020-06-02 中国人民解放军国防科技大学 Block chain-based cloud storage possession proving method, system and medium
CN111611622A (en) * 2020-05-29 2020-09-01 宁波富万信息科技有限公司 Block chain-based file storage method and electronic equipment
CN113065169A (en) * 2021-04-20 2021-07-02 支付宝(杭州)信息技术有限公司 File storage method, device and equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795269B (en) * 2018-08-03 2023-05-26 阿里巴巴集团控股有限公司 Data recovery verification method, device and equipment
CN109491968B (en) * 2018-11-13 2021-01-22 恒生电子股份有限公司 File processing method, device, equipment and computer readable storage medium
EP3566392B1 (en) * 2018-12-13 2021-08-25 Advanced New Technologies Co., Ltd. Achieving consensus among network nodes in a distributed system
CN111444042B (en) * 2020-03-24 2023-10-27 哈尔滨工程大学 Block chain data storage method based on erasure codes
CN111353180A (en) * 2020-03-30 2020-06-30 北京海益同展信息科技有限公司 Block chain evidence storing method, evidence obtaining method and system
CN111541753B (en) * 2020-04-16 2024-02-27 深圳市迅雷网络技术有限公司 Distributed storage system, method, computer device and medium for block chain data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647523A (en) * 2018-04-28 2018-10-12 华南理工大学 A kind of electronic identification system based on block chain and deposit card, file access pattern method
CN109409135A (en) * 2018-10-19 2019-03-01 北京金山云网络技术有限公司 A kind of characteristic information preparation method, device, equipment and the storage medium of data
US20200159696A1 (en) * 2018-11-16 2020-05-21 Advanced Messaging Technologies, Inc. Systems and Methods for Distributed Data Storage and Delivery Using Blockchain
CN111222176A (en) * 2020-01-08 2020-06-02 中国人民解放军国防科技大学 Block chain-based cloud storage possession proving method, system and medium
CN111611622A (en) * 2020-05-29 2020-09-01 宁波富万信息科技有限公司 Block chain-based file storage method and electronic equipment
CN113065169A (en) * 2021-04-20 2021-07-02 支付宝(杭州)信息技术有限公司 File storage method, device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117979118A (en) * 2024-03-29 2024-05-03 杭州海康威视数字技术股份有限公司 Data stream recording method, device, recorder and recording system

Also Published As

Publication number Publication date
CN113065169A (en) 2021-07-02
CN116579025A (en) 2023-08-11
CN113065169B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2022222786A1 (en) File storage method and apparatus, and device
US11438167B2 (en) Method and server for providing notary service for file and verifying file recorded by notary service
CN109902071B (en) Service log storage method, system, device and equipment
US10372942B1 (en) Method and server for providing notary service for file and verifying file recorded by notary service
WO2020042586A1 (en) Method and apparatus for generating address of smart contract, computer device, and readable storage medium
RU2332703C2 (en) Protection of data stream header object
EP3776250B1 (en) Performing map iterations in blockchain-based system
CN111444196B (en) Method, device and equipment for generating Hash of global state in block chain type account book
US20210049715A1 (en) Blockchain-based data procesing method, apparatus, and electronic device
CN107015882A (en) A kind of block data method of calibration and device
US8601358B2 (en) Buffer transfer check on variable length data
WO2020037400A1 (en) System, method, and computer program for secure authentication of live video
WO2019113495A1 (en) Systems and methods for cryptographic provision of synchronized clocks in distributed systems
CN110689349A (en) Transaction hash value storage and search method and device in block chain
US10969982B1 (en) Data deduplication with collision resistant hash digest processes
CN110061843B (en) Block height creating method, device and equipment in chain type account book
CN111444192B (en) Method, device and equipment for generating Hash of global state in block chain type account book
US11411743B2 (en) Birthday attack prevention system based on multiple hash digests to avoid collisions
CN110457873A (en) A kind of watermark embedding and detection method and device
WO2021093461A1 (en) Method and apparatus for aggregation calculation in blockchain-type ledger, and device
KR20220012353A (en) Validation of Data Fields in Blockchain Transactions
US20200382284A1 (en) Tracking, storage and authentication of documented intellectual property
WO2021057127A1 (en) Method, device, and equipment for data storage based on multiple service attributes
CN114281893A (en) Processing method, device and equipment for block chain transaction
CN113362068B (en) Method for verifying block chain state transfer by light node

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22790895

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22790895

Country of ref document: EP

Kind code of ref document: A1