WO2021022875A1 - Distributed data storage method and system - Google Patents

Distributed data storage method and system Download PDF

Info

Publication number
WO2021022875A1
WO2021022875A1 PCT/CN2020/092810 CN2020092810W WO2021022875A1 WO 2021022875 A1 WO2021022875 A1 WO 2021022875A1 CN 2020092810 W CN2020092810 W CN 2020092810W WO 2021022875 A1 WO2021022875 A1 WO 2021022875A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
target
data block
provider
Prior art date
Application number
PCT/CN2020/092810
Other languages
French (fr)
Chinese (zh)
Inventor
郑映锋
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021022875A1 publication Critical patent/WO2021022875A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Definitions

  • This application belongs to the field of data processing technology, and in particular relates to a method and system for distributed storage of data.
  • the inventor has discovered that in the process of distributed storage, it sometimes occurs: when the storage requester sends data to a storage provider for storage, the storage provider may delete part of the data due to equipment software and hardware problems or artificial maliciousness. As a result, the storage provider failed to store data completely and securely in accordance with the contract.
  • the embodiments of the present application provide a data distributed storage method and system to solve the problems of high storage failure rate and poor monitorability in distributed storage in the prior art.
  • the first aspect of the embodiments of the present application provides a distributed storage method of data, including: a requesting end sends a broadcast request packet, and the broadcast request packet includes request description information;
  • the storage provider After receiving the broadcast request packet, the storage provider extracts the request description information, and determines based on the request description information whether to provide storage services for the requesting side, if it is determined that the requesting side provides storage services, A reply message is returned to the requesting end; the requesting end selects one of the storage providers that return reply messages to within a preset time period as the target provider; the requesting end divides the target data into multiple data Block, and calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; the target provider receives And store each data block sent by the requesting end; the requesting end stores a data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary at a preset time interval The above summary of the data block is verified to determine whether the storage of the target data by the target provider is abnormal.
  • a second aspect of the embodiments of the present application provides a distributed storage system for data, including: a requesting end and a storage providing end;
  • the request end is used to send a broadcast request packet, the broadcast request packet contains request description information; the storage provider is used to extract the request description information after receiving the broadcast request packet, and Based on the request description information, it is determined whether to provide storage services for the requesting end, and if it is determined that the requesting end provides storage services, a reply message is returned to the requesting end; the requesting end is also used to download from a preset Select one of the storage providers to which reply messages are returned within the time period as the target provider; the requester is also used to divide the target data into multiple data blocks, calculate the summary of each data block, and generate and The corresponding relationship between the identifier of the data block and the summary is stored, and all the data blocks are sent to the target provider; the target provider is also used to receive and store each of the data sent by the requester Data block; the requesting end is also used to summarize the one or more data blocks stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary every preset time interval Perform verification to determine whether
  • the third aspect of the embodiments of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the distributed storage method of any of the above data is realized, for example
  • the computer program is executed by the processor, the following steps are implemented:
  • the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
  • Receive the broadcast request packet sent by the requester extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
  • a fourth aspect of the embodiments of the present application provides an electronic device, including: a processor and a memory, the memory stores a computer program that can be run on the processor, wherein the processor is configured to execute The computer program executes the distributed storage method of any one of the foregoing data.
  • the processor is configured to execute the following steps by executing the computer program:
  • the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
  • Receive the broadcast request packet sent by the requester extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
  • FIG. 1 is an implementation flowchart of a method for distributed storage of data provided by an embodiment of the present application
  • FIG. 2 is a specific implementation flowchart of the distributed storage method S103 of data provided by an embodiment of the present application;
  • FIG. 3 is a specific implementation flowchart of the distributed storage method S104 of data provided by an embodiment of the present application
  • Fig. 4 is a system interaction diagram of a distributed storage system for data provided by an embodiment of the present application.
  • the technical solution of the present application can be applied to the field of blockchain or big data technology.
  • the technical solution of the present application can be implemented by blockchain distributed storage.
  • Fig. 1 shows an implementation process of a method for distributed storage of data provided by an embodiment of the present application.
  • the process of the method includes steps S101 to S106.
  • the specific implementation principle of each step is as follows.
  • the requesting end sends a broadcast request packet, and the broadcast request packet contains request description information.
  • a distributed storage system contains multiple servers, and any server can be used as a requester or storage provider. Obviously, when a server needs to store part of its data (target data) locally in other When the server performs storage, each server in the distributed storage system is naturally divided into a requester and a storage provider.
  • the requester After the requester determines the target data that it wants to store in a different place, it needs to send a broadcast request packet to the local area network where the distributed storage system is located. Obviously, any other server (ie, storage provider) in the distributed storage system can The broadcast request packet is received.
  • the broadcast request packet contains request description information.
  • the request description information is mainly used to describe two aspects of information. On the one hand, it is the information of the requester, including: the location, level, and server type of the requester; on the other hand, it is The information of the target data that needs to be stored offsite includes: the size of the target data, the importance of the target data, the type of the target data, and the generation time, etc.
  • the requesting side broadcasts the request description information, which can be used by the storage provider to determine whether it is suitable for storing the target data that the requesting side wants to store offsite.
  • the storage provider After receiving the broadcast request packet, the storage provider extracts the request description information, and based on the request description information, determines whether to provide storage services for the requester, and if it is determined to be the requester If storage service is provided, a reply message is returned to the requesting end.
  • each storage provider is designed to improve data storage capacity. For security and integrity, it is necessary to first judge whether it is suitable to provide storage services for the requester based on the request description information in the received broadcast request packet.
  • the embodiment of the present application can analyze the request description information through a preset support vector machine algorithm. It includes the following two steps:
  • Step 1 Convert the requested description information into a description matrix, and perform dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix.
  • the data value corresponding to each data type in the request description information is stored in the corresponding matrix position, thereby generating the description matrix.
  • the description matrix can be used to characterize the request description information, which is convenient to be substituted into the subsequent formula calculation.
  • the embodiment of the present application uses the existing principal component analysis algorithm to perform dimensionality reduction processing on the description information. Since the principal component analysis algorithm is an existing technology, it will not be described in detail here.
  • Step 2 Classify the target feature matrix according to the pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.
  • the storage provider needs to train the classification hyperplane in the support vector machine model in advance based on the training data.
  • the specific method is: the storage provider collects the positive set and the negative set stored locally, where the positive set and the negative set
  • the set contains multiple training matrices.
  • the training matrix in the positive set represents the request description information of the data to be stored that the storage provider can process, and the training matrix in the negative set represents the storage provider’s inability Request description information of the stored data.
  • each training matrix generation parameter corresponding to training where P i represents the training parameter of the training matrix i, X-i represents the training matrix i, X 'represents a set of positive and negative to the collection of all the average training matrix of the matrix, the matrix elements of each element of the average value of the average values of all elements of the training matrix element positions corresponding, ⁇ i denotes X i and X 'covariance matrix.
  • the existing support vector machine algorithm determines the classification hyperplane of the support vector machine model according to the training parameters of the training matrix in the positive set and the training parameters of the training matrix in the negative set.
  • the storage provider can finally determine whether to provide storage services for the requester through the above-mentioned method. Obviously, the storage provider will judge whether it is suitable to store the target data based on the request description information, so as to avoid returning reply messages to the requesting end that is not suitable for storing target data from the storage provider side, which can reduce the future to a certain extent. The target data may be lost or destroyed.
  • the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider.
  • each storage provider (servers other than the requester) in the distributed storage system will make a judgment on the received broadcast request packet. Some storage providers will make a judgment based on the judgment of S102. Return a reply message to the requesting end, and other storage providers will not return a reply message to the requesting end.
  • the requesting end since the requesting end will only send a target data to one storage provider for storage, it is necessary to filter the storage providers that return reply messages to it, and finally select from these storage providers. Choose one with the highest reliability as the target provider.
  • the reliability of the storage provider is judged based on the remaining storage capacity of the storage provider and the time for returning the reply message.
  • the foregoing S103 includes:
  • S1031 Extract the remaining storage capacity from the reply message returned by each storage provider within a preset time period, and establish a corresponding relationship between the storage provider and the remaining storage capacity.
  • each storage provider will add its own remaining storage capacity when generating a reply message, so the requesting end can know the remaining storage capacity of each storage provider that returns reply messages to it.
  • S1032 Calculate the time difference between the time when the requesting terminal receives the reply message returned by each storage provider and the time when the requesting terminal sends the broadcast request packet, and establish a correspondence between the storage provider and the time difference.
  • the requesting end will record the sending time once when sending the broadcast request packet in step S101, and record another time after receiving each reply message, so that the requesting end calculates the value of the reply message returned by each storage provider. The time difference between the time and the time when the requesting end sends the broadcast request packet.
  • S1033 Calculate storage coefficients of each of the storage providers by using coefficient formulas.
  • the coefficient formula is: Wherein, P i stored in the storage provided to the coefficient input i, Cap i is the remaining capacity of the memory storing the corresponding supply terminal i, Time i to the supply terminal of the storage time corresponding to the difference i.
  • the storage coefficient is set to be proportional to the reliability. Because the higher the remaining storage capacity, it means that the storage provider has more space to store the target data. The higher the storage reliability, the storage coefficient is also Higher. On the other hand, if the time difference between the time of receiving a reply message returned by a storage provider and the time of sending a broadcast request packet is greater, it generally means that the storage provider’s operating load is greater, or the storage provider’s The routing table of the requesting end has a large distance, which may increase the possibility of data loss or theft due to long-distance data transmission, so the storage reliability will be reduced, and the storage coefficient will be reduced accordingly.
  • the embodiment of the present application once again judges and screens the server for future target data storage from the side of the requesting end, and further improves the reliability of future data storage.
  • the requesting end divides the target data into multiple data blocks, calculates a summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and converts all the data blocks
  • the data block is sent to the target provider.
  • the embodiment of this application has a timing verification step in the follow-up.
  • the target data is checked as a whole, it will obviously cause a lot of computational burden.
  • the requesting end has multiple target data for distributed storage in different places, it is obviously difficult for the requesting end to realize the overall verification of all target data. Therefore, in the embodiment of the present application, it is necessary to divide a target data into multiple data blocks, and perform spot checks on some data blocks in the subsequent verification process.
  • the embodiment of the present application calculates the summary of each data block in this step to generate and store the corresponding relationship between the identifier of the data block and the summary, thereby achieving The purpose of verifying each data block through the digest in the subsequent calculation process.
  • the foregoing S104 includes:
  • S1041 Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups.
  • the M is an integer greater than 1
  • the N is an integer greater than M.
  • each feature group is composed of several binary character strings.
  • expansion rule is only an example, and other expansion rules based on the original feature group can be used to expand the feature subgroup.
  • S1042 Assign an initial summary to the data block, divide the initial summary into L initial summary groups according to the arrangement order, and respectively number the initial summary groups.
  • L is an integer greater than 1.
  • a preset initial abstract is divided into 5 initial abstract groups, and each initial abstract group is numbered to generate a first initial abstract group, a second initial abstract group, and a third initial abstract Group, fourth initial summary group, and fifth initial summary group. Understandably, the number of initial summary groups can be adjusted according to actual conditions.
  • S1043 Set the L cache groups, and respectively number the cache groups.
  • each cache group is numbered to generate the first, second, third, fourth, and fourth cache groups. Five cache groups.
  • the shift assignment calculation includes: After entering the next cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group, and the target feature The group number is the same as the current round of shift assignment calculation.
  • the shift assignment calculation is repeated 8 times, and the input data of the first shift assignment calculation is the aforementioned initial digest group.
  • the initial abstract is divided into 5 initial abstract groups, namely: the first initial abstract group: 1001010; the second initial abstract group: 1000111; the third initial abstract group: 1010101; the fourth initial abstract group: 1000001; Five initial summary groups: 1000000, and the data of each of the initial summary groups are respectively stored in the cache groups with the same number, so at the beginning of the first round of shift assignment calculation, the first cache group: 1001010; The second cache group: 1000111; the third cache group: 1010101; the fourth cache group: 1000001; the fifth cache group: 1000000.
  • the second buffer group 1001010; the third buffer group: 1001111; the fourth buffer group: 1010101; the fifth buffer group: 1000001, and Because the updated first cache group (that is, the current first cache group) is the data of the first cache group before the update plus the data of the target feature group, and the number of the target feature group is calculated from the current shift assignment calculation
  • S1045 Combine data in each cache group after N rounds of shift assignment calculations, as a summary of the data block.
  • each data block has its corresponding identification
  • the identification is used to enable the server to search for and determine the data block. Therefore, through the above steps, the corresponding relationship between the identifier of each data block and the digest can be established. Subsequently, each data block and its corresponding identification are sent to the target provider.
  • the target provider receives and stores each data block sent by the requester.
  • the requesting end verifies the digest of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
  • the verification of the digest of more than one data block stored by the target provider includes:
  • the requesting end generates a challenge message every preset time interval, and the challenge message includes more than one data block identifiers used to indicate the data block to be randomly checked;
  • the target provider After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider returns the data block
  • the summary is verified to determine whether the storage of the target data by the target provider is abnormal.
  • the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store target data offsite; after receiving the broadcast request, the storage provider judges whether it is appropriate based on the request description information contained in it.
  • the requester selects one of the storage providers that return reply messages to it within a preset time period as the target provider, so that From the perspective, a storage provider with the highest reliability is selected to reduce the probability of data abnormalities in the future;
  • the requester divides the target data into multiple data blocks, calculates the summary of each data block, and generates and stores the data block Identifies the correspondence relationship with the summary, so as to provide data support for the subsequent spot check verification process;
  • the requester sends all data blocks to the target provider, and the target provider receives and stores each data sent by the requester Block;
  • the requester verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier
  • FIG. 4 shows a system interaction diagram of the distributed storage system for data provided in an embodiment of this application. For ease of description, only the implementation of this application is shown. Example related parts.
  • the system includes: a request end 401 and a storage provider 402;
  • the requesting end is configured to send a broadcast request packet, and the broadcast request packet contains request description information;
  • the storage provider is configured to extract the request description information after receiving the broadcast request packet, and determine based on the request description information whether to provide storage services for the requester, if it is determined that the request is If the end provides storage services, it returns a reply message to the requesting end;
  • the requesting end is also used to select one of the storage providers that return reply messages to within a preset time period as the target provider;
  • the request end is also used to divide the target data into multiple data blocks, calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and combine all the data Sending the data block to the target provider;
  • the target provider is also used to receive and store each data block sent by the requester;
  • the requesting end is further configured to verify the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
  • the judging whether to provide storage services for the requester based on the request description information includes: converting the request description information into a description matrix, and reducing the dimensionality of the description matrix through a principal component analysis algorithm Process to generate a target feature matrix; classify the target feature matrix according to a pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.
  • the requesting end selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, including: the reply message returned from each storage provider within the preset time period The remaining capacity of the storage is extracted from the, and the corresponding relationship between the storage provider and the remaining capacity of the storage is established; the time when the requesting terminal receives the reply message returned by each storage provider is calculated and the time that the requesting terminal sends The time difference between the time of broadcasting the request packet, and the corresponding relationship between the storage provider and the time difference is established; by formula: Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference; the storage provider with the largest storage coefficient is used as the target provider.
  • the calculating the digest of each of the data blocks includes:
  • the shift assignment calculation includes: after shifting the data of each cache group into the next cache group, adding the data of the current first cache group to the data of the target feature group as the new first The data of each cache group is used to update the first cache group, and the number of the target feature group is the same as the current round of shift assignment calculation; the data in each cache group after N rounds of shift assignment calculation Combination as a summary of the data block.
  • the requesting end verifies the digests of more than one data block stored by the target provider according to a pre-stored correspondence between the identifier of the data block and the digest every preset time interval, Including: the requesting end generates a challenge message every preset time interval, the challenge message includes more than one identifier of the data block, and is used to indicate the data block to be randomly checked; After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider responds to the data returned by the target provider. The block digest is verified to determine whether the storage of the target data by the target provider is abnormal.
  • the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store the target data offsite; the storage provider receives the broadcast request, according to the request description information contained therein Judge whether it is suitable to store the target data, and selectively return reply messages to avoid data loss in the future; the requester chooses one of the storage providers that return reply messages to it within a preset time period as the target provider, From the perspective of the requester, a storage provider with the highest reliability is selected to reduce the probability of data anomalies in the future; the requester divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the data The corresponding relationship between the identifier of the data block and the summary, so as to provide data support for the subsequent spot check verification process; the requesting end sends all the data blocks to the target provider, and the target provider receives and stores the data from the requester Each data block sent; the requester verifies the summary of more than one data block stored by the target provider according to the
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the embodiment of the present application also provides a distributed storage device for data, which may include a module/unit for executing the steps performed by the requesting end and/or the storage provider in the foregoing method.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium.

Abstract

Disclosed are a distributed data storage method and system, which are applicable to the technical field of data processing. The method comprises: a request end sending a broadcast request packet; a storage provision end extracting request description information in the broadcast request packet, determining, on the basis of the request description information, whether to provide a storage service for the request end, and returning a reply message to the request end if providing the storage service for the request end is determined; the request end selecting one of the storage provision ends returning reply messages thereto within a preset time period as a target provision end; the request end dividing target data into a plurality of data blocks, calculating the digest of each data block, generating a correlation between the identifier of the data block and the digest thereof, and sending all the data blocks to the target provision end; the target provision end storing the data blocks sent by the request end; and the request end verifying, at a preset time interval, the digests of one or more data blocks stored in the target provision end, so as to determine whether the storage of the target data by the target provision end is abnormal.

Description

数据的分布式存储方法及系统Data distributed storage method and system
本申请要求于2019年8月7日提交中国专利局、申请号为201910727287.0,发明名称为“数据的分布式存储方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 7, 2019, with the application number 201910727287.0 and the invention title of "Data Distributed Storage Method and System", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请属于数据处理技术领域,尤其涉及一种数据的分布式存储方法及系统。This application belongs to the field of data processing technology, and in particular relates to a method and system for distributed storage of data.
背景技术Background technique
当前,发明人发现,在分布式存储的过程中有时会出现:当存储请求端将数据发送至一个存储提供端进行存储后,存储提供端可能由于设备软硬件问题或者人为恶意将部分数据删除,导致存储提供端未能按合约完整且安全地的存储数据。At present, the inventor has discovered that in the process of distributed storage, it sometimes occurs: when the storage requester sends data to a storage provider for storage, the storage provider may delete part of the data due to equipment software and hardware problems or artificial maliciousness. As a result, the storage provider failed to store data completely and securely in accordance with the contract.
另一方面,当存储提供端出现存储异常时,存储请求端往往无法及时获悉相关的异常情况,致使存储请求端的用户遭受重大的损失。On the other hand, when a storage exception occurs on the storage provider, the storage requester is often unable to learn about the related abnormal situation in time, causing the users of the storage requesting side to suffer heavy losses.
发明内容Summary of the invention
有鉴于此,本申请实施例提供了一种数据的分布式存储方法及系统,以解决现有技术在分布式存储中存在的存储故障率高以及可监控性差的问题。In view of this, the embodiments of the present application provide a data distributed storage method and system to solve the problems of high storage failure rate and poor monitorability in distributed storage in the prior art.
本申请实施例的第一方面提供了一种数据的分布式存储方法,包括:请求端发送广播请求包,所述广播请求包中包含请求描述信息;The first aspect of the embodiments of the present application provides a distributed storage method of data, including: a requesting end sends a broadcast request packet, and the broadcast request packet includes request description information;
存储提供端在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;所述请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;所述目标提供端接收并存储由所述请求端发送的各个数据块;所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。After receiving the broadcast request packet, the storage provider extracts the request description information, and determines based on the request description information whether to provide storage services for the requesting side, if it is determined that the requesting side provides storage services, A reply message is returned to the requesting end; the requesting end selects one of the storage providers that return reply messages to within a preset time period as the target provider; the requesting end divides the target data into multiple data Block, and calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; the target provider receives And store each data block sent by the requesting end; the requesting end stores a data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary at a preset time interval The above summary of the data block is verified to determine whether the storage of the target data by the target provider is abnormal.
本申请实施例的第二方面提供了一种数据的分布式存储系统,包括:请求端以及存储提供端;A second aspect of the embodiments of the present application provides a distributed storage system for data, including: a requesting end and a storage providing end;
所述请求端,用于发送广播请求包,所述广播请求包中包含请求描述信息;所述存储提供端,用于在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;所述请求端,还用于从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;所述请求端,还用于将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;所述目标提供端,还用于接收并存储由所述请求端发送的各个数据块;所述请求端,还用于每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The request end is used to send a broadcast request packet, the broadcast request packet contains request description information; the storage provider is used to extract the request description information after receiving the broadcast request packet, and Based on the request description information, it is determined whether to provide storage services for the requesting end, and if it is determined that the requesting end provides storage services, a reply message is returned to the requesting end; the requesting end is also used to download from a preset Select one of the storage providers to which reply messages are returned within the time period as the target provider; the requester is also used to divide the target data into multiple data blocks, calculate the summary of each data block, and generate and The corresponding relationship between the identifier of the data block and the summary is stored, and all the data blocks are sent to the target provider; the target provider is also used to receive and store each of the data sent by the requester Data block; the requesting end is also used to summarize the one or more data blocks stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary every preset time interval Perform verification to determine whether the storage of the target data by the target provider is abnormal.
本申请实施例的第三方面提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的数据的分布式存储方法,例如,所述计算机程序被处理器执行时实现以下步骤:The third aspect of the embodiments of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the distributed storage method of any of the above data is realized, for example When the computer program is executed by the processor, the following steps are implemented:
发送广播请求包,所述广播请求包中包含请求描述信息;从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的 所述数据块发送至所述目标提供端;每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常;和/或,Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
接收请求端发送的广播请求包,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;如果接收到所述请求端发送的数据块,存储所述数据块,所述数据块是所述请求端对目标数据分割得到的全部数据块。Receive the broadcast request packet sent by the requester, extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
本申请实施例的第四方面提供了一种电子设备,包括:处理器以及存储器,所述存储器存储有可在所述处理器上运行的计算机程序,其中,所述处理器配置为经由执行所述计算机程序来执行上述任意一项所述的数据的分布式存储方法,例如,所述处理器配置为经由执行所述计算机程序来执行以下步骤:A fourth aspect of the embodiments of the present application provides an electronic device, including: a processor and a memory, the memory stores a computer program that can be run on the processor, wherein the processor is configured to execute The computer program executes the distributed storage method of any one of the foregoing data. For example, the processor is configured to execute the following steps by executing the computer program:
发送广播请求包,所述广播请求包中包含请求描述信息;从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常;和/或,Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
接收请求端发送的广播请求包,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;如果接收到所述请求端发送的数据块,存储所述数据块,所述数据块是所述请求端对目标数据分割得到的全部数据块。Receive the broadcast request packet sent by the requester, extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
在本申请实施例中,能够及时确定目标提供端对目标数据的存储是否存在异常,最终实现提高数据在分布式存储中的稳定性的有益效果。In the embodiments of the present application, it can be determined in time whether there is an abnormality in the storage of the target data by the target provider, and finally the beneficial effect of improving the stability of the data in the distributed storage is realized.
附图说明Description of the drawings
图1是本申请实施例提供的数据的分布式存储方法的实现流程图;FIG. 1 is an implementation flowchart of a method for distributed storage of data provided by an embodiment of the present application;
图2是本申请实施例提供的数据的分布式存储方法S103的具体实现流程图;FIG. 2 is a specific implementation flowchart of the distributed storage method S103 of data provided by an embodiment of the present application;
图3是本申请实施例提供的数据的分布式存储方法S104的具体实现流程图;FIG. 3 is a specific implementation flowchart of the distributed storage method S104 of data provided by an embodiment of the present application;
图4是本申请实施例提供的数据的分布式存储系统的系统交互图。Fig. 4 is a system interaction diagram of a distributed storage system for data provided by an embodiment of the present application.
具体实施方式detailed description
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便理解本申请实施例。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed to facilitate understanding of the embodiments of the present application.
本申请的技术方案可应用于区块链或大数据技术领域,例如本申请的技术方案可通过区块链分布式存储实现。The technical solution of the present application can be applied to the field of blockchain or big data technology. For example, the technical solution of the present application can be implemented by blockchain distributed storage.
图1示出了本申请实施例提供的数据的分布式存储方法的实现流程,该方法流程包括步骤S101至S106。各步骤的具体实现原理如下。Fig. 1 shows an implementation process of a method for distributed storage of data provided by an embodiment of the present application. The process of the method includes steps S101 to S106. The specific implementation principle of each step is as follows.
在S101中,请求端发送广播请求包,所述广播请求包中包含请求描述信息。In S101, the requesting end sends a broadcast request packet, and the broadcast request packet contains request description information.
在本申请实施例中,一个分布式存储系统中包含多个服务器,任何服务器均可以作为请求端或存储提供端,显然,当一个服务器需要将自己本地存储的部分数据(目标数据)在其他的服务器进行存储时,分布式存储系统中的各个服务器就自然的分为了请求端和存储提供端。In the embodiment of this application, a distributed storage system contains multiple servers, and any server can be used as a requester or storage provider. Obviously, when a server needs to store part of its data (target data) locally in other When the server performs storage, each server in the distributed storage system is naturally divided into a requester and a storage provider.
请求端在确定了自己希望在异地存储的目标数据后,需要向分布式存储系统所在的局域网中发送广播请求包,显然,在分布式存储系统中的任何其他服务器(即存储提供端)均可以接收到该广播请求包。After the requester determines the target data that it wants to store in a different place, it needs to send a broadcast request packet to the local area network where the distributed storage system is located. Obviously, any other server (ie, storage provider) in the distributed storage system can The broadcast request packet is received.
值得注意地,广播请求包中包含有请求描述信息,请求描述信息主要用于描述两方面 的信息,一方面为请求端的信息,包括:请求端的位置、级别、服务器类型等信息;另一方面为需要被异地存储的目标数据的信息,包括:目标数据的大小、目标数据的重要性、目标数据的类别以及生成时间等。显然,请求端通过广播请求描述信息,可以供存储提供端判断自己是否适合存入请求端希望异地存储的目标数据。It is worth noting that the broadcast request packet contains request description information. The request description information is mainly used to describe two aspects of information. On the one hand, it is the information of the requester, including: the location, level, and server type of the requester; on the other hand, it is The information of the target data that needs to be stored offsite includes: the size of the target data, the importance of the target data, the type of the target data, and the generation time, etc. Obviously, the requesting side broadcasts the request description information, which can be used by the storage provider to determine whether it is suitable for storing the target data that the requesting side wants to store offsite.
在S102中,存储提供端在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文。In S102, after receiving the broadcast request packet, the storage provider extracts the request description information, and based on the request description information, determines whether to provide storage services for the requester, and if it is determined to be the requester If storage service is provided, a reply message is returned to the requesting end.
可以理解地,在实际的分布式存储中,不同的服务器对不同类型的数据的存储能力和安全性的保障程度是不同的,在本申请实施例中,每个存储提供端为了提高数据存储的安全性和完整性,需要首先根据接收到的广播请求包中的请求描述信息判断自己是否适合为请求端提供存储服务。Understandably, in actual distributed storage, different servers have different levels of storage capacity and security for different types of data. In this embodiment of the present application, each storage provider is designed to improve data storage capacity. For security and integrity, it is necessary to first judge whether it is suitable to provide storage services for the requester based on the request description information in the received broadcast request packet.
可选地,由于一个存储提供端对是否为请求端提供存储服务的判断实际上是一个二分类的问题,所以本申请实施例可以通过预设的支持向量机算法对请求描述信息进行分析,具体包括如下两个步骤:Optionally, since a storage provider's judgment on whether to provide storage services for a requester is actually a two-class problem, the embodiment of the present application can analyze the request description information through a preset support vector machine algorithm. It includes the following two steps:
步骤一:将所述请求描述信息转换为描述矩阵,并通过主成分分析算法对所述描述矩阵进行降维处理,生成目标特征矩阵。Step 1: Convert the requested description information into a description matrix, and perform dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix.
在本申请实施例中,根据预设的数据类型和矩阵位置的对应关系,将请求描述信息中各个数据类型对应的数据值存入相应的矩阵位置中,从而生成描述矩阵。显然,描述矩阵可以用于表征请求描述信息,便于代入后续的公式计算中。随后,为了减少后续分类计算的数据量,本申请实施例通过现有的主成分分析算法对描述信息进行降维处理,由于主成分分析算法为现有技术,因此不在此进行详述。In the embodiment of the present application, according to the preset corresponding relationship between the data type and the matrix position, the data value corresponding to each data type in the request description information is stored in the corresponding matrix position, thereby generating the description matrix. Obviously, the description matrix can be used to characterize the request description information, which is convenient to be substituted into the subsequent formula calculation. Subsequently, in order to reduce the amount of data for subsequent classification calculations, the embodiment of the present application uses the existing principal component analysis algorithm to perform dimensionality reduction processing on the description information. Since the principal component analysis algorithm is an existing technology, it will not be described in detail here.
步骤二:根据预先训练出的支持向量机模型对所述目标特征矩阵进行分类,并基于所述目标特征矩阵的类别,确定是否为所述请求端提供存储服务。Step 2: Classify the target feature matrix according to the pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.
具体地,存储提供端需提前根据训练数据训练出支持向量机模型中的分类超平面,具体方法为:存储提供端收集本地存储的正向集合合负向集合,其中,正向集合和负向集合中均包含多个训练矩阵,正向集合中的训练矩阵表征的是该存储提供端可以处理的待存储数据的请求描述信息,负向集合中的训练矩阵表征的是该存储提供端不可以存储的数据的请求描述信息。通过高斯核函数:
Figure PCTCN2020092810-appb-000001
将各个训练矩阵转化为高维特征空间,生成各个训练矩阵对应的训练参数,其中P i表示训练矩阵i的训练参数,X i表示训练矩阵i,X’表示正向集合以及负向集合中全部训练矩阵的平均矩阵,所述平均矩阵中每个元素的元素值为所有训练矩阵相应元素位置的元素值的平均值,δ i表示X i与X’的协方差矩阵。通过现有的支持向量机算法根据正向集合中训练矩阵的训练参数以及负向集合中训练矩阵的训练参数确定出支持向量机模型的分类超平面。
Specifically, the storage provider needs to train the classification hyperplane in the support vector machine model in advance based on the training data. The specific method is: the storage provider collects the positive set and the negative set stored locally, where the positive set and the negative set The set contains multiple training matrices. The training matrix in the positive set represents the request description information of the data to be stored that the storage provider can process, and the training matrix in the negative set represents the storage provider’s inability Request description information of the stored data. Through the Gaussian kernel function:
Figure PCTCN2020092810-appb-000001
The respective training matrix into high dimensional feature space, each training matrix generation parameter corresponding to training, where P i represents the training parameter of the training matrix i, X-i represents the training matrix i, X 'represents a set of positive and negative to the collection of all the average training matrix of the matrix, the matrix elements of each element of the average value of the average values of all elements of the training matrix element positions corresponding, δ i denotes X i and X 'covariance matrix. The existing support vector machine algorithm determines the classification hyperplane of the support vector machine model according to the training parameters of the training matrix in the positive set and the training parameters of the training matrix in the negative set.
可以理解地,通过调用预先训练出的支持向量机模型,可以判断出目标特征矩阵位于分类超平面的哪一边,进而确定目标特征矩阵的类别。由于预先已经确定了类别与是否提供存储服务的对应关系,所以存储提供端最终可以通过上述方法确定出是否为请求端提供存储服务。显然,存储提供端会根据请求描述信息判断自己是否适宜存储目标数据,从而从存储提供端一侧避免向自己不适合存储的目标数据的请求端返回答复报文,这在一定程度上可以降低未来目标数据出现丢失和被破坏的可能性。Understandably, by calling the pre-trained support vector machine model, it can be determined which side of the classification hyperplane the target feature matrix is located, and then the type of the target feature matrix can be determined. Since the corresponding relationship between the category and whether to provide storage services has been determined in advance, the storage provider can finally determine whether to provide storage services for the requester through the above-mentioned method. Obviously, the storage provider will judge whether it is suitable to store the target data based on the request description information, so as to avoid returning reply messages to the requesting end that is not suitable for storing target data from the storage provider side, which can reduce the future to a certain extent. The target data may be lost or destroyed.
在S103中,所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端。In S103, the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider.
可以理解地,通过上一个步骤,分布式存储系统中的各个存储提供端(除请求端以外的服务器)都会对接收到的广播请求包做一个判断,有一些存储提供端基于S102的判断后 会向请求端返回答复报文,另一些存储提供端不会向请求端返回答复报文。在本申请实施例中,由于请求端对一个目标数据只会发送至一个存储提供端进行存储,所以在此需要对向其返回答复报文的存储提供端进行筛选,最终从这些存储提供端中选择一个可靠性最高的作为目标提供端。Understandably, through the previous step, each storage provider (servers other than the requester) in the distributed storage system will make a judgment on the received broadcast request packet. Some storage providers will make a judgment based on the judgment of S102. Return a reply message to the requesting end, and other storage providers will not return a reply message to the requesting end. In the embodiment of this application, since the requesting end will only send a target data to one storage provider for storage, it is necessary to filter the storage providers that return reply messages to it, and finally select from these storage providers. Choose one with the highest reliability as the target provider.
在本申请实施例中,通过存储提供端的存储剩余容量以及返回答复报文的时间,对存储提供端的可靠性进行判断。作为本申请的一个实施例,如图2所示,上述S103包括:In the embodiment of the present application, the reliability of the storage provider is judged based on the remaining storage capacity of the storage provider and the time for returning the reply message. As an embodiment of the present application, as shown in FIG. 2, the foregoing S103 includes:
S1031,从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系。S1031: Extract the remaining storage capacity from the reply message returned by each storage provider within a preset time period, and establish a corresponding relationship between the storage provider and the remaining storage capacity.
在本申请实施例中,每个存储提供端在生成答复报文时都会将自己的存储剩余容量添加进去,所以请求端可以知晓各个向其返回答复报文的存储提供端的存储剩余容量。In the embodiment of the present application, each storage provider will add its own remaining storage capacity when generating a reply message, so the requesting end can know the remaining storage capacity of each storage provider that returns reply messages to it.
S1032,计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系。S1032: Calculate the time difference between the time when the requesting terminal receives the reply message returned by each storage provider and the time when the requesting terminal sends the broadcast request packet, and establish a correspondence between the storage provider and the time difference.
显然,请求端在步骤S101中发送广播请求包时会记录一次发送时间,在收到各个答复报文后会记录另一个时间,从而请求端计算出各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差。Obviously, the requesting end will record the sending time once when sending the broadcast request packet in step S101, and record another time after receiving each reply message, so that the requesting end calculates the value of the reply message returned by each storage provider. The time difference between the time and the time when the requesting end sends the broadcast request packet.
S1033,通过系数公式计算各个所述存储提供端的存储系数。S1033: Calculate storage coefficients of each of the storage providers by using coefficient formulas.
可选地,所述系数公式为:
Figure PCTCN2020092810-appb-000002
其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差。
Optionally, the coefficient formula is:
Figure PCTCN2020092810-appb-000002
Wherein, P i stored in the storage provided to the coefficient input i, Cap i is the remaining capacity of the memory storing the corresponding supply terminal i, Time i to the supply terminal of the storage time corresponding to the difference i.
在本申请实施例中,设定存储系数与可靠性成正比,由于存储剩余容量越高,意味着存储提供端有更多的空间对目标数据进行存储,存储可靠性越高,所以存储系数也越高。另一方面,如果接收到一个存储提供端返回的答复报文的时间与发送广播请求包的时间的时间差越大,一般情况下代表着该存储提供端的运行负荷更大,或该存储提供端与请求端的路由表距离较大,这可能会由于数据的长距离传输导致数据丢失或被盗取的可能性增大,所以存储可靠性会降低,存储系数也相应降低。In the embodiment of this application, the storage coefficient is set to be proportional to the reliability. Because the higher the remaining storage capacity, it means that the storage provider has more space to store the target data. The higher the storage reliability, the storage coefficient is also Higher. On the other hand, if the time difference between the time of receiving a reply message returned by a storage provider and the time of sending a broadcast request packet is greater, it generally means that the storage provider’s operating load is greater, or the storage provider’s The routing table of the requesting end has a large distance, which may increase the possibility of data loss or theft due to long-distance data transmission, so the storage reliability will be reduced, and the storage coefficient will be reduced accordingly.
S1034,将所述存储系数最大的所述存储提供端作为所述目标提供端。S1034. Use the storage provider with the largest storage coefficient as the target provider.
可以理解地,本申请实施例从请求端一侧再一次对未来目标数据存储的服务器进行一次判断和筛选,进一步提高未来数据存储的可靠性。Understandably, the embodiment of the present application once again judges and screens the server for future target data storage from the side of the requesting end, and further improves the reliability of future data storage.
在S104中,所述请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端。In S104, the requesting end divides the target data into multiple data blocks, calculates a summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and converts all the data blocks The data block is sent to the target provider.
值得注意地,由于本申请实施例为了保证及时发现存储过程中发生的数据异常问题,在后续存在一个定时校验的步骤,但是如果将目标数据进行整体校验显然会产生大量的计算负担,当请求端有多个目标数据进行分布式的异地存储时,请求端显然难以实现对所有目标数据进行整体校验。因此本申请实施例需要将一个目标数据分割为多个数据块,并在后续的校验过程中,对部分数据块进行抽查。此外,为了进一步地减小校验时的数据量,本申请实施例在本步骤中计算各个数据块的摘要,以生成并存储所述数据块的标识与所述摘要的对应关系,从而实现在后续的计算过程中通过摘要对各个数据块进行校验的目的。It is worth noting that, in order to ensure timely detection of data abnormalities that occur in the storage process, the embodiment of this application has a timing verification step in the follow-up. However, if the target data is checked as a whole, it will obviously cause a lot of computational burden. When the requesting end has multiple target data for distributed storage in different places, it is obviously difficult for the requesting end to realize the overall verification of all target data. Therefore, in the embodiment of the present application, it is necessary to divide a target data into multiple data blocks, and perform spot checks on some data blocks in the subsequent verification process. In addition, in order to further reduce the amount of data during verification, the embodiment of the present application calculates the summary of each data block in this step to generate and store the corresponding relationship between the identifier of the data block and the summary, thereby achieving The purpose of verifying each data block through the digest in the subsequent calculation process.
作为本申请的一个实施例,如图3所示,上述S104包括:As an embodiment of the present application, as shown in FIG. 3, the foregoing S104 includes:
S1041,将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号。S1041: Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups.
在本申请实施例中,所述M为大于1的整数,所述N为大于M的整数。In the embodiment of the present application, the M is an integer greater than 1, and the N is an integer greater than M.
可选地,每个特征组都是由若干个二进制的字符串组成的。可选地,将M个特征组扩展为N个特征组的扩展规则可以为:假设特征组的编号为t,扩展前每个特征组用X t表示, 扩展后的每个特征组用Y t表示,当t≤M时,Y t=X t;当t>M时,
Figure PCTCN2020092810-appb-000003
Figure PCTCN2020092810-appb-000004
其中,
Figure PCTCN2020092810-appb-000005
为“异或”运算符。
Optionally, each feature group is composed of several binary character strings. Optionally, the expansion rule for expanding M feature groups into N feature groups may be: assuming that the number of the feature group is t, each feature group before expansion is represented by X t , and each feature group after expansion is represented by Y t Indicates that when t≤M, Y t =X t ; when t>M,
Figure PCTCN2020092810-appb-000003
Figure PCTCN2020092810-appb-000004
among them,
Figure PCTCN2020092810-appb-000005
It is the "exclusive OR" operator.
值得注意地,上述扩展规则仅为一个示例,其他的基于原特征组的扩展规则都可以用于扩展特征子组。It is worth noting that the aforementioned expansion rule is only an example, and other expansion rules based on the original feature group can be used to expand the feature subgroup.
S1042,为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号。S1042: Assign an initial summary to the data block, divide the initial summary into L initial summary groups according to the arrangement order, and respectively number the initial summary groups.
在本申请实施例中所述L为大于1的整数。In the embodiment of the present application, L is an integer greater than 1.
示例性地,假设L为5,则将一个预设的初始摘要分为5个初始摘要组,并为各个初始摘要组编号,生成第一初始摘要组、第二初始摘要组、第三初始摘要组、第四初始摘要组以及第五初始摘要组。可以理解地,可以根据实际情况调整初始摘要组的数量。Illustratively, assuming that L is 5, a preset initial abstract is divided into 5 initial abstract groups, and each initial abstract group is numbered to generate a first initial abstract group, a second initial abstract group, and a third initial abstract Group, fourth initial summary group, and fifth initial summary group. Understandably, the number of initial summary groups can be adjusted according to actual conditions.
S1043,设置所述L个缓存组,并分别为所述缓存组编号。S1043: Set the L cache groups, and respectively number the cache groups.
在本申请实施例中,由于后续要进行赋值和移位计算,因此需要设置与初始摘要组数量相同的缓存组。In the embodiment of the present application, since assignment and shift calculations are to be performed later, it is necessary to set the same number of cache groups as the initial digest groups.
示例性地,若初始摘要组的数量是5个,则设置5个缓存组,并为各个缓存组编号,生成第一缓存组、第二缓存组、第三缓存组、第四缓存组以及第五缓存组。Exemplarily, if the number of initial digest groups is 5, then 5 cache groups are set, and each cache group is numbered to generate the first, second, third, fourth, and fourth cache groups. Five cache groups.
S1044,将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同。S1044. Store the data of each of the initial digest groups into the cache groups with the same number respectively, and perform N rounds of shift assignment calculations cyclically. The shift assignment calculation includes: After entering the next cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group, and the target feature The group number is the same as the current round of shift assignment calculation.
示例性地,假设N为8,则重复执行8次移位赋值计算,第一次移位赋值计算的输入数据为上述的初始摘要组。例如:初始摘要被分为5个初始摘要组,分别为:第一初始摘要组:1001010,;第二初始摘要组:1000111;第三初始摘要组:1010101;第四初始摘要组:1000001;第五初始摘要组:1000000,并将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,所以在第一轮移位赋值计算开始时,第一缓存组:1001010,;第二缓存组:1000111;第三缓存组:1010101;第四缓存组:1000001;第五缓存组:1000000。随着执行过一次移位赋值计算,在第一轮移位赋值计算结束时,第二缓存组:1001010;第三缓存组:1001111;第四缓存组:1010101;第五缓存组:1000001,而由于更新后的第一缓存组(即当前的首个缓存组)为更新前的首个缓存组的数据加上目标特征组的数据,而目标特征组的编号与当前所述移位赋值计算的轮次相同,所以由于当前轮次为1,所以更新后的第一缓存组的数据为更新前的第一缓存组的数据加上第一特征组的数据,假设第一特征组的数据为1110,则更新后的第一缓存组的数据为:1001010+1110=0101100。(相应地,如果当前轮次为5,则更新后的第一缓存组的数据为更新前的第一缓存组的数据加上第五特征组的数据)Exemplarily, assuming that N is 8, the shift assignment calculation is repeated 8 times, and the input data of the first shift assignment calculation is the aforementioned initial digest group. For example: the initial abstract is divided into 5 initial abstract groups, namely: the first initial abstract group: 1001010; the second initial abstract group: 1000111; the third initial abstract group: 1010101; the fourth initial abstract group: 1000001; Five initial summary groups: 1000000, and the data of each of the initial summary groups are respectively stored in the cache groups with the same number, so at the beginning of the first round of shift assignment calculation, the first cache group: 1001010; The second cache group: 1000111; the third cache group: 1010101; the fourth cache group: 1000001; the fifth cache group: 1000000. After performing a shift assignment calculation, at the end of the first round of shift assignment calculation, the second buffer group: 1001010; the third buffer group: 1001111; the fourth buffer group: 1010101; the fifth buffer group: 1000001, and Because the updated first cache group (that is, the current first cache group) is the data of the first cache group before the update plus the data of the target feature group, and the number of the target feature group is calculated from the current shift assignment calculation The rounds are the same, so since the current round is 1, the updated data of the first cache group is the data of the first cache group before the update plus the data of the first feature group, assuming that the data of the first feature group is 1110 , The updated data of the first cache group is: 1001010+1110=0101100. (Correspondingly, if the current round is 5, the updated data of the first cache group is the data of the first cache group before the update plus the data of the fifth feature group)
显然,将第一轮移位赋值计算结束后的各个缓存组的数据,作为下一轮移位赋值计算开始时输入的相应的缓存组的数据。Obviously, the data of each cache group after the first round of shift assignment calculation is completed is used as the data of the corresponding cache group input at the beginning of the next round of shift assignment calculation.
S1045,将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。S1045: Combine data in each cache group after N rounds of shift assignment calculations, as a summary of the data block.
如上文示例所述,如果N为8,则在进行了8轮移位赋值计算后,得到该数据块的摘要。在本申请实施例中,由于每一个数据块都有其对应的标识,该标识用于使服务器搜索和确定数据块。所以通过上述的步骤,可以建立起各个数据块的标识与所述摘要的对应关系。随后,将各个数据块及其对应的标识一起发送至目标提供端。As mentioned in the above example, if N is 8, after 8 rounds of shift assignment calculations are performed, the summary of the data block is obtained. In the embodiment of the present application, since each data block has its corresponding identification, the identification is used to enable the server to search for and determine the data block. Therefore, through the above steps, the corresponding relationship between the identifier of each data block and the digest can be established. Subsequently, each data block and its corresponding identification are sent to the target provider.
在S105中,所述目标提供端接收并存储由所述请求端发送的各个数据块。In S105, the target provider receives and stores each data block sent by the requester.
在S106中,所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。In S106, the requesting end verifies the digest of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
具体地,所述目标提供端存储的一个以上的数据块的摘要进行验证,包括:Specifically, the verification of the digest of more than one data block stored by the target provider includes:
所述请求端每隔预设时间间隔生成一个挑战报文,所述挑战报文包含一个以上的所述数据块的标识,用于指明抽查的数据块;所述请求端在接收到由所述目标提供端根据所述挑战报文返回的所述数据块的摘要后,根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端返回的所述数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one data block identifiers used to indicate the data block to be randomly checked; After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider returns the data block The summary is verified to determine whether the storage of the target data by the target provider is abnormal.
显然,通过上述的对数据块的标识对应的摘要进行抽查,可以在使用较少计算资源的前提下,及时校对目标数据是否出现存储异常。Obviously, by spot-checking the abstract corresponding to the identifier of the data block as described above, it is possible to check whether the target data has a storage abnormality in time under the premise of using less computing resources.
在本申请实施例中,请求端通过发送广播包,通知局域网中所有的存储提供端自己需要异地存储目标数据;存储提供端在接收到广播请求后,根据其中包含的请求描述信息判断自己是否适宜存储目标数据,有选择地返回答复报文,以避免未来出现数据丢失的情况;请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端,以从请求端的角度筛选出一个可靠性最高的存储提供端,减少未来出现数据异常的概率;请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,从而为后续的抽查验证过程提供数据支持;请求端将全部的数据块发送至所述目标提供端,目标提供端接收并存储由所述请求端发送的各个数据块;请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对目标提供端存储的一个以上的数据块的摘要进行验证,从而及时确定所述目标提供端对所述目标数据的存储是否存在异常,最终实现提高数据在分布式存储中的稳定性的有益效果。In the embodiment of this application, the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store target data offsite; after receiving the broadcast request, the storage provider judges whether it is appropriate based on the request description information contained in it. Store the target data, and selectively return reply messages to avoid data loss in the future; the requester selects one of the storage providers that return reply messages to it within a preset time period as the target provider, so that From the perspective, a storage provider with the highest reliability is selected to reduce the probability of data abnormalities in the future; the requester divides the target data into multiple data blocks, calculates the summary of each data block, and generates and stores the data block Identifies the correspondence relationship with the summary, so as to provide data support for the subsequent spot check verification process; the requester sends all data blocks to the target provider, and the target provider receives and stores each data sent by the requester Block; the requester verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, so as to determine the target in time Whether there is an abnormality in the storage of the target data by the providing end, and finally achieve the beneficial effect of improving the stability of the data in the distributed storage.
对应于上文实施例所述的数据的分布式存储方法,图4示出了本申请实施例提供的数据的分布式存储系统的系统交互图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the distributed storage method of data described in the above embodiment, FIG. 4 shows a system interaction diagram of the distributed storage system for data provided in an embodiment of this application. For ease of description, only the implementation of this application is shown. Example related parts.
参照图4,该系统包括:请求端401以及存储提供端402;4, the system includes: a request end 401 and a storage provider 402;
所述请求端,用于发送广播请求包,所述广播请求包中包含请求描述信息;The requesting end is configured to send a broadcast request packet, and the broadcast request packet contains request description information;
所述存储提供端,用于在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;The storage provider is configured to extract the request description information after receiving the broadcast request packet, and determine based on the request description information whether to provide storage services for the requester, if it is determined that the request is If the end provides storage services, it returns a reply message to the requesting end;
所述请求端,还用于从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;The requesting end is also used to select one of the storage providers that return reply messages to within a preset time period as the target provider;
所述请求端,还用于将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;The request end is also used to divide the target data into multiple data blocks, calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and combine all the data Sending the data block to the target provider;
所述目标提供端,还用于接收并存储由所述请求端发送的各个数据块;The target provider is also used to receive and store each data block sent by the requester;
所述请求端,还用于每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The requesting end is further configured to verify the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
可选地,所述基于所述请求描述信息判断是否为所述请求端提供存储服务,包括:将所述请求描述信息转换为描述矩阵,并通过主成分分析算法对所述描述矩阵进行降维处理,生成目标特征矩阵;根据预先训练出的支持向量机模型对所述目标特征矩阵进行分类,并基于所述目标特征矩阵的类别,确定是否为所述请求端提供存储服务。Optionally, the judging whether to provide storage services for the requester based on the request description information includes: converting the request description information into a description matrix, and reducing the dimensionality of the description matrix through a principal component analysis algorithm Process to generate a target feature matrix; classify the target feature matrix according to a pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.
可选地,所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端,包括:从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系;计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系;通过公式:
Figure PCTCN2020092810-appb-000006
计算各个 所述存储提供端的存储系数,其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差;将所述存储系数最大的所述存储提供端作为所述目标提供端。
Optionally, the requesting end selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, including: the reply message returned from each storage provider within the preset time period The remaining capacity of the storage is extracted from the, and the corresponding relationship between the storage provider and the remaining capacity of the storage is established; the time when the requesting terminal receives the reply message returned by each storage provider is calculated and the time that the requesting terminal sends The time difference between the time of broadcasting the request packet, and the corresponding relationship between the storage provider and the time difference is established; by formula:
Figure PCTCN2020092810-appb-000006
Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference; the storage provider with the largest storage coefficient is used as the target provider.
可选地,所述计算各个所述数据块的摘要,包括:Optionally, the calculating the digest of each of the data blocks includes:
将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号,所述M为大于1的整数,所述N为大于M的整数;为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号,所述L为大于1的整数;设置所述L个缓存组,并分别为所述缓存组编号;将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同;将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M; an initial summary is assigned to the data block, the initial summary is divided into L initial summary groups according to the arrangement order, and the initial summary group numbers are respectively, and the L is greater than An integer of 1; set the L cache groups, and respectively be the cache group numbers; store the data of each of the initial digest groups into the cache groups with the same number, and perform N rounds of shift assignment calculations cyclically The shift assignment calculation includes: after shifting the data of each cache group into the next cache group, adding the data of the current first cache group to the data of the target feature group as the new first The data of each cache group is used to update the first cache group, and the number of the target feature group is the same as the current round of shift assignment calculation; the data in each cache group after N rounds of shift assignment calculation Combination as a summary of the data block.
可选地,所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,包括:所述请求端每隔预设时间间隔生成一个挑战报文,所述挑战报文包含一个以上的所述数据块的标识,用于指明抽查的数据块;所述请求端在接收到由所述目标提供端根据所述挑战报文返回的所述数据块的摘要后,根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端返回的所述数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。Optionally, the requesting end verifies the digests of more than one data block stored by the target provider according to a pre-stored correspondence between the identifier of the data block and the digest every preset time interval, Including: the requesting end generates a challenge message every preset time interval, the challenge message includes more than one identifier of the data block, and is used to indicate the data block to be randomly checked; After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider responds to the data returned by the target provider. The block digest is verified to determine whether the storage of the target data by the target provider is abnormal.
可以理解地,在本申请实施例中,请求端通过发送广播包,通知局域网中所有的存储提供端自己需要异地存储目标数据;存储提供端在接收到广播请求后,根据其中包含的请求描述信息判断自己是否适宜存储目标数据,有选择地返回答复报文,以避免未来出现数据丢失的情况;请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端,以从请求端的角度筛选出一个可靠性最高的存储提供端,减少未来出现数据异常的概率;请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,从而为后续的抽查验证过程提供数据支持;请求端将全部的数据块发送至所述目标提供端,目标提供端接收并存储由所述请求端发送的各个数据块;请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对目标提供端存储的一个以上的数据块的摘要进行验证,从而及时确定所述目标提供端对所述目标数据的存储是否存在异常,最终实现提高数据在分布式存储中的稳定性的有益效果。Understandably, in this embodiment of the application, the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store the target data offsite; the storage provider receives the broadcast request, according to the request description information contained therein Judge whether it is suitable to store the target data, and selectively return reply messages to avoid data loss in the future; the requester chooses one of the storage providers that return reply messages to it within a preset time period as the target provider, From the perspective of the requester, a storage provider with the highest reliability is selected to reduce the probability of data anomalies in the future; the requester divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the data The corresponding relationship between the identifier of the data block and the summary, so as to provide data support for the subsequent spot check verification process; the requesting end sends all the data blocks to the target provider, and the target provider receives and stores the data from the requester Each data block sent; the requester verifies the summary of more than one data block stored by the target provider according to the correspondence between the identifier of the data block stored in advance and the digest at a preset time interval, so as to be timely It is determined whether the storage of the target data by the target provider is abnormal, and the beneficial effect of improving the stability of the data in distributed storage is finally realized.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部的单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
本申请实施例还提供了一种数据的分布式存储装置,该装置可包括用于执行上述方法中请求端和/或存储提供端执行的步骤的模块/单元。The embodiment of the present application also provides a distributed storage device for data, which may include a module/unit for executing the steps performed by the requesting end and/or the storage provider in the foregoing method.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部的或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中。可选的,该计算机可读存储介质可以是非易失性的存储介质,也可以是易失性的存储介质。If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. Optionally, the computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种数据的分布式存储方法,其中,包括:A distributed storage method for data, which includes:
    请求端发送广播请求包,所述广播请求包中包含请求描述信息;The requesting end sends a broadcast request packet, and the broadcast request packet contains request description information;
    存储提供端在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;After receiving the broadcast request packet, the storage provider extracts the request description information, and determines based on the request description information whether to provide storage services for the requesting side, if it is determined that the requesting side provides storage services, Then return a reply message to the requesting end;
    所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;The requesting end selects one of the storage providers that return reply messages to it within a preset time period as the target provider;
    所述请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;The requesting end divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and sends all the data blocks to The target provider;
    所述目标提供端接收并存储由所述请求端发送的各个数据块;The target provider receives and stores each data block sent by the requester;
    所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The requesting end verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval to determine the Whether the storage of the target data by the target provider is abnormal.
  2. 如权利要求1所述的数据的分布式存储方法,其中,所述基于所述请求描述信息判断是否为所述请求端提供存储服务,包括:The method for distributed storage of data according to claim 1, wherein said determining whether to provide storage service for said requesting terminal based on said request description information comprises:
    将所述请求描述信息转换为描述矩阵,并通过主成分分析算法对所述描述矩阵进行降维处理,生成目标特征矩阵;Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;
    根据预先训练出的支持向量机模型对所述目标特征矩阵进行分类,并基于所述目标特征矩阵的类别,确定是否为所述请求端提供存储服务。The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
  3. 如权利要求1所述的数据的分布式存储方法,其中,所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端,包括:The method for distributed storage of data according to claim 1, wherein the requesting end selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, comprising:
    从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系;Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;
    计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系;Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;
    通过公式:
    Figure PCTCN2020092810-appb-100001
    计算各个所述存储提供端的存储系数,其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差;
    By formula:
    Figure PCTCN2020092810-appb-100001
    Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;
    将所述存储系数最大的所述存储提供端作为所述目标提供端。The storage provider with the largest storage coefficient is used as the target provider.
  4. 如权利要求1所述的数据的分布式存储方法,其中,所述计算各个所述数据块的摘要,包括:The method for distributed storage of data according to claim 1, wherein said calculating the summary of each of said data blocks comprises:
    将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号,所述M为大于1的整数,所述N为大于M的整数;Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;
    为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号,所述L为大于1的整数;Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;
    设置所述L个缓存组,并分别为所述缓存组编号;Set the L cache groups, and respectively be the cache group numbers;
    将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同;The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;
    将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
  5. 如权利要求1所述的数据的分布式存储方法,其中,所述请求端每隔预设时间间隔 根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,包括:The method for distributed storage of data according to claim 1, wherein the requesting terminal provides the target to the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval. The digest of more than one data block stored is verified, including:
    所述请求端每隔预设时间间隔生成一个挑战报文,所述挑战报文包含一个以上的所述数据块的标识,用于指明抽查的数据块;The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;
    所述请求端在接收到由所述目标提供端根据所述挑战报文返回的所述数据块的摘要后,根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端返回的所述数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
  6. 一种数据的分布式存储系统,其中,所述系统包括:请求端以及存储提供端;A distributed storage system for data, wherein the system includes: a request end and a storage provider end;
    所述请求端,用于发送广播请求包,所述广播请求包中包含请求描述信息;The requesting end is configured to send a broadcast request packet, and the broadcast request packet contains request description information;
    所述存储提供端,用于在接收到所述广播请求包后,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;The storage provider is configured to extract the request description information after receiving the broadcast request packet, and determine based on the request description information whether to provide storage services for the requester, if it is determined that the request is If the end provides storage services, it returns a reply message to the requesting end;
    所述请求端,还用于从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;The requesting end is also used to select one of the storage providers that return reply messages to within a preset time period as the target provider;
    所述请求端,还用于将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;The request end is also used to divide the target data into multiple data blocks, calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and combine all the data Sending the data block to the target provider;
    所述目标提供端,还用于接收并存储由所述请求端发送的各个数据块;The target provider is also used to receive and store each data block sent by the requester;
    所述请求端,还用于每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The requesting end is further configured to verify the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
  7. 如权利要求6所述的数据的分布式存储系统,其中,所述基于所述请求描述信息判断是否为所述请求端提供存储服务,包括:The data distributed storage system according to claim 6, wherein said determining whether to provide storage service for said requesting terminal based on said request description information comprises:
    将所述请求描述信息转换为描述矩阵,并通过主成分分析算法对所述描述矩阵进行降维处理,生成目标特征矩阵;Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;
    根据预先训练出的支持向量机模型对所述目标特征矩阵进行分类,并基于所述目标特征矩阵的类别,确定是否为所述请求端提供存储服务。The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
  8. 如权利要求6所述的数据的分布式存储系统,其中,所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端,包括:The data distributed storage system according to claim 6, wherein the requesting end selects one of the storage providers to which reply messages are returned within a preset time period as the target provider, comprising:
    从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系;Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;
    计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系;Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;
    通过公式:
    Figure PCTCN2020092810-appb-100002
    计算各个所述存储提供端的存储系数,其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差;
    By formula:
    Figure PCTCN2020092810-appb-100002
    Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;
    将所述存储系数最大的所述存储提供端作为所述目标提供端。The storage provider with the largest storage coefficient is used as the target provider.
  9. 如权利要求6所述的数据的分布式存储系统,其中,所述计算各个所述数据块的摘要,包括:7. The data distributed storage system according to claim 6, wherein said calculating the summary of each said data block comprises:
    将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号,所述M为大于1的整数,所述N为大于M的整数;Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;
    为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号,所述L为大于1的整数;Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;
    设置所述L个缓存组,并分别为所述缓存组编号;Set the L cache groups, and respectively be the cache group numbers;
    将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同;The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;
    将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
  10. 如权利要求6所述的数据的分布式存储系统,其中,所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,包括:The distributed storage system for data according to claim 6, wherein the requesting terminal provides the target provider to the target according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval. The digest of more than one data block stored is verified, including:
    所述请求端每隔预设时间间隔生成一个挑战报文,所述挑战报文包含一个以上的所述数据块的标识,用于指明抽查的数据块;The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;
    所述请求端在接收到由所述目标提供端根据所述挑战报文返回的所述数据块的摘要后,根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端返回的所述数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
  11. 一种数据的分布式存储方法,其中,包括:A distributed storage method for data, which includes:
    请求端发送广播请求包,所述广播请求包中包含请求描述信息;The requesting end sends a broadcast request packet, and the broadcast request packet contains request description information;
    所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;The requesting end selects one of the storage providers that return reply messages to it within a preset time period as the target provider;
    所述请求端将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;The requesting end divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and sends all the data blocks to The target provider;
    所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。The requesting end verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval to determine the Whether the storage of the target data by the target provider is abnormal.
  12. 一种数据的分布式存储方法,其中,包括:A distributed storage method for data, which includes:
    存储提供端接收请求端发送的广播请求包,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;The storage provider receives the broadcast request packet sent by the requester, extracts the request description information, and determines whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then Return a reply message to the requesting end;
    如果接收到所述请求端发送的数据块,所述存储提供端存储所述数据块,所述数据块是所述请求端对目标数据分割得到的全部数据块。If the data block sent by the requesting end is received, the storage provider stores the data block, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
  13. 一种电子设备,包括存储器以及处理器,所述存储器存储有可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:An electronic device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the following steps when the processor executes the computer program:
    发送广播请求包,所述广播请求包中包含请求描述信息;从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常;和/或,Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
    接收请求端发送的广播请求包,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求端返回答复报文;如果接收到所述请求端发送的数据块,存储所述数据块,所述数据块是所述请求端对目标数据分割得到的全部数据块。Receive the broadcast request packet sent by the requester, extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
  14. 如权利要求13所述的电子设备,其中,所述处理器执行所述基于所述请求描述信息判断是否为所述请求端提供存储服务时,具体实现以下步骤:The electronic device according to claim 13, wherein when said processor executes said determining whether to provide storage service for said requesting terminal based on said request description information, the following steps are specifically implemented:
    将所述请求描述信息转换为描述矩阵,并通过主成分分析算法对所述描述矩阵进行降 维处理,生成目标特征矩阵;Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;
    根据预先训练出的支持向量机模型对所述目标特征矩阵进行分类,并基于所述目标特征矩阵的类别,确定是否为所述请求端提供存储服务。The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
  15. 如权利要求13所述的电子设备,其中,所述处理器执行所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端时,具体实现以下步骤:The electronic device according to claim 13, wherein when the processor executes that the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, the following steps are specifically implemented:
    从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系;Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;
    计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系;Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;
    通过公式:
    Figure PCTCN2020092810-appb-100003
    计算各个所述存储提供端的存储系数,其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差;
    By formula:
    Figure PCTCN2020092810-appb-100003
    Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;
    将所述存储系数最大的所述存储提供端作为所述目标提供端。The storage provider with the largest storage coefficient is used as the target provider.
  16. 如权利要求13所述的电子设备,其中,所述处理器执行所述计算各个所述数据块的摘要时,具体实现以下步骤:The electronic device according to claim 13, wherein when the processor executes the calculation of the summary of each of the data blocks, the following steps are specifically implemented:
    将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号,所述M为大于1的整数,所述N为大于M的整数;Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;
    为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号,所述L为大于1的整数;Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;
    设置所述L个缓存组,并分别为所述缓存组编号;Set the L cache groups, and respectively be the cache group numbers;
    将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同;The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;
    将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
  17. 如权利要求13所述的电子设备,其中,所述处理器执行所述请求端每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证时,具体实现以下步骤:The electronic device according to claim 13, wherein the processor executes the requesting end to provide the target to the target according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval When verifying the digest of more than one data block stored at the end, the following steps are specifically implemented:
    所述请求端每隔预设时间间隔生成一个挑战报文,所述挑战报文包含一个以上的所述数据块的标识,用于指明抽查的数据块;The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;
    所述请求端在接收到由所述目标提供端根据所述挑战报文返回的所述数据块的摘要后,根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端返回的所述数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常。After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
  18. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:
    发送广播请求包,所述广播请求包中包含请求描述信息;从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端;将目标数据分割为多个数据块,并计算各个所述数据块的摘要,生成并存储所述数据块的标识与所述摘要的对应关系,将全部的所述数据块发送至所述目标提供端;每隔预设时间间隔根据预先存储的所述数据块的标识与所述摘要的对应关系,对所述目标提供端存储的一个以上的数据块的摘要进行验证,以确定所述目标提供端对所述目标数据的存储是否存在异常;和/或,Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,
    接收请求端发送的广播请求包,提取出所述请求描述信息,并基于所述请求描述信息判断是否为所述请求端提供存储服务,若判定为所述请求端提供存储服务,则向所述请求 端返回答复报文;如果接收到所述请求端发送的数据块,存储所述数据块,所述数据块是所述请求端对目标数据分割得到的全部数据块。Receive the broadcast request packet sent by the requesting end, extract the request description information, and determine whether to provide storage services for the requesting end based on the request description information, and if it is determined that the requesting end provides storage services, then The requesting end returns a reply message; if a data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述请求端从预设时间段内向其返回答复报文的存储提供端中选择一个作为目标提供端时,所述计算机程序被处理器执行以具体实现以下步骤:The computer-readable storage medium according to claim 18, wherein when the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, the computer program is executed by the processor Perform the following steps to specifically achieve:
    从预设时间段内各个存储提供端返回的所述答复报文中分别提取出存储剩余容量,建立所述存储提供端与所述存储剩余容量的对应关系;Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;
    计算所述请求端接收到各个存储提供端返回的所述答复报文的时间与所述请求端发送广播请求包的时间的时间差,建立所述存储提供端与所述时间差的对应关系;Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;
    通过公式:
    Figure PCTCN2020092810-appb-100004
    计算各个所述存储提供端的存储系数,其中,P i为存储提供端i的存储系数,Cap i为所述存储提供端i对应的存储剩余容量,Time i为所述存储提供端i对应的所述时间差;
    By formula:
    Figure PCTCN2020092810-appb-100004
    Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;
    将所述存储系数最大的所述存储提供端作为所述目标提供端。The storage provider with the largest storage coefficient is used as the target provider.
  20. 如权利要求18所述的计算机可读存储介质,其中,所述计算各个所述数据块的摘要时,所述计算机程序被处理器执行以具体实现以下步骤:18. The computer-readable storage medium according to claim 18, wherein, when the digest of each of the data blocks is calculated, the computer program is executed by a processor to specifically implement the following steps:
    将所述数据块分成M个特征组,根据预设的扩展规则将所述M个特征组扩展为N个特征组,并分别为所述N个特征组编号,所述M为大于1的整数,所述N为大于M的整数;Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;
    为所述数据块赋予初始摘要,将所述初始摘要按照排列顺序分为L个初始摘要组,并分别为所述初始摘要组编号,所述L为大于1的整数;Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;
    设置所述L个缓存组,并分别为所述缓存组编号;Set the L cache groups, and respectively be the cache group numbers;
    将各个所述初始摘要组的数据分别存入编号相同的所述缓存组中,循环执行N轮移位赋值计算,所述移位赋值计算包括:在将各个缓存组的数据移位存入后一个缓存组内后,将当前的首个缓存组的数据加上目标特征组的数据,作为新的所述首个缓存组的数据,以更新所述首个缓存组,所述目标特征组的编号与当前所述移位赋值计算的轮次相同;The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;
    将经过N轮移位赋值计算后的各个缓存组内数据组合,作为所述数据块的摘要。Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
PCT/CN2020/092810 2019-08-07 2020-05-28 Distributed data storage method and system WO2021022875A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910727287.0 2019-08-07
CN201910727287.0A CN110619019B (en) 2019-08-07 2019-08-07 Distributed storage method and system for data

Publications (1)

Publication Number Publication Date
WO2021022875A1 true WO2021022875A1 (en) 2021-02-11

Family

ID=68921578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092810 WO2021022875A1 (en) 2019-08-07 2020-05-28 Distributed data storage method and system

Country Status (2)

Country Link
CN (1) CN110619019B (en)
WO (1) WO2021022875A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014654A (en) * 2021-03-04 2021-06-22 阳光电源股份有限公司 Data storage method, charging pile and computer readable storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619019B (en) * 2019-08-07 2024-03-15 平安科技(深圳)有限公司 Distributed storage method and system for data
CN111208953A (en) * 2020-04-16 2020-05-29 广东睿江云计算股份有限公司 Distributed storage method and device
CN113783907A (en) * 2020-06-10 2021-12-10 神讯电脑(昆山)有限公司 Data acquisition device and file backup method thereof
CN116700632B (en) * 2023-08-07 2023-10-24 湖南中盈梦想商业保理有限公司 High-reliability financial information data storage method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101686262A (en) * 2009-05-14 2010-03-31 南京大学 Multi-node collaboration based storage method for sensor network
US20110126060A1 (en) * 2009-11-25 2011-05-26 Cleversafe, Inc. Large scale subscription based dispersed storage network
US20140282763A1 (en) * 2009-10-29 2014-09-18 Cleversafe, Inc. Distribution of unique copies of broadcast data utilizing fault-tolerant retrieval from dispersed storage
CN107219997A (en) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 A kind of method and device for being used to verify data consistency
CN109302495A (en) * 2018-11-20 2019-02-01 北京邮电大学 A kind of date storage method and device
CN110619019A (en) * 2019-08-07 2019-12-27 平安科技(深圳)有限公司 Distributed storage method and system of data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375872B (en) * 2018-09-27 2020-07-24 腾讯科技(深圳)有限公司 Data access request processing method, device and equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101686262A (en) * 2009-05-14 2010-03-31 南京大学 Multi-node collaboration based storage method for sensor network
US20140282763A1 (en) * 2009-10-29 2014-09-18 Cleversafe, Inc. Distribution of unique copies of broadcast data utilizing fault-tolerant retrieval from dispersed storage
US20110126060A1 (en) * 2009-11-25 2011-05-26 Cleversafe, Inc. Large scale subscription based dispersed storage network
CN107219997A (en) * 2016-03-21 2017-09-29 阿里巴巴集团控股有限公司 A kind of method and device for being used to verify data consistency
CN109302495A (en) * 2018-11-20 2019-02-01 北京邮电大学 A kind of date storage method and device
CN110619019A (en) * 2019-08-07 2019-12-27 平安科技(深圳)有限公司 Distributed storage method and system of data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014654A (en) * 2021-03-04 2021-06-22 阳光电源股份有限公司 Data storage method, charging pile and computer readable storage medium
CN113014654B (en) * 2021-03-04 2023-09-29 阳光电源股份有限公司 Data storage method, charging pile and computer readable storage medium

Also Published As

Publication number Publication date
CN110619019A (en) 2019-12-27
CN110619019B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
WO2021022875A1 (en) Distributed data storage method and system
CN101409706B (en) Method, system and relevant equipment for distributing data of edge network
CN110505228B (en) Edge cloud architecture-based big data processing method, system, medium and device
CN112860951B (en) Method and system for identifying target account
CN111629051A (en) Performance optimization method and device for industrial internet identification analysis system
CN112199412B (en) Payment bill processing method based on block chain and block chain bill processing system
CN111224831B (en) Method and system for generating call ticket
CN106713220A (en) DDOS-attack-based prevention method and device
CN116405929B (en) Secure access processing method and system suitable for cluster communication
CN105069074A (en) Strategy configuration file processing method, device and system
CN112995579A (en) Video stream distribution method and device, management server and video monitoring system
CN115567597A (en) Message request forwarding method and device of payment settlement system
CN115310137A (en) Secrecy method and related device of intelligent settlement system
CN111294553B (en) Method, device, equipment and storage medium for processing video monitoring service signaling
CN107707383B (en) Put-through processing method and device, first network element and second network element
CN113873001A (en) Load balancing optimization method based on HTTP request classification
KR102654479B1 (en) A system that detects and monitors the risk of tampering with request parameters by generating and executing verification queries through analysis of large amounts of user behavior data
CN109657447B (en) Equipment fingerprint generation method and device
CN113542439B (en) Distributed data storage access method and device
CN112118289B (en) Self-adaptive synchronization method and system for intelligent contract
CN116305220B (en) Big data-based resource data processing method and system
CN116708708B (en) Method and system for constructing paperless conference based on distribution
CN112506955B (en) Query processing method, computer equipment and storage medium
CN115061891A (en) System load capacity prediction method and device based on block chain
CN117499063A (en) Vulnerability detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20850213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20850213

Country of ref document: EP

Kind code of ref document: A1