WO2021022875A1

WO2021022875A1 - Distributed data storage method and system

Info

Publication number: WO2021022875A1
Application number: PCT/CN2020/092810
Authority: WO
Inventors: 郑映锋
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-08-07
Filing date: 2020-05-28
Publication date: 2021-02-11
Also published as: CN110619019A; CN110619019B

Abstract

Disclosed are a distributed data storage method and system, which are applicable to the technical field of data processing. The method comprises: a request end sending a broadcast request packet; a storage provision end extracting request description information in the broadcast request packet, determining, on the basis of the request description information, whether to provide a storage service for the request end, and returning a reply message to the request end if providing the storage service for the request end is determined; the request end selecting one of the storage provision ends returning reply messages thereto within a preset time period as a target provision end; the request end dividing target data into a plurality of data blocks, calculating the digest of each data block, generating a correlation between the identifier of the data block and the digest thereof, and sending all the data blocks to the target provision end; the target provision end storing the data blocks sent by the request end; and the request end verifying, at a preset time interval, the digests of one or more data blocks stored in the target provision end, so as to determine whether the storage of the target data by the target provision end is abnormal.

Description

Data distributed storage method and system

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 7, 2019, with the application number 201910727287.0 and the invention title of "Data Distributed Storage Method and System", the entire content of which is incorporated into this application by reference in.

Technical field

This application belongs to the field of data processing technology, and in particular relates to a method and system for distributed storage of data.

Background technique

At present, the inventor has discovered that in the process of distributed storage, it sometimes occurs: when the storage requester sends data to a storage provider for storage, the storage provider may delete part of the data due to equipment software and hardware problems or artificial maliciousness. As a result, the storage provider failed to store data completely and securely in accordance with the contract.

On the other hand, when a storage exception occurs on the storage provider, the storage requester is often unable to learn about the related abnormal situation in time, causing the users of the storage requesting side to suffer heavy losses.

Summary of the invention

In view of this, the embodiments of the present application provide a data distributed storage method and system to solve the problems of high storage failure rate and poor monitorability in distributed storage in the prior art.

The first aspect of the embodiments of the present application provides a distributed storage method of data, including: a requesting end sends a broadcast request packet, and the broadcast request packet includes request description information;

After receiving the broadcast request packet, the storage provider extracts the request description information, and determines based on the request description information whether to provide storage services for the requesting side, if it is determined that the requesting side provides storage services, A reply message is returned to the requesting end; the requesting end selects one of the storage providers that return reply messages to within a preset time period as the target provider; the requesting end divides the target data into multiple data Block, and calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; the target provider receives And store each data block sent by the requesting end; the requesting end stores a data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary at a preset time interval The above summary of the data block is verified to determine whether the storage of the target data by the target provider is abnormal.

A second aspect of the embodiments of the present application provides a distributed storage system for data, including: a requesting end and a storage providing end;

The request end is used to send a broadcast request packet, the broadcast request packet contains request description information; the storage provider is used to extract the request description information after receiving the broadcast request packet, and Based on the request description information, it is determined whether to provide storage services for the requesting end, and if it is determined that the requesting end provides storage services, a reply message is returned to the requesting end; the requesting end is also used to download from a preset Select one of the storage providers to which reply messages are returned within the time period as the target provider; the requester is also used to divide the target data into multiple data blocks, calculate the summary of each data block, and generate and The corresponding relationship between the identifier of the data block and the summary is stored, and all the data blocks are sent to the target provider; the target provider is also used to receive and store each of the data sent by the requester Data block; the requesting end is also used to summarize the one or more data blocks stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the summary every preset time interval Perform verification to determine whether the storage of the target data by the target provider is abnormal.

The third aspect of the embodiments of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the distributed storage method of any of the above data is realized, for example When the computer program is executed by the processor, the following steps are implemented:

Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,

Receive the broadcast request packet sent by the requester, extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.

A fourth aspect of the embodiments of the present application provides an electronic device, including: a processor and a memory, the memory stores a computer program that can be run on the processor, wherein the processor is configured to execute The computer program executes the distributed storage method of any one of the foregoing data. For example, the processor is configured to execute the following steps by executing the computer program:

In the embodiments of the present application, it can be determined in time whether there is an abnormality in the storage of the target data by the target provider, and finally the beneficial effect of improving the stability of the data in the distributed storage is realized.

Description of the drawings

FIG. 1 is an implementation flowchart of a method for distributed storage of data provided by an embodiment of the present application;

FIG. 2 is a specific implementation flowchart of the distributed storage method S103 of data provided by an embodiment of the present application;

FIG. 3 is a specific implementation flowchart of the distributed storage method S104 of data provided by an embodiment of the present application;

Fig. 4 is a system interaction diagram of a distributed storage system for data provided by an embodiment of the present application.

detailed description

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed to facilitate understanding of the embodiments of the present application.

The technical solution of the present application can be applied to the field of blockchain or big data technology. For example, the technical solution of the present application can be implemented by blockchain distributed storage.

Fig. 1 shows an implementation process of a method for distributed storage of data provided by an embodiment of the present application. The process of the method includes steps S101 to S106. The specific implementation principle of each step is as follows.

In S101, the requesting end sends a broadcast request packet, and the broadcast request packet contains request description information.

In the embodiment of this application, a distributed storage system contains multiple servers, and any server can be used as a requester or storage provider. Obviously, when a server needs to store part of its data (target data) locally in other When the server performs storage, each server in the distributed storage system is naturally divided into a requester and a storage provider.

After the requester determines the target data that it wants to store in a different place, it needs to send a broadcast request packet to the local area network where the distributed storage system is located. Obviously, any other server (ie, storage provider) in the distributed storage system can The broadcast request packet is received.

It is worth noting that the broadcast request packet contains request description information. The request description information is mainly used to describe two aspects of information. On the one hand, it is the information of the requester, including: the location, level, and server type of the requester; on the other hand, it is The information of the target data that needs to be stored offsite includes: the size of the target data, the importance of the target data, the type of the target data, and the generation time, etc. Obviously, the requesting side broadcasts the request description information, which can be used by the storage provider to determine whether it is suitable for storing the target data that the requesting side wants to store offsite.

In S102, after receiving the broadcast request packet, the storage provider extracts the request description information, and based on the request description information, determines whether to provide storage services for the requester, and if it is determined to be the requester If storage service is provided, a reply message is returned to the requesting end.

Understandably, in actual distributed storage, different servers have different levels of storage capacity and security for different types of data. In this embodiment of the present application, each storage provider is designed to improve data storage capacity. For security and integrity, it is necessary to first judge whether it is suitable to provide storage services for the requester based on the request description information in the received broadcast request packet.

Optionally, since a storage provider's judgment on whether to provide storage services for a requester is actually a two-class problem, the embodiment of the present application can analyze the request description information through a preset support vector machine algorithm. It includes the following two steps:

Step 1: Convert the requested description information into a description matrix, and perform dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix.

In the embodiment of the present application, according to the preset corresponding relationship between the data type and the matrix position, the data value corresponding to each data type in the request description information is stored in the corresponding matrix position, thereby generating the description matrix. Obviously, the description matrix can be used to characterize the request description information, which is convenient to be substituted into the subsequent formula calculation. Subsequently, in order to reduce the amount of data for subsequent classification calculations, the embodiment of the present application uses the existing principal component analysis algorithm to perform dimensionality reduction processing on the description information. Since the principal component analysis algorithm is an existing technology, it will not be described in detail here.

Step 2: Classify the target feature matrix according to the pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.

Specifically, the storage provider needs to train the classification hyperplane in the support vector machine model in advance based on the training data. The specific method is: the storage provider collects the positive set and the negative set stored locally, where the positive set and the negative set The set contains multiple training matrices. The training matrix in the positive set represents the request description information of the data to be stored that the storage provider can process, and the training matrix in the negative set represents the storage provider’s inability Request description information of the stored data. Through the Gaussian kernel function:

The respective training matrix into high dimensional feature space, each training matrix generation parameter corresponding to training, where P _i represents the training parameter of the training matrix i, _X-i represents the training matrix i, X 'represents a set of positive and negative to the collection of all the average training matrix of the matrix, the matrix elements of each element of the average value of the average values of all elements of the training matrix element positions corresponding, δ _i denotes X _i and X 'covariance matrix. The existing support vector machine algorithm determines the classification hyperplane of the support vector machine model according to the training parameters of the training matrix in the positive set and the training parameters of the training matrix in the negative set.

Understandably, by calling the pre-trained support vector machine model, it can be determined which side of the classification hyperplane the target feature matrix is located, and then the type of the target feature matrix can be determined. Since the corresponding relationship between the category and whether to provide storage services has been determined in advance, the storage provider can finally determine whether to provide storage services for the requester through the above-mentioned method. Obviously, the storage provider will judge whether it is suitable to store the target data based on the request description information, so as to avoid returning reply messages to the requesting end that is not suitable for storing target data from the storage provider side, which can reduce the future to a certain extent. The target data may be lost or destroyed.

In S103, the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider.

Understandably, through the previous step, each storage provider (servers other than the requester) in the distributed storage system will make a judgment on the received broadcast request packet. Some storage providers will make a judgment based on the judgment of S102. Return a reply message to the requesting end, and other storage providers will not return a reply message to the requesting end. In the embodiment of this application, since the requesting end will only send a target data to one storage provider for storage, it is necessary to filter the storage providers that return reply messages to it, and finally select from these storage providers. Choose one with the highest reliability as the target provider.

In the embodiment of the present application, the reliability of the storage provider is judged based on the remaining storage capacity of the storage provider and the time for returning the reply message. As an embodiment of the present application, as shown in FIG. 2, the foregoing S103 includes:

S1031: Extract the remaining storage capacity from the reply message returned by each storage provider within a preset time period, and establish a corresponding relationship between the storage provider and the remaining storage capacity.

In the embodiment of the present application, each storage provider will add its own remaining storage capacity when generating a reply message, so the requesting end can know the remaining storage capacity of each storage provider that returns reply messages to it.

S1032: Calculate the time difference between the time when the requesting terminal receives the reply message returned by each storage provider and the time when the requesting terminal sends the broadcast request packet, and establish a correspondence between the storage provider and the time difference.

Obviously, the requesting end will record the sending time once when sending the broadcast request packet in step S101, and record another time after receiving each reply message, so that the requesting end calculates the value of the reply message returned by each storage provider. The time difference between the time and the time when the requesting end sends the broadcast request packet.

S1033: Calculate storage coefficients of each of the storage providers by using coefficient formulas.

Optionally, the coefficient formula is:

Wherein, P _i stored in the storage provided to the coefficient input i, Cap _i is the remaining capacity of the memory storing the corresponding supply terminal i, Time _i to the supply terminal of the storage time corresponding to the difference i.

In the embodiment of this application, the storage coefficient is set to be proportional to the reliability. Because the higher the remaining storage capacity, it means that the storage provider has more space to store the target data. The higher the storage reliability, the storage coefficient is also Higher. On the other hand, if the time difference between the time of receiving a reply message returned by a storage provider and the time of sending a broadcast request packet is greater, it generally means that the storage provider’s operating load is greater, or the storage provider’s The routing table of the requesting end has a large distance, which may increase the possibility of data loss or theft due to long-distance data transmission, so the storage reliability will be reduced, and the storage coefficient will be reduced accordingly.

S1034. Use the storage provider with the largest storage coefficient as the target provider.

Understandably, the embodiment of the present application once again judges and screens the server for future target data storage from the side of the requesting end, and further improves the reliability of future data storage.

In S104, the requesting end divides the target data into multiple data blocks, calculates a summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and converts all the data blocks The data block is sent to the target provider.

It is worth noting that, in order to ensure timely detection of data abnormalities that occur in the storage process, the embodiment of this application has a timing verification step in the follow-up. However, if the target data is checked as a whole, it will obviously cause a lot of computational burden. When the requesting end has multiple target data for distributed storage in different places, it is obviously difficult for the requesting end to realize the overall verification of all target data. Therefore, in the embodiment of the present application, it is necessary to divide a target data into multiple data blocks, and perform spot checks on some data blocks in the subsequent verification process. In addition, in order to further reduce the amount of data during verification, the embodiment of the present application calculates the summary of each data block in this step to generate and store the corresponding relationship between the identifier of the data block and the summary, thereby achieving The purpose of verifying each data block through the digest in the subsequent calculation process.

As an embodiment of the present application, as shown in FIG. 3, the foregoing S104 includes:

S1041: Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups.

In the embodiment of the present application, the M is an integer greater than 1, and the N is an integer greater than M.

Optionally, each feature group is composed of several binary character strings. Optionally, the expansion rule for expanding M feature groups into N feature groups may be: assuming that the number of the feature group is t, each feature group before expansion is represented by X _t , and each feature group after expansion is represented by Y _t Indicates that when t≤M, Y _t =X _t ; when t>M,

among them,

It is the "exclusive OR" operator.

It is worth noting that the aforementioned expansion rule is only an example, and other expansion rules based on the original feature group can be used to expand the feature subgroup.

S1042: Assign an initial summary to the data block, divide the initial summary into L initial summary groups according to the arrangement order, and respectively number the initial summary groups.

In the embodiment of the present application, L is an integer greater than 1.

Illustratively, assuming that L is 5, a preset initial abstract is divided into 5 initial abstract groups, and each initial abstract group is numbered to generate a first initial abstract group, a second initial abstract group, and a third initial abstract Group, fourth initial summary group, and fifth initial summary group. Understandably, the number of initial summary groups can be adjusted according to actual conditions.

S1043: Set the L cache groups, and respectively number the cache groups.

In the embodiment of the present application, since assignment and shift calculations are to be performed later, it is necessary to set the same number of cache groups as the initial digest groups.

Exemplarily, if the number of initial digest groups is 5, then 5 cache groups are set, and each cache group is numbered to generate the first, second, third, fourth, and fourth cache groups. Five cache groups.

S1044. Store the data of each of the initial digest groups into the cache groups with the same number respectively, and perform N rounds of shift assignment calculations cyclically. The shift assignment calculation includes: After entering the next cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group, and the target feature The group number is the same as the current round of shift assignment calculation.

Exemplarily, assuming that N is 8, the shift assignment calculation is repeated 8 times, and the input data of the first shift assignment calculation is the aforementioned initial digest group. For example: the initial abstract is divided into 5 initial abstract groups, namely: the first initial abstract group: 1001010; the second initial abstract group: 1000111; the third initial abstract group: 1010101; the fourth initial abstract group: 1000001; Five initial summary groups: 1000000, and the data of each of the initial summary groups are respectively stored in the cache groups with the same number, so at the beginning of the first round of shift assignment calculation, the first cache group: 1001010; The second cache group: 1000111; the third cache group: 1010101; the fourth cache group: 1000001; the fifth cache group: 1000000. After performing a shift assignment calculation, at the end of the first round of shift assignment calculation, the second buffer group: 1001010; the third buffer group: 1001111; the fourth buffer group: 1010101; the fifth buffer group: 1000001, and Because the updated first cache group (that is, the current first cache group) is the data of the first cache group before the update plus the data of the target feature group, and the number of the target feature group is calculated from the current shift assignment calculation The rounds are the same, so since the current round is 1, the updated data of the first cache group is the data of the first cache group before the update plus the data of the first feature group, assuming that the data of the first feature group is 1110 , The updated data of the first cache group is: 1001010+1110=0101100. (Correspondingly, if the current round is 5, the updated data of the first cache group is the data of the first cache group before the update plus the data of the fifth feature group)

Obviously, the data of each cache group after the first round of shift assignment calculation is completed is used as the data of the corresponding cache group input at the beginning of the next round of shift assignment calculation.

S1045: Combine data in each cache group after N rounds of shift assignment calculations, as a summary of the data block.

As mentioned in the above example, if N is 8, after 8 rounds of shift assignment calculations are performed, the summary of the data block is obtained. In the embodiment of the present application, since each data block has its corresponding identification, the identification is used to enable the server to search for and determine the data block. Therefore, through the above steps, the corresponding relationship between the identifier of each data block and the digest can be established. Subsequently, each data block and its corresponding identification are sent to the target provider.

In S105, the target provider receives and stores each data block sent by the requester.

In S106, the requesting end verifies the digest of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval, To determine whether the storage of the target data by the target provider is abnormal.

Specifically, the verification of the digest of more than one data block stored by the target provider includes:

The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one data block identifiers used to indicate the data block to be randomly checked; After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider returns the data block The summary is verified to determine whether the storage of the target data by the target provider is abnormal.

Obviously, by spot-checking the abstract corresponding to the identifier of the data block as described above, it is possible to check whether the target data has a storage abnormality in time under the premise of using less computing resources.

In the embodiment of this application, the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store target data offsite; after receiving the broadcast request, the storage provider judges whether it is appropriate based on the request description information contained in it. Store the target data, and selectively return reply messages to avoid data loss in the future; the requester selects one of the storage providers that return reply messages to it within a preset time period as the target provider, so that From the perspective, a storage provider with the highest reliability is selected to reduce the probability of data abnormalities in the future; the requester divides the target data into multiple data blocks, calculates the summary of each data block, and generates and stores the data block Identifies the correspondence relationship with the summary, so as to provide data support for the subsequent spot check verification process; the requester sends all data blocks to the target provider, and the target provider receives and stores each data sent by the requester Block; the requester verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, so as to determine the target in time Whether there is an abnormality in the storage of the target data by the providing end, and finally achieve the beneficial effect of improving the stability of the data in the distributed storage.

Corresponding to the distributed storage method of data described in the above embodiment, FIG. 4 shows a system interaction diagram of the distributed storage system for data provided in an embodiment of this application. For ease of description, only the implementation of this application is shown. Example related parts.

4, the system includes: a request end 401 and a storage provider 402;

The requesting end is configured to send a broadcast request packet, and the broadcast request packet contains request description information;

The storage provider is configured to extract the request description information after receiving the broadcast request packet, and determine based on the request description information whether to provide storage services for the requester, if it is determined that the request is If the end provides storage services, it returns a reply message to the requesting end;

The requesting end is also used to select one of the storage providers that return reply messages to within a preset time period as the target provider;

The request end is also used to divide the target data into multiple data blocks, calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and combine all the data Sending the data block to the target provider;

The target provider is also used to receive and store each data block sent by the requester;

The requesting end is further configured to verify the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, To determine whether the storage of the target data by the target provider is abnormal.

Optionally, the judging whether to provide storage services for the requester based on the request description information includes: converting the request description information into a description matrix, and reducing the dimensionality of the description matrix through a principal component analysis algorithm Process to generate a target feature matrix; classify the target feature matrix according to a pre-trained support vector machine model, and determine whether to provide storage services for the requester based on the category of the target feature matrix.

Optionally, the requesting end selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, including: the reply message returned from each storage provider within the preset time period The remaining capacity of the storage is extracted from the, and the corresponding relationship between the storage provider and the remaining capacity of the storage is established; the time when the requesting terminal receives the reply message returned by each storage provider is calculated and the time that the requesting terminal sends The time difference between the time of broadcasting the request packet, and the corresponding relationship between the storage provider and the time difference is established; by formula:

Calculating each of said coefficients stored in a storage supply terminal, wherein, P _i is the coefficient storage supply terminal of the i, _i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, _i Time supply terminal to the memory corresponding to the i The time difference; the storage provider with the largest storage coefficient is used as the target provider.

Optionally, the calculating the digest of each of the data blocks includes:

Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M; an initial summary is assigned to the data block, the initial summary is divided into L initial summary groups according to the arrangement order, and the initial summary group numbers are respectively, and the L is greater than An integer of 1; set the L cache groups, and respectively be the cache group numbers; store the data of each of the initial digest groups into the cache groups with the same number, and perform N rounds of shift assignment calculations cyclically The shift assignment calculation includes: after shifting the data of each cache group into the next cache group, adding the data of the current first cache group to the data of the target feature group as the new first The data of each cache group is used to update the first cache group, and the number of the target feature group is the same as the current round of shift assignment calculation; the data in each cache group after N rounds of shift assignment calculation Combination as a summary of the data block.

Optionally, the requesting end verifies the digests of more than one data block stored by the target provider according to a pre-stored correspondence between the identifier of the data block and the digest every preset time interval, Including: the requesting end generates a challenge message every preset time interval, the challenge message includes more than one identifier of the data block, and is used to indicate the data block to be randomly checked; After the target provider returns the digest of the data block according to the challenge message, according to the pre-stored correspondence between the identifier of the data block and the digest, the target provider responds to the data returned by the target provider. The block digest is verified to determine whether the storage of the target data by the target provider is abnormal.

Understandably, in this embodiment of the application, the requester sends a broadcast packet to notify all storage providers in the local area network that it needs to store the target data offsite; the storage provider receives the broadcast request, according to the request description information contained therein Judge whether it is suitable to store the target data, and selectively return reply messages to avoid data loss in the future; the requester chooses one of the storage providers that return reply messages to it within a preset time period as the target provider, From the perspective of the requester, a storage provider with the highest reliability is selected to reduce the probability of data anomalies in the future; the requester divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the data The corresponding relationship between the identifier of the data block and the summary, so as to provide data support for the subsequent spot check verification process; the requesting end sends all the data blocks to the target provider, and the target provider receives and stores the data from the requester Each data block sent; the requester verifies the summary of more than one data block stored by the target provider according to the correspondence between the identifier of the data block stored in advance and the digest at a preset time interval, so as to be timely It is determined whether the storage of the target data by the target provider is abnormal, and the beneficial effect of improving the stability of the data in distributed storage is finally realized.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The embodiment of the present application also provides a distributed storage device for data, which may include a module/unit for executing the steps performed by the requesting end and/or the storage provider in the foregoing method.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. Optionally, the computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A distributed storage method for data, which includes:

The requesting end sends a broadcast request packet, and the broadcast request packet contains request description information;

After receiving the broadcast request packet, the storage provider extracts the request description information, and determines based on the request description information whether to provide storage services for the requesting side, if it is determined that the requesting side provides storage services, Then return a reply message to the requesting end;

The requesting end selects one of the storage providers that return reply messages to it within a preset time period as the target provider;

The requesting end divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and sends all the data blocks to The target provider;

The target provider receives and stores each data block sent by the requester;

The requesting end verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval to determine the Whether the storage of the target data by the target provider is abnormal.
The method for distributed storage of data according to claim 1, wherein said determining whether to provide storage service for said requesting terminal based on said request description information comprises:

Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;

The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
The method for distributed storage of data according to claim 1, wherein the requesting end selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, comprising:

Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;

Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;

By formula:
Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;

The storage provider with the largest storage coefficient is used as the target provider.
The method for distributed storage of data according to claim 1, wherein said calculating the summary of each of said data blocks comprises:

Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;

Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;

Set the L cache groups, and respectively be the cache group numbers;

The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;

Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
The method for distributed storage of data according to claim 1, wherein the requesting terminal provides the target to the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval. The digest of more than one data block stored is verified, including:

The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;

After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
A distributed storage system for data, wherein the system includes: a request end and a storage provider end;

The requesting end is configured to send a broadcast request packet, and the broadcast request packet contains request description information;

The storage provider is configured to extract the request description information after receiving the broadcast request packet, and determine based on the request description information whether to provide storage services for the requester, if it is determined that the request is If the end provides storage services, it returns a reply message to the requesting end;

The requesting end is also used to select one of the storage providers that return reply messages to within a preset time period as the target provider;

The request end is also used to divide the target data into multiple data blocks, calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and combine all the data Sending the data block to the target provider;

The target provider is also used to receive and store each data block sent by the requester;

The requesting end is further configured to verify the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest every preset time interval, To determine whether the storage of the target data by the target provider is abnormal.
The data distributed storage system according to claim 6, wherein said determining whether to provide storage service for said requesting terminal based on said request description information comprises:

Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;

The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
The data distributed storage system according to claim 6, wherein the requesting end selects one of the storage providers to which reply messages are returned within a preset time period as the target provider, comprising:

Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;

Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;

By formula:
Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;

The storage provider with the largest storage coefficient is used as the target provider.
7. The data distributed storage system according to claim 6, wherein said calculating the summary of each said data block comprises:

Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;

Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;

Set the L cache groups, and respectively be the cache group numbers;

The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;

Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
The distributed storage system for data according to claim 6, wherein the requesting terminal provides the target provider to the target according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval. The digest of more than one data block stored is verified, including:

The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;

After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
A distributed storage method for data, which includes:

The requesting end sends a broadcast request packet, and the broadcast request packet contains request description information;

The requesting end selects one of the storage providers that return reply messages to it within a preset time period as the target provider;

The requesting end divides the target data into multiple data blocks, calculates the summary of each data block, generates and stores the corresponding relationship between the identifier of the data block and the summary, and sends all the data blocks to The target provider;

The requesting end verifies the digests of more than one data block stored by the target provider according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval to determine the Whether the storage of the target data by the target provider is abnormal.
A distributed storage method for data, which includes:

The storage provider receives the broadcast request packet sent by the requester, extracts the request description information, and determines whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then Return a reply message to the requesting end;

If the data block sent by the requesting end is received, the storage provider stores the data block, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
An electronic device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the following steps when the processor executes the computer program:

Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,

Receive the broadcast request packet sent by the requester, extract the request description information, and determine whether to provide storage services for the requester based on the request description information. If it is determined that the requester provides storage services, then The requesting end returns a reply message; if the data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
The electronic device according to claim 13, wherein when said processor executes said determining whether to provide storage service for said requesting terminal based on said request description information, the following steps are specifically implemented:

Converting the requested description information into a description matrix, and performing dimensionality reduction processing on the description matrix through a principal component analysis algorithm to generate a target feature matrix;

The target feature matrix is classified according to a pre-trained support vector machine model, and based on the category of the target feature matrix, it is determined whether to provide a storage service for the requester.
The electronic device according to claim 13, wherein when the processor executes that the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, the following steps are specifically implemented:

Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;

Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;

By formula:
Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;

The storage provider with the largest storage coefficient is used as the target provider.
The electronic device according to claim 13, wherein when the processor executes the calculation of the summary of each of the data blocks, the following steps are specifically implemented:

Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;

Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;

Set the L cache groups, and respectively be the cache group numbers;

The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;

Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.
The electronic device according to claim 13, wherein the processor executes the requesting end to provide the target to the target according to the pre-stored correspondence between the identifier of the data block and the digest at a preset time interval When verifying the digest of more than one data block stored at the end, the following steps are specifically implemented:

The requesting end generates a challenge message every preset time interval, and the challenge message includes more than one identifier of the data block, which is used to indicate the data block to be randomly checked;

After the requesting end receives the digest of the data block returned by the target provider according to the challenge message, the requesting end performs the corresponding relationship between the identifier of the data block and the digest stored in advance. The summary of the data block returned by the target provider is verified to determine whether the storage of the target data by the target provider is abnormal.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:

Send a broadcast request packet, the broadcast request packet contains request description information; select one of the storage providers that return reply messages to it within a preset time period as the target provider; divide the target data into multiple data blocks, and Calculate the summary of each data block, generate and store the corresponding relationship between the identifier of the data block and the summary, and send all the data blocks to the target provider; The corresponding relationship between the identifier of the data block and the summary is verified on the summary of more than one data block stored by the target provider to determine whether the storage of the target data by the target provider is abnormal ;and / or,

Receive the broadcast request packet sent by the requesting end, extract the request description information, and determine whether to provide storage services for the requesting end based on the request description information, and if it is determined that the requesting end provides storage services, then The requesting end returns a reply message; if a data block sent by the requesting end is received, the data block is stored, and the data block is all the data blocks obtained by dividing the target data by the requesting end.
The computer-readable storage medium according to claim 18, wherein when the requester selects one of the storage providers to which the reply message is returned within a preset time period as the target provider, the computer program is executed by the processor Perform the following steps to specifically achieve:

Extracting the remaining storage capacity from the reply messages returned by each storage provider within a preset time period, and establishing a corresponding relationship between the storage provider and the remaining storage capacity;

Calculating the time difference between the time at which the requesting end receives the reply message returned by each storage provider and the time at which the requesting end sends the broadcast request packet, and establishing a correspondence between the storage provider and the time difference;

By formula:
Calculating each of said coefficients stored in a storage supply terminal, wherein, P i is the coefficient storage supply terminal of the i, i of Cap remaining storage capacity of the storage supply terminal corresponding to the i, i Time supply terminal to the memory corresponding to the i The time difference;

The storage provider with the largest storage coefficient is used as the target provider.
18. The computer-readable storage medium according to claim 18, wherein, when the digest of each of the data blocks is calculated, the computer program is executed by a processor to specifically implement the following steps:

Divide the data block into M feature groups, expand the M feature groups into N feature groups according to a preset expansion rule, and respectively number the N feature groups, where M is an integer greater than 1. , The N is an integer greater than M;

Assigning an initial summary to the data block, dividing the initial summary into L initial summary groups according to the sequence of arrangement, and each being the initial summary group number, where L is an integer greater than one;

Set the L cache groups, and respectively be the cache group numbers;

The data of each of the initial digest groups are respectively stored in the cache groups with the same number, and N rounds of shift assignment calculations are performed cyclically, and the shift assignment calculation includes: after shifting and storing the data of each cache group After entering a cache group, add the data of the current first cache group to the data of the target feature group as the new data of the first cache group to update the first cache group and the data of the target feature group The number is the same as the current round of shift assignment calculation;

Combine the data in each cache group after N rounds of shift assignment calculations as a summary of the data block.