CN117648057A

CN117648057A - Data security management method and system based on distributed storage

Info

Publication number: CN117648057A
Application number: CN202410119178.1A
Authority: CN
Inventors: 李璇珑
Original assignee: Ruida Credible Security Technology Guangzhou Co ltd
Current assignee: Ruida Credible Security Technology Guangzhou Co ltd
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-03-05

Abstract

The invention provides a data security management method based on distributed storage, which relates to the field of data processing and comprises the following steps: the storage scheduling module comprises at least one storage scheduling node; a data processing module comprising at least one data processing node; a data storage module comprising at least one data storage node; the storage scheduling node is used for determining a first target data processing node corresponding to the data storage request and a plurality of first target data storage nodes, the first target data processing node is used for dividing data to be stored corresponding to the data storage request, generating a plurality of data segments to be stored and encrypting the data segments, and the plurality of first target data storage nodes are used for storing the plurality of encrypted data segments to be stored; the data processing module is also used for scheduling the number of the data processing nodes; the data storage module is also used for scheduling the number of the data storage nodes, and has the advantages of improving the flexibility of distributed storage scheduling and guaranteeing the load balance of the distributed storage nodes.

Description

Data security management method and system based on distributed storage

Technical Field

The invention relates to the field of data storage, in particular to a data security management method and system based on distributed storage.

Background

With the rapid development of computer information technology and internet information technology, computer network technology has penetrated into various industries, information resources on the network show explosive growth, the use of internet information technology brings great convenience to our life, but the information on the internet is very numerous and miscellaneous, great trouble is brought to users, the data storage requirements under the big data age are more various, and the traditional data storage system has difficulty in meeting the requirements of big data storage. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes a bottleneck of system performance, is also a focus of reliability and safety, cannot meet the requirement of large-scale storage application, static shutdown is required when capacity expansion is carried out, then expansion storage is carried out, the data is also redistributed, if the data suddenly increases in a certain period, the information data increase speed is greater than the hardware upgrading speed of a database, seamless connection of suddenly increased data storage through capacity expansion cannot be met, insufficient database capacity easily occurs, and the service quality and the customer timeliness requirement are influenced.

The distributed storage system disperses and stores data on a plurality of independent devices, and in the distributed storage, reasonable scheduling of the data is a great problem and needs to meet the requirement of high concurrency and high availability. In the prior art, an arriving request can only provide service on one node bearing a requested object, which easily causes the problem of unbalanced node load of a distributed storage system.

Therefore, it is necessary to provide a data security management method and system based on distributed storage, which improves flexibility of distributed storage scheduling, ensures load balance of distributed storage nodes, and improves stability of distributed storage.

Disclosure of Invention

The invention provides a data security management system based on distributed storage, which comprises: the storage scheduling module comprises at least one storage scheduling node; a data processing module comprising at least one data processing node; a data storage module comprising at least one data storage node; the storage scheduling node is configured to receive a data storage request initiated by a user terminal, determine, based on the data storage request, a first target data processing node corresponding to the data storage request from among the at least one data processing node according to a historical data access operation of the user terminal and status information of each data processing node, determine, based on the data storage request, a plurality of first target data storage nodes corresponding to the data storage request from among the at least one data storage node according to a historical data access operation of the user terminal and status information of each data storage node, wherein the first target data processing node is configured to segment to-be-stored data corresponding to the data storage request, generate a plurality of to-be-stored data segments corresponding to the to-be-stored data, encrypt the plurality of to-be-stored data segments, and store the plurality of encrypted to-be-stored data segments; the storage scheduling node is configured to receive a data reading request initiated by a user terminal, determine, according to a historical data access operation of the user terminal and status information of each data processing node, a second target data processing node corresponding to the data storage request from the at least one data processing node, determine, according to storage path information of data to be read corresponding to the data reading request, a plurality of second target data storage nodes corresponding to the data reading request from the at least one data storage node, the plurality of second target data storage nodes are configured to read a plurality of encrypted data segments corresponding to the data reading request, and send the encrypted data segments to the second target data processing node, and the second target data processing node is configured to decrypt and splice the plurality of encrypted data segments corresponding to the data reading request, generate data to be read corresponding to the data reading request, and feed the data to the storage scheduling node; the data processing module is also used for scheduling the number of the data processing nodes based on historical data access operations of a plurality of user terminals; the data storage module is also used for scheduling the number of the data storage nodes based on historical data access operations of a plurality of user terminals.

Further, the data processing module schedules the number of data processing nodes based on historical data access operations of a plurality of user terminals, including: clustering the plurality of user terminals based on historical data access operation of the plurality of user terminals, and determining a plurality of user terminal clustering clusters; for each user terminal cluster, determining the data storage requirements and the data reading requirements of the user terminal cluster in a plurality of scheduling time periods of a current scheduling period based on historical data access operations of a plurality of user terminals included in the user terminal cluster; and scheduling the number of data processing nodes in a plurality of scheduling time periods of the current scheduling period based on the data storage requirements and the data reading requirements of each user terminal cluster in the plurality of scheduling time periods of the current scheduling period.

Further, the data processing module clusters the plurality of user terminals based on historical data access operations of the plurality of user terminals, and determines a plurality of user terminal cluster clusters, including: for each user terminal, determining a data storage requirement and a data reading requirement of the user terminal in each scheduling period of one scheduling period and a data storage requirement fluctuation parameter and a data reading requirement fluctuation parameter of the user terminal in each scheduling period of one scheduling period based on historical data access operation of the user terminal; for any two user terminals, calculating the terminal similarity of the two user terminals based on the data storage requirement and the data reading requirement of the two user terminals in each scheduling time period of one scheduling period and the data storage requirement fluctuation parameter and the data reading requirement fluctuation parameter of the user terminal in each scheduling time period of one scheduling period; and clustering the plurality of user terminals based on the terminal similarity of any two user terminals, and determining a plurality of user terminal clustering clusters.

Further, the data storage module schedules the number of data storage nodes based on historical data access operations of a plurality of user terminals, including: and scheduling the number of the data storage nodes based on the data storage requirements and the data reading requirements of each user terminal cluster in a plurality of scheduling time periods of the current scheduling period.

Further, the storage scheduling node determines, based on the data storage request, a first target data processing node corresponding to the data storage request from the at least one data processing node according to the historical data access operation of the user terminal and the status information of each data processing node, and includes: acquiring computing power resource use information of each data processing node; acquiring operation parameter information of each data processing node; acquiring state information of a physical machine where each data processing node is located; calculating the sequencing score of each data processing node for the computing power resource use information, the operation parameter information and the state information of the physical machine of each data processing node; and determining a first target data processing node corresponding to the data storage request from the at least one data processing node based on the ranking score of each data processing node.

Further, the storage scheduling node determines, based on the data storage request, a plurality of first target data storage nodes corresponding to the data storage request from the at least one data storage node according to the historical data access operation of the user terminal and the state information of each data storage node, and includes: acquiring storage resource use information of each data storage node; acquiring operation parameter information of each data storage node; acquiring state information of a physical machine where each data storage node is located; for each data storage node, calculating a sorting score of the data storage node based on storage resource use information, operation parameter information and state information of a physical machine in which the data storage node is located; and determining a plurality of first target data storage nodes corresponding to the data storage requests from the at least one data storage node based on the ordering scores of each data storage node.

Further, the first target data processing node segments the data to be stored corresponding to the data storage request, and generates a plurality of data segments to be stored corresponding to the data to be stored, including: for each user terminal cluster, determining the lengths of data segments of the user terminal cluster in a plurality of scheduling time segments of a current scheduling period based on the data storage requirements and the data reading requirements of the user terminal cluster in a plurality of scheduling time segments of the current scheduling period; determining a target user terminal cluster corresponding to the data to be stored based on the data storage request; determining the target data segment length corresponding to the data to be stored based on the data segment lengths of the target user terminal cluster in a plurality of scheduling time segments of the current scheduling period; and based on the target data segment length, segmenting the data to be stored corresponding to the data storage request, and generating a plurality of data segments to be stored corresponding to the data to be stored.

Further, the encrypting, by the first target data processing node, the plurality of data segments to be stored includes: encrypting the plurality of data segments to be stored based on the related information of the first target data processing node and the related information of the plurality of first target data storage nodes.

Further, the encrypting, by the first target data processing node, the plurality of data segments to be stored based on the related information of the first target data processing node and the related information of the plurality of first target data storage nodes includes: for each data segment to be stored, encrypting the data segment to be stored based on the segmentation identification of the data segment to be stored, the identification of a first target data storage node for storing the data segment to be stored and the identification of an associated first target data storage node; and encrypting the segmentation identification of the data segment to be stored on the basis of the identification of the first target data processing node for each data segment to be stored, wherein the first target data storage node for storing the data segment to be stored is used for storing the encrypted data segment to be stored and the corresponding encrypted segmentation identification.

The invention provides a data security management method based on distributed storage, which comprises the following steps: scheduling the number of the data processing nodes based on historical data access operations of a plurality of user terminals; scheduling the number of data storage nodes based on historical data access operations of a plurality of user terminals; receiving a data storage request initiated by a user terminal, determining a first target data processing node corresponding to the data storage request from at least one data processing node according to a historical data access operation of the user terminal and state information of each data processing node based on the data storage request, determining a plurality of first target data storage nodes corresponding to the data storage request from the at least one data storage node according to the historical data access operation of the user terminal and state information of each data storage node based on the data storage request, wherein the first target data processing node is used for segmenting data to be stored corresponding to the data storage request, generating a plurality of data segments to be stored corresponding to the data to be stored, encrypting the plurality of data segments to be stored, and storing a plurality of encrypted data segments to be stored; receiving a data reading request initiated by a user terminal, determining a second target data processing node corresponding to the data storage request from the at least one data processing node according to historical data access operation of the user terminal and state information of each data processing node based on the data reading request, determining a plurality of second target data storage nodes corresponding to the data reading request from the at least one data storage node based on storage path information of data to be read corresponding to the data reading request, wherein the plurality of second target data storage nodes are used for reading a plurality of encrypted data segments corresponding to the data reading request and sending the encrypted data segments to the second target data processing node, and the second target data processing node is used for decrypting and splicing the plurality of encrypted data segments corresponding to the data reading request, generating data to be read corresponding to the data reading request and feeding the data back to the storage scheduling node.

Compared with the prior art, the data security management method and system based on distributed storage provided by the invention have the following beneficial effects:

1. the data storage system comprises a plurality of data storage nodes, a plurality of data processing nodes, a plurality of data storage nodes and a data storage system.

2. Based on historical data access operation of the user terminal, determining data storage requirement and data reading requirement of the user terminal in each scheduling time period of one scheduling period and data storage requirement fluctuation parameter and data reading requirement fluctuation parameter of the user terminal in each scheduling time period of one scheduling period, realizing more accurate clustering of a plurality of user terminals, and determining a plurality of user terminal clustering clusters, so that the data storage requirement and data reading requirement of the user terminal clustering clusters determined later in a plurality of scheduling time periods of the current scheduling period are more accurate.

3. For the user terminal cluster with higher data storage requirements and data reading requirements in a plurality of scheduling time periods of the current scheduling period, the probability of reading the data stored in the current scheduling period is higher, so that the target data segment length corresponding to the data to be stored is set longer, the number of the data segments to be stored, which are divided into the data to be stored, is smaller, the number of data storage nodes needing interaction is smaller when the data to be stored is read later, the processing time needed by the second target data processing node for decryption and data splicing is shorter, and the reading efficiency of the data to be stored is improved. For the user terminal cluster with lower data storage requirement and data reading requirement in a plurality of scheduling time periods of the current scheduling period, the possibility of data reading is lower, so that the data to be stored of the user terminal cluster is segmented into a plurality of data segments to be stored, the situation that the data to be stored of the user terminal cluster occupy too much storage resources of a single data storage node is avoided, and further, the data to be stored of the user terminal cluster with lower data storage requirement and data reading requirement is segmented into a plurality of data segments to be stored, so that the load balancing accuracy of the data storage node can be improved;

4. Based on the segmentation identification of the data segment to be stored, the identification of the first target data storage node for storing the data segment to be stored and the identification of the associated first target data storage node, the data segment to be stored is encrypted, and based on the identification of the first target data processing node, the segmentation identification of the data segment to be stored is encrypted, so that double encryption is realized, and the security of data storage is further improved.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a block diagram of a distributed storage based data security management system according to some embodiments of the present disclosure;

FIG. 2 is a schematic flow diagram illustrating a determination of a first target data processing node to which a data storage request corresponds, according to some embodiments of the present disclosure;

FIG. 3 is a flow diagram illustrating a method for determining a plurality of first target data storage nodes corresponding to a data storage request according to some embodiments of the present disclosure;

FIG. 4 is a flow chart of generating a plurality of data segments to be stored corresponding to data to be stored according to some embodiments of the present disclosure;

Fig. 5 is a flow diagram illustrating a method for data security management based on distributed storage according to some embodiments of the present disclosure.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

FIG. 1 is a block diagram of a distributed storage based data security management system according to some embodiments of the present disclosure, as shown in FIG. 1, a distributed storage based data security management system may include a storage scheduling module, a data processing module, and a data storage module. The respective modules are described in detail in order below.

The storage scheduling module may include at least one storage scheduling node. The data processing module may comprise at least one data processing node. The data storage module may comprise at least one data storage node.

The data processing module may also be configured to schedule the number of data processing nodes based on historical data access operations of the plurality of user terminals.

In some embodiments, it may specifically include:

clustering the plurality of user terminals based on historical data access operations of the plurality of user terminals, and determining a plurality of user terminal clustering clusters;

for each user terminal cluster, determining the data storage requirements and the data reading requirements of the user terminal cluster in a plurality of scheduling time periods of a current scheduling period based on historical data access operations of a plurality of user terminals included in the user terminal cluster;

and scheduling the number of data processing nodes in a plurality of scheduling time periods of the current scheduling period based on the data storage requirements and the data reading requirements of each user terminal cluster in the plurality of scheduling time periods of the current scheduling period.

Specifically, the historical data access operation of the user terminal may include the number of data storage requests and the number of data reading requests of the user terminal in a plurality of historical time periods. The data processing module can determine data storage requirements and data reading requirements of the user terminal cluster in a plurality of scheduling time periods of a current scheduling period based on historical data access operations of a plurality of user terminals included in the user terminal cluster through a requirement prediction model, wherein the time length of the current scheduling period can be one day, one week, half month, one month and the like.

The data processing module can determine the computational power demands of the plurality of scheduling time periods of the current scheduling period based on the data storage demands and the data reading demands of each user terminal cluster in the plurality of scheduling time periods of the current scheduling period through the computational power determining model, further determine the number of data processing nodes of the plurality of scheduling time periods of the current scheduling period, and schedule the number of data processing nodes of the plurality of scheduling time periods of the current scheduling period according to the determined number of data processing nodes of the plurality of scheduling time periods of the current scheduling period.

The computational force determination model and the demand prediction model may be Long Short-Term Memory network (LSTM) models.

In some embodiments, the data processing module clusters the plurality of user terminals based on historical data access operations of the plurality of user terminals, determines a plurality of user terminal cluster clusters, including:

for each user terminal, determining the data storage requirement and the data reading requirement of the user terminal in each scheduling time period of one scheduling period and the data storage requirement fluctuation parameter and the data reading requirement fluctuation parameter of the user terminal in each scheduling time period of one scheduling period based on the historical data access operation of the user terminal;

For any two user terminals, calculating the terminal similarity of the two user terminals based on the data storage requirement and the data reading requirement of the two user terminals in each scheduling time period of one scheduling period and the data storage requirement fluctuation parameter and the data reading requirement fluctuation parameter of the user terminal in each scheduling time period of one scheduling period;

based on the terminal similarity of any two user terminals, clustering the plurality of user terminals to determine a plurality of user terminal clustering clusters.

Specifically, for each user terminal, the data processing module may determine, based on a historical data access operation of the user terminal, the number of data storage requests and the number of data reading requests of the user terminal in a plurality of scheduling periods of the plurality of historical scheduling periods, for each scheduling period, calculate, as a mean value of the number of data storage requests and the number of data reading requests of the user terminal in the scheduling period of the plurality of historical scheduling periods, a data storage demand and a data reading demand of the user terminal in the scheduling period of the plurality of scheduling periods, calculate, based on the number of data storage requests of the user terminal in the scheduling period of the plurality of historical scheduling periods, a data storage demand fluctuation parameter of the user terminal in the scheduling period, and calculate, based on the number of data reading requests of the user terminal in the scheduling period of the plurality of historical scheduling periods, a data reading demand fluctuation parameter of the user terminal in the scheduling period.

For example, the data processing module may calculate the data storage demand fluctuation parameter according to the following formula:

wherein,the demand fluctuation parameters are stored for the data of the ith user terminal in the jth scheduling period,and (3) the number of data storage requests of the ith user terminal in the jth scheduling time period of the mth historical scheduling period, wherein M is the total number of the historical scheduling periods.

The data processing module may calculate the data read demand fluctuation parameter according to the following formula:

wherein,the demand fluctuation parameter is read for the data of the ith user terminal in the jth scheduling period,the number of data reading requests of the ith user terminal in the jth scheduling period of the mth historical scheduling period.

The data processing module may calculate terminal similarities of the two user terminals based on the data storage requirement and the data reading requirement of the two user terminals in each scheduling period of one scheduling period and the data storage requirement fluctuation parameter and the data reading requirement fluctuation parameter of the user terminal in each scheduling period of one scheduling period according to the following formula:

wherein,for terminal similarity of the e-th user terminal and the f-th user terminal,for the data storage requirement of the e-th user terminal in the j-th scheduling period, For the data storage requirement of the f-th user terminal in the j-th scheduling period,for the data read requirement of the e-th user terminal in the j-th scheduling period,data read demand for the f-th user terminal in the j-th scheduling period，The demand fluctuation parameters are stored for the data of the e-th user terminal in the j-th scheduling period,the demand fluctuation parameters are stored for the data of the f-th user terminal in the j-th scheduling period,the demand fluctuation parameter is read for the data of the e-th user terminal in the j-th scheduling period,the demand fluctuation parameter is read for the data of the f-th user terminal in the j-th scheduling period,、、andAre all the weights of the preset weight, and the weight of the whole body is equal to the preset weight,、、andAre all preset parameters.

The plurality of user terminals can be clustered based on the terminal similarity of any two user terminals through a k-means clustering algorithm (k-means clustering algorithm) to determine a plurality of user terminal clustering clusters.

The data storage module may also be configured to schedule the number of data storage nodes based on historical data access operations of the plurality of user terminals.

In some embodiments, it may specifically include: and scheduling the number of the data storage nodes based on the data storage requirements and the data reading requirements of each user terminal cluster in a plurality of scheduling time periods of the current scheduling period.

Specifically, the data storage module may determine the number of spare data storage nodes based on the data storage requirements and the data reading requirements in a plurality of scheduling periods of the current scheduling period, so as to implement scheduling of the number of data storage nodes. For example, when the data storage requirement of the multiple scheduling periods of the current scheduling period is greater than a preset data storage requirement threshold and/or the data reading requirement is greater than a preset data reading requirement, the number of spare data storage nodes is increased.

The storage scheduling node is used for receiving a data storage request initiated by a user terminal, determining a first target data processing node corresponding to the data storage request from at least one data processing node according to historical data access operation of the user terminal and state information of each data processing node based on the data storage request, determining a plurality of first target data storage nodes corresponding to the data storage request from at least one data storage node according to the historical data access operation of the user terminal and state information of each data storage node based on the data storage request, wherein the first target data processing node is used for segmenting data to be stored corresponding to the data storage request, generating a plurality of data segments to be stored corresponding to the data to be stored, encrypting the plurality of data segments to be stored, and storing the plurality of encrypted data segments to be stored.

FIG. 2 is a flow chart of determining a first target data processing node corresponding to a data storage request according to some embodiments of the present disclosure, as shown in FIG. 2, in some embodiments, a storage scheduling node determines, from at least one data processing node, the first target data processing node corresponding to the data storage request according to a historical data access operation of a user terminal and status information of each data processing node based on the data storage request, including:

acquiring computing power resource use information of each data processing node, such as CPU occupation information, memory occupation information, network bandwidth occupation information, disk IO occupation information and the like;

acquiring operation parameter information of each data processing node, such as service response time, network bandwidth and the like;

acquiring state information, such as temperature information, environment humidity information, vibration information and the like, of a physical machine where each data processing node is located;

calculating the ordering scores of the data processing nodes for the computing power resource use information, the operation parameter information and the state information of the physical machines of each data processing node;

a first target data processing node corresponding to the data storage request is determined from the at least one data processing node based on the ranking score of each data processing node.

Specifically, the storage scheduling node may calculate an ordering score of the data processing node based on computing power resource usage information, operation parameter information and state information of a physical machine where the data processing node is located through a first score prediction model, and use the data processing node with the largest ordering score as a first target data processing node corresponding to a data storage request, where the first score prediction model may be a model of an artificial neural network (Artificial Neural Network, ANN).

FIG. 3 is a flow chart of determining a plurality of first target data storage nodes corresponding to a data storage request according to some embodiments of the present disclosure, as shown in FIG. 3, in some embodiments, a storage scheduling node determines, from at least one data storage node, a plurality of first target data storage nodes corresponding to a data storage request according to a historical data access operation of a user terminal and status information of each data storage node based on the data storage request, including:

acquiring storage resource use information of each data storage node, such as memory occupation, disk occupation, CPU occupation and the like;

acquiring operation parameter information of each data storage node, such as data reading time, data writing time, data reading success rate, data writing success rate and the like;

Acquiring state information, such as temperature information, environment humidity information, vibration information and the like, of a physical machine where each data storage node is located;

for each data storage node, calculating a ranking score of the data storage node based on storage resource usage information, operating parameter information and state information of the physical machine in which the data storage node is located;

a plurality of first target data storage nodes corresponding to the data storage request are determined from the at least one data storage node based on the ranking score of each data storage node.

Specifically, the storage scheduling node may calculate, based on storage resource usage information, operation parameter information of the data storage node and state information of the physical machine where the storage scheduling node is located, an ordering score of the data storage node, determine the number of a plurality of data segments to be stored corresponding to the generated data to be stored after splitting the data to be stored corresponding to the data storage request, further determine the number of first target data storage nodes required, order at least one data storage node according to the ordering score of each data storage node, determine the first target data storage node, then predict the ordering score of the first target data storage node after storing one data segment to be stored again, order the at least one data storage node again, and select the second first target data storage node until all the data segments to be stored are determined to be corresponding to the first target data storage node, where the first score prediction model may be a model of an artificial neural network (Artificial Neural Network, ANN).

Fig. 4 is a schematic flow chart of generating a plurality of data segments to be stored corresponding to data to be stored according to some embodiments of the present disclosure, as shown in fig. 4, in some embodiments, the first target data processing node segments data to be stored corresponding to a data storage request, and generates a plurality of data segments to be stored corresponding to data to be stored, including:

for each user terminal cluster, determining the lengths of data segments of the user terminal clusters in a plurality of scheduling time segments of a current scheduling period based on the data storage requirements and the data reading requirements of the user terminal clusters in a plurality of scheduling time segments of the current scheduling period;

determining a target user terminal cluster corresponding to data to be stored based on the data storage request;

determining the length of a target data segment corresponding to data to be stored based on the lengths of the data segments of the target user terminal cluster in a plurality of scheduling time periods of a current scheduling period;

and based on the target data segment length, segmenting the data to be stored corresponding to the data storage request, and generating a plurality of data segments to be stored corresponding to the data to be stored.

Specifically, the higher the data storage requirement and the data reading requirement of the target user terminal cluster in a plurality of scheduling time periods of the current scheduling period, the longer the target data segment length corresponding to the data to be stored.

It can be understood that, for the user terminal cluster with higher data storage requirements and data reading requirements in multiple scheduling periods of the current scheduling period, the probability of reading the data stored in the current scheduling period is higher, so that the length of the target data segment corresponding to the data to be stored is set longer, the number of the data segments to be stored, which are divided into the data to be stored, is smaller, the number of data storage nodes which need interaction when the data to be stored is read later is smaller, the processing time required for decryption and data splicing by the second target data processing node is shorter, and the reading efficiency of the data to be stored is improved. For the user terminal cluster with lower data storage requirement and data reading requirement in a plurality of scheduling time periods of the current scheduling period, the possibility of data reading is lower, so that the data to be stored of the user terminal cluster is segmented into a plurality of data segments to be stored, the situation that the data to be stored of the user terminal cluster occupy too much storage resources of a single data storage node is avoided, and further, the data storage requirement and the data reading requirement are lower, and the accuracy of load balancing of the data storage node can be improved by using the plurality of data segments to be stored which are segmented by the data to be stored of the user terminal cluster.

In some embodiments, the first target data processing node encrypts a plurality of data segments to be stored, comprising: and encrypting the plurality of data segments to be stored based on the related information of the first target data processing node and the related information of the plurality of first target data storage nodes.

In some embodiments, the first target data processing node encrypts the plurality of data segments to be stored based on the related information of the first target data processing node and the related information of the plurality of first target data storage nodes, including:

for each data segment to be stored, encrypting the data segment to be stored based on the segmentation identification of the data segment to be stored, the identification of the first target data storage node for storing the data segment to be stored and the identification of the associated first target data storage node;

and encrypting the segmentation identification of the data segment to be stored based on the identification of the first target data processing node for each data segment to be stored, wherein the first target data storage node for storing the data segment to be stored stores the encrypted data segment to be stored and the corresponding segmentation identification after encryption.

Specifically, the first target data processing node may encrypt the data segment to be stored based on the splitting identifier of the data segment to be stored, the identifier of the first target data storage node for storing the data segment to be stored and the identifier of the associated first target data storage node through an encryption algorithm such as MD5 (Message Digest Algorithm), a secure hash algorithm (Secure Hash Algorithm), and the like, and encrypt the splitting identifier of the data segment to be stored based on the identifier of the first target data processing node.

The storage scheduling node is used for receiving a data reading request initiated by the user terminal, determining a second target data processing node corresponding to the data storage request from at least one data processing node according to historical data access operation of the user terminal and state information of each data processing node based on the data reading request, determining a plurality of second target data storage nodes corresponding to the data reading request from at least one data storage node based on storage path information of data to be read corresponding to the data reading request, wherein the plurality of second target data storage nodes are used for reading a plurality of encrypted data segments corresponding to the data reading request and sending the encrypted data segments to the second target data processing node, and the second target data processing node is used for decrypting and splicing the plurality of encrypted data segments corresponding to the data reading request to generate data to be read corresponding to the data reading request and feeding the data to the storage scheduling node.

FIG. 5 is a flow chart of a data security management method based on distributed storage according to some embodiments of the present disclosure, as shown in FIG. 5, the data security management method based on distributed storage may include the following steps:

Step 510, scheduling the number of data processing nodes based on historical data access operations of a plurality of user terminals;

step 520, scheduling the number of data storage nodes based on historical data access operations of the plurality of user terminals;

step 530, receiving a data storage request initiated by a user terminal, determining a first target data processing node corresponding to the data storage request from at least one data processing node according to a historical data access operation of the user terminal and state information of each data processing node based on the data storage request, determining a plurality of first target data storage nodes corresponding to the data storage request from at least one data storage node according to the historical data access operation of the user terminal and state information of each data storage node based on the data storage request, wherein the first target data processing node is used for splitting data to be stored corresponding to the data storage request, generating a plurality of data segments to be stored corresponding to the data to be stored, encrypting the plurality of data segments to be stored, and storing the plurality of encrypted data segments to be stored;

Step 540, a data reading request initiated by the user terminal is received, and based on the data reading request, a second target data processing node corresponding to the data storage request is determined from at least one data processing node according to the historical data access operation of the user terminal and the state information of each data processing node, a plurality of second target data storage nodes corresponding to the data reading request are determined from at least one data storage node based on the storage path information of the data to be read corresponding to the data reading request, the plurality of second target data storage nodes are used for reading a plurality of encrypted data segments corresponding to the data reading request and sending the encrypted data segments to the second target data processing node, and the second target data processing node is used for decrypting and splicing the plurality of encrypted data segments corresponding to the data reading request to generate the data to be read corresponding to the data reading request and feeding the data to the storage scheduling node.

The data security management method based on the distributed storage can be executed by a data security management system based on the distributed storage, and more description of the data security management method based on the distributed storage can be referred to as related description of the data security management system based on the distributed storage, which is not repeated herein.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. A data security management system based on distributed storage, comprising:

the storage scheduling module comprises at least one storage scheduling node;

a data processing module comprising at least one data processing node;

a data storage module comprising at least one data storage node;

the storage scheduling node is configured to receive a data storage request initiated by a user terminal, determine, based on the data storage request, a first target data processing node corresponding to the data storage request from among the at least one data processing node according to a historical data access operation of the user terminal and status information of each data processing node, determine, based on the data storage request, a plurality of first target data storage nodes corresponding to the data storage request from among the at least one data storage node according to a historical data access operation of the user terminal and status information of each data storage node, wherein the first target data processing node is configured to segment to-be-stored data corresponding to the data storage request, generate a plurality of to-be-stored data segments corresponding to the to-be-stored data, encrypt the plurality of to-be-stored data segments, and store the plurality of encrypted to-be-stored data segments;

The storage scheduling node is configured to receive a data reading request initiated by a user terminal, determine, according to a historical data access operation of the user terminal and status information of each data processing node, a second target data processing node corresponding to the data storage request from the at least one data processing node, determine, according to storage path information of data to be read corresponding to the data reading request, a plurality of second target data storage nodes corresponding to the data reading request from the at least one data storage node, the plurality of second target data storage nodes are configured to read a plurality of encrypted data segments corresponding to the data reading request, and send the encrypted data segments to the second target data processing node, and the second target data processing node is configured to decrypt and splice the plurality of encrypted data segments corresponding to the data reading request, generate data to be read corresponding to the data reading request, and feed the data to the storage scheduling node;

the data processing module is also used for scheduling the number of the data processing nodes based on historical data access operations of a plurality of user terminals;

The data storage module is also used for scheduling the number of the data storage nodes based on historical data access operations of a plurality of user terminals.

2. The distributed storage based data security management system of claim 1, wherein the data processing module schedules the number of data processing nodes based on historical data access operations of a plurality of user terminals, comprising:

clustering the plurality of user terminals based on historical data access operation of the plurality of user terminals, and determining a plurality of user terminal clustering clusters;

3. The distributed storage based data security management system of claim 2, wherein the data processing module clusters the plurality of user terminals based on historical data access operations of the plurality of user terminals, and determines a plurality of user terminal cluster clusters, comprising:

For each user terminal, determining a data storage requirement and a data reading requirement of the user terminal in each scheduling period of one scheduling period and a data storage requirement fluctuation parameter and a data reading requirement fluctuation parameter of the user terminal in each scheduling period of one scheduling period based on historical data access operation of the user terminal;

and clustering the plurality of user terminals based on the terminal similarity of any two user terminals, and determining a plurality of user terminal clustering clusters.

4. A data security management system based on distributed storage as claimed in claim 3, wherein said data storage module schedules the number of data storage nodes based on historical data access operations of a plurality of user terminals, comprising:

And scheduling the number of the data storage nodes based on the data storage requirements and the data reading requirements of each user terminal cluster in a plurality of scheduling time periods of the current scheduling period.

5. The distributed storage-based data security management system according to any one of claims 1 to 4, wherein the storage scheduling node determines, based on the data storage request, a first target data processing node corresponding to the data storage request from the at least one data processing node according to a historical data access operation of the user terminal and status information of each of the data processing nodes, including:

acquiring computing power resource use information of each data processing node;

acquiring operation parameter information of each data processing node;

acquiring state information of a physical machine where each data processing node is located;

calculating the sequencing score of each data processing node for the computing power resource use information, the operation parameter information and the state information of the physical machine of each data processing node;

and determining a first target data processing node corresponding to the data storage request from the at least one data processing node based on the ranking score of each data processing node.

6. The distributed storage-based data security management system according to any one of claims 1 to 4, wherein the storage scheduling node determines, based on the data storage request, a plurality of first target data storage nodes corresponding to the data storage request from the at least one data storage node according to historical data access operations of the user terminal and status information of each of the data storage nodes, including:

acquiring storage resource use information of each data storage node;

acquiring operation parameter information of each data storage node;

acquiring state information of a physical machine where each data storage node is located;

for each data storage node, calculating a sorting score of the data storage node based on storage resource use information, operation parameter information and state information of a physical machine in which the data storage node is located;

and determining a plurality of first target data storage nodes corresponding to the data storage requests from the at least one data storage node based on the ordering scores of each data storage node.

7. The distributed storage-based data security management system according to any one of claims 2 to 4, wherein the first target data processing node segments data to be stored corresponding to the data storage request, and generates a plurality of data segments to be stored corresponding to the data to be stored, including:

For each user terminal cluster, determining the lengths of data segments of the user terminal cluster in a plurality of scheduling time segments of a current scheduling period based on the data storage requirements and the data reading requirements of the user terminal cluster in a plurality of scheduling time segments of the current scheduling period;

determining a target user terminal cluster corresponding to the data to be stored based on the data storage request;

determining the target data segment length corresponding to the data to be stored based on the data segment lengths of the target user terminal cluster in a plurality of scheduling time segments of the current scheduling period;

8. A distributed storage based data security management system according to any of claims 1-4, wherein the first target data processing node encrypts the plurality of data segments to be stored, comprising:

encrypting the plurality of data segments to be stored based on the related information of the first target data processing node and the related information of the plurality of first target data storage nodes.

9. The distributed storage-based data security management system of claim 8, wherein the first target data processing node encrypts the plurality of data segments to be stored based on the information related to the first target data processing node and the information related to the plurality of first target data storage nodes, comprising:

for each data segment to be stored, encrypting the data segment to be stored based on the segmentation identification of the data segment to be stored, the identification of a first target data storage node for storing the data segment to be stored and the identification of an associated first target data storage node;

and encrypting the segmentation identification of the data segment to be stored on the basis of the identification of the first target data processing node for each data segment to be stored, wherein the first target data storage node for storing the data segment to be stored is used for storing the encrypted data segment to be stored and the corresponding encrypted segmentation identification.

10. A data security management method based on distributed storage, which is applied to the data security management system based on distributed storage as claimed in any one of claims 1 to 9, and comprises the following steps:

Scheduling the number of the data processing nodes based on historical data access operations of a plurality of user terminals;

scheduling the number of data storage nodes based on historical data access operations of a plurality of user terminals;

receiving a data storage request initiated by a user terminal, determining a first target data processing node corresponding to the data storage request from at least one data processing node according to a historical data access operation of the user terminal and state information of each data processing node based on the data storage request, determining a plurality of first target data storage nodes corresponding to the data storage request from the at least one data storage node according to the historical data access operation of the user terminal and state information of each data storage node based on the data storage request, wherein the first target data processing node is used for segmenting data to be stored corresponding to the data storage request, generating a plurality of data segments to be stored corresponding to the data to be stored, encrypting the plurality of data segments to be stored, and storing a plurality of encrypted data segments to be stored;

Receiving a data reading request initiated by a user terminal, determining a second target data processing node corresponding to the data storage request from the at least one data processing node according to historical data access operation of the user terminal and state information of each data processing node based on the data reading request, determining a plurality of second target data storage nodes corresponding to the data reading request from the at least one data storage node based on storage path information of data to be read corresponding to the data reading request, wherein the plurality of second target data storage nodes are used for reading a plurality of encrypted data segments corresponding to the data reading request and sending the encrypted data segments to the second target data processing node, and the second target data processing node is used for decrypting and splicing the plurality of encrypted data segments corresponding to the data reading request, generating data to be read corresponding to the data reading request and feeding the data back to the storage scheduling node.