CN108900626B - Data storage method, device and system in cloud environment - Google Patents

Data storage method, device and system in cloud environment Download PDF

Info

Publication number
CN108900626B
CN108900626B CN201810792783.XA CN201810792783A CN108900626B CN 108900626 B CN108900626 B CN 108900626B CN 201810792783 A CN201810792783 A CN 201810792783A CN 108900626 B CN108900626 B CN 108900626B
Authority
CN
China
Prior art keywords
data
data node
ith
load rate
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810792783.XA
Other languages
Chinese (zh)
Other versions
CN108900626A (en
Inventor
卢莹
毋涛
贾智宇
王智明
刘畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201810792783.XA priority Critical patent/CN108900626B/en
Publication of CN108900626A publication Critical patent/CN108900626A/en
Application granted granted Critical
Publication of CN108900626B publication Critical patent/CN108900626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a data storage method in a cloud environment, which comprises the following steps: receiving a request for applying for storing data sent by a client; inquiring the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing the data; the ith data node is a data node with the ith bit arranged in the residual storage space of all the data nodes; returning the residual storage space of the ith data node to the client, so that the client can send the data block to the ith data node according to the residual storage space of the ith data node; comparing whether the first I/O load rate is smaller than a set threshold value; and if the first I/O load rate is smaller than the set threshold, sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client. When the client stores the file, the residual storage space and the I/O load rate of the data node are considered at the same time, and the performance of the storage system is improved.

Description

Data storage method, device and system in cloud environment
Technical Field
The invention belongs to the field of cloud storage, and particularly relates to a data storage method, device and system in a cloud environment.
Background
In recent years, with the rapid development of the performance of basic devices such as processors and networks, the amount of data to be processed is rapidly increased, and at the same time, the performance of mass storage systems is also challenged. In today's storage domain, especially for the great trend towards virtualization, the revolution in storage technology can be described as "subversion", the most important of which is the introduction of distributed storage technology.
HDFS is a distributed storage technology designed to address mass data storage. And carrying out fault tolerance and load balancing on the data by adopting a multi-copy strategy, wherein each copy is a block, the stored data is copied and randomly stored in a plurality of nodes of the cluster by taking the block as a unit. The HDFS has the advantages of high fault tolerance, high expansion, low cost and the like.
The selection of the HDFS storage nodes is random, which may cause the storage devices with superior performance to process fewer tasks and the storage devices with low performance to overload, and affect the performance of the overall storage system. Meanwhile, the data blocks are unevenly distributed in the cluster, and when the load is balanced, the HDFS randomly selects the data blocks in the data nodes, so that the data blocks with high access frequency are concentrated on some data nodes, which may cause the overload of the data nodes, and greatly reduce the performance of the storage system.
Disclosure of Invention
The application provides a data storage method in a cloud environment, which aims to solve the problem that the performance of a storage system in the existing cloud environment is low. The application additionally provides a data storage device under the cloud environment, and a data storage system under the cloud environment.
The application provides a data storage method in a cloud environment, which comprises the following steps:
receiving a request for applying for storing data sent by a client;
inquiring the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing data; the ith data node is a data node with the residual storage space arranged at the ith bit in all the data nodes;
returning the residual storage space of the ith data node to the client, so that the client can send a data block to the ith data node according to the residual storage space of the ith data node;
comparing whether the first I/O load rate is smaller than a set threshold value;
and if the first I/O load rate is smaller than the set threshold, sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client.
Optionally, the first I/O load rate P of the ith data nodeiCalculated by the following formula:
Pi=ω1Oi2Ri3ti
wherein O isiRepresenting the actual reading and writing speed of the ith data node after normalization; riRepresenting the actual throughput of the ith data node after normalization; t is tiRepresenting the actual waiting time of the ith data node after normalization; omega1、ω2、ω3Weighted values of read-write speed, throughput and latency, respectively, and Σ ωi(i=1,2,3)=1。
Optionally, the actual read-write speed O of the ith data node after normalizationiCalculated by the following formula:
Figure BDA0001735301430000021
wherein
Figure BDA0001735301430000022
IiThe actual reading and writing speed of the ith data node is obtained;
the actual throughput R of the ith data node after normalizationiCalculated by the following formula:
Figure BDA0001735301430000023
wherein
Figure BDA0001735301430000024
HiThe actual throughput of the ith data node;
the actual waiting time t of the ith data node after normalizationiCalculated by the following formula:
Figure BDA0001735301430000031
wherein
Figure BDA0001735301430000032
TiThe actual latency for the ith data node.
Optionally, the set threshold Q is calculated by the following formula:
Figure BDA0001735301430000033
optionally, after the step of comparing whether the first I/O load rate is smaller than a set threshold, the method further includes:
and if the first I/O load rate is larger than or equal to the set threshold, adding 1 to I, and continuing to execute the step of inquiring the residual storage space and the first I/O load rate of the ith data node from the residual storage spaces and the data of the I/O load rates of all the data nodes according to the request for applying for storing data.
Optionally, if the first I/O load rate is smaller than the set threshold, after the step of sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client, the method further includes:
sending a request for acquiring a second I/O load rate to the ith data node;
receiving the second I/O load rate after the data block is stored by the ith data node, wherein the second I/O load rate is calculated by the ith data node;
comparing whether the second I/O load rate is smaller than the set threshold value;
if the second I/O load rate is smaller than the set threshold, sending a request for acquiring the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
receiving data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node;
and updating the data of the residual storage space and the I/O load rate of the ith data node, and returning to the step of receiving the request for applying for storing the data sent by the client.
Optionally, after the step of comparing whether the second I/O load factor is smaller than the set threshold, the method further includes:
if the second I/O load rate is larger than or equal to the set threshold, sending a call-out instruction to the ith data node to allow the ith data node to call out the data block;
and adding 1 to the I, and continuously executing the step of inquiring the residual storage space and the first I/O load rate of the ith data node from the residual storage spaces and the I/O load rates of all the data nodes according to the request for applying to store the data.
The present application further provides a data storage device in a cloud environment, including:
the receiving module is used for receiving a request for applying for storing data sent by a client;
the query module is used for querying the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing the data; the ith data node is a data node with the residual storage space arranged at the ith bit in all the data nodes;
a sending module, configured to return the remaining storage space of the ith data node to the client, so that the client sends a data block to the ith data node according to the remaining storage space of the ith data node;
the first comparison module is used for comparing whether the first I/O load rate is smaller than a set threshold value or not;
and the storage module is used for sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client if the first I/O load rate is smaller than the set threshold.
Optionally, the query module is further configured to add 1 to I if the first I/O load rate is greater than or equal to the set threshold, and continue to operate the query module to query the remaining storage space and the first I/O load rate of the ith data node from the remaining storage spaces and the I/O load rates of all data nodes according to the request for applying to store data.
Optionally, the apparatus further includes: the second comparison module and the updating module;
the second comparing module is used for comparing whether the second I/O load rate is smaller than the set threshold value;
the updating module is used for updating the data of the residual storage space and the I/O load rate of the ith data node and returning to the step of receiving the request for applying for storing the data sent by the client;
the sending module is further configured to send a request for obtaining a second I/O load rate to the ith data node; if the second I/O load rate is smaller than the set threshold, sending a request for acquiring the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
the receiving module is further configured to receive the second I/O load rate after the data block is stored by the ith data node, where the second I/O load rate is calculated by the ith data node; and receiving the data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node.
Optionally, the apparatus further includes: calling out a module;
the calling-out module is configured to send a calling-out instruction to the ith data node to call out the data block by the ith data node if the second I/O load rate is greater than or equal to the set threshold;
and the query module is further used for performing 1 addition processing on the I, and continuously operating the query module to query the residual storage space and the part of the first I/O load rate of the ith data node from the residual storage spaces and the I/O load rates of all the data nodes according to the request for applying for storing data.
The present application further provides a data storage system in a cloud environment, including: the metadata server, client and data node of claims 8-11;
the client is used for sending a request for applying for storing data to the metadata server; receiving the residual storage space of the ith data node sent by the metadata server; sending a data block to the ith data node according to the residual storage space of the ith data node;
the data node is used for receiving the data block sent by the client; and receiving a storage instruction sent by the metadata server to store the data block sent by the client.
The embodiment of the application improves the existing distributed storage technology in the cloud environment, so that the selection of the data nodes is targeted, the task allocation of the storage equipment is balanced, the condition that the storage equipment with superior performance processes fewer tasks and the storage equipment with low performance is overloaded is avoided, the data nodes with large residual space of the data nodes are selected for storage, and the performance of the whole storage system is improved. And meanwhile, the load rate of each data node is controlled, so that the load rates of different data nodes do not exceed a set threshold, the condition that the I/O load rates of certain data nodes are too high is avoided, the file access time is reduced, and the operating efficiency of the storage system is improved.
Drawings
Fig. 1 is a flowchart of a data storage method in a cloud environment according to a first embodiment of the present application;
fig. 2 is a flowchart of a data storage method in a cloud environment according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a data storage device in a cloud environment according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a data storage device in a cloud environment according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of a data storage system in a cloud environment according to a fifth embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The data storage method, the device and the system under the cloud environment are achieved under the cooperation of a client, a metadata server and data nodes. The following detailed description is made with reference to the drawings of the embodiments provided in the present application, respectively.
A data storage method in a cloud environment provided in a first embodiment of the present application is as follows:
an execution subject of the embodiment of the present application is a metadata server, and as shown in fig. 1, it shows a flowchart of a data storage method in a cloud environment provided by the embodiment of the present application, and includes the following steps.
Step S101, receiving a request for applying for storing data sent by a client.
Step S102, inquiring the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing data; the ith data node is a data node with the ith bit arranged in the residual storage space of the residual storage spaces of all the data nodes.
Step S103, returning the residual storage space of the ith data node to the client, so that the client can send a data block to the ith data node according to the residual storage space of the ith data node.
Step S104, comparing whether the first I/O load rate is smaller than a set threshold value, if so, executing step S105; if not, go to step S106.
Step S105, if the first I/O load rate is smaller than the set threshold, sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client, and ending the process.
In step S106, 1 is added to i, and the process proceeds to step S102.
In today's storage domain, especially for the great trend towards virtualization, the revolution in storage technology can be described as "subversion", the most important of which is the introduction of distributed storage technology. The cloud storage is a new concept extended and developed on the cloud computing (cloud computing) concept, is an emerging network storage technology, and refers to a system which integrates a large number of various types of storage devices in a network through application software to cooperatively work through functions such as cluster application, network technology or a distributed file system and the like, and provides data storage and service access functions to the outside. When the core of operation and processing of the cloud computing system is storage and management of a large amount of data, a large amount of storage devices need to be configured in the cloud computing system, and then the cloud computing system is converted into a cloud storage system, so that the cloud storage is the cloud computing system taking data storage and management as the core. Briefly, cloud storage is an emerging solution for putting storage resources on the cloud for human access. The user can conveniently access data at any time and any place through connecting to the cloud through any internet-connected device.
The embodiment of the application improves the existing distributed storage technology in the cloud environment, so that the selection of the data nodes is targeted, the task allocation of the storage equipment is balanced, the condition that the storage equipment with superior performance processes fewer tasks and the storage equipment with low performance is overloaded is avoided, the data nodes with large residual space of the data nodes are selected for storage, and the performance of the whole storage system is improved. And meanwhile, the load rate of each data node is controlled, so that the load rates of different data nodes do not exceed a set threshold, the condition that the I/O load rates of certain data nodes are too high is avoided, the file access time is reduced, and the operating efficiency of the storage system is improved.
A data storage method in a cloud environment provided in a second embodiment of the present application is as follows:
an execution subject of the embodiment of the present application is a metadata server, and as shown in fig. 2, it shows a flowchart of a data storage method in a cloud environment provided by the embodiment of the present application, and includes the following steps.
Step S201, receiving a request for applying for storing data sent by a client.
Referring to fig. 5, a data storage system (distributed storage architecture) in a cloud environment is composed of three parts: a client 2, a metadata server 1 and a data node 3 (data server). The client 2 is responsible for sending read-write requests and caching file metadata and file data. The metadata server is responsible for collecting and storing load information, regulating and controlling the overall load information and processing the request of the client, and is a core component of the whole system. Cloud storage is only a server, a metadata server, and can store any type of data. The data node 3 is responsible for storing file data, automatically managing and maintaining self load information, mainly monitoring self residual storage space and I/O load rate, and feeding back related information to the metadata server in real time.
The step is used for receiving a request for applying for storing data sent by a client, and after receiving the request for applying for storing data sent by the client, the metadata server 1 queries corresponding data according to the request for applying for storing data.
Step S202, inquiring the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing data; the ith data node is a data node with the ith bit arranged in the residual storage space of the residual storage spaces of all the data nodes.
The metadata server 1 comprises a storage space collection module, an I/O information collection module and a load information management module. The data node 3 comprises a storage space monitoring module and an I/O load monitoring module. The storage space information is pre-stored in the storage space collection module of the metadata server 1, and the remaining storage space and the storage space utilization rate of all the data nodes are recorded. And arranging the data nodes in the storage space information from large to small according to the residual storage space. When the storage space monitoring module in the data node 3 finds that the remaining storage space of the current data node changes, the storage space collecting module updates the remaining storage space of the current data node and the utilization rate of the storage space by acquiring the latest storage space information of the current data node.
And querying the residual storage space of the ith data node from the storage space information in the storage space collection module according to the request for applying for storing data. The data nodes in the storage space information are arranged from large to small according to the remaining storage space, and the ith data node is a data node with the ith-position remaining storage space in the remaining storage spaces of all the data nodes, for example, i is 1, that is, the data node with the largest remaining storage space in all the data nodes. Correspondingly, the (i +1) th data node is a data node with the storage space remaining among the storage spaces remaining among all the data nodes ranked at the (i +1) th bit, for example, i is 1, and i +1 is 2, that is, a data node with the storage space remaining among all the data nodes ranked at the second bit.
I/O load information, i.e., I/O load rates of all data nodes, is pre-existing in the I/O information collection module of the metadata server 1. The I/O information collection module periodically sends I/O information requests to the data nodes 3, an I/O load monitoring module in the data nodes 3 monitors the I/O load rate of the current data nodes, feeds the latest I/O load rate of the current data nodes back to the metadata server 1, and updates the I/O load information of the current data nodes.
And inquiring the first I/O load rate of the ith data node from the I/O load information in the I/O information collection module according to the request for applying for storing data. The first I/O load rate refers to an I/O load rate before the ith data node receives the data block, and correspondingly, the second I/O load rate refers to an I/O load rate after the ith data node receives the data block.
The step is used for inquiring the residual storage space and the first I/O load rate of the ith data node from the pre-existing storage space information and I/O load information of the metadata server according to the request for applying for storing data so as to be used for further judgment, thereby storing data.
Preferably, said first I/O load rate P of said ith data nodeiCalculated by the following formula:
Pi=ω1Oi2Ri3ti
wherein O isiRepresenting the actual reading and writing speed of the ith data node after normalization; riRepresenting the actual throughput of the ith data node after normalization; t is tiRepresenting the actual waiting time of the ith data node after normalization; omega1、ω2、ω3Weighted values of read-write speed, throughput and latency, respectively, and Σ ωi(i=1,2,3)=1。
Preferably, the actual read-write speed O of the ith data node after the normalizationiCalculated by the following formula:
Figure BDA0001735301430000101
wherein
Figure BDA0001735301430000102
Equal to the arithmetic mean of the actual read and write speeds of the n data nodes.
Figure BDA0001735301430000103
s is equal to the variance of the read-write speed of the ith data node. I isiThe actual reading and writing speed of the ith data node is obtained;
the actual throughput R of the ith data node after normalizationiCalculated by the following formula:
Figure BDA0001735301430000104
wherein
Figure BDA0001735301430000105
Equal to the arithmetic mean of the actual throughputs of the n data nodes.
Figure BDA0001735301430000106
s' is equal to the variance of the throughput of the ith data node. HiThe actual throughput of the ith data node;
the actual waiting time t of the ith data node after normalizationiCalculated by the following formula:
Figure BDA0001735301430000107
wherein
Figure BDA0001735301430000108
Equal to the arithmetic mean of the actual latencies of the n data nodes.
Figure BDA0001735301430000109
s "is equal to the variance of the latency of the ith data node. T isiThe actual latency for the ith data node.
The influence factors of the I/O load rate of the data nodes mainly comprise read-write speed, throughput, waiting time of data read-write operation and the like per second, the factors are respectively standardized, the data nodes are uniformly metered, and the actual read-write speed I of the ith data node after normalization is obtainediThroughput RiWaiting time Ti. So that Ii、Ri、TiThe values of (A) are all less than 1, which is convenient for calculation. Omega1、ω2、ω3The weighted values are respectively the read-write speed, the throughput and the waiting time, the larger the weighted value is, the larger the influence of the influence factor on the I/O load rate of the ith data node is, and the sigma omegai(i=1,2,3)=1。
Preferably, the set threshold Q is calculated by the following formula:
Figure BDA0001735301430000111
is preset withAnd setting a threshold Q of the I/O load rate, wherein the threshold Q is equal to the arithmetic mean value of the I/O load rates of the n data nodes. For judging Q and PiBy judging Q and PiFurther, whether to store data is obtained.
Step S203, returning the remaining storage space of the ith data node to the client, so that the client sends the data block to the ith data node according to the remaining storage space of the ith data node.
After the residual storage space of the ith data node is inquired, the metadata server sends the residual storage space to the client, and the client sends the data block to the ith data node according to the inquired residual storage space of the ith data node. For example, the client sends the data block to the 1 st data node, i.e. the data node with the largest remaining storage space.
Step S204, comparing whether the first I/O load rate is smaller than a set threshold value, if so, executing step S205; if not, step S213 is executed.
Step S205, sending a storage instruction to the ith data node, so that the ith data node stores the data block sent by the client.
And the client sends the data block to the ith data node, and simultaneously compares whether the inquired first I/O load rate is smaller than a set threshold value. If so, Pi<And Q, indicating that the I/O load rate of the ith data node does not reach the set threshold value, and the I/O load rate of the ith data node is smaller, so that a new data block can be stored. And sending a storage instruction to the ith data node to enable the ith data node to store the data block sent by the client.
If not, Pi>Q, the data block can not be placed in the data node, and the data block is stored in the next data node (i +1) in the residual storage space information, namely the data node with the residual storage space next to the ith data node, until P of a certain data nodei<And Q. Adding 1 to I, and according to the request for storing data, obtaining the data of residual storage space and I/O load rate of all data nodesAnd querying the remaining storage space and the first I/O load rate of the (I +1) th data node. The subsequent steps are sequentially executed based on the (i +1) th data node according to the step of the ith data node until one data node is found to meet the requirement of Pi<Condition of Q.
Step S206, sending a request for obtaining a second I/O load rate to the ith data node.
Step S207, receiving the second I/O load rate after the data block is stored by the ith data node calculated by the ith data node.
After the ith data node stores a new data block, the I/O load rate changes, and in order to balance the I/O load of each data node, the I/O load rate after the data node stores the data block needs to be counted, and at this time, the I/O load rate of the ith data node is P'. An I/O information collection module of the metadata server 1 sends a request for obtaining a second I/O load rate to an ith data node, an I/O load monitoring module of the ith data node monitors the current I/O load rate of the ith data node, and feeds back the latest second I/O load rate of the current ith data node to the metadata server 1.
Step S208, comparing whether the second I/O load rate is smaller than the set threshold value, and if so, executing step S209; if not, go to step S212.
Step S209, sending a request for obtaining the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
step S210, receiving the data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node;
and step S211, updating the data of the residual storage space and the I/O load rate of the ith data node, continuing to execute the step S102, and ending the process.
If Pi<And Q, placing the data block at the ith data node, and counting the I/O load rate of the ith data node at the moment to be P'.
If P' < Q, the data block may be stored at the ith data node. An I/O information collection module of the metadata server 1 sends a request for obtaining an I/O load rate to an ith data node, an I/O load monitoring module of the ith data node monitors the I/O load rate of the current ith data node, and feeds back the latest I/O load rate of the current ith data node to the metadata server 1.
And a storage space collection module of the metadata server 1 sends a request for obtaining the residual storage space of the ith data node to the ith data node, and when the storage space monitoring module in the ith data node finds that the residual storage space of the current data node changes, the latest residual storage space of the current ith data node is fed back to the metadata server 1. And updating the data of the residual storage space and the I/O load rate of the current ith data node in the metadata server 1, ending the process, and then returning to the step of receiving the request for applying for storing the data sent by the client side to store the next data block.
Step S212, sending a call instruction to the ith data node to allow the ith data node to call the data block.
In step S213, 1 is added to i, and the process proceeds to step S202.
As the core parts of the existing network increase with the increase of the traffic volume and the rapid increase of the access volume and the data flow, the processing capacity and the computing intensity of the existing network also increase correspondingly, so that a single server device cannot bear the load at all. In this case, if the existing device is thrown away to perform a large amount of hardware upgrade, the existing resources will be wasted, and if the next service volume is to be upgraded, the hardware upgrade will be costly again, and even the device with excellent performance cannot meet the requirement of the increase of the current service volume.
A cheap, effective and transparent method derived for this situation is Load balancing (Load Balance), which is a technology to expand the bandwidth of existing network devices and servers, increase the throughput, enhance the network data processing capability, and improve the flexibility and availability of the network.
Load balancing, also known as load sharing, refers to dynamically adjusting the load condition in the system to eliminate or reduce the phenomenon of unbalanced load of each node in the system as much as possible. The specific implementation method is to transfer the tasks on the overloaded nodes to other light-load nodes, and to realize the load balance of each node of the system as much as possible, thereby improving the throughput of the system. Load sharing is beneficial to overall management of various resources in the distributed system, and is convenient for expanding the processing capacity of the system by utilizing shared information and a service mechanism thereof.
In this embodiment, if Pi<And Q, placing the data block at the ith data node, and counting the I/O load rate of the ith data node at the moment to be P'.
If P' > Q, it indicates that the I/O load of the ith data node is too heavy after the data block is added, and the overall performance is affected. Therefore, a call-out instruction is sent to the ith data node to allow the ith data node to call out the data block, and the data block is stored in the next data node (i +1) in the remaining storage space information, namely, the data node with the remaining storage space next to the ith data node, until P' < Q of a certain data node, so as to achieve load balance.
And adding 1 to the I, and inquiring the residual storage space and the first I/O load rate of the (I +1) th data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing the data. The subsequent steps are sequentially executed based on the (i +1) th data node according to the step of the ith data node until a data node is found to meet the condition of P' < Q. The method can ensure that the I/O load rate of the data node of each storage data block is below a set threshold, avoid the situation that the I/O load rate of the data node is too high, and improve the load balance of the system.
After the storage of the client data is completed, the I/O load rate of the data nodes changes with the change of the frequency of the client for data access, and at this time, dynamic load balancing may be performed according to the above method to balance the I/O access amount of each data node. If the I/O load rate of a certain data node is overhigh in a certain period of time, one or a plurality of data blocks with high access amount in the data node are called out from the data node to carry out the load balancing process. Putting the called data block into the (I +1) th data node of the residual storage space information table in the metadata memory, and then comparing the first I/O load rate P of the data nodeiAnd if the relation with the threshold Q and the relation between the second I/O load rate P' of the data block after being placed in the data node and the threshold Q are smaller than the threshold Q, the data block can be placed in the data node until the I/O load rate is smaller than the threshold Q, and the load balancing process is finished.
The embodiment of the application is improved aiming at the existing distributed storage technology in the cloud environment, load balancing is carried out at the initial stage of client data storage, meanwhile, the storage space and the I/O load condition of the data nodes are considered, the data nodes with large storage node space are selected to try to store, then whether the I/O load rate of the current data nodes is smaller than the set threshold value is considered, and if the conditions are met, the user data blocks are stored in the current data nodes. After the storage of the client is finished, the access rate of the storage file is changed, the data node has the condition of overhigh I/O load rate, and at the moment, part of data blocks with overhigh load rate are dynamically dispatched to the low-load data node, so that the overall performance of the storage system is ensured.
The selection of the data nodes is targeted, the task allocation of the storage device is balanced, the data nodes with large residual space of the data nodes are selected for storage, and the performance of the whole storage system is improved. Meanwhile, the process of selecting the data blocks is also targeted, when the I/O load rate of the data nodes is too high, the data blocks with partial I/O load rate are dynamically scheduled to the data nodes with low I/O load rate, the load balance of the system is improved, the file access time is reduced, and the operating efficiency of the system is improved.
It should be noted that, the order of the steps provided in the embodiments of the present application may be changed as necessary, or some of the steps may be deleted as necessary, all of which are within the scope of the present application. The second embodiment is only an example, and is an embodiment with a smaller protection scope compared with the first embodiment.
A data storage device in a cloud environment provided by a third embodiment of the present application is as follows:
in the foregoing first embodiment, a data storage method in a cloud environment is provided, and correspondingly, the present application further provides a data storage device in a cloud environment.
Since the embodiment of the apparatus is basically similar to the embodiment of the method, the description is relatively simple, and for related parts, reference may be made to the corresponding description of the embodiment of the data storage method in the cloud environment provided above. The device embodiments described below are merely illustrative and will be described below with reference to the accompanying drawings.
Fig. 3 is a schematic structural diagram of a data storage device in a cloud environment according to an embodiment of the present application, and includes the following modules.
A receiving module 11, configured to receive a request for applying for storing data sent by a client;
the query module 12 is configured to query the remaining storage space and the first I/O load rate of the ith data node from the data of the remaining storage spaces and the I/O load rates of all data nodes according to the request for applying for storing data; the ith data node is a data node with the residual storage space arranged at the ith bit in all the data nodes;
a sending module 13, configured to return the remaining storage space of the ith data node to the client, so that the client sends a data block to the ith data node according to the remaining storage space of the ith data node;
a first comparing module 14, configured to compare whether the first I/O load rate is smaller than a set threshold;
and the storage module 15 is configured to send a storage instruction to the ith data node to enable the ith data node to store the data block sent by the client if the first I/O load rate is smaller than the set threshold.
A data storage device in a cloud environment provided by a fourth embodiment of the present application is as follows:
optionally, as shown in fig. 4, a schematic structural diagram of the data storage device in the cloud environment provided in the embodiment of the present application is shown, and the embodiment of the present application is explained based on the third embodiment.
The query module 12 is further configured to, if the first I/O load rate is greater than or equal to the set threshold, add 1 to I, and continue to operate the query module to query the remaining storage space and the first I/O load rate of the ith data node from the remaining storage spaces and the I/O load rates of all data nodes according to the request for applying to store data.
Optionally, as shown in fig. 4, the data storage device in the cloud environment further includes: a second comparison module 16 and an update module 17;
the second comparing module 16 is configured to compare whether the second I/O load rate is smaller than the set threshold;
the updating module 17 is configured to update the data of the remaining storage space and the I/O load rate of the ith data node, and return to the step of receiving the request for applying for storing data sent by the client;
the sending module 13 is further configured to send a request for obtaining a second I/O load rate to the ith data node; if the second I/O load rate is smaller than the set threshold, sending a request for acquiring the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
the receiving module 11 is further configured to receive the second I/O load rate calculated by the ith data node after the data block is stored in the ith data node; and receiving the data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node.
Optionally, as shown in fig. 4, the data storage device in the cloud environment further includes: a call-out module 18;
the call-out module 18 is configured to send a call-out instruction to the ith data node to call out the data block for the ith data node if the second I/O load rate is greater than or equal to the set threshold;
the query module 12 is further configured to add 1 to I, and continue to operate the query module to query the remaining storage space and the first I/O load rate of the ith data node from the remaining storage spaces and the I/O load rates of all data nodes according to the request for applying for storing data.
It should be noted that, the modules provided in the embodiments of the present application may be changed in order as necessary, and some modules may be deleted as necessary, all of which are within the scope of the present application. The fourth embodiment is only an example, and is an embodiment with a smaller protection scope compared with the third embodiment.
A data storage system in a cloud environment provided in a fifth embodiment of the present application is as follows:
in the foregoing embodiment, a data storage method and apparatus in a cloud environment are provided, and correspondingly, the present application also provides a data storage system in a cloud environment, which is described below with reference to the accompanying drawings.
As shown in fig. 5, a schematic structural diagram of a data storage system in a cloud environment provided in an embodiment of the present application is shown, where the system includes: metadata server 1, client 2, data node 3.
The metadata server 1 may adopt the data storage apparatus in the cloud environment provided in the third embodiment or the fourth embodiment, and a description thereof will not be repeated here.
The client 2 is configured to send a request for applying for storing data to the metadata server; receiving the residual storage space of the ith data node sent by the metadata server; sending a data block to the ith data node according to the residual storage space of the ith data node;
the data node 3 is configured to receive a data block sent by the client; and receiving a storage instruction sent by the metadata server to store the data block sent by the client.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. A data storage method in a cloud environment is applied to a metadata server and comprises the following steps:
receiving a request for applying for storing data sent by a client;
inquiring the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing data; the ith data node is a data node with the residual storage space arranged at the ith bit in all the data nodes;
returning the residual storage space of the ith data node to the client, so that the client can send a data block to the ith data node according to the residual storage space of the ith data node;
comparing whether the first I/O load rate is smaller than a set threshold value;
if the first I/O load rate is smaller than the set threshold, sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client;
if the first I/O load rate is smaller than the set threshold, after the step of sending a storage instruction to the ith data node for the ith data node to store the data block sent by the client, the method further includes:
sending a request for acquiring a second I/O load rate to the ith data node;
receiving the second I/O load rate after the data block is stored by the ith data node, wherein the second I/O load rate is calculated by the ith data node;
comparing whether the second I/O load rate is smaller than the set threshold value;
if the second I/O load rate is smaller than the set threshold, sending a request for acquiring the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
receiving data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node;
updating the data of the residual storage space and the I/O load rate of the ith data node, and returning to the step of receiving the request for applying for storing the data sent by the client;
after the step of comparing whether the second I/O load factor is smaller than the set threshold, the method further includes:
if the second I/O load rate is larger than or equal to the set threshold, sending a call-out instruction to the ith data node to allow the ith data node to call out the data block;
and adding 1 to the I, and continuously executing the step of inquiring the residual storage space and the first I/O load rate of the ith data node from the residual storage spaces and the I/O load rates of all the data nodes according to the request for applying to store the data.
2. The method for storing data in the cloud environment according to claim 1, wherein the first I/O load rate P of the ith data nodeiCalculated by the following formula:
Pi=ω1Oi2Ri3ti
wherein O isiRepresenting the actual reading and writing speed of the ith data node after normalization; riRepresenting the actual throughput of the ith data node after normalization; t is tiRepresenting the actual waiting time of the ith data node after normalization; omega1、ω2、ω3Weighted values of read-write speed, throughput and latency, respectively, and Σ ωi(i=1,2,3)=1。
3. The method for storing data in the cloud environment according to claim 2, wherein the actual read-write speed O of the ith data node after the normalizationiCalculated by the following formula:
Figure FDA0003148810330000021
wherein
Figure FDA0003148810330000022
IiThe actual reading and writing speed of the ith data node is obtained;
the actual throughput R of the ith data node after normalizationiCalculated by the following formula:
Figure FDA0003148810330000023
wherein
Figure FDA0003148810330000024
HiThe actual throughput of the ith data node;
the actual waiting time t of the ith data node after normalizationiCalculated by the following formula:
Figure FDA0003148810330000031
wherein
Figure FDA0003148810330000032
TiThe actual latency for the ith data node.
4. The method for storing data in the cloud environment according to claim 2, wherein the set threshold Q is calculated by the following formula:
Figure FDA0003148810330000033
5. the method for storing data in a cloud environment according to claim 1, wherein after the step of comparing whether the first I/O load rate is smaller than a set threshold, the method further comprises:
and if the first I/O load rate is larger than or equal to the set threshold, adding 1 to I, and continuing to execute the step of inquiring the residual storage space and the first I/O load rate of the ith data node from the residual storage spaces and the data of the I/O load rates of all the data nodes according to the request for applying for storing data.
6. A data storage device in a cloud environment, comprising:
the receiving module is used for receiving a request for applying for storing data sent by a client;
the query module is used for querying the residual storage space and the first I/O load rate of the ith data node from the data of the residual storage space and the I/O load rate of all the data nodes according to the request for applying for storing the data; the ith data node is a data node with the residual storage space arranged at the ith bit in all the data nodes;
a sending module, configured to return the remaining storage space of the ith data node to the client, so that the client sends a data block to the ith data node according to the remaining storage space of the ith data node;
the first comparison module is used for comparing whether the first I/O load rate is smaller than a set threshold value or not;
a storage module, configured to send a storage instruction to the ith data node if the first I/O load rate is smaller than the set threshold, so that the ith data node stores the data block sent by the client;
the sending module is further configured to send a request for obtaining a second I/O load rate to the ith data node; if the second I/O load rate is smaller than the set threshold, sending a request for acquiring the data of the residual storage space and the I/O load rate of the ith data node to the ith data node;
the receiving module is further configured to receive the second I/O load rate after the data block is stored by the ith data node, where the second I/O load rate is calculated by the ith data node; receiving data of the residual storage space and the I/O load rate of the ith data node returned by the ith data node;
the second comparison module is used for comparing whether the second I/O load rate is smaller than the set threshold value or not;
an updating module, configured to update data of the remaining storage space and the I/O load rate of the ith data node, and return to the step of receiving a request for applying for storing data sent by the client;
a call-out module, configured to send a call-out instruction to the ith data node to call out the data block for the ith data node if the second I/O load rate is greater than or equal to the set threshold;
and the query module is further used for performing 1 addition processing on the I, and continuously operating the query module to query the residual storage space and the part of the first I/O load rate of the ith data node from the residual storage spaces and the I/O load rates of all the data nodes according to the request for applying for storing data.
7. The data storage device under the cloud environment of claim 6, wherein the query module is further configured to, if the first I/O load rate is greater than or equal to the set threshold, add 1 to I, and continue to run the query module to query the remaining storage space and the first I/O load rate of the ith data node from the remaining storage spaces and I/O load rates of all data nodes according to the request for storing data.
8. A data storage system in a cloud environment, comprising: the data storage device, the client and the data node in the cloud environment of any one of claims 6 to 7;
the client is used for sending a request for applying for storing data to the data storage device in the cloud environment; receiving the residual storage space of the ith data node sent by the data storage device in the cloud environment; sending a data block to the ith data node according to the residual storage space of the ith data node;
the data node is used for receiving the data block sent by the client; and receiving a storage instruction sent by a data storage device under the cloud environment to store the data block sent by the client.
CN201810792783.XA 2018-07-18 2018-07-18 Data storage method, device and system in cloud environment Active CN108900626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810792783.XA CN108900626B (en) 2018-07-18 2018-07-18 Data storage method, device and system in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810792783.XA CN108900626B (en) 2018-07-18 2018-07-18 Data storage method, device and system in cloud environment

Publications (2)

Publication Number Publication Date
CN108900626A CN108900626A (en) 2018-11-27
CN108900626B true CN108900626B (en) 2021-11-19

Family

ID=64350820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810792783.XA Active CN108900626B (en) 2018-07-18 2018-07-18 Data storage method, device and system in cloud environment

Country Status (1)

Country Link
CN (1) CN108900626B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110071964B (en) * 2019-03-26 2022-03-15 罗克佳华科技集团股份有限公司 File synchronization method, device, file sharing network, file sharing system and storage medium
US10908960B2 (en) * 2019-04-16 2021-02-02 Alibaba Group Holding Limited Resource allocation based on comprehensive I/O monitoring in a distributed storage system
CN110158430B (en) * 2019-05-08 2021-03-02 中铁北京工程局集团有限公司 Automatic napping laminating machine for bridge concrete surface
CN111124316B (en) * 2019-12-30 2023-12-19 青岛海尔科技有限公司 Storage space sharing method and device and computer readable storage medium
CN114745563B (en) * 2022-04-11 2024-01-30 中国联合网络通信集团有限公司 Method, device and system for processing live broadcast task by selecting edge computing node
CN114827180B (en) * 2022-06-22 2022-09-27 蒲惠智造科技股份有限公司 Distribution method of cloud data distributed storage
CN116339622B (en) * 2023-02-20 2023-11-14 深圳市数存科技有限公司 Data compression system and method based on block level

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760227A (en) * 2016-02-04 2016-07-13 中国联合网络通信集团有限公司 Method and system for resource scheduling in cloud environment
CN107948293A (en) * 2017-11-29 2018-04-20 重庆邮电大学 One kind is based on MongoDB load balance optimization system and methods
CN108196791A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Data access method and device based on multiple storage devices
CN108200151A (en) * 2017-12-29 2018-06-22 创新科存储技术(深圳)有限公司 ISCSI Target load-balancing methods and device in a kind of distributed memory system
CN108287666A (en) * 2018-01-16 2018-07-17 中国人民公安大学 Date storage method and device for cloud storage environment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080225714A1 (en) * 2007-03-12 2008-09-18 Telefonaktiebolaget Lm Ericsson (Publ) Dynamic load balancing
CN101710339B (en) * 2009-11-20 2012-02-01 中国科学院计算技术研究所 Method and system for controlling data storage in cluster file system and method for creating file
US10146584B2 (en) * 2016-01-28 2018-12-04 Ca, Inc. Weight adjusted dynamic task propagation
CN107436813A (en) * 2017-08-03 2017-12-05 郑州云海信息技术有限公司 A kind of method and system of meta data server dynamic load leveling

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760227A (en) * 2016-02-04 2016-07-13 中国联合网络通信集团有限公司 Method and system for resource scheduling in cloud environment
CN107948293A (en) * 2017-11-29 2018-04-20 重庆邮电大学 One kind is based on MongoDB load balance optimization system and methods
CN108196791A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Data access method and device based on multiple storage devices
CN108200151A (en) * 2017-12-29 2018-06-22 创新科存储技术(深圳)有限公司 ISCSI Target load-balancing methods and device in a kind of distributed memory system
CN108287666A (en) * 2018-01-16 2018-07-17 中国人民公安大学 Date storage method and device for cloud storage environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于对象存储的负载均衡存储策略;熊安萍 等;《计算机工程与设计》;20120731;第33卷(第7期);第0-3节 *

Also Published As

Publication number Publication date
CN108900626A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN108900626B (en) Data storage method, device and system in cloud environment
US10466899B2 (en) Selecting controllers based on affinity between access devices and storage segments
CN109218355B (en) Load balancing engine, client, distributed computing system and load balancing method
US10545921B2 (en) Metadata control in a load-balanced distributed storage system
CN109564528B (en) System and method for computing resource allocation in distributed computing
CN104679594B (en) A kind of middleware distributed computing method
CN110221920B (en) Deployment method, device, storage medium and system
US20120159115A1 (en) Software architecture for service of collective memory and method for providing service of collective memory using the same
CN108183947A (en) Distributed caching method and system
CN106534308B (en) Method and device for solving data block access hot spot in distributed storage system
CN110990154B (en) Big data application optimization method, device and storage medium
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
Hsieh et al. The incremental load balance cloud algorithm by using dynamic data deployment
CN114721818A (en) Kubernetes cluster-based GPU time-sharing method and system
WO2016065198A1 (en) High performance hadoop with new generation instances
CN104683480A (en) Distribution type calculation method based on applications
CN111858656A (en) Static data query method and device based on distributed architecture
CN110178119B (en) Method, device and storage system for processing service request
CN116302534A (en) Method, device, equipment and medium for optimizing performance of server storage equipment
CN110447019B (en) Memory allocation manager and method for managing memory allocation performed thereby
CN108228323B (en) Hadoop task scheduling method and device based on data locality
CN109388493A (en) A kind of method, apparatus and storage medium of the adjustment of cache partitions capacity
CN110955644A (en) IO control method, device, equipment and storage medium of storage system
CN114598706B (en) Storage system elastic expansion method based on Serverless function
US10992743B1 (en) Dynamic cache fleet management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant