CN110636122A - Distributed storage method, server, system, electronic device, and storage medium - Google Patents

Distributed storage method, server, system, electronic device, and storage medium Download PDF

Info

Publication number
CN110636122A
CN110636122A CN201910857800.8A CN201910857800A CN110636122A CN 110636122 A CN110636122 A CN 110636122A CN 201910857800 A CN201910857800 A CN 201910857800A CN 110636122 A CN110636122 A CN 110636122A
Authority
CN
China
Prior art keywords
cluster
storage
file
weight coefficient
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910857800.8A
Other languages
Chinese (zh)
Inventor
郭杨勇
王建
周英能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongchang (hangzhou) Information Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Zhongchang (hangzhou) Information Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongchang (hangzhou) Information Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Zhongchang (hangzhou) Information Technology Co Ltd
Priority to CN201910857800.8A priority Critical patent/CN110636122A/en
Publication of CN110636122A publication Critical patent/CN110636122A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the technical field of computers, and discloses a distributed storage method, a server, a system, electronic equipment and a storage medium. In the invention, a writing request of a file is received; acquiring a weight coefficient of each cluster in a storage cluster array; generating a weight coefficient according to the current storage resource of the cluster; determining a target cluster for writing in a file by utilizing a Hash algorithm according to the weight coefficient of each cluster; and writing the file into the target storage sub-cluster. Therefore, when the storage capacity needs to be enlarged, the reduction of the read-write performance of the storage cluster caused by data redistribution is avoided under the condition that the file uploading and downloading efficiency is not influenced, the performance experience of a user on the reading and writing of the cloud file is improved, and meanwhile, the operation and maintenance efficiency of later-stage operation and maintenance personnel on the cloud storage system can be improved by taking the cluster as a unit for capacity expansion through the whole framework.

Description

Distributed storage method, server, system, electronic device, and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a distributed storage method, a server, a system, electronic equipment and a storage medium.
Background
The cloud storage is a new concept extended and developed from a cloud computing concept, and mainly provides a system for data storage and service access functions together through the cooperative work of a large number of different types of storage devices in networks such as cluster application, network technology or distributed file systems through application software, so that the safety of data is ensured, and the storage space is saved. Common distributed storage systems at present comprise a GFS Google file system, a Lustre parallel distributed file system, a Ceph distributed file system, a GlusterFS network file system and the like. The Open-source Ceph is used as a reliable, extensible, uniform and distributed storage system solution, and is particularly driven by an Open Stack Open-source cloud computing management platform, so that the Ceph is pursued by various internet companies once entering the industry.
The resource scheduling method based on the Ceph distributed cluster provided by the prior art is mainly based on a single Ceph distributed cluster, and resource scheduling is performed in a space time-changing mode through combining writing of a HDD hard disk drive, increasing of SSD solid state drive cache and the like. For example, the read-write performance of a single Ceph cluster is optimized by adjusting a strong consistency scheme of read-write of the single Ceph distributed cluster, and adjusting the number of copies and a selection algorithm of distributed nodes.
However, in the course of implementing the present invention, the inventors found that: the prior art can only aim at performance experience of a single Ceph distributed cluster user during reading and writing, and does not consider how to expand resources on the premise of ensuring the user reading and writing performance experience when the single Ceph distributed cluster storage resources are low.
Disclosure of Invention
The embodiment of the invention aims to provide a distributed storage method, a server, a system, electronic equipment and a storage medium, so that when the storage capacity needs to be enlarged, the reduction of the read-write performance of a storage cluster caused by the expansion of cluster resources is avoided under the condition that the file uploading and downloading efficiency is not influenced, the performance experience of a user on the read-write of a cloud file is improved, meanwhile, the whole architecture takes the cluster as a unit for capacity expansion, and the operation and maintenance efficiency of later-stage operation and maintenance personnel on the cloud storage system can also be improved.
In order to solve the above technical problem, an embodiment of the present invention provides a distributed storage method, including: receiving a write request of a file; acquiring a weight coefficient of each cluster in a storage cluster array; generating a weight coefficient according to the current storage resource of the cluster; determining a target cluster for writing in a file according to the weight coefficient of each cluster; and writing the file into the target cluster.
An embodiment of the present invention further provides a server, including: the request receiving module is used for receiving a reading or writing request of a file; the computing module is used for acquiring a weight coefficient of each cluster in the storage cluster array and determining a target cluster for writing in a file by utilizing a Hash algorithm according to the weight coefficient; and the writing module is used for writing the file into the target cluster.
Compared with the prior art, the embodiment of the invention has the advantages that a plurality of distributed storage clusters form a storage cluster array, and the corresponding weight coefficients are set for the clusters according to the current storage resources of the clusters, so that the files are distributed to the clusters in a random probability manner by taking the weight coefficients as the random coefficients, the expansion or scheduling of the storage resources can be carried out on the basis of not influencing the read-write performance experience of users, and the operation and maintenance efficiency of later-stage operation and maintenance personnel on the whole distributed storage system can be improved.
In addition, the value of the weight coefficient comprises a default weight coefficient; the default weight coefficient is generated by a database management system in real time according to the current residual storage capacity of the cluster or the network speed of a machine room where the cluster is located; the default weight factor is positively correlated with the remaining storage capacity or the network speed. By means of the method, the weight coefficient can be produced according to the current residual storage capacity of the cluster, and the write-in strategy of the file can be adjusted by considering the influence of the network speed of the computer room on the read-write speed of the cloud storage file, so that the use experience of a user is better.
In addition, the weight coefficient further comprises a user weight coefficient; the user weight coefficient is set by a user according to actual conditions and is stored in the database management system; and when the weight coefficient of each cluster in the storage cluster array is obtained, the user weight coefficient is preferentially obtained. By means of the method, operation and maintenance personnel can perform user-defined setting on the distribution of the files needing to be stored currently according to actual conditions, so that the utilization of the storage space is more flexible, and more scenes can be dealt with.
In addition, the storage cluster array comprises a main cluster array and a standby cluster array; each cluster in the main cluster array and the standby cluster array respectively corresponds to each other; the target cluster is a cluster in the main cluster array. And when the target cluster fails, writing the file into a cluster in the standby cluster array corresponding to the target cluster. By such means, the reliability of the whole storage system can be enhanced, and the system can stably and reliably operate when accidents such as power failure of a machine room occur.
In addition, generating index information consisting of the identification of the file and the identification of the target cluster; storing the index information into an index cluster; wherein the index cluster is built from high-speed storage media. An index cluster is built through a high-speed medium, so that the response speed of a server to a user when the user reads a cloud storage file can be greatly improved, and the user experience is remarkably improved.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
Fig. 1 is a flowchart of a distributed storage method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a distributed storage method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a distributed storage method according to a third embodiment of the present invention;
FIG. 4 is a block diagram of a storage cluster array according to a third embodiment of the present invention;
fig. 5 is a block diagram of a server according to a fourth embodiment of the present invention;
fig. 6 is a structural diagram of a distributed storage system according to a fifth embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to a sixth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The first embodiment of the invention relates to a distributed storage method which is applied to a server. In the present embodiment, a write request of a file is received; acquiring a weight coefficient of each cluster in a storage cluster array; generating a weight coefficient according to the current storage resource of the cluster; determining a target cluster for writing in a file according to the weight coefficient of each cluster; and writing the file into the target cluster. The distributed storage clusters form a storage cluster array, and corresponding weight coefficients are set for the clusters according to the current storage resources of the clusters, so that files are distributed to the clusters in a random probability mode by taking the weight coefficients as the clusters, expansion or scheduling of the storage resources can be performed on the basis of not influencing the read-write performance experience of users, and meanwhile, the operation and maintenance efficiency of operation and maintenance personnel on the whole distributed storage system in the later period can be improved.
The following describes implementation details of the distributed storage method of the present embodiment in detail, and the following is provided only for easy understanding and is not necessary for implementing the present embodiment.
As shown in fig. 1, the distributed storage method in this embodiment specifically includes:
step 101, receiving a write request of a file.
Specifically, a user first needs to send a file uploading request to a proxy server based on a hypertext transfer protocol HTTP of a cluster through a client, and establish communication connection with the proxy server. The request received by the proxy server includes the ID of the file. The client is a native client of Ceph under various programming languages.
Step 102, obtaining a weight coefficient of each cluster in the storage cluster array.
In this embodiment, the cluster is a cluster built based on a Ceph distributed file system, and the storage cluster array is a cluster array composed of a plurality of Ceph distributed file system clusters. When the storage space of the cluster array needs to be expanded, the expansion is performed in units of one cluster.
Specifically, for the purpose of reasonably writing files into the distributed storage clusters and fully and uniformly utilizing cluster storage resources, each cluster has a weight coefficient generated according to the specific situation of the current storage resource. The storage resources include: the current remaining storage space of the cluster and the current network speed of the server room in which the cluster is located.
In specific application, the current storage resource information of each cluster is stored in a small relational database management system MySQL, the system can dynamically update data in real time, and the system generates the current weight coefficient of each cluster according to the current storage resource information of each cluster. After receiving the file request, the proxy server immediately requests the weight coefficient of each current cluster from MySQL and caches the weight coefficient in the server.
An example is illustrated:
cluster identification Current remaining storage capacity beta Weight coefficient alpha
1 20% 0.4
2 20% 0.4
3 10% 0.2
As shown in the above table, a storage cluster array includes three Ceph clusters with equal total storage space, but the current remaining storage capacity of each cluster is not equal, and when the operation and maintenance staff expect that the file written into the storage cluster array next time can make the space utilization rate of each cluster reach uniform distribution, we can set a simple formula for the process of generating the current weight coefficient of each cluster by MySQL:
Figure BDA0002198769660000041
based on the above example, when the storage space of the cluster array needs to be expanded, a new cluster "cluster 4" is added, the remaining storage capacity of the cluster is 100%, and therefore the weighting factor is given
Figure BDA0002198769660000042
In another example, when the total storage space of each cluster in a storage cluster array is not equal, the operation and maintenance personnel still expect that the written files can be uniformly distributed in each cluster, and the calculation can be performed according to the current remaining specific capacity of each cluster, instead of the ratio of the remaining storage capacity to the total storage capacity in the above example.
In another example, the plurality of Ceph clusters in this embodiment are deployed in different computer rooms, so that the reliability of the entire storage system is enhanced, and sudden reduction in the file read-write performance due to physical factors is avoided. For example: if network blocking occurs in a machine room where a certain cluster is located, that is, the data read-write performance of the cluster is reduced, MySQL can also reduce the weight coefficient of the cluster according to other rules or formulas set by the user, so that the user can have good performance experience on file read-write, and the maintenance burden of operation and maintenance personnel is reduced.
And 103, determining a target cluster for writing in the file by using a random distribution algorithm according to the weight coefficient of each cluster.
And 104, writing the file into the target cluster.
In this embodiment, the random distribution algorithm is a hash algorithm. Firstly, a hash value table is preset, wherein the hash value table comprises two types of data of hash values and nodes, and each hash value corresponds to one node. Based on the foregoing example, each cluster corresponds to a different number of nodes, where the number of nodes is proportional to the magnitude of the cluster weight coefficient. Assuming a scaling factor of 100, cluster 1 corresponds to 40 nodes, cluster 2 corresponds to 40 nodes, and cluster 3 corresponds to 20 nodes. If there is a new cluster, then 100 nodes corresponding to the new cluster are directly added. The algorithm takes a file ID as an input parameter to obtain a hash value, then obtains a cluster number in a table look-up mode, and takes the cluster number as an output result. By combining the random distribution algorithm with the weighting algorithm, the distribution of the written files on the storage space can be ensured to be in accordance with the weighting coefficients of each cluster on the whole.
Specifically, a file ID in the file write request is input into the hash algorithm, the algorithm returns an address, which is the number of the cluster used for writing the file, and then a data channel between the client and the Ceph cluster is established, and the file is transmitted and written into the corresponding cluster. Since the hash algorithm is a random algorithm, MySQL must monitor the storage capacity of each cluster in real time and dynamically generate a weight coefficient for each cluster to ensure the performance of the file when writing to the storage medium, and at the same time ensure that the capacity of each cluster is sufficient to accommodate the file.
The above examples in the present embodiment are for convenience of understanding, and do not limit the technical aspects of the present invention.
Compared with the prior art, the embodiment integrates a plurality of Ceph distributed storage clusters into a Ceph distributed storage cluster array, so that the problem that the storage resource expansion can not be performed when the storage resource of a single Ceph distributed storage cluster is insufficient can be solved, and meanwhile, because the expansion of the single Ceph distributed storage cluster is not needed, the data rebalancing operation does not need to be performed inside the single Ceph distributed storage cluster, the effect that the storage resource expansion user cannot sense is achieved, and the use experience of the user is improved; moreover, since the weighting weight is set for each Ceph distributed storage cluster in advance, when a file is written, only hash calculation needs to be performed according to the file ID and the weighting weight of each Ceph distributed storage cluster to obtain a specific write address of the file, and the response speed of the distributed storage system and the balance of the distribution of the file in the cluster are further improved.
A second embodiment of the present invention relates to a distributed storage method, and a flow is shown in fig. 2, where the method includes:
step 201, receiving a write request of a file. This step is similar to step 101 in the first embodiment of the present invention, and is not described herein again.
Step 202, reading the size of the file.
Step 203, obtaining a user weight coefficient of each cluster in the storage cluster array.
In this embodiment, the operation and maintenance staff can set the weight coefficient of each cluster according to the actual needs and store the weight coefficient in MySQL. When the weight coefficient generated by the MySQL and the user weight coefficient set by the user exist in the MySQL at the same time, the proxy server obtains the user weight coefficient as the weight coefficient used by the algorithm.
In practical application, the file volumes uploaded to the cloud by users have large differences, and the volume of a large file may be hundreds of times that of a small file. Because each cluster is built by a storage medium with a small capacity, the residual storage capacity of one cluster may not be enough to store a file with a large volume, so that a proper strategy needs to be set to ensure that the storage space can be reasonably used, and at the moment, a user can set weight coefficients of a plurality of schemes to adapt to files with various sizes. For example, when the remaining space capacity of cluster 1 in clusters 1, 2, and 3 is low, when a write request for a large file is received, a weighting factor scheme for the large file is adopted.
Specifically, a cluster weight coefficient scheme for the large-volume file is set, that is: the weight coefficient of the cluster which can not store the large file is set to be 0, and then the weight coefficient is reasonably distributed to each cluster with the capacity enough to store the file according to the conventional rule. The weight coefficient scheme for small volume files, namely: the weighting factors are normally assigned to all clusters according to the storage resources of each cluster.
In practical applications, the files can be divided into N classes according to the file volumes, where N is a natural number greater than 2, so as to make the use of the storage capacity more reasonable. When a file writing request is received by the proxy server, the volume size of the file is firstly obtained, and a proper weight coefficient scheme is obtained according to the volume size.
In another example, the read-write rate of the storage medium is decreased due to the increase of the space utilization rate, so that the operation and maintenance personnel can set a space utilization rate threshold value for each storage cluster according to the actual situation, and when the space utilization rate of a certain cluster reaches the threshold value, the weight coefficient of the cluster is automatically set to 0, so as to ensure that the read-write rate of each cluster is kept above the read-write rate level acceptable to a user.
In another example, when the storage cluster array adds a new cluster due to capacity expansion, the operation and maintenance personnel may set all the weighting coefficients of the old cluster to 0, and then equally distribute the weighting coefficients to the new clusters, so as to quickly achieve the purpose of storing data in the clusters in a balanced manner, thereby making the storage space in the storage cluster array more reasonable in utilization.
And step 204, determining a target cluster for writing in the file by using a random distribution algorithm according to the weight coefficient of each cluster. This step is similar to step 103 in the first embodiment of the present invention, and is not described herein again.
Step 205, writing the file into the target cluster. This step is similar to step 104 in the first embodiment of the present invention, and is not described herein again.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
Compared with the prior art, the operation and maintenance personnel can set the weight coefficients of different schemes according to actual conditions and perform custom setting on the distribution of the files needing to be stored currently, so that the storage space is more flexibly utilized, more scenes can be dealt with, and the user performance experience is improved.
A third embodiment of the present invention relates to a distributed storage method, and a flow is shown in fig. 3, where the method includes:
step 301, receiving a write request of a file. This step is similar to step 101 in the first embodiment of the present invention, and is not described herein again.
Step 302, obtaining a weight coefficient of each cluster in the storage cluster array. This step is similar to step 102 in the first embodiment of the present invention, and is not described herein again.
Step 303, determining a target cluster for writing in the file by using a random distribution algorithm according to the weight coefficient of each cluster. This step is similar to step 103 in the first embodiment of the present invention, and is not described herein again.
Step 304, sending a write request to the target cluster, and judging whether the target cluster is normal. If yes, go to step 305; if not, go to step 306.
Step 305, writing the file into a target cluster; and step 306, writing the file into a standby cluster corresponding to the target cluster in the standby cluster array.
Specifically, as shown in FIG. 4, the entire storage cluster array includes a primary cluster array and a backup cluster array. Wherein each cluster in the main cluster array has a corresponding standby cluster in the standby cluster array. When the agent server determines the cluster identifier of the target cluster through the ID of the file to be written by using an algorithm, firstly, a write request is sent to the cluster in the main cluster array corresponding to the cluster identifier, then, the cluster state information returned by the cluster array management server is waited, and if the cluster is in normal operation currently, the file uploaded by the user is written into the cluster. And if the cluster is in an abnormal state currently, sending a file writing request to a standby cluster corresponding to the cluster, and then writing the file uploaded by the user into the standby cluster corresponding to the cluster.
In practical application, the backup cluster array can be used as an emergency backup when a file is written in, and can also be used as a mirror image of the main cluster array when a user needs to read a cloud file.
Specifically, after the file is written into the target cluster, when each cluster is in an idle state, that is, when the data throughput of each cluster is at a low level, the cluster array management server copies the file into one mirror image and stores the mirror image in the standby cluster corresponding to the target cluster. Therefore, the reliability of the cloud file is improved under the condition that the reading and writing experience of a user is not influenced. When a certain cluster fails and cannot be read, a user can read the file images stored in the standby cluster.
Step 307, generating index information composed of the file ID and the cluster identifier, and storing the index information in the index cluster.
Specifically, the storage system is provided with an index cluster in which all storage media are high-speed storage media, and the high-speed storage media comprise: solid state drive SSD, dynamic random access memory. When the proxy server receives a request of a user for reading a cloud file, a corresponding cluster identifier is firstly inquired in an index cluster according to a file ID, and then file data are transmitted to a client of the user. Since the storage media in the index cluster are all high-speed storage media, the process of querying can be controlled in a very short time, and a better reading experience is provided for a user. Meanwhile, when the operation and maintenance personnel maintain the distributed storage system, the operation and maintenance personnel can more quickly acquire the specific storage address of each file, so that the operation and maintenance efficiency is improved.
Compared with the prior art, in the embodiment, the backup cluster array and the index cluster are arranged in the distributed storage system, so that the reliability of long-term storage of the files can be improved, the performance experience of users in reading the files can be improved, and the operation and maintenance efficiency of operation and maintenance personnel can be improved.
A fourth embodiment of the present invention relates to a server, which is configured as shown in fig. 5, and includes:
a request receiving module 501, configured to receive a read or write request of a file;
a calculating module 502, configured to obtain a weight coefficient of each cluster in the storage cluster array, cache the weight coefficient, and determine, according to the weight coefficient and a hash algorithm, a target cluster for writing a file;
in one embodiment, after determining the target cluster, the computing module first sends a write request to the target cluster, and then determines whether the target cluster is operating normally according to feedback from a management server of the storage cluster array. And when the target cluster is abnormal, sending a write-in request to a standby cluster corresponding to the target cluster in the standby cluster array.
A writing module 503, configured to write the file uploaded by the user into the target cluster determined by the algorithm.
In one embodiment, after the writing module writes the user file into the target cluster, the writing module generates index information of the file according to the file ID and the cluster number of the target cluster, and stores the index information into the index cluster.
Compared with the prior art, the server in the embodiment determines the specific storage address of the file by combining the hash algorithm with the weight coefficient, so that the overall distribution of the file conforms to the distribution of the weight coefficient, the space capacity of the distributed storage system is fully and reasonably utilized, the reduction of the read-write performance of the storage cluster caused by the redistribution of the data is avoided, and the performance experience of the user on the read-write of the cloud file is improved.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fifth embodiment of the present invention relates to a distributed storage system, and a flow is shown in fig. 6, including:
the server 601 according to the fourth embodiment of the present invention;
the storage cluster array 602 built by N storage clusters is used for storing files uploaded by users, where N is a natural number greater than 1.
In a specific example, the storage cluster array may include a main cluster array and a standby cluster array, where the number of clusters in the main cluster array and the number of clusters in the standby cluster array are equal and respectively correspond to each other. After the server determines the cluster identifier of the target cluster according to the ID of the file to be written by using an algorithm, firstly, a write request is sent to the cluster in the main cluster array corresponding to the cluster identifier, then, the cluster state information returned by the cluster array management server is waited, and if the cluster is in normal operation currently, the file uploaded by the user is written into the cluster. And if the cluster is in an abnormal state currently, sending a file writing request to a standby cluster corresponding to the cluster, and then writing the file uploaded by the user into the standby cluster corresponding to the cluster.
In a specific example, the backup cluster array may be used as an emergency backup when a file is written in, and may also be used as a mirror image of the primary cluster array when a user needs to read a cloud file. After the file is written into the target cluster, when each cluster is in an idle state, that is, when the data throughput of each cluster is at a lower level, the cluster array management server copies the file into a mirror image and stores the mirror image into the standby cluster corresponding to the target cluster. Therefore, the reliability of the cloud file is improved under the condition that the reading and writing experience of a user is not influenced. When a certain cluster fails and cannot be read, a user can read the file images stored in the standby cluster.
And the index cluster 603 is used for storing index information of the files.
Specifically, the storage media of the index cluster are all high-speed storage media, and the high-speed storage media include: solid state drive SSD, dynamic random access memory. When the proxy server receives a request of a user for reading a cloud file, a corresponding cluster identifier is firstly inquired in an index cluster according to a file ID, and then file data are transmitted to a client of the user. Since the storage media in the index cluster are all high-speed storage media, the process of querying can be controlled in a very short time, and a better reading experience is provided for a user. Meanwhile, when the operation and maintenance personnel maintain the distributed storage system, the operation and maintenance personnel can more quickly acquire the specific storage address of each file, so that the operation and maintenance efficiency is improved.
And a database management system 604 for storing and managing the capacity utilization information of each cluster in the storage cluster array and dynamically calculating and storing default weight coefficients according to the capacity utilization information.
In one embodiment, the database management system is MySQL, and the operation and maintenance staff can set the weight coefficient of each cluster according to the needs of actual situations and store the weight coefficient in MySQL. When the MySQL simultaneously has the weight coefficient generated by the MySQL and the user weight coefficient set by the user, the server preferentially obtains the user weight coefficient as the weight coefficient used by the algorithm.
Compared with the prior art, the embodiment integrates the plurality of Ceph distributed storage clusters into a Ceph distributed storage cluster array, so that the problem that the storage resource expansion can not be performed when the storage resource of a single Ceph distributed storage cluster is insufficient can be solved, and meanwhile, because the expansion of the single Ceph distributed storage cluster is not needed, the data rebalancing operation of the inside of the single Ceph distributed storage cluster is not needed, and the effect that the storage resource expansion user can not sense is achieved. Meanwhile, when the current storage space use conditions of the clusters are not balanced, different weight coefficients are set for the clusters, so that file data of the clusters can be distributed in a balanced manner after the subsequent files are written.
It should be noted that this embodiment is a system example corresponding to the first, second, and third embodiments, and may be implemented in cooperation with the first, second, and third embodiments. The related technical details mentioned in the first embodiment, the second embodiment and the third embodiment are still valid in the present embodiment, and are not described herein again in order to reduce the repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment, the second embodiment, and the third embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A sixth embodiment of the present invention relates to an electronic device, as shown in fig. 7, including at least one processor 701; and, a memory 702 communicatively coupled to the at least one processor 701; the memory 702 stores instructions executable by the at least one processor 701, and the instructions are executed by the at least one processor 701 to enable the at least one processor 701 to execute the distributed storage method according to the first, second, or third embodiment.
The memory 702 and the processor 701 are coupled by a bus, which may comprise any number of interconnecting buses and bridges that couple one or more of the various circuits of the processor 701 and the memory 702. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 701 is transmitted over a wireless medium through an antenna, which receives the data and transmits the data to the processor 701. The processor 701 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory 702 may be used for storing data used by the processor 701 in performing operations.
A seventh embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A distributed storage method, characterized in that, an application server includes:
receiving a write request of a file;
acquiring a weight coefficient of each cluster in a storage cluster array; the weight coefficient is generated according to the current storage resource of the cluster;
determining a target cluster for writing the file according to the weight coefficient of each cluster;
and writing the file into the target cluster.
2. The distributed storage method of claim 1,
the weight coefficients comprise default weight coefficients;
the default weight coefficient is generated by a database management system in real time according to the current residual storage capacity of the cluster and the network speed of a machine room where the cluster is located;
the default weight factor is positively correlated with the remaining storage capacity or the network speed.
3. The distributed storage method of claim 2,
the weight coefficients further comprise user weight coefficients; the user weight coefficient is set by a user according to actual conditions and is stored in the database management system;
when the weight coefficient of each cluster in the storage cluster array is obtained, judging whether the user weight coefficient exists or not;
and if so, acquiring the user weight coefficient.
4. The distributed storage method of claim 1,
the storage cluster array comprises a main cluster array and a standby cluster array;
each cluster in the main cluster array corresponds to each cluster in the standby cluster array;
the target cluster is a cluster in the main cluster array;
and when the target cluster fails, writing the file into a cluster in the standby cluster array corresponding to the target cluster.
5. The distributed storage method according to any one of claims 1 to 4, comprising, after said writing the file in the target cluster:
generating index information consisting of the identification of the file and the identification of the target cluster;
storing the index information into an index cluster; wherein the index cluster is built by a high-speed storage medium.
6. A server, comprising:
the request receiving module is used for receiving a writing request of a file;
the computing module is used for acquiring a weight coefficient of each cluster in the storage cluster array and determining the target cluster for writing the file according to the weight coefficient;
and the writing module is used for writing the file into the target cluster.
7. A distributed storage system, comprising:
the storage cluster array comprises N clusters, wherein N is a natural number greater than 1;
the server of claim 6, configured to store the user's uploaded file in a cluster of the storage cluster array.
8. The distributed storage system according to claim 7, further comprising:
the index cluster is used for storing index information of the files in the storage cluster array;
and the database management system is used for storing and managing the current storage resource information of each cluster in the storage cluster array and dynamically calculating and storing the default weight coefficient according to the storage resource information.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to perform the distributed storage method of any of claims 1 to 5.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the distributed storage method of any one of claims 1 to 5.
CN201910857800.8A 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device, and storage medium Pending CN110636122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910857800.8A CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910857800.8A CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
CN110636122A true CN110636122A (en) 2019-12-31

Family

ID=68971036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910857800.8A Pending CN110636122A (en) 2019-09-11 2019-09-11 Distributed storage method, server, system, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN110636122A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111131457A (en) * 2019-12-25 2020-05-08 上海交通大学 Capacity and bandwidth compromise method and system for heterogeneous distributed storage
CN111562884A (en) * 2020-04-28 2020-08-21 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111736762A (en) * 2020-05-21 2020-10-02 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111767250A (en) * 2020-06-10 2020-10-13 钛星投资(深圳)有限公司 Decentralized storage method, downloading method and storage system
CN112637327A (en) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 Data processing method, device and system
CN113110796A (en) * 2020-01-13 2021-07-13 顺丰科技有限公司 Data management method, device, server and storage medium
CN113721855A (en) * 2021-09-01 2021-11-30 中国建设银行股份有限公司 Storage method and device of storage resources, electronic equipment and computer storage medium
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023621A1 (en) * 2008-07-24 2010-01-28 Netapp, Inc. Load-derived probability-based domain name service in a network storage cluster
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN106527981A (en) * 2016-10-31 2017-03-22 华中科技大学 Configuration-based data fragmentation method for adaptive distributed storage system
CN108011929A (en) * 2017-11-14 2018-05-08 平安科技(深圳)有限公司 Data request processing method, apparatus, computer equipment and storage medium
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN108614837A (en) * 2016-12-13 2018-10-02 杭州海康威视数字技术股份有限公司 File stores and the method and device of retrieval
CN108763436A (en) * 2018-05-25 2018-11-06 福州大学 A kind of distributed data-storage system based on ElasticSearch and HBase
CN108875035A (en) * 2018-06-25 2018-11-23 郑州云海信息技术有限公司 The date storage method and relevant device of distributed file system
CN109343801A (en) * 2018-10-23 2019-02-15 深圳前海微众银行股份有限公司 Date storage method, equipment and computer readable storage medium
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023621A1 (en) * 2008-07-24 2010-01-28 Netapp, Inc. Load-derived probability-based domain name service in a network storage cluster
CN101997884A (en) * 2009-08-18 2011-03-30 升东网络科技发展(上海)有限公司 Distributed storage system and method
CN106527981A (en) * 2016-10-31 2017-03-22 华中科技大学 Configuration-based data fragmentation method for adaptive distributed storage system
CN108614837A (en) * 2016-12-13 2018-10-02 杭州海康威视数字技术股份有限公司 File stores and the method and device of retrieval
CN109597567A (en) * 2017-09-30 2019-04-09 网宿科技股份有限公司 A kind of data processing method and device
CN108011929A (en) * 2017-11-14 2018-05-08 平安科技(深圳)有限公司 Data request processing method, apparatus, computer equipment and storage medium
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system
CN108600316A (en) * 2018-03-23 2018-09-28 深圳市网心科技有限公司 Data managing method, system and the equipment of cloud storage service
CN108763436A (en) * 2018-05-25 2018-11-06 福州大学 A kind of distributed data-storage system based on ElasticSearch and HBase
CN108875035A (en) * 2018-06-25 2018-11-23 郑州云海信息技术有限公司 The date storage method and relevant device of distributed file system
CN109343801A (en) * 2018-10-23 2019-02-15 深圳前海微众银行股份有限公司 Date storage method, equipment and computer readable storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111131457A (en) * 2019-12-25 2020-05-08 上海交通大学 Capacity and bandwidth compromise method and system for heterogeneous distributed storage
CN113110796A (en) * 2020-01-13 2021-07-13 顺丰科技有限公司 Data management method, device, server and storage medium
CN111562884A (en) * 2020-04-28 2020-08-21 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111562884B (en) * 2020-04-28 2023-10-27 北京奇艺世纪科技有限公司 Data storage method and device and electronic equipment
CN111736762A (en) * 2020-05-21 2020-10-02 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111736762B (en) * 2020-05-21 2023-04-07 平安国际智慧城市科技股份有限公司 Synchronous updating method, device, equipment and storage medium of data storage network
CN111767250A (en) * 2020-06-10 2020-10-13 钛星投资(深圳)有限公司 Decentralized storage method, downloading method and storage system
CN112637327A (en) * 2020-12-21 2021-04-09 北京奇艺世纪科技有限公司 Data processing method, device and system
CN113721855A (en) * 2021-09-01 2021-11-30 中国建设银行股份有限公司 Storage method and device of storage resources, electronic equipment and computer storage medium
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment

Similar Documents

Publication Publication Date Title
CN110636122A (en) Distributed storage method, server, system, electronic device, and storage medium
US20210255791A1 (en) Distributed storage system and data management method for distributed storage system
CN109299190B (en) Method and device for processing metadata of object in distributed storage system
US9250682B2 (en) Distributed power management for multi-core processors
JP2005275829A (en) Storage system
WO2015196686A1 (en) Data storage method and data storage management server
WO2011088767A1 (en) Content delivery method, system and schedule server
US20140297728A1 (en) Load distribution system
EP3739440A1 (en) Distributed storage system, data processing method and storage node
CN111600957A (en) File transmission method, device and system and electronic equipment
US20220075757A1 (en) Data read method, data write method, and server
CA3030504A1 (en) Blockchain network and task scheduling method therefor
CN112817728A (en) Task scheduling method, network device and storage medium
CN115396377B (en) Method, device, equipment and storage medium for optimizing service quality of object storage
CN113268329B (en) Request scheduling method, device and storage medium
CN108459926B (en) Data remote backup method and device and computer readable medium
CN113923216A (en) Distributed cluster current limiting system and method and distributed cluster nodes
CN105760391A (en) Data dynamic redistribution method and system, data node and name node
CN112540966A (en) File synchronization method and device
US20140025630A1 (en) Data-store management apparatus, data providing system, and data providing method
US11989455B2 (en) Storage system, path management method, and recording medium
CN115794396A (en) Resource allocation method, system and electronic equipment
US20220283875A1 (en) Storage system, resource control method, and recording medium
CN112291326B (en) Load balancing method, load balancing device, storage medium and electronic equipment
CN103685359A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191231