CN112162707A - Storage method, electronic device and storage medium for distributed storage system - Google Patents

Storage method, electronic device and storage medium for distributed storage system Download PDF

Info

Publication number
CN112162707A
CN112162707A CN202011101715.8A CN202011101715A CN112162707A CN 112162707 A CN112162707 A CN 112162707A CN 202011101715 A CN202011101715 A CN 202011101715A CN 112162707 A CN112162707 A CN 112162707A
Authority
CN
China
Prior art keywords
storage
stored
cluster
url
realbucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011101715.8A
Other languages
Chinese (zh)
Inventor
张致江
夏静霆
张明
王芝斌
刘年超
舒银东
殷奎
黄开元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202011101715.8A priority Critical patent/CN112162707A/en
Publication of CN112162707A publication Critical patent/CN112162707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects

Abstract

The embodiment of the invention provides a storage method, electronic equipment and a storage medium for a distributed storage system, wherein the distributed storage system comprises a metadata center and a plurality of storage clusters, and the method comprises the following steps: the method comprises the steps of obtaining a storage request of an object to be stored, searching RealBucket stored in the object to be stored from a metadata center according to a first URL, reorganizing the first URL to form a second URL by utilizing a cluster type of a first storage cluster to which the RealBucket belongs, and storing the object to be stored into the first storage cluster based on the second URL. The RealBucket stored in the object to be stored is searched from the metadata center according to the first URL, so that not only can the RealBucket stored in the object to be stored be quickly determined, but also the data index of the object to be stored is not required to be introduced in the storage process, the storage space of the metadata can be greatly saved, and the storage scale and the complexity are reduced.

Description

Storage method, electronic device and storage medium for distributed storage system
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a storage method, an electronic device, and a storage medium for a distributed storage system.
Background
Due to the rapid development of big data, Artificial Intelligence (AI) technology, there is a need to configure storage systems with greater capacity, faster response speed, and greater bandwidth.
The existing large-scale object storage system generally comprises a cloud object storage system and a distributed storage system, wherein the cloud object storage system forms a large-scale cluster by combining a centralized data center and a plurality of clusters, data are distributed to the corresponding clusters from the centralized data center for storage, and a user can directly find the corresponding clusters according to data indexes and access the data by using a standard S3 interface. In the cloud object storage system in the prior art, each data corresponds to a unique data index and a unique cluster, so that the storage space of metadata is increased, the capacity expansion of the system is not facilitated, and the performance of the system is sharply reduced. The distributed storage system is large in scale, high in complexity and high in building and maintaining cost.
Disclosure of Invention
The embodiment of the invention provides a storage method, electronic equipment and a storage medium for a distributed storage system, which are used for overcoming the defects in the prior art.
The embodiment of the invention provides a storage method for a distributed storage system, wherein the distributed storage system comprises a metadata center and a plurality of storage clusters, and the storage method comprises the following steps:
acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored;
searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster;
reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL.
According to a storage method for a distributed storage system in an embodiment of the present invention, the searching for the RealBucket stored in the object to be stored from the metadata center according to the first URL includes:
determining a virtual storage position corresponding to the object to be stored according to the user bucket information in the first URL;
and determining RealBuckets corresponding to the virtual storage positions from preset mapping relations stored in the metadata center based on the virtual storage positions, and taking the RealBuckets corresponding to the virtual storage positions as the RealBuckets stored in the objects to be stored.
According to an embodiment of the present invention, the reorganizing the first URL to form a second URL using the cluster type of the first storage cluster includes:
and reorganizing the first URL to form the second URL by using the cluster type of the first storage cluster or based on the cluster type of the first storage cluster and authorization information obtained by carrying out identity authentication on a user in combination with the RealBucket stored in the object to be stored and the first storage cluster.
The storage method for the distributed storage system according to one embodiment of the present invention further includes:
and increasing the number of RealBuckets in each storage cluster based on the running state information of each storage cluster, and allocating a plurality of virtual storage positions for each RealBucket.
According to the storage method for the distributed storage system, each storage cluster has a priority attribute;
the method further comprises the following steps:
and migrating the objects in the storage cluster with higher priority to the storage cluster with lower priority based on the storage time information and the attribute information of the objects.
According to a storage method for a distributed storage system according to an embodiment of the present invention, migrating an object in a storage cluster with a higher priority to a storage cluster with a lower priority based on storage time information and attribute information of the object includes:
and merging and removing the objects in the storage cluster with higher priority based on the storage time information and the attribute information of the objects, and migrating the processed objects to the storage cluster with lower priority.
According to the storage method for the distributed storage system, the preset mapping relationship comprises a first mapping relationship between the virtual storage position and the RealBucket and a second mapping relationship between the RealBucket and the storage cluster.
The storage method for the distributed storage system according to one embodiment of the present invention further includes:
acquiring an access request of an object to be accessed, wherein the access request carries a third URL of the object to be accessed;
searching RealBucket stored in the object to be accessed from the metadata center according to the third URL, wherein the RealBucket belongs to a second storage cluster;
reorganizing the third URL by using the cluster type of the second storage cluster to form a fourth URL, and reading the object to be accessed in the second storage cluster based on the fourth URL.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of any one of the storage methods for the distributed storage system.
Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the storage method for a distributed storage system as described in any one of the above.
The embodiment of the invention provides a storage method, electronic equipment and a storage medium for a distributed storage system, wherein the distributed storage system comprises a metadata center and a plurality of storage clusters, and the method comprises the following steps: acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored; searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster; reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL. The RealBucket stored in the object to be stored is searched from the metadata center according to the first URL, so that not only can the RealBucket stored in the object to be stored be quickly determined, but also the data index of the object to be stored is not required to be introduced in the storage process, so that the storage space of the metadata can be greatly saved, the storage scale and complexity are reduced, and the construction and maintenance cost is reduced. Even under the condition that the system needs to be expanded, the system can be expanded under the condition that the system performance is not influenced. Particularly for object storage of trillion level and above, the storage method for the distributed storage system provided by the embodiment of the invention has more remarkable advantages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a storage method for a distributed storage system according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a distributed storage system at a data stream level according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Because the existing cloud object storage system forms a large-scale cluster by combining a centralized data center and a plurality of clusters, the stored objects are data, the data are distributed to the corresponding clusters from the centralized data center for storage, and when the clusters store the data, a data index is usually provided for each data, so that when a user accesses the data, the user can directly use a standard S3 interface to find the corresponding cluster according to the data index and access the data. Although the corresponding relation is simple, the method not only increases the storage space of the metadata and is not beneficial to system capacity expansion, but also reduces the object query speed and sharply reduces the system performance. The existing distributed storage system, such as the disk ancient distributed storage system, is large in scale, high in complexity and high in building and maintaining cost. Therefore, the embodiment of the invention provides a storage method for a distributed storage system, so as to solve the problems in the prior art.
Fig. 1 is a schematic flowchart of a storage method for a distributed storage system according to an embodiment of the present invention, where as shown in fig. 1, the method includes:
s1, acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored;
s2, searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster;
s3, reorganizing the first URL to form a second URL by using the cluster type of the first storage cluster, and storing the object to be stored to the first storage cluster based on the second URL.
It can be understood that the storage method for the distributed storage system provided in the embodiment of the present invention is implemented by a distributed storage system, and the distributed storage system may include a metadata center, a gateway module, and a plurality of storage clusters. The metadata center is used for providing metadata service and metadata storage for the distributed storage system, and the metadata refers to data used for describing objects and is used for supporting functions such as indicating storage positions, historical data, resource searching, file recording and the like. Metadata can be used as an electronic catalog, and in order to achieve the purpose of cataloguing, the content or characteristics of data must be described and collected, so as to achieve the purpose of assisting data retrieval. The gateway module may be an entry for a user to implement object storage and access, and may be connected to a user interface (user interface) through a network interface, where the network interface may include an S3API and a swift API, and the two may provide a Remote Procedure Call (RPC) interface for high performance at the same time. The user interface is a user-oriented interface, and can provide a public network entrance and an intranet entrance.
Each storage cluster may be configured to store the content of the object, and each storage cluster may carry an identifier, such as a serial number, for performing an identifying function. The storage clusters can be classified according to the attribute information of objects stored inside, and the attribute information of the objects can include priority, heat and the like, so that the storage clusters can be classified according to the priority, namely, high-priority storage clusters, medium-priority storage clusters and low-priority storage clusters, and can also be classified according to the heat, namely, hot storage clusters, warm storage clusters and cold storage clusters. The warm storage cluster may be referred to as a normal storage cluster, and the cold storage cluster may be referred to as an archive storage cluster. There may be multiple storage clusters belonging to the same class, for example, there may be multiple hot storage clusters, multiple warm storage clusters, and multiple cold storage clusters. The plurality of storage clusters belonging to the same class may carry a uniform identifier for indicating the class and an individual identifier for distinguishing other storage clusters in the same class. Due to the classification of the storage clusters according to the categories, the distributed storage system can naturally support the hierarchical storage of the data.
Step S1 is executed first, where the object to be stored is an object to be stored, the content of the object may be data, a file, or the like, a storage request of the object to be stored may be transmitted to the gateway module by a user through a user interface, and the storage request may carry a first Uniform Resource Locator (URL), which is a link address generated based on the http protocol, of the object to be stored. The first URL may carry related information corresponding to the object to be stored. After receiving the storage request, the gateway module classifies the storage request to determine whether the type of the storage request is a swift request or an S3 request, that is, determines through which type of user interface the storage request is received. And correspondingly processing the storage request according to the type of the storage request so as to identify a first URL contained in the storage request and relevant information corresponding to the object to be stored.
Then, step S2 is executed, and a RealBucket stored in the object to be stored is searched from the metadata center according to the first URL, where the RealBucket belongs to the first storage cluster. Since the metadata information of each object is stored in the metadata center, the metadata information matched with the information carried in the first URL can be searched in the metadata center, the RealBucket stored in the object to be stored is determined, and then the storage cluster to which the RealBucket belongs can be known as the first storage cluster. Here, the RealBucket refers to an actual storage location for storing an object in a storage cluster in the distributed storage system, and is disposed in the storage cluster, and a plurality of realbuckets may be disposed in each storage cluster.
Finally, step S3 is performed to reorganize the first URL to form a second URL using the cluster type of the first storage cluster. The cluster types of the storage cluster may include Ceph type, switch type, and other types. Because the cluster types of the storage clusters are different, different URLs are required to be adopted for storing or taking out the objects from the storage clusters, and therefore the first URL needs to be reorganized by using the cluster type of the first storage cluster, so that the objects to be stored can be stored in the first storage cluster through the formed second URL.
According to the storage method for the distributed storage system, which is provided by the embodiment of the invention, the distributed storage system comprises the metadata center and a plurality of storage clusters, and the RealBucket stored in the object to be stored is searched from the metadata center according to the first URL, so that not only can the RealBucket stored in the object to be stored be quickly determined, but also the storage space of the metadata can be greatly saved, the storage scale and the complexity are reduced, and the construction and maintenance cost is reduced because the data index of the object to be stored is not required to be introduced in the storage process. Even under the condition that the system needs to be expanded, the system can be expanded under the condition that the system performance is not influenced. Particularly for object storage of trillion level and above, the storage method for the distributed storage system provided by the embodiment of the invention has more remarkable advantages.
On the basis of the foregoing embodiment, the storage method for a distributed storage system provided in the embodiment of the present invention, where the searching for the RealBucket stored in the object to be stored from the metadata center according to the first URL includes:
determining a virtual storage position corresponding to the object to be stored according to the user bucket information in the first URL;
and determining RealBuckets corresponding to the virtual storage positions from preset mapping relations stored in the metadata center based on the virtual storage positions, and taking the RealBuckets corresponding to the virtual storage positions as the RealBuckets stored in the objects to be stored.
It can be understood that the first URL of the object to be stored carried by the storage request includes user bucket information, where the user bucket information may be related information used to characterize a user bucket (UB/UserContainer, UB/UC) corresponding to the object to be stored, and the user bucket is a virtual bucket, which may be understood as a virtual concept on a storage layer allocated to a user, and is used for allowing the user to consider all storage clusters in the distributed storage system as a whole without distinguishing. In the embodiment of the invention, the isolation is realized in the distributed storage system based on the user Account (Account), namely, each user correspondingly has one user Account in the distributed storage system, the user Account can be obtained by registration, and the user accounts are in one-to-one correspondence with the user buckets. And the user buckets corresponding to the objects under the user account are all the user buckets corresponding to the user account. Thus, the user account may also be included in the first URL.
In the distributed storage system, each user bucket may correspond to a plurality of virtual storage sets (HashBuckets, HBs), and each virtual storage set may include one or more virtual storage locations (HashBuckets, HB), and each virtual storage location is used to represent a logical storage location of an object in the user bucket. And determining the virtual storage position corresponding to the object to be stored according to the user bucket information corresponding to the object to be stored.
In the embodiment of the invention, identification information can be allocated to each virtual storage position, when the virtual storage position of the object to be stored is determined, the first URL can be operated by combining a corresponding algorithm according to the user bucket information of the object to be stored to obtain a corresponding numerical value, and the virtual storage position carrying the identification information corresponding to the numerical value is determined to be the virtual storage position of the user bucket information.
In the embodiment of the present invention, the metadata center stores metadata information of an object, which is a preset mapping relationship, where the preset mapping relationship may represent a mapping relationship between a virtual storage location and an actual storage location, and the actual storage location may include a RealBucket and a storage cluster. Thus, the preset mapping relationship may include a first mapping relationship between the virtual storage location and the RealBucket and a second mapping relationship between the RealBucket and the storage cluster. And determining the RealBucket corresponding to the virtual storage position and the storage cluster where the RealBucket is located by combining a preset mapping relation stored in the metadata center. In the embodiment of the invention, RealBuckets on different storage clusters can correspond to the same user bucket, so that data of the same user can be distributed on different storage clusters without perception of the user.
And finally, taking the RealBucket corresponding to the virtual storage position as a RealBucket stored in the object to be stored of the object to be stored, wherein the storage cluster where the RealBucket corresponding to the virtual storage position is located is the first storage cluster.
According to the storage method for the distributed storage system, provided by the embodiment of the invention, the RealBucket corresponding to the virtual storage position and the first storage cluster where the RealBucket is located are determined through the preset mapping relation stored in the metadata center, so that the RealBucket stored in the object to be stored and the first storage cluster where the RealBucket is located can be rapidly determined, and the data index of the object to be stored is not required to be introduced in the storage process, so that the storage space of metadata can be greatly saved, the storage scale and complexity are reduced, and the construction and maintenance cost is reduced. Even under the condition that the system needs to be expanded, the system can be expanded under the condition that the system performance is not influenced. Particularly for object storage of trillion level and above, the storage method for the distributed storage system provided by the embodiment of the invention has more remarkable advantages. In addition, compared with the centralized metadata storage in the prior art, the preset mapping relation stored in the metadata center in the embodiment of the invention avoids the single-point performance bottleneck, so that the metadata service can be horizontally expanded.
On the basis of the foregoing embodiment, in the storage method for a distributed storage system provided in the embodiment of the present invention, reorganizing the first URL to form a second URL using the cluster type of the first storage cluster includes:
and reorganizing the first URL to form the second URL by using the cluster type of the first storage cluster or based on the cluster type of the first storage cluster and authorization information obtained by carrying out identity authentication on a user in combination with the RealBucket stored in the object to be stored and the first storage cluster.
It can be understood that after the RealBucket stored in the object to be stored is found from the metadata center, the found RealBucket and the first storage cluster where the RealBucket is located are sent to the gateway module in the distributed storage system, and after the gateway module receives the RealBucket and the first storage cluster where the RealBucket is located, the gateway module may reorganize the first URL according to the cluster type of the first storage cluster by combining the RealBucket and the first storage cluster where the RealBucket is located, that is, the reorganized second URL includes the RealBucket and the first storage cluster where the RealBucket is located, and may be adapted to the type of the first storage cluster. And the gateway module forwards the object to be stored to the first storage cluster for storage according to the second URL, so that the object to be stored is successfully stored.
After receiving the RealBucket and the first storage cluster where the RealBucket is located, the gateway module may also reorganize the first URL in combination with the RealBucket and the first storage cluster where the RealBucket is located according to the type of the first storage cluster and authorization information obtained by authenticating the user, that is, the premise that the first URL is reorganized is that the result obtained by authenticating the user is that the user passes authentication, if the authentication fails, authorization information cannot be obtained, and further, the first URL cannot be reorganized and the obtained second URL is used to store the object to be stored. The reorganized second URL includes the RealBucket, the first storage cluster where the RealBucket is located, and the authorization information, and may be adapted to the type of the first storage cluster. And then, the gateway module forwards the object to be stored to the first storage cluster for storage according to the second URL, so that the object to be stored is successfully stored.
It should be noted that, in the embodiment of the present invention, the identity of the user may be authenticated through the unified authentication center in the distributed storage system, and the unified authentication center returns the authorization information after the authentication is passed. Because the bottom storage space of the distributed storage system is constructed based on heterogeneous multi-clusters, one identity authentication system is executed for each cluster, the identity authentication system is redundant, and a large number of operation and maintenance problems are caused, so that a unified authentication center is adopted, identity authentication related operations are managed on a User Interface (UI) layer, and a non-high-authority manager cannot bypass the UI layer to directly access bottom cluster data.
In the embodiment of the invention, a scheme for realizing the storage of the object to be stored by forwarding through the gateway module is provided, and the type of the first storage cluster is considered, so that the object to be stored can be smoothly stored. Moreover, the identity authentication of the user can be introduced, the first URL can be reorganized only after the authorization information is obtained, the gateway module stores the object to be stored to the first storage cluster according to the second URL, the storage main body of the object to be stored is limited, the safety of the object to be stored is improved, and the storage isolation of the object to be stored based on the user information is realized.
On the basis of the foregoing embodiment, the storage method for a distributed storage system provided in the embodiment of the present invention further includes:
and increasing the number of RealBuckets in each storage cluster based on the running state information of each storage cluster, and allocating a plurality of virtual storage positions for each RealBucket.
It can be understood that the distributed storage system further includes a unified operation and maintenance center, where the unified operation and maintenance center is used to obtain the operation state information of each storage cluster in the distributed storage system. The operation state information may include, but is not limited to, a log of each storage cluster, system performance data, a user access log, and the like. When the distributed storage system needs to be expanded to increase the real buckets in the storage clusters, the unified operation and maintenance center can monitor the loads of the storage clusters according to the acquired running state information of each storage cluster, automatically create new real buckets, and distribute a plurality of virtual storage positions to the real buckets in the storage clusters to achieve load balance among the storage clusters. One RealBucket can be allocated with 1 or more virtual storage positions, and at most all virtual storage positions in a certain virtual storage set corresponding to one UB/UC can be allocated to the same RealBucket.
In addition, the unified operation and maintenance center can also be used for monitoring the number of virtual storage positions corresponding to the user buckets and the like.
In the embodiment of the invention, the storage buckets in each storage cluster can be increased according to the running state information of each storage cluster, and a plurality of virtual storage positions are distributed to each storage bucket, so that basic support is provided for system capacity expansion.
On the basis of the above embodiments, when performing system capacity expansion, any storage cluster that supports the swift interface or the S3 interface may be used as the storage cluster of the distributed storage system.
On the basis of the foregoing embodiment, in the storage method for a distributed storage system provided in the embodiment of the present invention, the storage cluster has an attribute of priority; the method further comprises the following steps:
and migrating the objects in the storage cluster with higher priority to the storage cluster with lower priority based on the storage time information and the attribute information of the objects.
It will be appreciated that the storage clusters may have a prioritized property, i.e. the storage clusters may be divided in priority into high priority storage clusters for storing high priority objects, medium priority storage clusters for storing medium priority objects and low priority storage clusters for storing low priority objects.
The storage time information of the object refers to information of the sequence, length and the like of the storage time of the object, when the distributed storage system is expanded, the object in the storage cluster with higher priority can be migrated to the storage cluster with lower priority according to the storage time information and the attribute information of the object in each storage cluster, that is, for the object with longer storage time, the object with longer storage time can be considered to have lower priority or lower heat degree by combining the attribute information, so that the original storage cluster with higher priority can be migrated to the storage cluster with lower priority.
In the embodiment of the invention, the objects in the storage cluster with higher priority are migrated to the storage cluster with lower priority, so that sufficient storage space can be provided for the subsequent objects with higher priority.
On the basis of the foregoing embodiment, the storage method for a distributed storage system provided in the embodiment of the present invention, where migrating an object in a storage cluster with a higher priority to a storage cluster with a lower priority based on storage time information and attribute information of the object, includes:
and merging and removing the objects in the storage cluster with higher priority based on the storage time information and the attribute information of the objects, and migrating the processed objects to the storage cluster with lower priority.
It can be understood that, in the process of migrating the object, the object to be migrated, that is, the object in the storage cluster with higher priority may be merged and deduplicated, and then the processed object may be migrated, so that it may be ensured that the object occupies less storage resources after being migrated to the storage cluster with lower priority.
On the basis of the above embodiment, the preset mapping relationship may include not only a first mapping relationship between the virtual storage location and the RealBucket, a second mapping relationship between the RealBucket and the storage cluster, but also a third mapping relationship between the user account corresponding to the object and the user bucket. Due to the existence of the third mapping relation, when the storage request or the access request does not contain the user account, the user account can be determined according to the contained user bucket so as to perform user identity authentication.
Since the prior art cloud object storage system usually adopts the TiDB as a centralized metadata storage unit, the performance of the TiDB determines the performance of the entire storage system. Although the TiDB as a kv database has good query performance and expandability, the TiDB has good performance in storage of billions or even billions of objects, but with the further expansion of storage scale, when the data volume exceeds trillion, billions or even trillion, any query function, especially sql query function, of the centralized metadata storage system may become a performance bottleneck, which is determined by system performance such as storage, reading, browsing (scan), and the like, and splitting for speeding up may greatly increase the cost of the metadata storage system.
Therefore, on the basis of the foregoing embodiment, in the storage method for a distributed storage system provided in the embodiment of the present invention, the method further includes:
acquiring an access request of an object to be accessed, wherein the access request carries a third URL of the object to be accessed;
searching RealBucket stored in the object to be accessed from the metadata center according to the third URL, wherein the RealBucket belongs to a second storage cluster;
reorganizing the third URL by using the cluster type of the second storage cluster to form a fourth URL, and reading the object to be accessed in the second storage cluster based on the fourth URL.
It can be understood that, in the embodiment of the present invention, an access process of an object to be accessed is provided, that is, how to determine and smoothly read a RealBucket of the object to be accessed in a distributed storage system. The embodiment of the invention does not limit the execution sequence of the access process and the storage process.
Firstly, an access request of an object to be accessed is obtained, wherein the access request carries a third URL of the object to be accessed. The object to be accessed is an object to be accessed, the content of the object may be data, a file, or the like, an access request of the object to be accessed may be transmitted to the gateway module by a user through the user interface, and the access request may carry a third URL of the object to be accessed, which may be a link address generated based on an http protocol. The access request may carry user bucket information corresponding to the object to be accessed, and may also carry a user account corresponding to the object to be accessed.
Optionally, after receiving the access request, the gateway module classifies the access request to determine whether the type of the access request is a swift request or an S3 request, that is, determines through which type of user interface the access request is received. And correspondingly processing the access request according to the type of the access request so as to identify the user bucket information and the user account corresponding to the object to be accessed, which are contained in the access request.
Secondly, according to the user bucket information corresponding to the object to be accessed, the virtual storage position of the object to be accessed, namely the logic storage position of the object to be accessed in the corresponding user bucket can be determined. And performing hash operation on the access request by combining a hash algorithm through the user bucket information corresponding to the object to be accessed to obtain a corresponding hash value, wherein the virtual storage position corresponding to the hash value is the designated virtual storage position.
And finally, determining the RealBucket corresponding to the virtual storage position of the object to be accessed and the storage cluster where the RealBucket is located according to the preset mapping relation. And reading the object to be accessed from the RealBucket corresponding to the object to be accessed through the gateway module.
In the embodiment of the invention, the query and the access of the object to be accessed can be facilitated by presetting the mapping relation, the query and the access efficiency of the object to be accessed are improved, and the time consumed by the query and the access of the object is shortened.
Fig. 2 is a schematic structural diagram of a distributed storage system at a data flow level according to an embodiment of the present invention, and as shown in fig. 2, the distributed storage system may include a user interface and a network interface, where the user interface may be a public network portal or an intranet portal. The network interface may be an S3 interface or a Swift interface. The network interface is connected with the gateway module, and the gateway module realizes interaction with each cluster in the distributed storage system through forwarding the storage request or the access request. The clusters included in the distributed storage system include a thermal storage cluster, a warm storage cluster, an archive cluster and a metadata cluster, the thermal storage cluster may include a plurality of high-performance clusters, the warm storage cluster may include a plurality of ordinary storage clusters, and the archive cluster may include a plurality of low-power-consumption high-capacity clusters. In the process of expanding the capacity of the distributed storage system, the files in the hot storage cluster can be subjected to duplicate removal and small file combination processing, and then the processed result is stored in the warm storage cluster. In the metadata cluster, the metadata processing operation of each object can be realized through the metadata service, and the function of uniformly storing the metadata of each object can be realized. The distributed storage system also comprises a unified operation and maintenance center which is used for collecting the operation state information of each storage cluster and updating the preset mapping relation stored in the metadata cluster according to the operation state information.
When the gateway module forwards the storage request or the access request, the gateway module can combine with authorization information obtained by identity authentication of the user by the unified authentication center to improve the confidentiality of the object.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke the logic instructions in the memory 330 to perform a storage method for a distributed storage system comprising a metadata center and a plurality of storage clusters, the method comprising: acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored; searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster; reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the storage method for a distributed storage system provided by the above-mentioned method embodiments, where the distributed storage system includes a metadata center and a plurality of storage clusters, and the method includes: acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored; searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster; reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to execute the storage method for a distributed storage system provided in the foregoing embodiments, where the distributed storage system includes a metadata center and a plurality of storage clusters, and the method includes: acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored; searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster; reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL.
The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A storage method for a distributed storage system, the distributed storage system comprising a metadata center and a plurality of storage clusters, comprising:
acquiring a storage request of an object to be stored, wherein the storage request carries a first URL of the object to be stored;
searching RealBucket stored in the object to be stored from the metadata center according to the first URL, wherein the RealBucket belongs to a first storage cluster;
reorganizing the first URL by using the cluster type of the first storage cluster to form a second URL, and storing the object to be stored to the first storage cluster based on the second URL.
2. The storage method for the distributed storage system according to claim 1, wherein the searching for the RealBucket where the object to be stored is stored from the metadata center according to the first URL includes:
determining a virtual storage position corresponding to the object to be stored according to the user bucket information in the first URL;
and determining RealBuckets corresponding to the virtual storage positions from preset mapping relations stored in the metadata center based on the virtual storage positions, and taking the RealBuckets corresponding to the virtual storage positions as the RealBuckets stored in the objects to be stored.
3. The storage method for the distributed storage system according to claim 2, wherein said reorganizing the first URL to form a second URL using the cluster type of the first storage cluster comprises:
and reorganizing the first URL to form the second URL by using the cluster type of the first storage cluster or based on the cluster type of the first storage cluster and authorization information obtained by carrying out identity authentication on a user in combination with the RealBucket stored in the object to be stored and the first storage cluster.
4. The storage method for the distributed storage system according to claim 1, further comprising:
and increasing the number of RealBuckets in each storage cluster based on the running state information of each storage cluster, and allocating a plurality of virtual storage positions for each RealBucket.
5. The storage method for the distributed storage system according to claim 2, wherein the preset mapping relationship comprises a first mapping relationship between a logical storage location and a RealBucket and a second mapping relationship between the RealBucket and a storage cluster.
6. The storage method for the distributed storage system according to any one of claims 1 to 5, wherein each of the storage clusters has an attribute of priority;
the method further comprises the following steps:
and migrating the objects in the storage cluster with higher priority to the storage cluster with lower priority based on the storage time information and the attribute information of the objects.
7. The storage method for the distributed storage system according to claim 6, wherein migrating the object in the storage cluster with higher priority to the storage cluster with lower priority based on the storage time information and the attribute information of the object comprises:
and merging and removing the objects in the storage cluster with higher priority based on the storage time information and the attribute information of the objects, and migrating the processed objects to the storage cluster with lower priority.
8. The storage method for the distributed storage system according to any one of claims 1 to 5, further comprising:
acquiring an access request of an object to be accessed, wherein the access request carries a third URL of the object to be accessed;
searching RealBucket stored in the object to be accessed from the metadata center according to the third URL, wherein the RealBucket belongs to a second storage cluster;
reorganizing the third URL by using the cluster type of the second storage cluster to form a fourth URL, and reading the object to be accessed in the second storage cluster based on the fourth URL.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the storage method for a distributed storage system according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the storage method for a distributed storage system according to any one of claims 1 to 8.
CN202011101715.8A 2020-10-15 2020-10-15 Storage method, electronic device and storage medium for distributed storage system Pending CN112162707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011101715.8A CN112162707A (en) 2020-10-15 2020-10-15 Storage method, electronic device and storage medium for distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011101715.8A CN112162707A (en) 2020-10-15 2020-10-15 Storage method, electronic device and storage medium for distributed storage system

Publications (1)

Publication Number Publication Date
CN112162707A true CN112162707A (en) 2021-01-01

Family

ID=73867109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011101715.8A Pending CN112162707A (en) 2020-10-15 2020-10-15 Storage method, electronic device and storage medium for distributed storage system

Country Status (1)

Country Link
CN (1) CN112162707A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860186A (en) * 2021-02-05 2021-05-28 中国科学技术大学 Capacity expansion method for billion-level object storage bucket
CN113986116A (en) * 2021-09-07 2022-01-28 广东珠江智联信息科技股份有限公司 Distributed storage system and data management method based on distributed storage system
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment
CN117081931A (en) * 2023-10-17 2023-11-17 之江实验室 Online capacity expansion method and device for heterogeneous distributed storage system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537076A (en) * 2014-12-31 2015-04-22 北京奇艺世纪科技有限公司 File reading and writing method and device
CN109314721A (en) * 2016-11-16 2019-02-05 华为技术有限公司 The management of multiple clusters of distributed file system
CN110198225A (en) * 2018-02-27 2019-09-03 中移(苏州)软件技术有限公司 A kind of management method and management server of more clusters
CN110908590A (en) * 2018-09-17 2020-03-24 中国电力科学研究院有限公司 Distributed storage method and system for transformer substation data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537076A (en) * 2014-12-31 2015-04-22 北京奇艺世纪科技有限公司 File reading and writing method and device
CN109314721A (en) * 2016-11-16 2019-02-05 华为技术有限公司 The management of multiple clusters of distributed file system
CN110198225A (en) * 2018-02-27 2019-09-03 中移(苏州)软件技术有限公司 A kind of management method and management server of more clusters
CN110908590A (en) * 2018-09-17 2020-03-24 中国电力科学研究院有限公司 Distributed storage method and system for transformer substation data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860186A (en) * 2021-02-05 2021-05-28 中国科学技术大学 Capacity expansion method for billion-level object storage bucket
CN113986116A (en) * 2021-09-07 2022-01-28 广东珠江智联信息科技股份有限公司 Distributed storage system and data management method based on distributed storage system
CN114089917A (en) * 2021-11-19 2022-02-25 中国电信集团系统集成有限责任公司 Distributed object storage cluster, capacity expansion method and device thereof, and electronic equipment
CN117081931A (en) * 2023-10-17 2023-11-17 之江实验室 Online capacity expansion method and device for heterogeneous distributed storage system
CN117081931B (en) * 2023-10-17 2024-01-09 之江实验室 Online capacity expansion method and device for heterogeneous distributed storage system

Similar Documents

Publication Publication Date Title
CN112162707A (en) Storage method, electronic device and storage medium for distributed storage system
US11868359B2 (en) Dynamically assigning queries to secondary query processing resources
US20200012441A1 (en) Scaling events for hosting hierarchical data structures
WO2008049353A1 (en) Network data storing system and data accessing method thereof
CN107515879B (en) Method and electronic equipment for document retrieval
CN109299056B (en) A kind of method of data synchronization and device based on distributed file system
CN104239377A (en) Platform-crossing data retrieval method and device
CN112714018B (en) Gateway-based ElasticSearch search service method, system, medium and terminal
CN112416960A (en) Data processing method, device and equipment under multiple scenes and storage medium
CN108154024B (en) Data retrieval method and device and electronic equipment
US11082494B2 (en) Cross storage protocol access response for object data stores
CN111026709B (en) Data processing method and device based on cluster access
CN113127526A (en) Distributed data storage and retrieval system based on Kubernetes
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
CN112148745B (en) Multi-HBase cluster access method, device and storage medium
CN114490527A (en) Metadata retrieval method, system, terminal and storage medium
CN116541427B (en) Data query method, device, equipment and storage medium
CN113032356A (en) Cabin distributed file storage system and implementation method
CN116775712A (en) Method, device, electronic equipment, distributed system and storage medium for inquiring linked list
CN106549983B (en) Database access method, terminal and server
CN112783417A (en) Data reduction method and device, computing equipment and storage medium
CN114528274A (en) Authority management method and related device
CN112261097A (en) Object positioning method for distributed storage system and electronic equipment
CN111737613A (en) APP page collection method and device, computer equipment and storage medium
CN113656469B (en) Big data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination