CN113886331A

CN113886331A - Distributed object storage method and device, electronic equipment and readable storage medium

Info

Publication number: CN113886331A
Application number: CN202111460680.1A
Authority: CN
Inventors: 臧林劼
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-01-04
Anticipated expiration: 2041-12-03
Also published as: CN113886331B

Abstract

The application discloses a distributed object storage method and device, electronic equipment and a readable storage medium. The method comprises the steps of pre-building a distributed object hierarchical storage architecture, wherein the distributed object hierarchical storage architecture comprises a primary bucket and a plurality of secondary buckets, and the primary bucket and each secondary bucket establish a mapping relation through a consistent hash algorithm; each secondary bucket is used for storing user data, and the object metadata of each secondary bucket is mapped to different fragments of the corresponding bucket; each slice is used to store metadata index information for a corresponding secondary bucket. And when receiving a user read-write request, processing the user read-write request based on the distributed object hierarchical storage architecture. According to the method and the device, on the basis of not increasing the cost of the distributed object storage system, the object bucket data traversal efficiency of large-scale magnitude order can be improved, and the performance of the distributed object storage system is improved.

Description

Distributed object storage method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of storage technologies, and in particular, to a distributed object storage method and apparatus, an electronic device, and a readable storage medium.

Background

With the rapid development of big data and cloud technology, the data mass is increased, and although the traditional storage technology is mature in technology, good in performance and high in availability, the traditional storage technology still has the defects of poor expansibility, high cost, single-point failure, performance bottleneck and the like in the face of mass data. In order to meet the storage requirement of mass data, a distributed storage technology is applied. The distributed storage technology is to store files and data on a plurality of independent storage devices in a distributed manner, and can be divided into file storage, object storage and block storage. A distributed storage system typically includes a metadata server, a client or application, and a storage server. The client is used for sending data read-write requests and caching file metadata and file data. The metadata server is responsible for managing metadata and handling requests of clients. The data storage server is used for storing file data and ensuring the availability and integrity of the data. The interaction between the client and the metadata server is a "signaling interaction" and the client to the data storage server is a "media interaction". And the metadata server acquires the basic configuration condition and the state information of each storage server through the data storage server. The object storage can overcome the defects that block storage is not beneficial to sharing and the storage read-write performance of file storage is poor, so that the distributed object storage technology has the advantages of fast file storage read-write and being beneficial to sharing, and is widely applied to various storage scenes.

In distributed object storage, RGW (radius Gate Way, a reliable autonomous distributed object storage gateway) is a gateway that provides object storage, i.e., an object storage gateway. The object store gateway, i.e. the entry of the object store, is essentially an HTTP (hypertext Transfer Protocol) server, and is no different from Nginx and Apache. Through this entry, the user can access the distributed object store in the manner of a restful (representational State transfer) via the HTTP protocol. The object storage gateway actually calls an API (Application Programming Interface) of library to store and read data. The gateway provides an object storage access interface compatible with an AWS (Amazon Web Services) S3 and OpenStack Swift. Distributed object stores are typically used in internet scenarios, and one object store is typically used by multiple users or tenants. While multiple buckets may be created under one user. Objects, i.e. stored data, such as pictures or videos, may be stored in the buckets. If the object storage is similar to the conventional storage, the bucket is similar to a folder, and the object is similar to a file, but the difference is that the object can only be stored in the bucket, and the bucket cannot be embedded. The storage read-write IO (Input/Output) performance of a single bucket of distributed object storage is crucial, and when the number of objects in the bucket reaches a certain order of magnitude, usually to millions and millions of levels, due to a certain service requirement, when a list (directory) traverses the bucket, the storage performance is greatly reduced. In order to improve the traversal efficiency of object bucket data of large-scale magnitude and improve the storage performance, the related art usually selects to replace a hardware storage device medium, for example, to replace an original Disk storage medium with an SSD (Solid State Disk) storage medium. However, the price of the solid state disk is high, which greatly increases the cost of the whole distributed storage system.

In view of this, how to improve the traversal efficiency of object bucket data in large scale order without increasing the cost of the distributed object storage system, and improve the performance of the distributed object storage system is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides a distributed object storage method, a distributed object storage device, an electronic device and a readable storage medium, which can improve the object bucket data traversal efficiency of large-scale magnitude order and improve the performance of a distributed object storage system on the basis of not increasing the cost of the distributed object storage system.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

an embodiment of the present invention provides a distributed object storage method, including:

pre-building a distributed object hierarchical storage architecture comprising a primary bucket and a plurality of secondary buckets;

when a user read-write request is received, processing the user read-write request based on the distributed object layered storage architecture;

the first-level bucket and each second-level bucket establish a mapping relation through a consistent Hash algorithm; each secondary bucket is used for storing user data, and the object metadata of each secondary bucket is mapped to different fragments of the corresponding bucket; each slice is used to store metadata index information for a corresponding secondary bucket.

Optionally, after the pre-building a distributed object hierarchical storage architecture including a primary bucket and a plurality of secondary buckets, the method includes:

when a bucket creating request is detected, acquiring second-level bucket configuration information of a target first-level bucket;

determining the total number of the secondary buckets according to the configuration information of the secondary buckets;

acquiring the number of fragments of each target secondary barrel;

and creating a plurality of target secondary buckets based on the total creating number of the secondary buckets and the fragmentation number of each target secondary bucket, and establishing the mapping relation between each target secondary bucket and the target primary bucket according to the consistent hash algorithm.

Optionally, the determining the total number of created secondary buckets according to the secondary bucket configuration information includes:

obtaining the configuration number and the configuration threshold range by analyzing the configuration information of the secondary bucket;

if the configuration number is within the configuration threshold value range, calling a first relational expression to calculate the total creation number of the secondary buckets;

and if the configuration number is not in the configuration threshold range, the total creation number of the secondary buckets is the same as the configuration number.

Optionally, the step of calling the first relational expression to calculate the total creation number of the secondary buckets is as follows:

according toN=bucket_name×kAndN＞N ₀determining the secondary bucket creation total number;

in the formula (I), the compound is shown in the specification,Ncreating a total number for the secondary bucket, anNIs greater thanN ₀Is the smallest integer of (a) or (b),kis a positive integer and is a non-zero integer,N ₀for the maximum value of the configured threshold range,bucket_namethe number is the configuration number.

Optionally, after creating a plurality of target secondary buckets based on the secondary bucket creation total number and the number of fragments of each target secondary bucket, the method further includes:

receiving a barrel number adjusting request, and acquiring the current barrel number by analyzing the barrel number adjusting request;

if the current barrel number is larger than the total number of the created secondary barrels, calculating the number of newly added barrels, creating corresponding newly added secondary barrels based on the number of the newly added barrels and the number of the fragments of each newly added secondary barrel, and meanwhile, establishing a mapping relation between each newly added secondary barrel and the target primary barrel according to the consistent Hash algorithm;

if the current bucket number is smaller than the total number of the second-level buckets, calculating the number of deleted buckets, and counting candidate second-level buckets without stored data; if the total number of the candidate secondary buckets is larger than or equal to the deleted bucket number, selecting a plurality of secondary buckets to be deleted from the candidate secondary buckets for deletion processing, wherein the number of the secondary buckets to be deleted is the same as the deleted bucket number; if the total number of the candidate secondary buckets is less than the deleted bucket number, deleting the candidate secondary buckets, selecting a plurality of to-be-deleted data secondary buckets for storing data from the target secondary buckets, and performing bucket deletion operation after migrating the data of the to-be-deleted data secondary buckets; and the number of the secondary buckets of the data to be deleted is the difference between the deleted number of the buckets and the total number of the candidate secondary buckets.

Optionally, when a user read-write request is received, processing the user read-write request based on the distributed object hierarchical storage architecture includes:

when an object data access request is received, acquiring the name of an object to be accessed by analyzing the object data access request;

determining the position of a hash ring through the consistent hash algorithm according to the name of the object to be accessed;

determining a secondary bucket to be read closest to the hash ring position on the hash ring in a clockwise direction by taking the hash ring position as a starting point;

and based on the name of the object to be accessed, determining corresponding metadata information by traversing each fragment of the secondary bucket to be read, and determining the storage position of the object to be accessed in the secondary bucket to be read according to the metadata information.

when an object data write-in request is received, acquiring identification information of a bucket to be written and data to be written by analyzing the object data write-in request;

determining the hash ring position of the secondary bucket to be written on the hash ring through the consistent hash algorithm according to the identification information of the secondary bucket to be written so as to determine the mapping relation between the secondary bucket to be written and the corresponding primary bucket;

and writing the data to be written into the secondary bucket to be written based on the object placement rule of the secondary bucket to be written, and simultaneously establishing a mapping relation between the data to be written and a target fragment of the secondary bucket to be written.

Optionally, the writing the data to be written into the secondary bucket to be written into based on the object placement rule of the secondary bucket to be written into includes:

a storage pool is constructed in advance; the storage pool comprises a data pool and an index pool, the data pool comprises a plurality of secondary buckets, the data pool is used for storing user data, and the index pool is used for storing object index information of each secondary bucket;

and writing the data to be written into the secondary bucket to be written in the data pool, and updating the index pool.

Optionally, the storing pool further includes a transit data pool, and the writing the data to be written into the secondary bucket to be written based on the object placement rule of the secondary bucket to be written into includes:

judging whether the data to be written is larger than a preset data threshold value or not;

if the data to be written is larger than the preset data threshold value, dividing the data to be written into a plurality of data blocks, and placing each data block into the transfer data pool;

and writing each data block read from the transit data pool into the secondary bucket to be written.

Another aspect of an embodiment of the present invention provides a distributed object storage apparatus, including:

the storage architecture building module is used for building a distributed object hierarchical storage architecture comprising a primary bucket and a plurality of secondary buckets in advance; the first-level bucket and each second-level bucket establish a mapping relation through a consistent Hash algorithm; each secondary bucket is used for storing user data, and the object metadata of each secondary bucket is mapped to different fragments of the corresponding bucket; each fragment is used for storing metadata index information of a corresponding secondary bucket;

and the request processing module is used for processing the user read-write request based on the distributed object layered storage architecture when the user read-write request is received.

An embodiment of the present invention further provides an electronic device, which includes a processor, and the processor is configured to implement the steps of the distributed object storage method according to any one of the foregoing when executing the computer program stored in the memory.

Finally, an embodiment of the present invention provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the distributed object storage method according to any one of the preceding claims.

The technical scheme provided by the application has the advantages that the barrel is layered, and a user is completely transparent when accessing the barrel data. The layered barrel structure and the secondary barrels are calculated and mapped through a consistent Hash algorithm, the algorithm can ensure that the number of objects in each secondary barrel is relatively balanced, and the access barrel path can be unique through Hash calculation.

When the amount of object data in the bucket gradually increases, if the bucket is traversed under the condition of the same storage medium, the service response time can be greatly shortened through the hierarchical structure and the fragment information of each channel, and the storage IO performance is improved. Therefore, on the basis of not increasing the cost of the distributed object storage system, the object bucket data traversal efficiency of large-scale magnitude can be improved, and the performance of the distributed object storage system is improved.

In addition, the embodiment of the invention also provides a corresponding implementation device, electronic equipment and a readable storage medium for the distributed object storage method, so that the method has higher practicability, and the device, the electronic equipment and the readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a distributed object storage method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a distributed object hierarchical storage architecture according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a mapping relationship between a primary bucket and each secondary bucket according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a mapping between object placement rules and storage pools for a distributed object bucket according to an embodiment of the present invention;

FIG. 5 is a block diagram of a distributed object storage apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of an embodiment of an electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

The method aims to solve the problems that traversal time delay is high, service performance is slow or even unavailable when massive data buckets are traversed in a distributed object storage cluster, and performance bottleneck exists. In this embodiment, a consistent hash algorithm and a bucket layering method are combined, and under the same storage medium, when traversing a large-scale object data bucket, the method has an obvious performance advantage, which can break through a performance bottleneck when traversing a bucket object, and effectively alleviate a technical disadvantage of slow performance of a storage system when traversing large-scale bucket object data, and various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a distributed object storage method according to an embodiment of the present invention, where the embodiment of the present invention includes the following:

s101: a distributed object hierarchical storage architecture comprising a primary bucket and a plurality of secondary buckets is pre-built.

The distributed object hierarchical storage architecture of the present application means that the distributed object storage system is a hierarchical structure, and the so-called hierarchical structure is to divide a conventional bucket for storing user data into a primary bucket and a secondary bucket, as shown in fig. 2. The distributed object hierarchical storage architecture is completely transparent to user service logic, namely when object data are stored in the primary bucket, the process is transparent to user services, and a unique mapping relation can be established with the secondary bucket through a consistent hash algorithm. Whether single or multiple buckets, the hierarchical processing can be carried out, and meanwhile, through a Hash consistency algorithm, a bucket object storage path can be calculated and has pseudo-randomness. The storage path of the same object can not change along with the business operation in different time periods, and the performance problem when a large number of objects are traversed through the barrel is solved. The first-level bucket is a bucket with the meaning of virtual logic concept, the second-level bucket is a physical bucket created by a user, namely the second-level bucket is a bucket which really stores user data, the first-level bucket can correspond to a plurality of second-level buckets, and the first-level bucket and each second-level bucket have mapping relation. In order to ensure that the number of objects in each secondary bucket is relatively balanced, the mapping relation is established between the primary bucket and each secondary bucket through a consistent hash algorithm. The secondary bucket is used for placing object data of a user and is identified through basic information and extended attribute information. The basic information is also referred to as metadata information of the object bucket, which includes but is not limited to quota attributes that can limit the capacity of the bucket, object placement rules, used capacity of the bucket, and number of objects of the bucket. The object placement rule attribute information records location storage information of the object data. The extended attribute information is metadata information generated by object bucket customization.

In this embodiment, a mapping relationship is established for the primary bucket and each secondary bucket through a consistent hash algorithm, specifically, the consistent hash algorithm organizes the whole hash value space into a virtual ring, for example, assuming that the value space of a certain hash function H is 0 to 2³²1, i.e. the hash value is a 32-bit unsigned shaping, the entire spatial circle is laid out in a clockwise direction, the point directly above the circle represents 0, and the first point to the right of the 0 point represents 1. And so on 2, 3, 4, 5, 6 … … through 2³²1, that is to say, the first point to the left of point 0 represents 2³²-1, 0 and 2³²-1 coincides in the zero point direction, this is divided by 2³²The circle of points is called a hash ring, as shown in fig. 3. Performing hash calculation on the identification information such as id of each secondary bucket, and further comparing the obtained hash value with 2³²And taking a modulus to obtain the position of the secondary bucket on the hash ring. The name of the object stored in each secondary bucket can be used as a key value and can also be subjected to hash calculation, the position of the object corresponding to the hash ring can be determined through the hash value of the name of the object, and the first secondary bucket encountered clockwise from the position is the second secondary bucketThe location where the object is stored.

Further, in order to improve the efficiency of indexing a list big bucket, a secondary bucket may be adjusted, and specifically, a bucket object may be adjusted in a fragmentation manner. Mapping the object metadata of each secondary bucket onto different slices of the corresponding bucket; each slice is used to store metadata index information for a corresponding secondary bucket. The shards establish a mapping relationship with the objects, and the object metadata in the bucket is mapped to different shards of the bucket. In order to avoid the limitation disadvantage of traversing a large bucket through an asynchronous processing mechanism, the fragmentation rule is set when the new bucket is built, and when the new bucket is built for a user, the list traversal storage performance of the bucket can be improved by increasing the fragmentation number of the bucket. Too large or too small of the fragment data setting can affect the objects in the bucket, and therefore, the controllability is not provided. A person skilled in the art can design a reasonable value for the number of fragments of the bucket when the bucket is created according to actual conditions, if the number of the set fragments is too large, the number of objects in the bucket is small, and list traversal requests are consumed in invalid traversal; if the number of slices is set to be too small, the number of objects contained in a single slice will be too large, and traversal performance will be reduced.

S102: and when receiving a user read-write request, processing the user read-write request based on the distributed object hierarchical storage architecture.

In the application, before each service user uses the distributed object storage to store data, the public key information of the private keys of the user and the user needs to be created in the distributed object storage system, and a secondary bucket corresponding to the object data is created, wherein each secondary bucket can store massive object data. The user read-write requests include write requests for writing data into the distributed object storage system requested by users, and also include read requests for accessing data stored in the distributed object storage system requested by users.

In the technical scheme provided by the embodiment of the invention, the barrel is layered, and a user is completely transparent when accessing the barrel data. The layered barrel structure and the secondary barrels are calculated and mapped through a consistent Hash algorithm, the algorithm can ensure that the number of objects in each secondary barrel is relatively balanced, and the access barrel path can be unique through Hash calculation.

It should be noted that, in the present application, there is no strict sequential execution order among the steps, and as long as the logical order is met, the steps may be executed simultaneously or according to a certain preset order, and fig. 1 is only an exemplary manner, and does not represent that only the execution order is the order.

It can be understood that after a user registers in the distributed object storage system, the user needs to create a physical bucket for storing business data or other data before storing the data in the distributed object storage system, that is, the distributed object storage system will create a corresponding secondary bucket for the user, and the following embodiments illustrate embodiments of creating a secondary bucket for the user, which may include:

determining the total number of the second-level barrel according to the configuration information of the second-level barrel;

acquiring the number of fragments of each target secondary barrel;

and creating a plurality of target secondary buckets based on the total number of the secondary buckets and the number of the fragments of each target secondary bucket, and establishing a mapping relation between each target secondary bucket and each target primary bucket according to a consistent hash algorithm.

In this embodiment, for distinction and no ambiguity, the second-level bucket created by the user corresponding to the bucket creation request is referred to as a target second-level bucket, and the first-level bucket corresponding to each target second-level bucket is referred to as a target first-level bucket. After the distributed object storage system starts the RGW object gateway service, the configuration information of the secondary buckets of the primary bucket, such as the configuration number of the secondary bucket, may be read first, and the configuration number of the secondary bucket may preset a configuration threshold range, such as an integer of which the configuration number may be in a range of 1 to 16. Obtaining the configuration number and the configuration threshold range by analyzing the configuration information of the secondary bucket; if the configuration number is within the configuration threshold value range, calling a first relational expression to calculate the total number of the secondary buckets; and if the configuration number is not in the configuration threshold range, the total creation number of the secondary buckets is the same as the configuration number. Calling a first relational expression to calculate the total creation number of the secondary buckets as follows:

according toN=bucket_name×kAndN＞N ₀determining the total number of the secondary buckets;

in the formula (I), the compound is shown in the specification,Ncreate a total number for the secondary bucket, anNIs greater thanN ₀Is the smallest integer of (a) or (b),kis a positive integer and is a non-zero integer,N ₀in order to configure the maximum value of the threshold range,bucket_namethe number of the devices is configured.

For example, the configured threshold range is [1, 16 ]]Configuring the maximum value of the threshold range to be 16 when 1 <bucket_nameWhen < 16, calculateN=bucket_name×kSo thatNIs the smallest integer greater than 16 and is,kis a positive integer; when in usebucket_nameNot less than 16 orbucket_nameWhen the ratio is not less than 1,N=bucket_name。

it can be understood that the consistent hash algorithm has the advantages of fault tolerance and expansibility. If the number of secondary buckets is reduced or increased, only a small amount of data is migrated, and in a conventional hash table, adding or deleting a slot requires remapping of almost all information. The consistent hashing algorithm can reduce performance overhead of migration when the number of slots is changed. Based on this, this embodiment also provides a way of dynamically adjusting the secondary bucket, improves the storage performance of the entire distributed object storage system, and improves the user experience, which may include the following contents:

receiving a bucket number adjusting request, and acquiring the current bucket number by analyzing the bucket number adjusting request;

if the current bucket number is larger than the total number of the created second-level buckets, calculating the newly added bucket number, creating corresponding newly added second-level buckets based on the newly added bucket number and the fragmentation number of each newly added second-level bucket, and simultaneously establishing the mapping relation between each newly added second-level bucket and the target first-level bucket according to a consistent Hash algorithm;

if the current bucket number is smaller than the total number of the second-level buckets, calculating the number of deleted buckets, and counting candidate second-level buckets without stored data; if the total number of the candidate secondary buckets is larger than or equal to the number of the deleted buckets, selecting a plurality of secondary buckets to be deleted from the candidate secondary buckets for deletion processing, wherein the number of the secondary buckets to be deleted is the same as the number of the deleted buckets; if the total number of the candidate secondary buckets is less than the number of the deleted buckets, deleting the candidate secondary buckets, selecting a plurality of to-be-deleted data secondary buckets for storing data from the target secondary buckets, and performing bucket deletion operation after migrating the data of the to-be-deleted data secondary buckets; the number of the buckets of each secondary bucket of the data to be deleted is the difference between the number of the deleted buckets and the total number of the buckets of each candidate secondary bucket.

In this embodiment, can be in real time according to the number of user's demand adjustment second grade bucket, can increase second grade bucket number on the basis of original second grade bucket number, also can delete partly original second grade bucket. For the newly added secondary bucket, the bucket is only needed to be created according to the secondary bucket creation process. For deletion, in order to improve the whole working efficiency and convenience and reduce the migration data volume, the secondary bucket which has no storage object or few storage objects or the secondary bucket which stores the data to be deleted is preferentially deleted.

In the foregoing embodiment, how to perform data access is not limited, and an optional implementation manner of data access provided in this embodiment may include the following steps:

determining the position of a hash ring through a consistent hash algorithm according to the name of an object to be accessed;

determining a secondary bucket to be read closest to the position of the hash ring on the hash ring in a clockwise direction by taking the position of the hash ring as a starting point;

based on the name of the object to be accessed, corresponding metadata information is determined by traversing each fragment of the secondary bucket to be read, and the storage position of the object to be accessed in the secondary bucket to be read is determined by the metadata information.

For the purpose of distinction, the present embodiment refers to the secondary bucket stored in the object to be accessed corresponding to the object access request as the secondary bucket to be read. In this embodiment, when accessing the object data, the object name (key) in the request is analyzed, the hash ring position is obtained through a consistent hash algorithm according to the key, as shown in fig. 3, and the closest secondary bucket is found on the hash ring in a clockwise direction, where the secondary bucket is the secondary bucket storing the object. After the secondary bucket is determined, corresponding metadata information is determined through the fragment information, the storage position of the object is further determined according to the object storage rule in the metadata information, after the storage position is determined, data reading can be conducted, and the read data are fed back to the client.

In the foregoing embodiment, how to perform data writing in the distributed object storage system is not limited, and an implementation manner of data writing is provided in this embodiment, and may include:

determining the hash ring position of the secondary bucket to be written on the hash ring through a consistent hash algorithm according to the identification information of the secondary bucket to be written so as to determine the mapping relation between the secondary bucket to be written and the corresponding primary bucket;

and writing the data to be written into the secondary bucket to be written based on the object placement rule of the secondary bucket to be written, and simultaneously establishing a mapping relation between the data to be written and the target fragment of the secondary bucket to be written.

For the sake of distinction, the second-level bucket to which data to be written corresponding to the object data write request is to be written is referred to as a second-level bucket to be written. The to-be-written bucket identification information of the present embodiment may be, for example, a to-be-written bucket id. The object placement rules are stored into metadata information at the time of secondary bucket creation, which is used to store location storage information for object data. When the data writing operation is executed, the data writing operation is carried out according to the predefined object placement rule.

As an optional implementation manner of this embodiment, the distributed object storage system may store the object of the user service data through the pool. The present embodiment may pre-construct the storage pool, where the storage pool includes a data pool and an index pool, the data pool includes a plurality of secondary buckets, that is, each secondary bucket is created in the storage pool, the data pool is used to store user data, and the index pool is used to store object index information of each secondary bucket. The mapping of the placement rules of the distributed object buckets to the storage pools is shown in FIG. 4. When the object is written into the storage pool, the bottom layer establishes a mapping relation between the primary bucket and the secondary bucket through an algorithm, writes the data to be written into the secondary bucket to be written in the data pool, establishes a mapping relation between the data to be written and the fragments to be written into the secondary bucket, and updates the corresponding metadata index information of the index pool. The distributed object is stored in the existing fragmentation method of the object metadata index pool, and the problem of service triggering list traversal performance of a single barrel for storing a large amount of object data is solved. By adding a layering method of a secondary bucket and a consistent hash algorithm method, the efficiency of traversing the data of the bucket object is improved.

In order to further improve the data storage performance, the storage pool further includes a transit data pool, and accordingly, based on the object placement rule to be written into the secondary bucket, the process of writing the data to be written into the secondary bucket may include:

if the data to be written is larger than a preset data threshold value, dividing the data to be written into a plurality of data blocks, and placing each data block into a transfer data pool;

and writing each data block read from the transfer data pool into a secondary bucket to be written.

In this embodiment, the preset data threshold may be flexibly selected according to an actual application scenario, and by performing data writing operation after blocking a large file or large data, in order to further improve data writing efficiency, multiple threads may be simultaneously called to perform the data writing operation, and the transit data pool is used for storing each data block. After each data block is cut, each data block needs to be numbered and identified, when a user reads the large files or the large data, the corresponding data block is read out firstly, then data integration is carried out according to the number identification information, the integrated data is fed back to the client, and the large data reading and writing efficiency is effectively improved.

The embodiment of the invention also provides a corresponding device for the distributed object storage method, so that the method has higher practicability. Wherein the means can be described separately from the functional module point of view and the hardware point of view. In the following, the distributed object storage apparatus provided by the embodiment of the present invention is introduced, and the distributed object storage apparatus described below and the distributed object storage method described above may be referred to correspondingly.

Based on the angle of the functional module, referring to fig. 5, fig. 5 is a structural diagram of a distributed object storage apparatus according to an embodiment of the present invention, in a specific implementation manner, where the apparatus may include:

a storage architecture building module 501, configured to pre-build a distributed object hierarchical storage architecture including a primary bucket and a plurality of secondary buckets; the mapping relation is established between the primary bucket and each secondary bucket through a consistent Hash algorithm; each secondary bucket is used for storing user data, and the object metadata of each secondary bucket is mapped to different fragments of the corresponding bucket; each slice is used to store metadata index information for a corresponding secondary bucket.

The request processing module 502 is configured to, when a user read-write request is received, process the user read-write request based on the distributed object hierarchical storage architecture.

Optionally, in some embodiments of this embodiment, the apparatus may include a bucket creation module, configured to, when a bucket creation request is detected, obtain secondary bucket configuration information of a target primary bucket; determining the total number of the second-level barrel according to the configuration information of the second-level barrel; acquiring the number of fragments of each target secondary barrel; and creating a plurality of target secondary buckets based on the total number of the secondary buckets and the number of the fragments of each target secondary bucket, and establishing a mapping relation between each target secondary bucket and each target primary bucket according to a consistent hash algorithm.

As an optional implementation manner of this embodiment, the bucket creating module may be further configured to: obtaining the configuration number and the configuration threshold range by analyzing the configuration information of the secondary bucket; if the configuration number is within the configuration threshold value range, calling a first relational expression to calculate the total number of the secondary buckets; and if the configuration number is not in the configuration threshold range, the total creation number of the secondary buckets is the same as the configuration number.

As an optional implementation manner of the foregoing embodiment, the above bucket creation module may be further configured to: according toN=bucket_name×kAndN＞N ₀determining the total number of the secondary buckets; in the formula (I), the compound is shown in the specification,Ncreate a total number for the secondary bucket, anNIs greater thanN ₀Is the smallest integer of (a) or (b),kis a positive integer and is a non-zero integer,N ₀in order to configure the maximum value of the threshold range,bucket_namethe number of the devices is configured.

As another optional implementation manner of this embodiment, the bucket creating module may further include a dynamic adjustment unit, configured to receive a bucket number adjustment request, and obtain the current bucket number by analyzing the bucket number adjustment request; if the current bucket number is larger than the total number of the created second-level buckets, calculating the newly added bucket number, creating corresponding newly added second-level buckets based on the newly added bucket number and the fragmentation number of each newly added second-level bucket, and simultaneously establishing the mapping relation between each newly added second-level bucket and the target first-level bucket according to a consistent Hash algorithm; if the current bucket number is smaller than the total number of the second-level buckets, calculating the number of deleted buckets, and counting candidate second-level buckets without stored data; if the total number of the candidate secondary buckets is larger than or equal to the number of the deleted buckets, selecting a plurality of secondary buckets to be deleted from the candidate secondary buckets for deletion processing, wherein the number of the secondary buckets to be deleted is the same as the number of the deleted buckets; if the total number of the candidate secondary buckets is less than the number of the deleted buckets, deleting the candidate secondary buckets, selecting a plurality of to-be-deleted data secondary buckets for storing data from the target secondary buckets, and performing bucket deletion operation after migrating the data of the to-be-deleted data secondary buckets; the number of the buckets of each secondary bucket of the data to be deleted is the difference between the number of the deleted buckets and the total number of the buckets of each candidate secondary bucket.

Optionally, in other embodiments of this embodiment, the request processing module 502 includes a data reading unit, configured to, when an object data access request is received, obtain a name of an object to be accessed by analyzing the object data access request; determining the position of a hash ring through a consistent hash algorithm according to the name of an object to be accessed; determining a secondary bucket to be read closest to the position of the hash ring on the hash ring in a clockwise direction by taking the position of the hash ring as a starting point; based on the name of the object to be accessed, corresponding metadata information is determined by traversing each fragment of the secondary bucket to be read, and the storage position of the object to be accessed in the secondary bucket to be read is determined by the metadata information.

Optionally, in some other embodiments of this embodiment, the request processing module 502 may further include a data writing unit, configured to obtain, when an object data writing request is received, to-be-written bucket identification information and to-be-written data by analyzing the object data writing request; determining the hash ring position of the secondary bucket to be written on the hash ring through a consistent hash algorithm according to the identification information of the secondary bucket to be written so as to determine the mapping relation between the secondary bucket to be written and the corresponding primary bucket; and writing the data to be written into the secondary bucket to be written based on the object placement rule of the secondary bucket to be written, and simultaneously establishing a mapping relation between the data to be written and the target fragment of the secondary bucket to be written.

As an optional implementation manner of the foregoing embodiment, the data writing unit may be further configured to: a storage pool is constructed in advance; the storage pool comprises a data pool and an index pool, the data pool comprises a plurality of secondary buckets, the data pool is used for storing user data, and the index pool is used for storing object index information of each secondary bucket; and writing the data to be written into the secondary bucket to be written in the data pool, and updating the index pool.

As an optional implementation manner of the foregoing embodiment, the data writing unit may further be configured to: the storage pool also comprises a transfer data pool which judges whether the data to be written is larger than a preset data threshold value; if the data to be written is larger than a preset data threshold value, dividing the data to be written into a plurality of data blocks, and placing each data block into a transfer data pool; and writing each data block read from the transfer data pool into a secondary bucket to be written.

The functions of the functional modules of the distributed object storage apparatus according to the embodiment of the present invention may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Therefore, the embodiment of the invention can improve the traversing efficiency of the object bucket data of large scale and magnitude order and improve the performance of the distributed object storage system on the basis of not increasing the cost of the distributed object storage system.

The above-mentioned distributed object storage apparatus is described from the perspective of functional modules, and further, the present application also provides an electronic device described from the perspective of hardware. Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device includes a memory 60 for storing a computer program; a processor 61, configured to implement the steps of the distributed object storage method as mentioned in any of the above embodiments when executing the computer program.

The processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the processor 61 may also be a controller, a microcontroller, a microprocessor or other data processing chip, and the like. The processor 61 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 61 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 61 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 61 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 60 may include one or more computer-readable storage media, which may be non-transitory. Memory 60 may also include high speed random access memory as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. The memory 60 may in some embodiments be an internal storage unit of the electronic device, for example a hard disk of a server. The memory 60 may also be an external storage device of the electronic device in other embodiments, such as a plug-in hard disk provided on a server, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 60 may also include both internal storage units of the electronic device and external storage devices. The memory 60 may be used for storing various data and application software installed in the electronic device, such as: the code of the program that executes the vulnerability handling method, etc. may also be used to temporarily store data that has been output or is to be output. In this embodiment, the memory 60 is at least used for storing a computer program 601, wherein the computer program is loaded and executed by the processor 61, and then the relevant steps of the distributed object storage method disclosed in any one of the foregoing embodiments can be implemented. In addition, the resources stored by the memory 60 may also include an operating system 602, data 603, and the like, and the storage may be transient storage or permanent storage. Operating system 602 may include Windows, Unix, Linux, etc., among others. Data 603 may include, but is not limited to, data corresponding to distributed object storage results, and the like.

In some embodiments, the electronic device may further include a display 62, an input/output interface 63, a communication interface 64, otherwise known as a network interface, a power supply 65, and a communication bus 66. The display 62 and the input/output interface 63, such as a Keyboard (Keyboard), belong to a user interface, and the optional user interface may also include a standard wired interface, a wireless interface, and the like. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, as appropriate, is used for displaying information processed in the electronic device and for displaying a visualized user interface. The communication interface 64 may optionally include a wired interface and/or a wireless interface, such as a WI-FI interface, a bluetooth interface, etc., typically used to establish a communication link between an electronic device and other electronic devices. The communication bus 66 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not intended to be limiting of the electronic device and may include more or fewer components than those shown, such as a sensor 67 that performs various functions.

The functions of the functional modules of the electronic device according to the embodiments of the present invention may be specifically implemented according to the method in the above method embodiments, and the specific implementation process may refer to the description related to the above method embodiments, which is not described herein again.

It is to be understood that, if the distributed object storage method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be substantially or partially implemented in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods of the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrically erasable programmable ROM, a register, a hard disk, a multimedia card, a card type Memory (e.g., SD or DX Memory, etc.), a magnetic Memory, a removable magnetic disk, a CD-ROM, a magnetic or optical disk, and other various media capable of storing program codes.

Based on this, the embodiment of the present invention further provides a readable storage medium, which stores a computer program, and the computer program is executed by a processor, and the steps of the distributed object storage method according to any one of the above embodiments are provided.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. For hardware including devices and electronic equipment disclosed by the embodiment, the description is relatively simple because the hardware includes the devices and the electronic equipment correspond to the method disclosed by the embodiment, and the relevant points can be obtained by referring to the description of the method.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

A distributed object storage method, an apparatus, an electronic device, and a readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A distributed object storage method, comprising:

2. The distributed object storage method according to claim 1, after the pre-building of the distributed object hierarchical storage architecture including the primary bucket and the plurality of secondary buckets, further comprising:

acquiring the number of fragments of each target secondary barrel;

3. The distributed object storage method of claim 2, wherein said determining a secondary bucket creation total based on said secondary bucket configuration information comprises:

4. The distributed object storage method of claim 3, wherein said invoking the first relationship to calculate the secondary bucket creation total is:

5. The distributed object storage method of claim 2, wherein after creating a plurality of target secondary buckets based on the secondary bucket creation total and the number of shards for each target secondary bucket, further comprising:

6. The distributed object storage method according to any one of claims 1 to 5, wherein when a user read-write request is received, processing the user read-write request based on the distributed object hierarchical storage architecture includes:

7. The distributed object storage method according to any one of claims 1 to 5, wherein when a user read-write request is received, processing the user read-write request based on the distributed object hierarchical storage architecture includes:

8. The distributed object storage method according to claim 7, wherein the writing the data to be written to the secondary bucket to be written based on the object placement rule of the secondary bucket to be written comprises:

9. The distributed object storage method of claim 8, wherein the storage pool further comprises a transit data pool, and wherein writing the data to be written to the secondary bucket to be written based on the object placement rule of the secondary bucket to be written comprises:

10. A distributed object storage apparatus, comprising:

11. An electronic device comprising a processor and a memory, the processor being configured to implement the steps of the distributed object storage method of any one of claims 1 to 9 when executing a computer program stored in the memory.

12. A readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of the distributed object storage method according to any one of claims 1 to 9.