CN113010526A

CN113010526A - Storage method and device based on object storage service

Info

Publication number: CN113010526A
Application number: CN202110421022.5A
Authority: CN
Inventors: 卢行; 杨瑞峰; 王豪迈; 胥昕; 翟静
Original assignee: Xsky Beijing Data Technology Corp ltd
Current assignee: Xsky Beijing Data Technology Corp ltd
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-06-22

Abstract

The application discloses a storage method and device based on object storage service. Wherein, the method comprises the following steps: initializing a preset number of index fragments in a storage bucket of the object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; and storing the target object to be stored to the index fragment according to a preset algorithm. The method and the device solve the technical problem that the dynamic expansion of the mass data cannot be supported by a single storage bucket in the application scene of storing the mass data by using the object storage service at present.

Description

Storage method and device based on object storage service

Technical Field

The application relates to the field of computer information storage and calculation, in particular to a storage method and device based on object storage service.

Background

With the rapid development of internet applications, an increasingly large amount of unstructured data needs to be stored. Object storage services can provide a solution to mass storage, and the support of billions or billions of objects by a single bucket is an important product specification issue currently facing. The bucket object index not only records object metadata, but also establishes an index relationship between buckets and objects. In the practical application scenario of a user, it is required to support dynamic expansion of mass data and pursue efficient writing and query speeds.

Aiming at the problem that the dynamic expansion of massive data supported by a single storage bucket cannot be realized in the application scene of storing the massive data by using the object storage service at present, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the application provides a storage method and a storage device based on an object storage service, which at least solve the technical problem that a single storage bucket cannot support dynamic expansion of mass data in the current application scene of storing the mass data by using the object storage service.

According to an aspect of an embodiment of the present application, there is provided a storage method based on an object storage service, including: initializing a preset number of index fragments in a storage bucket of the object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; and storing the target object to be stored to the index fragment according to a preset algorithm.

Optionally, storing the target object to be stored to the index fragment according to a preset algorithm, including: determining a hash value of the name of a target object to be stored; carrying out remainder operation on the value range of the preset number of index fragments by utilizing the hash value to obtain an operation result; and determining the index fragment for storing the target object to be stored according to the operation result.

Optionally, storing the target object to be stored to the index fragment according to a preset algorithm, further comprising: if the number of the target objects stored in the first index fragment exceeds a preset threshold value, splitting the first index fragment into target index fragments, wherein the target index fragments comprise a first index fragment and a second index fragment, and the first index fragment is any one of the index fragments with the preset number; and storing the target object to be stored to the target index fragment.

Optionally, splitting the first index slice into target index slices includes: determining an average value of a numerical value corresponding to the first index fragment and a numerical value corresponding to a third index fragment adjacent to the first index fragment; and taking the average value as a corresponding numerical value of the second index fragment.

Optionally, storing the target object to be stored to the target index shard includes: respectively writing target objects to be stored into a first index fragment, a second index fragment and a log; and deleting the repeated target objects stored on the first index fragment and the second index fragment.

Optionally, after writing the target object to be stored into the first index fragment, the second index fragment, and the log, respectively, the method further includes: acquiring a read request; reading a target object corresponding to the reading request from the first index fragment; and if the target object corresponding to the read request is not read from the first index fragment, reading the target object corresponding to the read request from the second index fragment.

Optionally, after storing the target object to be stored in the index fragment, the method further includes: caching the index fragments in the storage barrel to a memory; in the event that a target object stored in any one of the index shards in the bucket changes, the target object stored in the other index shards in the bucket is updated.

Optionally, after caching the index fragments in the buckets into the memory, the method further includes: inquiring a target object corresponding to an inquiry instruction from the index fragment, wherein the inquiry instruction is used for searching the target object from the storage bucket; and outputting the target object corresponding to the query instruction.

Optionally, querying a target object corresponding to the query instruction from the index shard includes: respectively determining the weight value of each index fragment; respectively reading a target object list from each index fragment according to the weight value; and sorting the target object list.

According to another aspect of the embodiments of the present application, there is also provided an object storage service-based storage apparatus, including: the device comprises a setting module, a storage module and a processing module, wherein the setting module is used for initializing a preset number of index fragments in a storage bucket of an object storage service, the storage bucket is a container used for storing a target object in the object storage service, the index fragments are used for storing the target object, and a numerical value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; and the storage module is used for storing the target object to be stored to the index fragment according to a preset algorithm.

According to still another aspect of the embodiments of the present application, there is also provided a non-volatile storage medium, where the non-volatile storage medium includes a stored program, and when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the above storage method based on the object storage service.

According to still another aspect of the embodiments of the present application, there is also provided a processor configured to execute a program stored in a memory, where the program executes the above object-based storage service storage method.

In the embodiment of the application, a preset number of index fragments are initialized in a storage bucket of an object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a numerical value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; the method comprises the steps of storing a target object to be stored into an index fragment according to a preset algorithm, and solving the problem of mass data expansion in the process of reading and writing files through a distributed hash algorithm, thereby realizing the technical effect that a single storage bucket supports mass data expansion in an object storage service, and further solving the technical problem that the single storage bucket cannot support mass data dynamic expansion in the application scene of storing mass data by using the object storage service at present.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic diagram of writing data to a bucket using a general hash algorithm according to the related art;

FIG. 2 is a flow chart of a storage method based on an object storage service according to an embodiment of the present application;

FIG. 3 is a schematic diagram of initializing index shards in a bucket according to an embodiment of the present application;

FIG. 4 is a schematic diagram of storing data to an index shard according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a data write index sharding splitting process according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a bucket metadata update process according to an embodiment of the present application;

FIG. 7 is a schematic diagram of querying data from buckets according to an embodiment of the present application;

fig. 8 is a block diagram of a storage apparatus based on an object storage service according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present application, there is provided an embodiment of a storage method based on an object storage service, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

In view of the technical problems mentioned in the background section, in the existing solutions, metadata management and file index query are implemented by a common hash algorithm. The implementation details are that a fixed number of index files (N) are initialized when a bucket is created, as shown in fig. 1, 123123123122 is obtained by calculating a hash value of a file name (s3.api) when the file is uploaded, 2 is obtained by modulo calculation on N, and finally the fragment 2 is written in. And similarly, when in query, the object is queried through a common hash algorithm to find the fragment 2 and read the corresponding metadata. The existing scheme can quickly read files by using a hash algorithm in distributed storage application, but has certain limitation in mass data scenes and data hot spots.

In practical application, one solution is to initialize a fixed number of fragments and allow dynamic expansion to be implemented in a manner of multiplying the number of fragments by 2 when single fragmented data is written in a certain number (too many files are written in a single fragment, which has management complexity, generally 100 ten thousand), but there is a problem of write protection when data is redistributed and written, and thus certain limitations are brought. Another solution can allow a single bucket to support more data by initializing more slices, limiting slice splitting, but has two problems: 1) the single barrel cannot support real mass data; 2) this introduces a complexity of metadata management when a single bucket stores less data.

The existing solution can not solve the problem fundamentally and is difficult to meet the requirement that the current user pursues increasingly that a single barrel supports mass data. Meanwhile, when the user uses the storage service, the number of objects stored in a single bucket is uncertain, and the data volume of the user is usually generated by gradual explosion when the user uses the storage service.

The dynamic indexing method based on the storage device is introduced, namely, a novel distributed hash algorithm is provided to solve the problem of mass data expansion in the process of reading and writing files, meanwhile, the traversal of single-bucket files is greatly improved through optimizing the index, and the user experience is improved. The method is described in detail below:

fig. 2 is a flowchart of a storage method based on an object storage service according to an embodiment of the present application, and as shown in fig. 2, the method includes the following steps:

step S202, initializing a preset number of index fragments in a storage bucket of the object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of data quantity which can be stored by the preset number of index fragments;

it should be noted that the merchant target object is mass data to be stored.

Fig. 3 is a schematic diagram of initializing index shards in a bucket according to an embodiment of the present application, and as shown in fig. 3, by initializing a certain number of index shards in a create bucket service, a user may initialize a fixed number of shards according to different attributes when creating a bucket, and the initialization is divided into three levels: a small number of 10 tablets, a standard 100 tablets and a large number of 1000 tablets. Wherein, all the slicing value ranges are as follows: the power of 23 (the value range of the data amount stored by all index fragments) of 0-2, and the corresponding value of each index fragment uniformly falls between the maximum value and the minimum value of the value range. The specific implementation details of which are determined by the incoming parameters when the bucket is created.

For example, 10 index shards are initialized, and the value range of the 10 index shards is 0 to 10, so that the values corresponding to the 10 index shards are 0, 1, 2 …, and 9, respectively.

And step S204, storing the target object to be stored to the index fragment according to a preset algorithm.

Through the steps, the problem of mass data expansion in the process of reading and writing files is solved through the distributed hash algorithm, and therefore the technical effect that a single storage bucket supports mass data expansion in the object storage service is achieved.

According to an alternative embodiment of the present application, step S204 is implemented by: determining a hash value of the name of a target object to be stored; carrying out remainder operation on the value range of the preset number of index fragments by utilizing the hash value to obtain an operation result; and determining the index fragment for storing the target object to be stored according to the operation result.

Fig. 4 is a schematic diagram of storing data into index shards according to an embodiment of the present application, and as shown in fig. 4, the hash value is calculated and queried by a consistent hash algorithm when the data is written. When the object is written, a hash value is calculated according to the name of the object, for example, the hash value calculated by the object S3.api is 123123123122, a is obtained by taking the remainder of the 23 th power of 2 (i.e., the value range in step S202), and the mode of falling into the fragment can adopt a clockwise mode and a counterclockwise mode, in this embodiment, the current application adopts a clockwise mode. If the data is specified to be clockwise, the bucket metadata is inquired and then is determined to fall into the fragment n2, and the method is adopted for data reading and writing.

For example, the value of a obtained by remainder is greater than the value corresponding to the index fragment n1 and less than the value corresponding to the index fragment n2, and because the manner of falling into the fragment is clockwise, the object s3.api falls into the index fragment n 2.

According to another optional embodiment of the present application, when step S204 is executed, if the number of target objects stored in the first index fragment exceeds a preset threshold, the first index fragment is split into target index fragments, where the target index fragments include a first index fragment and a second index fragment, and the first index fragment is any one of the preset number of index fragments; and storing the target object to be stored to the target index fragment.

Optionally, splitting the first index fragment into the target index fragments is specifically implemented by the following method: determining an average value of a numerical value corresponding to the first index fragment and a numerical value corresponding to a third index fragment adjacent to the first index fragment; and taking the average value as a corresponding numerical value of the second index fragment.

FIG. 5 is a schematic diagram of a data write index shard splitting process according to an embodiment of the present application, where, as shown in FIG. 5, when an object written by a single index shard exceeds a threshold, the index shard is split. The write object1 calculation should fall into n3 slices, triggering n3 splits when the n3 object number reaches the threshold M (100 ten thousand), which is achieved by calculating the average n31 slice of n3 and n 4.

In other optional embodiments of the present application, storing the target object to be stored to the target index shard includes: respectively writing target objects to be stored into a first index fragment, a second index fragment and a log; and deleting the repeated target objects stored on the first index fragment and the second index fragment.

According to an optional embodiment of the present application, after writing a target object to be stored into a first index fragment, a second index fragment, and a log, respectively, a read request needs to be obtained; reading a target object corresponding to the reading request from the first index fragment; and if the target object corresponding to the read request is not read from the first index fragment, reading the target object corresponding to the read request from the second index fragment.

During split n3, n3 and n31 double write, with n3 and n31 deduplication occurring simultaneously, updating the bucket's metadata after completion. The n3 partition is characterized in that data write metadata update should be updated to n31 and n3 simultaneously during the splitting process, the reliability of the data is guaranteed, and log logs are written simultaneously. The n3 slice is accessed on a read request because the bucket's metadata is not updated at this time. n3 and n31 implement deduplication by iterator deletion of objects that do not belong to their own shards. The read request accesses the original tile n3, and accesses n31 when the read is not due. After the iterations of n3 and n31 shards are completed, there may be objects that have not been completed by iteration, so deduplication is again completed by reading the log.

By the method, the device for splitting the index fragment writes the metadata in the original index fragment and the new index fragment generated by splitting, and reads the metadata from the original index fragment and the new index fragment generated by splitting in sequence, so that the reliability of the data can be ensured, and the problem of metadata loss in the process of splitting the index fragment is avoided.

The method solves the problems of write protection and incapability of realizing that a single storage bucket supports mass data storage in the method shown in FIG. 1 through the splitting of the single fragment.

According to an alternative embodiment of the present application, after the step S204 is completed, the index segments in the buckets are cached in the memory; in the event that a target object stored in any one of the index shards in the bucket changes, the target object stored in the other index shards in the bucket is updated.

In this step, after the data (object) is written into the index fragment, the metadata of the bucket is updated, and the metadata of the bucket is cached in the memory in a subscription cache manner.

Fig. 6 is a schematic diagram of a bucket metadata update process according to an embodiment of the present application, and as shown in fig. 6, since both read and write accesses of an object need to access fragment information, in order to improve performance, metadata of a bucket is cached in a memory, and after the bucket metadata changes, all nodes are notified to update the metadata, so that a problem of data inconsistency is solved.

In some optional embodiments of the present application, after caching the index fragments in the bucket to the memory, querying a target object corresponding to a query instruction from the index fragments, where the query instruction is used to search for the target object from the bucket; and outputting the target object corresponding to the query instruction.

In other optional embodiments of the present application, querying a target object corresponding to a query instruction from an index fragment includes: respectively determining the weight value of each index fragment; respectively reading a target object list from each index fragment according to the weight value; and sorting the target object list.

Fig. 7 is a schematic diagram of querying data from a bucket according to an embodiment of the present application, and as shown in fig. 7, when indexing a bucket object, a certain proportion of object lists on a fragment are respectively read by calculating a weight value of the fragment, and finally, the object lists are returned to a user by sorting through multi-thread calculation, so that a querying speed is increased. The original logic lists 1000 objects by accessing each fragment through multiple threads, and finally returns an object list through sorting, and the query speed is very slow when the fragments are too many. The modified logic increases the query speed by rounding up the number of objects by accessing the shard metadata information, e.g., n1 returns 1, n2 returns 500 … n10 returns 6.

And the object lists in a certain proportion on the fragments are respectively read by calculating the weight values of the fragments, and finally, the object lists are returned to the user by multithread calculation sequencing, so that the query speed is increased.

According to the method provided by the embodiment of the application, the problem of mass data expansion in the file reading and writing process is solved by using the distributed hash algorithm, the problem of data hot spots is solved, and the mass data expansion of a single-bucket object is realized. In the process of traversing the barrel object, the traversing speed of the single-barrel file is greatly improved through optimizing the index, and the user experience is improved.

Fig. 8 is a block diagram of a storage apparatus based on an object storage service according to an embodiment of the present application, where, as shown in fig. 8, the apparatus includes:

a setting module 80, configured to initialize a preset number of index fragments in a bucket of an object storage service, where the bucket is a container used for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls within a value range of a data amount that can be stored by the preset number of index fragments;

and the storage module 82 is used for storing the target object to be stored to the index fragment according to a preset algorithm.

It should be noted that, reference may be made to the description related to the embodiment shown in fig. 2 for a preferred implementation of the embodiment shown in fig. 8, and details are not repeated here.

The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein the device where the nonvolatile storage medium is located is controlled to execute the storage method based on the object storage service when the program runs.

The nonvolatile storage medium stores a program for executing the following functions: initializing a preset number of index fragments in a storage bucket of the object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; and storing the target object to be stored to the index fragment according to a preset algorithm.

The embodiment of the present application further provides a processor, where the processor is configured to run a program stored in a memory, where the program executes the above storage method based on the object storage service when running.

The processor is configured to process a program that performs the following functions: initializing a preset number of index fragments in a storage bucket of the object storage service, wherein the storage bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments; and storing the target object to be stored to the index fragment according to a preset algorithm.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A storage method based on an object storage service is characterized by comprising the following steps:

initializing a preset number of index fragments in a bucket of an object storage service, wherein the bucket is a container for storing a target object in the object storage service, the index fragments are used for storing the target object, and a value corresponding to each index fragment uniformly falls into a value range of a data volume which can be stored by the preset number of index fragments;

and storing the target object to be stored to the index fragment according to a preset algorithm.

2. The method of claim 1, wherein storing the target object to be stored to the index shard according to a preset algorithm comprises:

determining a hash value of the name of the target object to be stored;

performing remainder operation on the value range of the preset number of index fragments by using the hash value to obtain an operation result;

and determining the index fragment for storing the target object to be stored according to the operation result.

3. The method according to claim 1 or 2, wherein storing the target object to be stored to the index fragment according to a preset algorithm, further comprises:

if the number of target objects stored in a first index fragment exceeds a preset threshold value, splitting the first index fragment into target index fragments, wherein the target index fragments comprise the first index fragment and a second index fragment, and the first index fragment is any one of the index fragments with the preset number;

and storing the target object to be stored to the target index fragment.

4. The method of claim 3, wherein splitting the first index slice into target index slices comprises:

determining an average value of a numerical value corresponding to the first index fragment and a numerical value corresponding to a third index fragment adjacent to the first index fragment;

and taking the average value as a numerical value corresponding to the second index fragment.

5. The method of claim 4, wherein storing the target object to be stored to the target index shard comprises:

writing the target object to be stored into the first index fragment, the second index fragment and the log respectively;

and deleting the repeated target objects stored on the first index fragment and the second index fragment.

6. The method of claim 5, wherein after writing the target object to be stored to the first index shard, the second index shard, and a log, respectively, the method further comprises:

acquiring a read request;

reading a target object corresponding to the reading request from the first index fragment;

and if the target object corresponding to the read request is not read from the first index fragment, reading the target object corresponding to the read request from the second index fragment.

7. The method of claim 1, wherein after storing the target object to be stored to the index shard, the method further comprises:

caching the index fragments in the storage barrel to a memory;

and updating the target objects stored in other index fragments in the storage bucket when the target object stored in any index fragment in the storage bucket is changed.

8. The method of claim 7, wherein after caching the index shards in the buckets into memory, the method further comprises:

querying a target object corresponding to a query instruction from the index fragment, wherein the query instruction is used for searching the target object from the bucket;

and outputting the target object corresponding to the query instruction.

9. The method of claim 8, wherein querying the index shard for a target object corresponding to a query instruction comprises:

respectively determining the weight value of each index fragment;

respectively reading a target object list from each index fragment according to the weight value;

and sequencing the target object list.

10. An object storage service based storage device, comprising:

the system comprises a setting module, a storage module and a processing module, wherein the setting module is used for initializing a preset number of index fragments in a storage bucket of an object storage service, the storage bucket is a container used for storing a target object in the object storage service, the index fragments are used for storing the target object, and a numerical value corresponding to each index fragment uniformly falls into a value range of data volume which can be stored by the preset number of index fragments;

and the storage module is used for storing the target object to be stored to the index fragment according to a preset algorithm.

11. A non-volatile storage medium, comprising a stored program, wherein when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the storage method based on the object storage service according to any one of claims 1 to 9.