CN115438016A - Dynamic fragmentation method, system, medium and device in distributed object storage - Google Patents
Dynamic fragmentation method, system, medium and device in distributed object storage Download PDFInfo
- Publication number
- CN115438016A CN115438016A CN202211275567.0A CN202211275567A CN115438016A CN 115438016 A CN115438016 A CN 115438016A CN 202211275567 A CN202211275567 A CN 202211275567A CN 115438016 A CN115438016 A CN 115438016A
- Authority
- CN
- China
- Prior art keywords
- bucket
- client
- distributed object
- file
- available
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a dynamic fragmentation method, a system, a medium and equipment in distributed object storage, which relate to the technical field of distributed storage and comprise the following steps: setting the available storage capacity of each bucket to be a preset value, and starting the next available bucket only when the used storage capacity of the current bucket reaches the preset value; acquiring a current available bucket corresponding to a user based on a received bucket application request; and sending the current available bucket address to a client based on an RGW file gateway so that the client completes the file operation corresponding to the bucket application request based on the current available bucket address. The dynamic fragmentation method, the system, the medium and the equipment in the distributed object storage solve the problem caused by excessive bucket files, so that the object storage service is more stable.
Description
Technical Field
The present invention relates to the field of distributed storage technologies, and in particular, to a method, a system, a medium, and a device for dynamic fragmentation in distributed object storage.
Background
Ceph is a unified, distributed storage system with excellent performance, reliability and scalability. Ceph supports object store (RADOSGW), chunk store (RBD), and file store (CephFS). An object storage Gateway (RADOS Gateway, RGW) is an open-source distributed object storage service, is established on the basis of ceph RADOS, provides a general s3 protocol object storage service, and can support a PB-level storage service. RGW provides a function of a user-specified key (key) to access a file, a pull list, etc., and distinguishes different users through a bucket/storage space (bucket). In the prior art, under the condition that files under a bucket are more and more, the performance of the file is remarkably reduced no matter the file is accessed or a pull list is pulled. Analysis shows that the performance consumption is large when certain file information under the bucket is obtained.
Rgw is a file gateway, extensible, as shown in fig. 1. osd represents a disk, and group 1, group 2, and group n each represent a primary and secondary copy set of the disk. The operation get/put/list of the bucket and other requests pass through the rgw first, and the rgw is routed to the osd group, wherein step 1 can randomly select one rgw, step 2 calculates which specific shard is specifically located by the selected rgw according to the number of sharding (sharding) of the bucket and key, and the shard calculates the osd group according to a bus algorithm to locate the specific osd.
As shown in fig. 2, a packet may correspond to multiple shredders, and the number is specified when creating, and the key-to-shredders algorithm is located by a hash (key)% shreddcount. As can be seen from the algorithm, the number of tiles, shard _ count, once determined, cannot be easily changed, otherwise the positions of all keys are changed. Thus, the routing rules do not facilitate extending the number of boards. In the prior art, a solution of consistent hash exists, but many keys are still migrated on the basis of large quantity, and the online service is also influenced. In addition, if the above problem is solved by setting a large number of boards, there is a problem of wasting resources. A shrrd corresponds to a rados object in ceph, the shrrd tasks that need to be round-robin during synchronization of the two rooms are also very large, and performance degradation occurs under some requests, for example, when a list is pulled, all shrards need to be aggregated to find out a correct list order.
Additionally, rgw double active synchronization is re-shareable disabled. That is, when the two rooms are synchronized, the number of the packet boards cannot be changed after being determined, otherwise, a problem occurs during synchronization. In order to ensure data security, a dual-machine-room synchronization policy must be adopted, and as more and more files under a bucket are under the policy, if the bucket cannot be re-fragmented, a performance problem occurs when the data volume is too large.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a method, a system, a medium, and a device for dynamically partitioning in distributed object storage, which solve the problem caused by excessive packet files and make the object storage service more stable.
To achieve the above and other related objects, the present invention provides a dynamic fragmentation method in distributed object storage, including the following steps: the available storage capacity of each bucket in the distributed object storage system is a preset value, and the next available bucket is started only when the used storage capacity of the current bucket reaches the preset value; acquiring a current available bucket based on a bucket application request sent by a client; and sending the current available bucket address to the client based on the RGW file gateway, so that the client completes the file operation corresponding to the bucket application request based on the current available bucket address.
The invention provides a dynamic fragmentation system in distributed object storage, which comprises:
the setting module is used for enabling the available storage capacity of each bucket in the distributed object storage system to be a preset value, and starting the next available bucket only when the used storage capacity of the current bucket reaches the preset value;
the acquisition module is used for acquiring the current available bucket based on the bucket application request sent by the client;
and the processing module is used for sending the current available bucket address to the client based on the RGW file gateway so as to enable the client to complete the file operation corresponding to the bucket application request based on the current available bucket address.
The invention provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the dynamic fragmentation method in the distributed object storage when executing the computer program.
The present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for dynamic fragmentation in distributed object storage.
As described above, the method, system, medium, and device for dynamic fragmentation in distributed object storage according to the present invention have the following advantages:
(1) When a bucket reaches a certain storage capacity, the next bucket is automatically switched to, and all new files are stored in the next bucket, so that the old bucket does not support the uploading operation of the new files any more, the effect of dynamic fragmentation is achieved by adding the buckets, and data migration is avoided;
(2) The problem caused by excessive bucket files is solved, and the object storage service is more stable;
(3) Each bucket can be guaranteed to fully utilize the storage capacity of the bucket, and the problems of insufficient storage capacity or excessive storage capacity caused by fragment storage based on a fixed number in the prior art are solved; and effective synchronization can be realized under the application scene of the two machine rooms.
Drawings
FIG. 1 is a schematic diagram illustrating a dynamic storage process for storing distributed objects in an embodiment of the prior art;
FIG. 2 is a diagram illustrating an operation state of a bucket in an embodiment of the prior art;
FIG. 3 is a flowchart of a dynamic fragmentation method in distributed object storage according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating an embodiment of a dynamic sharding system in distributed object storage according to the present invention;
FIG. 5 is a schematic structural diagram of a computer apparatus according to an embodiment of the invention.
Description of the element reference
41. Setting module
42. Acquisition module
43. Processing module
51. Processor/processing unit
52. Memory device
521. Random access memory
522. Cache memory
523. Storage system
524. Program/utility tool
5241. Program module
53. Bus line
54. Input/output interface
55. Network adapter
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
In the dynamic fragmentation method, the system, the medium and the equipment in the distributed object storage, when a bucket reaches a certain storage capacity, the next bucket is automatically switched to, and all new files are stored in the next bucket, so that the problem caused by excessive bucket files is solved, the object storage service is more stable, and the method, the system, the medium and the equipment have high practicability.
The internet protocol allows IP fragmentation. So when a packet is larger than the maximum transmission unit of the link, it can be broken down into many small enough fragments to be transmitted over it. Among them, sharding (sharding) is a type of database partitioning that divides a large database into smaller, faster, and more manageable portions, called data shards. The dynamic sharding server is used for dynamic sharding in distributed object storage.
As shown in fig. 3, in an embodiment, the dynamic fragmentation method in distributed object storage of the present invention includes the following steps:
s1, enabling the available storage capacity of each bucket in the distributed object storage system to be a preset value, and starting the next available bucket only when the used storage capacity of the current bucket reaches the preset value.
Specifically, a bucket (bucket) is a storage space in the MOS, and is a container for storing objects. The object storage is a flat storage mode, objects stored in the bucket are all in the same logic level, and a file structure with a plurality of levels is adopted in the file system. In MOS, the naming of buckets is globally unique. Each bucket generates a default bucket ACL (Access Control List) when being created, and each item of the bucket ACL List contains the authority of an authorized user, such as READ authority (READ), WRITE authority (WRITE), FULL Control authority (FULL _ Control), and the like. The user can operate the bucket only if the user has corresponding authority to the bucket, such as creating, deleting, displaying, setting the bucket ACL and the like. A user can create up to 100 buckets.
In the distributed object storage system, the original number of fragments is kept unchanged for each bucket. However, a preset value of the available storage capacity is set for each bucket, that is, the available storage capacity of the bucket does not exceed the preset value. When the used storage capacity of the bucket reaches the preset value, the next bucket can be automatically started, so that each bucket can fully utilize the available storage capacity of the bucket, and the problem that the storage capacity is insufficient or excessive due to fragment storage based on a fixed number in the prior art is solved.
And S2, acquiring the current available bucket based on the bucket application request sent by the client.
Specifically, the user sends a bucket application request to the dynamic fragmentation server through the client. The bucket application request contains user-associated bucket information. And the dynamic fragmentation server acquires the current available bucket corresponding to the user according to the bucket information associated with the user. For example, if the user-associated bucket information is a bucket test, it can be known that the user's bucket includes a test _1, a test _2 \8230, a test _ n, which respectively represent the first and second _8230andnth buckets of the user. Wherein test _ n is the currently available bucket corresponding to the user.
In an embodiment of the present invention, acquiring a currently available bucket based on a bucket application request sent by a client includes the following steps:
11 Check the used storage capacity of each bucket.
Specifically, for each bucket, its file available storage capacity is set, such as 100 kilobits. When the used storage capacity of the bucket is smaller than the preset value, the fact that file storage can be carried out in the bucket is indicated; and when the used storage capacity of the bucket is equal to the preset value, the fact that the file storage cannot be continuously carried out in the bucket is indicated, and a next bucket needs to be started.
12 The used storage capacity is smaller than the preset value is selected as the current available bucket.
Specifically, the available storage capacity of each bucket is set to be a preset value, and the next bucket is started when the used storage capacity of the current available bucket reaches the preset value. Therefore, the available storage capacity of all the buckets except the currently available bucket does not reach the preset value, and the available storage capacity of other buckets is the preset value. Therefore, the bucket with the used storage capacity smaller than the preset value is selected as the current available bucket.
In actual use, for each file request, if the currently available bucket is detected once in real time, time is consumed. Preferably, the currently available bucket is updated at regular time by setting the timer, so that the currently available bucket is positioned quickly and accurately.
In an embodiment of the present invention, obtaining a currently available bucket based on a bucket application request sent by a client further includes the following steps:
14 A timer is preset, and a current available bucket query request is initiated based on a preset time interval set by the timer to obtain a current available bucket; and when a bucket application request sent by the client is received, returning the current available bucket.
Specifically, the timer means that after a specified time elapses from a specified time, and then a timeout event is triggered, a user can customize the period and frequency of the timer. Preferably, the timer of the present invention is a software timer. The software timer is a system interface provided by an operating system, is constructed on the basis of a hardware timer, enables the system to provide a timer service which is not limited by hardware timer resources, and is similar to the hardware timer in function. The invention can select proper timer precision according to the system overhead.
Preferably, the timer may initiate an http request every half hour, 1 hour or 2 hours for making the currently available bucket query. The interval of the timer is too large, which causes the searched currently available packet to be inconsistent with the actually currently available packet, and thus cannot meet the requirement of actual application. Too small a timer interval increases the system load. Therefore, the preset time interval may be set according to an actual application scenario, and is not limited to the above example.
Through the timer setting, file data detection on all the buckets is not needed each time, and only the current available buckets are updated and recorded regularly. When a file operation request is received next time, the current available bucket can be directly given according to the record, so that the method is fast and accurate, and the power consumption of the system is reduced.
15 Based on the current available bucket query request, judging whether the used storage capacity of the current available bucket reaches the preset value; if so, starting the next bucket, and recording the currently available bucket as the next bucket; if not, the currently available bucket remains unchanged.
And S3, sending the current available bucket address to the client based on the RGW file gateway, so that the client completes the file operation corresponding to the bucket application request based on the current available bucket address.
Specifically, the dynamic fragmentation server sends the current available bucket address to the client, so that the client performs file operations such as uploading (put)/downloading (get)/pulling list (list) based on the current available bucket address. Wherein the client sends a file operation request to an RGW file gateway, the RGW routing the file operation request to an osd packet. Preferably, the RGW file gateway comprises at least one file sub-gateway (RGW 1, RGW2 \8230rgwn), and one file sub-gateway rgwi can be randomly selected to carry out the routing of the file operation request. The selected file sub-gateway rgwi calculates which specific shard is according to the number of fragments of the currently available bucket and the key, and the shard calculates the osd group according to a bus algorithm to position the specific osd, so that file operation is realized.
Preferably, the RGW file gateway may include a plurality of parallel RGW file sub-gateways to process file operation requests of the clients in parallel.
Preferably, the currently available bucket address is a URL address. More preferably, the dynamic fragmentation server signs the current available packet address and then sends the signed current available packet address to the client, so as to ensure the accuracy of information and facilitate subsequent tracing.
In an embodiment of the present invention, when the file uploading operation is performed, the method further includes receiving a keyword returned by the client, and searching whether there is an uploaded file in a previous bucket based on the keyword; and if so, deleting the file in the previous bucket. For example, test _2 is a currently available bucket, and after the file doc is uploaded in test _2, whether the file doc is stored in previous buckets test0 and test _1 is searched; and if so, deleting the file doc in the corresponding bucket.
In an embodiment of the present invention, the dynamic fragmentation method in distributed object storage further includes that when the client needs to download a file from a bucket, the dynamic fragmentation server performs query in reverse order from the currently available bucket. And when the key corresponding to the required file is inquired, returning the address of the downloaded file to the client.
In an embodiment of the present invention, the dynamic fragmentation method in distributed object storage further includes querying a current bucket total number when the client receives a request for pulling a list from a bucket, and pulling the list of buckets based on the bucket total number. Specifically, when a client needs to pull a list from a bucket, the dynamic fragmentation server first queries the total number of the current buckets, and then pulls the list of the buckets based on the total number of the buckets. In order to further reduce the power consumption of bucket query, the index data of the bucket is stored based on the SSD to solve the bottleneck problem caused by random search.
In order to improve the performance of the dynamic allocation server of the present invention, in an embodiment of the present invention, the dynamic sharding server includes a plurality of dynamic sharding sub-servers arranged in parallel, so as to perform parallel processing on a file operation request of a client. Specifically, when a plurality of bucket application requests are received, the currently available bucket addresses are sent to the client side in a parallel mode. Preferably, the configuration information and the state information of the dynamic fragment sub-server may be stored by using a mysql database. The mysql database is a relational database management system that keeps data in different tables instead of putting all data in one large repository, which increases speed and flexibility. In the invention, the mysql database is provided with master-slave synchronization, so that the problem of information loss is effectively avoided.
It should be noted that, the steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, and as long as the steps contain the same logical relationship, the steps are within the scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
As shown in fig. 4, in an embodiment, the dynamic sharding system in distributed object storage of the present invention includes:
the setting module 41 is configured to enable the available storage capacity of each bucket in the distributed object storage system to be a preset value, and only start the next available bucket when the used storage capacity of the current bucket reaches the preset value.
Specifically, a bucket (bucket) is a storage space in the MOS, and is a container for storing objects. The object storage is a flat storage mode, objects stored in the bucket are all in the same logic level, and a file structure with multiple levels is adopted in the file system. In MOS, the naming of buckets is globally unique. Each bucket generates a default bucket ACL (Access Control List) when being created, and each item of the bucket ACL List contains the authority of an authorized user, such as READ authority (READ), WRITE authority (WRITE), FULL Control authority (FULL _ Control), and the like. The user can operate the bucket, such as creating, deleting, displaying, setting the bucket ACL, and the like, only if the user has corresponding rights to the bucket. A user can create up to 100 buckets.
In the distributed object storage system, the original number of fragments is kept unchanged for each bucket. However, a preset value of the available storage capacity is set for each bucket, that is, the available storage capacity of the bucket does not exceed the preset value. When the used storage capacity of the bucket reaches the preset value, the next bucket can be automatically started, so that each bucket can fully utilize the available storage capacity of the bucket, and the problem that the storage capacity is insufficient or excessive due to fragment storage based on a fixed number in the prior art is solved.
And the obtaining module 42 is connected to the setting module 41, and is configured to obtain a currently available bucket based on a bucket application request sent by the client.
Specifically, the user sends a bucket application request to the dynamic fragmentation server through the client. The bucket application request contains user-associated bucket information. And the dynamic fragmentation server acquires the current available bucket corresponding to the user according to the bucket information associated with the user. For example, if the user-associated bucket information is bucket test, it can be known that the user's bucket contains test _1, test_2 \8230, test _ n respectively representing the first and second \8230andnth buckets of the user. Wherein test _ n is the currently available bucket corresponding to the user.
In an embodiment of the present invention, acquiring a currently available bucket based on a bucket application request sent by a client includes the following steps:
11 Check the used storage capacity of each bucket.
Specifically, for each bucket, the file available storage capacity thereof is set, such as 100 kilobits. When the used storage capacity of the bucket is smaller than the preset value, the fact that file storage can be carried out in the bucket is indicated; and when the used storage capacity of the bucket is equal to the preset value, the fact that the file storage cannot be continuously carried out in the bucket is indicated, and a next bucket needs to be started.
12 Selecting the bucket with the used storage capacity smaller than the preset value as the current available bucket.
Specifically, the available storage capacity of each bucket is set to be a preset value, and the next bucket is started when the used storage capacity of the current available bucket reaches the preset value. Therefore, the available storage capacity of all the buckets except the currently available bucket does not reach the preset value, and the available storage capacity of other buckets is the preset value. Therefore, the bucket with the used storage capacity smaller than the preset value is selected as the current available bucket.
In actual use, for each file request, if the currently available bucket is detected once in real time, time is consumed. Preferably, the currently available bucket is updated at regular time by setting the timer, so that the currently available bucket is positioned quickly and accurately.
In an embodiment of the present invention, obtaining a currently available bucket based on a bucket application request sent by a client further includes the following steps:
14 A timer is preset, and a current available bucket query request is initiated based on a preset time interval set by the timer to obtain a current available bucket; and when a bucket application request sent by the client is received, returning the current available bucket.
Specifically, the timer means that after a specified time elapses from a specified time, and then a timeout event is triggered, a user can customize the period and frequency of the timer. Preferably, the timer of the present invention is a software timer. The software timer is a system interface provided by an operating system, is constructed on the basis of a hardware timer, enables the system to provide a timer service which is not limited by hardware timer resources, and is similar to the hardware timer in function. The invention can select proper timer precision according to the system overhead.
Preferably, the timer may initiate an http request every half hour, 1 hour, or 2 hours for making a currently available bucket query. The interval of the timer is too large, which causes the searched currently available packet to be inconsistent with the actually currently available packet, and thus cannot meet the requirement of actual application. Too small a timer interval increases the system load. Therefore, the preset time interval may be set according to an actual application scenario, and is not limited to the above example.
Through the timer setting, file data detection on all the buckets is not needed each time, and only the current available buckets are updated and recorded regularly. When a file operation request is received next time, the current available bucket can be directly given according to the record, so that the method is fast and accurate, and the power consumption of the system is reduced.
15 Based on the current available bucket query request, judging whether the used storage capacity of the current available bucket reaches the preset value; if so, starting the next bucket, and recording the currently available bucket as the next bucket; if not, the currently available bucket remains unchanged.
And the processing module 43 is connected to the obtaining module 42, and is configured to send the currently available bucket address to the client based on the RGW file gateway, so that the client completes the file operation corresponding to the bucket application request based on the currently available bucket address.
Specifically, the dynamic fragmentation server sends the current available bucket address to the client, so that the client performs file operations such as uploading (put)/downloading (get)/pulling list (list) based on the current available bucket address. Wherein the client sends a file operation request to an RGW file gateway, which routes the file operation request to an osd packet. Preferably, the RGW file gateway comprises at least one file sub-gateway (RGW 1, RGW2 \8230rgwn), and one file sub-gateway rgwi can be randomly selected to carry out the routing of the file operation request. The selected file sub-gateway rgwi calculates which specific shard is located according to the number of fragments of the currently available bucket and the key, and the shard calculates the osd group according to a flush algorithm to locate the specific osd, so that file operation is realized.
Preferably, the RGW file gateway may include a plurality of parallel RGW file sub-gateways to process file operation requests of clients in parallel.
Preferably, the currently available bucket address is a URL address. More preferably, the dynamic fragmentation server signs the current available bucket address and then sends the signed current available bucket address to the client, so as to ensure the accuracy of information and facilitate subsequent tracing.
In an embodiment of the present invention, when performing an operation of uploading a file, the dynamic fragmentation system in distributed object storage of the present invention further includes a search module, configured to receive a keyword returned by the client, and search whether there is an uploaded file in a previous bucket based on the keyword; and if so, deleting the file in the previous bucket. For example, test _2 is a currently available bucket, and after the file doc is uploaded in test _2, whether the file doc is stored in previous buckets test0 and test _1 is searched; and if so, deleting the file doc in the corresponding bucket.
In an embodiment of the present invention, the dynamic fragmentation system in distributed object storage further includes a downloading module, configured to, when a client needs to download a file from a bucket, start, by the dynamic fragmentation server, a query in reverse order from a currently available bucket. And when the key corresponding to the required file is inquired, returning the address of the downloaded file to the client.
In an embodiment of the present invention, the dynamic fragmentation system in distributed object storage further includes a list module, configured to query a current total number of buckets when the client receives a request for pulling a list from the buckets, and pull the list of the buckets based on the total number of the buckets. Specifically, when the client needs to pull the list from the bucket, the dynamic fragmentation server firstly queries the total number of the current buckets, and then pulls the list of the buckets based on the total number of the buckets. In order to further reduce the power consumption of bucket query, the index data of the bucket is stored based on the SSD in a limited way, so that the bottleneck problem caused by random search is solved.
In order to improve the performance of the dynamic allocation server of the present invention, in an embodiment of the present invention, the dynamic sharding server includes a plurality of dynamic sharding sub-servers arranged in parallel, so as to perform parallel processing on a file operation request of a client. Specifically, when a plurality of bucket application requests are received, the currently available bucket addresses are sent to the client side in a parallel mode. Preferably, the configuration information and the state information of the dynamic fragment sub-server may be stored by using a mysql database. The mysql database is a relational database management system that keeps data in different tables instead of putting all data in one large repository, which increases speed and flexibility. In the invention, the mysql database is provided with master-slave synchronization, so that the problem of information loss is effectively avoided.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. As another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
The computer readable storage medium of the present invention stores thereon a computer program that, when executed by a processor, performs the steps of the above-described dynamic fragmentation method in distributed object storage. Preferably, the storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
Any combination of one or more storage media may be employed. The storage medium may be a computer-readable signal medium or a computer-readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer program instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
In an embodiment, the computer device of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above dynamic fragmentation method in distributed object storage when executing the computer program.
The memory includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
As shown in FIG. 5, the computer apparatus of the present invention is embodied in the form of a general purpose computing device. Components of the computer device may include, but are not limited to: one or more processors or processing units 51, a memory 52, and a bus 53 that couples the various system components (including the memory 52 and the processing unit 51).
The computer device typically includes a variety of computer system readable media. Such media may be any available media that can be accessed by the computing device and includes both volatile and nonvolatile media, removable and non-removable media.
The memory 52 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 521 and/or cache memory 522. The computer device may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 523 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as "hard disk drives"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 53 by one or more data media interfaces. Memory 52 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 524 having a set (at least one) of program modules 5241, such program modules 5241 including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in, for example, the memory 52, each of which examples or some combination thereof may include an implementation of a network environment. The program modules 5241 generally perform the functions and/or methods of the described embodiments of the invention.
The computer device may also communicate with one or more external devices (e.g., keyboard, pointing device, display, etc.), with one or more devices that enable a user to interact with the computer device, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 54. Also, the computer device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) through the network adapter 55. As shown in FIG. 5, the network adapter 55 communicates with the other modules of the computer device via the bus 53. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In summary, the dynamic fragmentation method, system, medium, and device in distributed object storage according to the present invention automatically switch to the next bucket when a bucket reaches a certain storage capacity, and all new files are stored in the next bucket, so that the old bucket does not support new file uploading operation any more, and thus a dynamic fragmentation effect is achieved by adding a bucket, and data migration is avoided; the problem caused by excessive bucket files is solved, and the object storage service is more stable; each bucket can be guaranteed to fully utilize the storage capacity of the bucket, and the problems of insufficient storage capacity or excessive storage capacity caused by fragment storage based on a fixed number in the prior art are solved; and effective synchronization can be realized under the application scene of the two machine rooms. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Claims (10)
1. A dynamic fragmentation method in distributed object storage is characterized by comprising the following steps:
the available storage capacity of each bucket in the distributed object storage system is a preset value, and the next available bucket is started only when the used storage capacity of the current bucket reaches the preset value;
acquiring a current available bucket based on a bucket application request sent by a client;
and sending the current available bucket address to the client based on the RGW file gateway so that the client completes the file operation corresponding to the bucket application request based on the current available bucket address.
2. The method for dynamically fragmenting in a distributed object storage according to claim 1, wherein said obtaining a currently available bucket based on a bucket application request sent by a client comprises the steps of:
checking the used storage capacity of each bucket;
and selecting the bucket with the used storage capacity smaller than the preset value as the current available bucket.
3. The dynamic fragmentation method in distributed object storage according to claim 2, wherein the obtaining of a currently available bucket based on a bucket application request sent by a client further includes the following steps:
presetting a timer, and initiating a current available bucket query request based on a preset time interval set by the timer to obtain a current available bucket;
and when a bucket application request sent by the client is received, returning the current available bucket.
4. The method for dynamic fragmentation in distributed object storage according to claim 3, wherein obtaining a currently available bucket based on a bucket application request sent by a client further comprises the following steps:
judging whether the used storage capacity of the current available bucket reaches the preset value or not based on the current available bucket query request; if so, starting the next bucket, and recording the currently available bucket as the next bucket; if not, the currently available bucket remains unchanged.
5. The method for dynamic fragmentation in a distributed object store according to claim 1, further comprising:
when the file uploading operation is carried out, receiving a keyword returned by the client, and searching whether the uploaded file exists in a previous bucket or not based on the keyword; and if so, deleting the file in the previous bucket.
6. The method for dynamic fragmentation in a distributed object store according to claim 1, further comprising: and when a plurality of bucket application requests are received, sending the currently available bucket addresses to the client side in a parallel mode.
7. The method for dynamic fragmentation in a distributed object store according to claim 1, further comprising: when the client receives a request for pulling the list from the bucket, the current total number of the buckets is inquired, and the list of the buckets is pulled based on the total number of the buckets.
8. A system for dynamic fragmentation in distributed object storage, comprising:
the setting module is used for enabling the available storage capacity of each bucket in the distributed object storage system to be a preset value, and starting the next available bucket only when the used storage capacity of the current bucket reaches the preset value;
the acquisition module is used for acquiring the current available bucket based on the bucket application request sent by the client;
and the processing module is used for sending the current available bucket address to the client based on the RGW file gateway so as to enable the client to complete the file operation corresponding to the bucket application request based on the current available bucket address.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for dynamic fragmentation in a distributed object store according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing a method for dynamic fragmentation in a distributed object store according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211275567.0A CN115438016A (en) | 2022-10-18 | 2022-10-18 | Dynamic fragmentation method, system, medium and device in distributed object storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211275567.0A CN115438016A (en) | 2022-10-18 | 2022-10-18 | Dynamic fragmentation method, system, medium and device in distributed object storage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115438016A true CN115438016A (en) | 2022-12-06 |
Family
ID=84250303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211275567.0A Pending CN115438016A (en) | 2022-10-18 | 2022-10-18 | Dynamic fragmentation method, system, medium and device in distributed object storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115438016A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220100878A1 (en) * | 2020-09-25 | 2022-03-31 | EMC IP Holding Company LLC | Facilitating an object protocol based access of data within a multiprotocol environment |
CN116150807A (en) * | 2023-04-14 | 2023-05-23 | 深圳高灯计算机科技有限公司 | Object storage method, system, computer device and storage medium |
WO2024212599A1 (en) * | 2023-12-13 | 2024-10-17 | 天翼云科技有限公司 | Method for implementing shard seamless expansion on basis of replication |
-
2022
- 2022-10-18 CN CN202211275567.0A patent/CN115438016A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220100878A1 (en) * | 2020-09-25 | 2022-03-31 | EMC IP Holding Company LLC | Facilitating an object protocol based access of data within a multiprotocol environment |
US11928228B2 (en) * | 2020-09-25 | 2024-03-12 | EMC IP Holding Company LLC | Facilitating an object protocol based access of data within a multiprotocol environment |
CN116150807A (en) * | 2023-04-14 | 2023-05-23 | 深圳高灯计算机科技有限公司 | Object storage method, system, computer device and storage medium |
CN116150807B (en) * | 2023-04-14 | 2023-07-04 | 深圳高灯计算机科技有限公司 | Object storage method, system, computer device and storage medium |
WO2024212599A1 (en) * | 2023-12-13 | 2024-10-17 | 天翼云科技有限公司 | Method for implementing shard seamless expansion on basis of replication |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10795817B2 (en) | Cache coherence for file system interfaces | |
US11144573B2 (en) | Synchronization protocol for multi-premises hosting of digital content items | |
JP6419319B2 (en) | Synchronize shared folders and files | |
CN111247518B (en) | Method and system for database sharding | |
US9052824B2 (en) | Content addressable stores based on sibling groups | |
US8793227B2 (en) | Storage system for eliminating duplicated data | |
CN115438016A (en) | Dynamic fragmentation method, system, medium and device in distributed object storage | |
US10623470B2 (en) | Optimizing internet data transfers using an intelligent router agent | |
JP2009295127A (en) | Access method, access device and distributed data management system | |
US11082494B2 (en) | Cross storage protocol access response for object data stores | |
US20210281637A1 (en) | Management for a load balancer cluster | |
US20140214775A1 (en) | Scalable data deduplication | |
US10177795B1 (en) | Cache index mapping | |
CN116848517A (en) | Cache indexing using data addresses based on data fingerprints | |
US20230315741A1 (en) | Federation of data during query time in computing systems | |
US10148662B1 (en) | De-duplication of access control lists | |
WO2023029485A1 (en) | Data processing method and apparatus, computer device, and computer-readable storage medium | |
WO2012171363A1 (en) | Method and equipment for data operation in distributed cache system | |
US11233739B2 (en) | Load balancing system and method | |
Bin et al. | An efficient distributed B-tree index method in cloud computing | |
US11064020B2 (en) | Connection load distribution in distributed object storage systems | |
US10083225B2 (en) | Dynamic alternate keys for use in file systems utilizing a keyed index | |
US11249952B1 (en) | Distributed storage of data identifiers | |
JP2017123040A (en) | Server device, distribution file system, distribution file system control method, and program | |
US9348859B2 (en) | Providing record-level sharing (RLS) to local data sets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |