CN111831752A - Distributed database space arrangement method, device, equipment and storage medium - Google Patents

Distributed database space arrangement method, device, equipment and storage medium Download PDF

Info

Publication number
CN111831752A
CN111831752A CN202010696933.4A CN202010696933A CN111831752A CN 111831752 A CN111831752 A CN 111831752A CN 202010696933 A CN202010696933 A CN 202010696933A CN 111831752 A CN111831752 A CN 111831752A
Authority
CN
China
Prior art keywords
target
data
storage unit
unit
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010696933.4A
Other languages
Chinese (zh)
Inventor
卢祚
李理
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010696933.4A priority Critical patent/CN111831752A/en
Publication of CN111831752A publication Critical patent/CN111831752A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a space arrangement method, a space arrangement device, space arrangement equipment and a storage medium of a distributed database, and relates to the field of distributed databases and big data. The specific implementation scheme is as follows: for each data table in each database fragment in the distributed database, counting data writing requests and data deleting requests aiming at the data table, and determining the data volume of data to be written and the data volume of data to be deleted; determining a target space arrangement unit value; determining storage units meeting preset sorting conditions in the storage units of the data table as target storage units according to the pre-acquired metadata of the storage units of the data table; based on the target space arrangement unit value and the metadata of the target storage unit, generating migration task information; and outputting the migration task information for carrying out space arrangement on each database fragment. The implementation mode realizes the spatial arrangement of the whole distributed database by arranging the space of each storage unit of each data table.

Description

Distributed database space arrangement method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to the field of distributed databases and big data, and in particular, to a method, an apparatus, a device, and a storage medium for organizing a space of a distributed database.
Background
In internet services, as the amount of service data increases, a large amount of storage space is required to store relevant data in internet services. The storage space of the distributed storage system can be infinitely expanded, so that the distributed storage system is widely used for storing the service data of the internet. In internet services, because related operations such as insertion, update, deletion and the like are often required to be performed on data, during the distributed storage process of the internet services, related operations such as insertion, deletion, split, migration and the like are correspondingly performed on stored data, and these operations easily cause fragmentation of the storage space of the distributed storage system.
As the amount of data stored increases, a large amount of redundant fragmented storage space accumulates in the storage space. These cumulative fragmented storage spaces can be significant in size, resulting in a significant amount of total storage space being consumed. In addition, when data is permanently stored, the stored data is multiplied with the increase of time, so that the problem is more prominent.
Disclosure of Invention
A method, a device, equipment and a storage medium for spatial arrangement of a distributed database are provided.
According to a first aspect, there is provided a method for spatially organizing a distributed database, comprising: for each data table in each database fragment in the distributed database, counting data writing requests and data deleting requests aiming at the data table, and determining the data volume of data to be written and the data volume of data to be deleted of the data table; determining a target space arrangement unit value according to the data volume of the data to be written and the data volume of the data to be deleted; determining storage units meeting preset sorting conditions in the storage units of the data table as target storage units according to the pre-acquired metadata of the storage units of the data table; based on the target space arrangement unit value and the metadata of the target storage unit, generating migration task information; and outputting the migration task information for carrying out space arrangement on each database fragment.
According to a second aspect, there is provided a spatial arrangement apparatus for a distributed database, comprising: the statistical unit is configured to count a data writing request and a data deleting request aiming at each data table in each database fragment in the distributed database, and determine the data volume of data to be written and the data volume of data to be deleted of the data table; a determining unit configured to determine a target spatial arrangement unit value according to a data amount of data to be written and a data amount of data to be deleted; the screening unit is configured to determine storage units meeting preset sorting conditions in the storage units of the data table as target storage units according to the metadata of the storage units of the data table acquired in advance; a generation unit configured to generate migration task information based on the target spatial arrangement unit value and the metadata of the target storage unit; and the output unit is configured to output the migration task information for performing space arrangement on each database fragment.
According to a third aspect, there is provided a spatial arrangement electronic device for a distributed database, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect.
According to the technology of the application, the problem that the existing distributed storage system cannot effectively arrange the storage space is solved, and the space arrangement of the whole distributed database is realized by arranging the space of each storage unit of each data table.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for spatial arrangement of a distributed database according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a spatial arrangement method of a distributed database according to the present application;
FIG. 4 is a flow diagram of another embodiment of a method for spatial arrangement of a distributed database according to the present application;
FIG. 5 is a schematic block diagram of one embodiment of a spatial arrangement apparatus for a distributed database according to the present application;
fig. 6 is a block diagram of an electronic device for implementing a method for spatially organizing a distributed database according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method for spatially organizing a distributed database or the apparatus for spatially organizing a distributed database of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include a distributed database 101, a server 102, and terminal devices 103, 104. The distributed database 101 and the server 102 and the terminal devices 103 and 104 may be connected via a network. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
Distributed database 101 may include multiple database shards 1011, 1012, 1013. Each database shard may include a plurality of data tables, and each data table may include a plurality of storage units for storing data. The distributed database 101 may be a distributed key value storage system (distributed KV storage system).
The server 102 may be a management server that manages each of the database shards 1011, 1012, 1013 in the distributed database 101. The server 102 may generate migration task information according to the received write request and delete request, so that each database fragment performs space arrangement on each storage unit in each data table therein, thereby reducing storage fragments and providing a utilization rate of a storage space.
The server 102 may be hardware or software. When the server 102 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server 102 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
A user may use terminal devices 103, 104 to interact with server 102 via a network to request access to each database shard 1011, 1012, 1013 in distributed database 101 to implement writing, reading, and deleting of data. Various communication client applications, such as a database access application, may be installed on the terminal devices 103 and 104.
The terminal devices 103 and 104 may be hardware or software. When the terminal devices 103, 104 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, car computers, laptop portable computers, desktop computers, and the like. When the terminal devices 103 and 104 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the spatial arrangement method of the distributed database provided in the embodiment of the present application is generally performed by the server 102. Accordingly, the spatial arrangement of the distributed database is typically provided in the server 102.
It should be understood that the number of database shards, servers, and end devices in the distributed data in fig. 1 is merely illustrative. There may be any number of database shards, servers, and terminal devices in the distributed data, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for spatial consolidation of a distributed database according to the present application is shown. The spatial arrangement method of the distributed database of the embodiment comprises the following steps:
step 201, for each data table in each database fragment in the distributed database, counting data writing requests and data deleting requests for the data table, and determining the data amount of data to be written and the data amount of data to be deleted of the data table.
In this embodiment, an executing entity (for example, the server 102 shown in fig. 1) of the spatial arrangement method for the distributed database may perform statistics on requests corresponding to each data table in each database fragment (for example, each database fragment 1011, 1012, 1013 shown in fig. 1) in the distributed database (for example, the distributed database 101 shown in fig. 1). Specifically, statistics is performed on data writing requests and data deleting requests of each data table, and the data volume of data to be written and the data volume of data to be deleted of each data table are determined. It can be understood that each data write request includes an identifier of a targeted data table and data to be written, and each data delete request includes an identifier of a targeted data table and a storage location of data to be deleted. The execution body can determine the data amount of the data to be written and the data amount of the data to be deleted for each data table by using the information.
Step 202, determining a target space arrangement unit value according to the data volume of the data to be written and the data volume of the data to be deleted.
The execution body may determine the target spatial arrangement unit value after determining the data amount of the data to be written and the data amount of the data to be deleted. Specifically, the execution main body may calculate a ratio of the data amount of the data to be written to and the data amount of the data to be deleted, and then determine the target spatial arrangement unit value according to the ratio. For example, if the ratio is 1:2, the target spatial arrangement unit value may be 2G. Alternatively, the execution main body may calculate a common divisor of the data amount of the data to be written and the data amount of the data to be deleted, and use the common divisor as the target spatial arrangement unit value. For example, the data amount of the data to be written is 4G, and the data amount of the data to be deleted is 8G. Then 4G may be used as the target spatial collation unit value.
In this embodiment, the target spatial arrangement unit value is a minimum spatial value that changes when each storage unit is spatially arranged. For example, if the target space arrangement unit value is 2G and the available space of the memory cell is 32G, the reduced minimum space value when the memory cell is spatially arranged is 2G, that is, the maximum available space of the memory cell after reduction is 30G. Here, the available space refers to a space of the storage unit that can be used to store data. For example, if a storage unit has a capacity of 32G and stores 20G of data, the available space is 12G.
It is understood that the target spatial consolidation unit values may be different for different data tables.
Step 203, according to the metadata of each storage unit of the data table acquired in advance, determining a storage unit meeting a preset sorting condition in each storage unit of the data table as a target storage unit.
In this embodiment, the execution subject may also obtain metadata of each storage unit in the data table in advance. The metadata may include various information of the storage unit, such as identification, capacity, address, and the like. The execution main body can determine the storage units meeting the preset sorting condition in the storage units of each data table according to the metadata of the storage units, and takes the storage units as target storage units. The preset finishing conditions may include, but are not limited to: the memory unit operates normally, the available space is greater than a preset threshold, and the like. In this embodiment, the preset sorting condition is set to screen out storage units that can be used for spatial sorting from each storage unit. If a memory cell is operating abnormally, then it does not need to be spatially organized. If the available space of a storage unit is small, it does not need to be spatially organized.
And step 204, based on the target space arrangement unit value and the metadata of the target storage unit, generating migration task information.
After the execution main body obtains the target space arrangement unit value through calculation, migration task information can be generated by combining metadata of the target storage unit. Specifically, the execution subject may directly use the target spatial consolidation unit value and the metadata of the target storage unit as the migration task information. Or the execution subject may determine the target space of the target storage unit according to the target space sorting unit value and the capacity in the metadata. And taking the target space and the metadata of the target storage unit as migration task information.
And step 205, outputting migration task information for performing space arrangement on each database fragment.
The execution subject may output the resulting migration task information. Specifically, the execution main body may output the migration task information to each database fragment, so that each database fragment performs spatial sorting on the storage unit in each data table. In this embodiment, the space arrangement refers to migrating data in the target storage unit to a new storage unit, where the capacity of the new storage unit is smaller than that of the target storage unit, so that the space of the target storage unit can be vacated for storing more data.
With continued reference to fig. 3, a schematic diagram of one application scenario of the spatial arrangement method of a distributed database according to the present application is shown. In the application scenario of fig. 3, the server 301 counts data write requests and data delete requests of each data table in each database shard 302, 303, 304 in the distributed database. It is determined that the target spatial consolidation unit value of data table 3021 in database shard 302 is 2G, the target spatial consolidation unit value of data table 3022 is 8G … …, the target spatial consolidation unit value of data table 3031 in database shard 303 is 4G, the target spatial consolidation unit value of data table 3032 is 8G … …, the target spatial consolidation unit value of data table 3041 in database shard 304 is 6G, and the target spatial consolidation unit value of data table 3042 is 16G. The server can determine the target storage unit according to the metadata of each storage unit in each data table and the target space arrangement unit value. The target storage units of the data table 3021 are the storage units 30211 and 30212, the target storage unit of the data table 3022 is the storage unit 30221, the target storage unit of the data table 3031 is the storage unit 30311, the target storage units of the data table 3032 are the storage units 30321 and 30322, and no target storage unit exists in the data table 3041. The server 301 may organize the unit value and the metadata of each target storage unit according to the target space of each data table, and generate migration task information. And sending the generated migration task information to the corresponding database shards, so that the database shards 302 perform spatial arrangement on the storage units 30211 and 30212 in the data table 3021 and the storage unit 30221 in the data table 3022, and the database shards 303 perform spatial arrangement on the storage unit 30311 in the data table 3031 and the storage units 30321 and 30322 in the data table 3032, thereby implementing spatial arrangement of the distributed database.
According to the space arrangement method for the distributed database provided by the embodiment of the application, the space arrangement of the whole distributed database is realized by arranging the space of each storage unit of each data table.
With continued reference to FIG. 4, a flow 400 of another embodiment of a method for spatial collation of a distributed database in accordance with the present application is shown. As shown in fig. 4, the method of this embodiment may include the following steps:
step 401, for each data table in each database fragment in the distributed database, counting data writing requests and data deleting requests for the data table, and determining the data amount of data to be written and the data amount of data to be deleted of the data table.
Step 402, determining the ratio of the data volume of the data to be written to the data volume of the data to be deleted as a target ratio.
In this embodiment, the execution main body may calculate a ratio of the data amount of the data to be written to and the data amount of the data to be deleted, and use the ratio as a target ratio.
And 403, determining a target space arrangement unit value according to the corresponding relation between the preset ratio and the space arrangement unit value and the target ratio.
The execution main body can also acquire the corresponding relation between the preset ratio and the spatial arrangement unit value in advance. The above correspondence may be statistically derived by a skilled person based on a large number of practical experiences. The execution main body may search the target ratio in the correspondence, and use the spatial arrangement unit value corresponding to the target ratio as the target spatial arrangement unit value.
Step 404, writing the data to be written into each storage unit and deleting the data to be deleted from each storage unit, respectively, and determining the available space of each storage unit after processing.
The execution main body can also respectively write the data to be written and the data to be deleted into each storage unit and determine the available space of each storage unit. Specifically, the execution main body may determine, according to the information in the data write request, a storage unit to which the data to be written is to be written, or may determine, according to the information in the data delete request, a storage unit in which the data to be deleted is located. Then, after writing and deleting, the execution body may determine the available space of each storage unit.
Step 405, determining the memory cell which normally operates in each memory cell of the data table as a candidate memory cell.
After determining the available space of each storage unit, the execution body may further use, as a candidate storage unit, a storage unit that operates normally in each storage unit in the data table.
Step 406, regarding the storage units satisfying at least one of the following storage units as target storage units: the available space is larger than the target space sorting unit value, and the time length between the creation time and the current time is larger than the preset time length.
Then, the execution subject may select, as the target storage unit, a storage unit that satisfies at least one of the following from the candidate storage units: the available space is larger than the target space sorting unit value, and the time length between the creation time and the current time is larger than the preset time length. In this embodiment, if the available space of the storage unit is smaller than the target space arrangement unit value, which indicates that the space utilization of the storage unit is relatively complete, the storage unit does not need to be spatially arranged. If the time length between the creation time of the storage unit and the current time is less than the preset time length, the storage unit is newly created soon, and the storage unit does not need to be spatially sorted.
Step 407, the target space of the target storage unit is determined according to the target space sorting unit value and the available space of the target storage unit.
After the target storage units are screened out, the execution principal may determine a target space for each target storage unit. It will be appreciated that after the target storage unit is spatially organized, the capacity of the new storage unit is equal to the target space. For example, if the target storage unit has a capacity of 32G and the target space is 20G, the new storage unit has a capacity of 20G.
In some optional implementation manners of this embodiment, the step 407 may be specifically implemented by the following steps:
step 4071, determine an integer value corresponding to the ratio of the available space to the target space sorting unit value.
Step 4072 calculates the product of the target spatial arrangement unit value and the integer value.
Step 4073, the difference between the available space and the product is used as the target space.
In this implementation, the execution subject may first calculate a ratio of the available space of the target storage unit to the target spatial consolidation unit value. The ratio may not be an integer, and in this case, the execution body may calculate an integer corresponding to the ratio. For example, rounding up the above ratio. The executive may then calculate the product of the target spatially clean-up unit value and the integer value. And finally, taking the difference value of the available space and the product value as a target space. For example, if the available space is 7G and the target space ordering unit value is 2G, the ratio is first calculated to be 7/2-3.5, and the corresponding integer is 3, 3 × 2-6, and 7-6-1. That is, the target space is 1G.
Step 408, based on the target space arrangement unit value, the target space and the metadata of the target storage unit, migration task information is generated.
After the execution main body obtains the target space arrangement unit value and the target space, the execution main body can generate migration task information by combining the metadata of the target storage unit. Specifically, the execution subject may directly use the target space arrangement unit value, the target space, and the metadata of the target storage unit as the migration task information. Alternatively, the execution agent may extract partial information from the metadata, such as identification, stored data, etc., to form migration task information
In some optional implementations of this embodiment, the step 408 may be specifically implemented by the following steps not shown in fig. 4: determining the address of a new storage unit required for data migration of a target storage unit; and sorting the unit value, the address, the target space and the metadata of the target storage unit according to the target space to generate migration task information.
In this implementation, for data stored in the target storage unit, the executing agent may first generate an address of a new storage unit for storing the data. In particular, the execution agent may generate the address of the new memory location from the address of the memory location of the data associated with the data. Then, the unit value, the address, the target space and the metadata of the target storage unit are sorted according to the target space, and migration task information is generated.
In some practical applications, taking the Baidu object storage as an example, data sent by a user may be called an object, and the execution subject may divide the object into a plurality of blobs when storing the object, where each blob is stored in a vlet (storage unit). Different vlets can be stored in different data nodes, and a plurality of data nodes can form one database fragment. In this way, multiple blobs of the same object may be stored in different data tables, different data nodes, or even different database shards. The executing agent may consider the topology of the distributed database, the storage medium of the data node, and the amount of stored data when generating the address of the new storage unit. For example, to ensure reliability, the new vlet cannot be the same as the previous vlet at the data node (preferably, different computer rooms, different clusters, different networks, and different data nodes). It can be understood that if the new vlet and the previous vlet are located in the same data node, a data node downtime may cause the vlet on the data node to be lost, and data reliability is reduced. Or, for example, the storage medium of the data node where the current vlet is located is an HDD (Hard Disk Drive), and the storage medium of the data node where the address of the new vlet is desired to be also an HDD, so that the delay for reading different storage media is not very different. Or, when the amount of data stored in one data node reaches 90%, and the amount of data stored in another data node reaches 20%, 20% of the data nodes should be selected as the data nodes of the address of the new vlet.
And step 409, selecting the target migration task from the waiting queue, and outputting the target migration task to the database fragment corresponding to the target migration task.
In this embodiment, the execution main body may generate migration task information for each target storage unit in each data table, and finally, the execution main body may obtain a migration task information set. The execution subject may add the migration task information to a wait queue for storing the migration tasks that are not executed. The execution main body may select a target migration task from the waiting queue each time, where the target migration task may be a migration task corresponding to a maximum value among the target spatial consolidation unit values. And the execution main body sends the target migration task to the corresponding database fragment so that the database fragment executes the target migration task. Specifically, the execution subject may first determine a data table corresponding to the target migration task, and then determine the database segment where the data table is located. Before the database fragment executes the target migration task, the target migration task may be added to an execution queue, and the target migration task may be taken out from the execution queue for execution each time.
Step 410, in response to receiving the migration success message sent by the database fragment, deleting the metadata of the target storage unit, and acquiring and storing the metadata of the new storage unit.
In this embodiment, after the database fragment executes the migration task, that is, after the data in the target storage unit is migrated to the new storage unit according to the migration task information, the database fragment may send a migration success message to the execution main body. After receiving the migration success message, the execution main body may delete the metadata of the target storage unit, acquire the metadata of the new storage unit from the database fragment, store the metadata of the new storage unit, and implement updating of the metadata of the storage unit.
The space arrangement method for the distributed database provided by the above embodiment of the present application may be applied to a KV storage engine without additional writing, dynamically implement space arrangement on a storage unit by automatically calculating a target space arrangement unit value, and do not affect reading, writing, and deleting of data by a user. And with continuous operation of deleting more and writing less scenes, a new target space arrangement unit value can be generated in real time, the used disk space can be recovered in real time, and the utilization rate of the disk is improved under the condition of meeting the requirements of high throughput and low delay.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a spatial arrangement apparatus for a distributed database, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for outputting information of the present embodiment includes: a counting unit 501, a determining unit 502, a screening unit 503, a generating unit 504 and an output unit 505.
The counting unit 501 is configured to count, for each data table in each database partition in the distributed database, a data write request and a data delete request for the data table, and determine a data amount of data to be written and a data amount of data to be deleted of the data table.
A determining unit 502 configured to determine a target spatial arrangement unit value according to the data amount of the data to be written and the data amount of the data to be deleted.
The screening unit 503 is configured to determine, according to the metadata of each storage unit of the data table acquired in advance, a storage unit satisfying a preset sorting condition in each storage unit of the data table as a target storage unit.
A generating unit 504 configured to generate migration task information based on the target spatial arrangement unit value and the metadata of the target storage unit.
And an output unit 505 configured to output the migration task information for performing spatial arrangement on each database fragment.
In some optional aspects of this embodiment, the determining unit 502 may be further configured to: determining the ratio of the data volume of the data to be written to the data volume of the data to be deleted as a target ratio; and determining a target space arrangement unit value according to the corresponding relation between the preset ratio and the space arrangement unit value and the target ratio.
In some alternatives of this embodiment, the metadata includes a running state, an available space, and a creation time. The screening unit 503 may be further configured to: respectively writing data to be written into each storage unit and deleting data to be deleted from each storage unit, and determining the available space of each storage unit after processing; determining a storage unit which normally operates in each storage unit of the data table as a candidate storage unit; taking the storage units which meet at least one of the following storage units in the candidate storage units as target storage units: the available space is larger than the target space sorting unit value, and the time length between the creation time and the current time is larger than the preset time length.
In some optional aspects of this embodiment, the generating unit 504 may be further configured to: the target space of the target storage unit is determined according to the target space sorting unit value and the available space of the target storage unit; and arranging the unit value, the target space and the metadata of the target storage unit based on the target space, and generating migration task information.
In some optional aspects of this embodiment, the generating unit 504 may be further configured to: determining an integer value corresponding to the ratio of the available space to the target space sorting unit value; calculating the product value of the target space arrangement unit value and the integer value; and taking the difference value of the available space and the product value as a target space.
In some optional aspects of this embodiment, the generating unit 504 may be further configured to: determining the address of a new storage unit required for data migration of a target storage unit; and sorting the unit value, the address, the target space and the metadata of the target storage unit according to the target space to generate migration task information.
In some optional aspects of this embodiment, the output unit 505 may be further configured to: and selecting the target migration task from the waiting queue, and outputting the target migration task to the database fragment corresponding to the target migration task, wherein the waiting queue comprises the generated migration task information.
In some optional manners of this embodiment, the apparatus 500 may further include an updating unit not shown in fig. 5, configured to: in response to receiving a migration success message sent by the database fragment, deleting the metadata of the target storage unit, wherein the migration success message is sent by the database fragment after migrating the data of the target storage unit to a new storage unit according to the migration task information; metadata for the new storage unit is obtained and stored.
It should be understood that the units 501 to 505 recited in the spatial arrangement apparatus 500 of the distributed database correspond to respective steps in the method described with reference to fig. 2. Thus, the operations and features described above with respect to the method for spatially organizing a distributed database are equally applicable to the apparatus 500 and the units included therein, and will not be described again here.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device that executes a method for spatially organizing a distributed database according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of performing spatial collation of a distributed database provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of performing spatial consolidation of a distributed database provided by the present application.
The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for performing spatial arrangement of distributed databases in the embodiments of the present application (for example, the statistics unit 501, the determination unit 502, the screening unit 503, the generation unit 504, and the output unit 505 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the spatial arrangement method of executing the distributed database in the above method embodiment.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the space arrangement electronic device that executes the distributed database, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to space consolidation electronics executing a distributed database. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device performing the spatial collating method of the distributed database may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the spatially arranged electronic apparatus executing the distributed database, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or like input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the problem that the existing distributed storage system cannot effectively arrange the storage space is solved, and the space arrangement of the whole distributed database is realized by arranging the space of each storage unit of each data table.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (18)

1. A spatial arrangement method of a distributed database comprises the following steps:
for each data table in each database fragment in the distributed database, counting data writing requests and data deleting requests aiming at the data table, and determining the data volume of data to be written and the data volume of data to be deleted of the data table;
determining a target space arrangement unit value according to the data volume of the data to be written and the data volume of the data to be deleted;
determining storage units meeting preset sorting conditions in the storage units of the data table as target storage units according to the pre-acquired metadata of the storage units of the data table;
generating migration task information based on the target space arrangement unit value and the metadata of the target storage unit;
and outputting the migration task information for carrying out space arrangement on each database fragment.
2. The method of claim 1, wherein the determining a target spatial consolidation unit value according to the data amount of the data to be written and the data amount of the data to be deleted comprises:
determining the ratio of the data volume of the data to be written to the data volume of the data to be deleted as a target ratio;
and determining the target space arrangement unit value according to the corresponding relation between the preset ratio and the space arrangement unit value and the target ratio.
3. The method of claim 1, wherein the metadata includes a run state, an available space, and a creation time; and
the determining, according to the pre-obtained metadata of each storage unit of the data table, a storage unit meeting a preset sorting condition in each storage unit of the data table as a target storage unit includes:
respectively writing the data to be written into each storage unit and deleting the data to be deleted from each storage unit, and determining the available space of each storage unit after processing;
determining a storage unit which normally operates in each storage unit of the data table as a candidate storage unit;
taking the storage units satisfying at least one of the following candidate storage units as target storage units: the available space is larger than the target space sorting unit value, and the time length between the creation time and the current time is larger than the preset time length.
4. The method of claim 3, wherein the generating migration task information based on the target spatial consolidation unit value and the metadata of the target storage unit comprises:
determining a target space of the target storage unit according to the target space sorting unit value and the available space of the target storage unit;
and generating migration task information based on the target space sorting unit value, the target space and the metadata of the target storage unit.
5. The method of claim 4, wherein the determining the target space of the target storage unit according to the target space consolidation unit value and the available space of the target storage unit comprises:
determining an integer value corresponding to the ratio of the available space to the target space sorting unit value;
calculating a product value of the target space arrangement unit value and the integer value;
and taking the difference value of the available space and the product value as the target space.
6. The method of claim 4, wherein the generating migration task information based on the target space consolidation unit value, the target space, and metadata of the target storage unit comprises:
determining the address of a new storage unit required for data migration of the target storage unit;
and generating migration task information according to the target space arrangement unit value, the address, the target space and the metadata of the target storage unit.
7. The method of claim 1, wherein the outputting the migration task information comprises:
and selecting a target migration task from a waiting queue, and outputting the target migration task to a database fragment corresponding to the target migration task, wherein the waiting queue comprises the generated migration task information.
8. The method of claim 1, wherein the method further comprises:
in response to receiving a migration success message sent by a database fragment, deleting the metadata of the target storage unit, wherein the migration success message is sent by the database fragment after migrating the data of the target storage unit to a new storage unit according to the migration task information;
and acquiring and storing the metadata of the new storage unit.
9. A spatial arrangement apparatus for a distributed database, comprising:
the statistical unit is configured to count a data writing request and a data deleting request aiming at each data table in each database fragment in the distributed database, and determine the data volume of data to be written and the data volume of data to be deleted of the data table;
a determining unit configured to determine a target spatial arrangement unit value according to the data amount of the data to be written and the data amount of the data to be deleted;
the screening unit is configured to determine storage units meeting preset sorting conditions in the storage units of the data table as target storage units according to the metadata of the storage units of the data table acquired in advance;
a generation unit configured to generate migration task information based on the target spatial arrangement unit value and metadata of the target storage unit;
and the output unit is configured to output the migration task information for performing spatial arrangement on each database fragment.
10. The apparatus of claim 9, wherein the determining unit is further configured to:
determining the ratio of the data volume of the data to be written to the data volume of the data to be deleted as a target ratio;
and determining the target space arrangement unit value according to the corresponding relation between the preset ratio and the space arrangement unit value and the target ratio.
11. The apparatus of claim 9, wherein the metadata comprises a running state, an available space, and a creation time; and
the screening unit is further configured to:
respectively writing the data to be written into each storage unit and deleting the data to be deleted from each storage unit, and determining the available space of each storage unit after processing;
determining a storage unit which normally operates in each storage unit of the data table as a candidate storage unit;
taking the storage units satisfying at least one of the following candidate storage units as target storage units: the available space is larger than the target space sorting unit value, and the time length between the creation time and the current time is larger than the preset time length.
12. The apparatus of claim 11, wherein the generating unit is further configured to:
determining a target space of the target storage unit according to the target space sorting unit value and the available space of the target storage unit;
and generating migration task information based on the target space sorting unit value, the target space and the metadata of the target storage unit.
13. The apparatus of claim 12, wherein the generating unit is further configured to:
determining an integer value corresponding to the ratio of the available space to the target space sorting unit value;
calculating a product value of the target space arrangement unit value and the integer value;
and taking the difference value of the available space and the product value as the target space.
14. The apparatus of claim 12, wherein the generating unit is further configured to:
determining the address of a new storage unit required for data migration of the target storage unit;
and generating migration task information according to the target space arrangement unit value, the address, the target space and the metadata of the target storage unit.
15. The apparatus of claim 9, wherein the output unit is further configured to:
and selecting a target migration task from a waiting queue, and outputting the target migration task to a database fragment corresponding to the target migration task, wherein the waiting queue comprises the generated migration task information.
16. The apparatus of claim 9, wherein the apparatus further comprises an update unit configured to:
in response to receiving a migration success message sent by a database fragment, deleting the metadata of the target storage unit, wherein the migration success message is sent by the database fragment after migrating the data of the target storage unit to a new storage unit according to the migration task information;
and acquiring and storing the metadata of the new storage unit.
17. A spatial consolidation electronic device for a distributed database, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
CN202010696933.4A 2020-07-20 2020-07-20 Distributed database space arrangement method, device, equipment and storage medium Pending CN111831752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010696933.4A CN111831752A (en) 2020-07-20 2020-07-20 Distributed database space arrangement method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010696933.4A CN111831752A (en) 2020-07-20 2020-07-20 Distributed database space arrangement method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111831752A true CN111831752A (en) 2020-10-27

Family

ID=72923692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010696933.4A Pending CN111831752A (en) 2020-07-20 2020-07-20 Distributed database space arrangement method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111831752A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948477A (en) * 2021-03-31 2021-06-11 北京金山云网络技术有限公司 Data downloading method and device, electronic equipment and storage medium
CN113641670A (en) * 2021-07-09 2021-11-12 北京百度网讯科技有限公司 Data storage and data retrieval method and device, electronic equipment and storage medium
CN113778984A (en) * 2021-08-16 2021-12-10 维沃移动通信(杭州)有限公司 Processing component selection method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488173A (en) * 2015-12-01 2016-04-13 四川效率源信息安全技术股份有限公司 Method for recovering and extracting historical records of 360 browser
CN105637491A (en) * 2014-09-26 2016-06-01 华为技术有限公司 File migration method and apparatus and storage device
CN106611364A (en) * 2015-10-22 2017-05-03 中国电信股份有限公司 Storage fragmentation arrangement method and device
CN107315840A (en) * 2017-07-20 2017-11-03 郑州云海信息技术有限公司 The management method and device of memory space in database
CN107368260A (en) * 2017-06-30 2017-11-21 北京奇虎科技有限公司 Memory space method for sorting, apparatus and system based on distributed system
CN107656939A (en) * 2016-07-26 2018-02-02 南京中兴新软件有限责任公司 File wiring method and device
CN110413595A (en) * 2019-06-28 2019-11-05 万翼科技有限公司 A kind of data migration method and relevant apparatus applied to distributed data base
CN110674108A (en) * 2019-08-30 2020-01-10 中国人民财产保险股份有限公司 Data processing method and device
CN110780814A (en) * 2019-10-10 2020-02-11 苏州浪潮智能科技有限公司 Stored data sorting method and device
US10657154B1 (en) * 2017-08-01 2020-05-19 Amazon Technologies, Inc. Providing access to data within a migrating data partition
CN111324596A (en) * 2020-03-06 2020-06-23 腾讯科技(深圳)有限公司 Data migration method and device for database cluster and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637491A (en) * 2014-09-26 2016-06-01 华为技术有限公司 File migration method and apparatus and storage device
CN106611364A (en) * 2015-10-22 2017-05-03 中国电信股份有限公司 Storage fragmentation arrangement method and device
CN105488173A (en) * 2015-12-01 2016-04-13 四川效率源信息安全技术股份有限公司 Method for recovering and extracting historical records of 360 browser
CN107656939A (en) * 2016-07-26 2018-02-02 南京中兴新软件有限责任公司 File wiring method and device
CN107368260A (en) * 2017-06-30 2017-11-21 北京奇虎科技有限公司 Memory space method for sorting, apparatus and system based on distributed system
CN107315840A (en) * 2017-07-20 2017-11-03 郑州云海信息技术有限公司 The management method and device of memory space in database
US10657154B1 (en) * 2017-08-01 2020-05-19 Amazon Technologies, Inc. Providing access to data within a migrating data partition
CN110413595A (en) * 2019-06-28 2019-11-05 万翼科技有限公司 A kind of data migration method and relevant apparatus applied to distributed data base
CN110674108A (en) * 2019-08-30 2020-01-10 中国人民财产保险股份有限公司 Data processing method and device
CN110780814A (en) * 2019-10-10 2020-02-11 苏州浪潮智能科技有限公司 Stored data sorting method and device
CN111324596A (en) * 2020-03-06 2020-06-23 腾讯科技(深圳)有限公司 Data migration method and device for database cluster and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐俊;何连跃;严巍巍;陈博;徐照淼;: "海量小文件系统中基于聚合单元的空间回收机制", 计算机应用, no. 1, 30 June 2018 (2018-06-30) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948477A (en) * 2021-03-31 2021-06-11 北京金山云网络技术有限公司 Data downloading method and device, electronic equipment and storage medium
CN113641670A (en) * 2021-07-09 2021-11-12 北京百度网讯科技有限公司 Data storage and data retrieval method and device, electronic equipment and storage medium
CN113641670B (en) * 2021-07-09 2023-08-11 北京百度网讯科技有限公司 Data storage and data retrieval method and device, electronic equipment and storage medium
CN113778984A (en) * 2021-08-16 2021-12-10 维沃移动通信(杭州)有限公司 Processing component selection method and device

Similar Documents

Publication Publication Date Title
CN111831752A (en) Distributed database space arrangement method, device, equipment and storage medium
CN112015775A (en) Label data processing method, device, equipment and storage medium
CN112269789B (en) Method and device for storing data, and method and device for reading data
CN111523001B (en) Method, device, equipment and storage medium for storing data
CN111966677B (en) Data report processing method and device, electronic equipment and storage medium
CN111045985A (en) File storage processing method, server, electronic device and storage medium
CN111880914A (en) Resource scheduling method, resource scheduling apparatus, electronic device, and storage medium
CN112084366A (en) Method, apparatus, device and storage medium for retrieving image
CN110619002A (en) Data processing method, device and storage medium
CN111913808A (en) Task allocation method, device, equipment and storage medium
CN112565356B (en) Data storage method and device and electronic equipment
CN111506401A (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
CN112559522A (en) Data storage method and device, query method, electronic device and readable medium
CN111767321A (en) Node relation network determining method and device, electronic equipment and storage medium
US8903871B2 (en) Dynamic management of log persistence
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN111782147A (en) Method and apparatus for cluster scale-up
CN113792038A (en) Method and apparatus for storing data
CN112069155A (en) Data multidimensional analysis model generation method and device
US11036710B2 (en) Scalable selection management
CN112527527A (en) Consumption speed control method and device of message queue, electronic equipment and medium
CN111523000A (en) Method, device, equipment and storage medium for importing data
CN113778973A (en) Data storage method and device
CN112328807A (en) Anti-cheating method, device, equipment and storage medium
CN111581049A (en) Method, device, equipment and storage medium for monitoring running state of distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination