CN113377780A

CN113377780A - Database fragmentation method and device, electronic equipment and readable storage medium

Info

Publication number: CN113377780A
Application number: CN202110768338.1A
Authority: CN
Inventors: 汪磊; 苏杭; 李宽; 蒋文伟; 岳猛; 段石石; 刘长伟; 谭钧心; 王军正
Original assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Current assignee: Hangzhou Netease Cloud Music Technology Co Ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-09-10

Abstract

The application discloses a database fragmentation method and device, electronic equipment and a readable storage medium, and belongs to the field of electronic equipment. In the embodiment of the disclosure, a target data table on each node database may be determined, a first fragment configuration parameter and a second fragment configuration parameter are determined, a first interval in the target data table is iteratively segmented based on a first preset fragment policy to obtain N first alternative fragments, each first alternative fragment is divided according to a second preset fragment policy to obtain M second alternative fragments, and finally, the N × M second alternative fragments are classified to determine a target interval in the target data table, so that the target fragment in the node database is obtained by segmentation according to the target interval. Therefore, the difference of the data storage amount on each target fragment can be reduced to a certain extent, and the situation that a data card reader is read due to excessive data storage on individual fragments can be avoided.

Description

Database fragmentation method and device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of electronic equipment, and particularly relates to a database fragmentation method and device, electronic equipment and a readable storage medium.

Background

With the rapid development of electronic technology and the increasing requirements for data storage, the use of multi-node databases to store data is more and more extensive, and the storage capacity in the databases is also larger and larger. In order to manage data stored in the database conveniently, the database is often fragmented, that is, each node database includes a plurality of fragments, and each fragment stores a plurality of data.

In the prior art, a fragmentation method for a database usually includes directly performing fragmentation according to all data stored in the database, and obtaining a plurality of fragments in the database according to a fragmentation result. However, this segmentation method is simple, so that when reading data on a fragment, the required time may be inconsistent, thereby affecting the reading efficiency on the database.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a database sharding method, an apparatus, an electronic device, and a storage medium.

According to a first aspect of the present disclosure, there is provided a database fragmentation method applied to a multi-node database, the method including:

determining a target data table on each node database, and determining a first fragmentation configuration parameter and a second fragmentation configuration parameter;

aiming at any one node database, carrying out iterative segmentation on a first interval in the target data table based on a first preset segmentation strategy to obtain N first alternative segments; the first preset fragmentation strategy is determined according to the first fragmentation configuration parameters; n is a positive integer;

dividing a second interval in each first alternative fragment according to a second preset fragment strategy to obtain M second alternative fragments; the second preset fragmentation strategy is determined according to the second fragmentation configuration parameters; m is a positive integer;

and classifying the N × M second alternative fragments, and determining a target interval in the target data table so as to obtain the target fragments in the node database according to the segmentation of the target interval.

Optionally, the first slice configuration parameter is a target iteration number;

the iterative segmentation is performed on the first interval in the target data table based on the first preset fragmentation strategy to obtain N first alternative fragments, including:

determining the first interval in the target data table according to a target identification value stored in the target data table;

and carrying out iterative segmentation on the first interval according to the target iteration times by utilizing a bisection method to obtain the N first alternative fragments.

Optionally, the performing iterative segmentation on the first interval according to the target iteration number by using a bisection method to obtain the N first candidate slices includes:

under the condition that the number of times of executing the binary segmentation operation is less than the target iteration number, sequentially and circularly executing the following steps:

determining a first endpoint pair according to the data stored in the first interval;

according to the first endpoint pair, executing the binary segmentation operation on the first interval to obtain a plurality of first segments;

determining a second endpoint pair according to the data stored in the first segment, and respectively executing the binary segmentation operation on the plurality of first segments according to the second endpoint pair to obtain a plurality of second segments;

stopping the loop execution step and executing the following steps when the number of times of executing the binary division operation reaches the target iteration number:

and taking the plurality of second fragments obtained after the step of stopping the circular execution as the N first alternative fragments.

Optionally, the determining a first endpoint pair according to the data stored in the first interval includes:

traversing the data correspondingly stored in the first interval, and taking a target identification value corresponding to the data when the data is not empty as an effective value;

and selecting a valid value at an endpoint in the first interval as the first endpoint pair to screen out a target identification value with empty data stored at the endpoint.

Optionally, the second slice configuration parameter is a target division number;

the dividing a second interval in the first alternative fragments according to a second preset fragment strategy to obtain M second alternative fragments includes:

averagely dividing the second interval according to the target division number to obtain X third segments; x is a positive integer;

for each third segment, traversing the data correspondingly stored in the third segment, and selecting the segment with the data not empty at the end point as the second alternative segment, thereby obtaining the M second alternative segments; said M is not greater than said X.

Optionally, the classifying the N × M second candidate segments and determining the target interval in the target data table includes:

classifying the N x M second alternative fragments according to the number P of target fragments by using a preset classification algorithm, and taking each classification result as a target interval so as to obtain P target intervals in the target data table; and P is a positive integer.

Optionally, the method further includes:

receiving a request for acquiring data;

and in response to the data acquisition request, allowing data to be read on the target fragment.

Optionally, the determining the target data table on each node database includes:

reading metadata stored in the database;

and determining a target data table in each node database according to the metadata.

According to a second aspect of the present disclosure, there is provided a database sharding apparatus, applied to a multi-node database, the apparatus including:

the first determining module is used for determining a target data table on each node database and determining a first fragmentation configuration parameter and a second fragmentation configuration parameter;

the segmentation module is used for carrying out iterative segmentation on a first interval in the target data table based on a first preset segmentation strategy aiming at any one node database to obtain N first alternative segments; the first preset fragmentation strategy is determined according to the first fragmentation configuration parameters; n is a positive integer;

the dividing module is used for dividing a second interval in each first alternative fragment according to a second preset fragment strategy to obtain M second alternative fragments; the second preset fragmentation strategy is determined according to the second fragmentation configuration parameters; m is a positive integer;

and the second determining module is used for classifying the N x M second alternative fragments and determining a target interval in the target data table so as to obtain the target fragments in the node database according to the segmentation of the target interval.

the cutting module is further specifically configured to:

Optionally, the dividing module is further specifically configured to:

the dividing module is further specifically configured to:

Optionally, the second determining module is further specifically configured to:

Optionally, the apparatus further comprises:

the receiving module is used for receiving a data acquisition request;

and the reading module is used for responding to the data acquisition request and allowing data to be read on the target fragment.

Optionally, the first determining module is further specifically configured to:

reading metadata stored in the database;

In accordance with a third aspect of the present disclosure, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the database sharding method of any one of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium, wherein instructions, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the database sharding method according to any one of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising readable program instructions which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the steps of the database sharding method as in any one of the above embodiments.

Compared with the related art, the method has the following advantages and positive effects:

to sum up, the database sharding method provided in this disclosure may determine a target data table on each node database, determine a first sharding configuration parameter and a second sharding configuration parameter, perform iterative segmentation on a first interval in the target data table based on a first preset sharding policy for any node database to obtain N first candidate shards, where the first preset sharding policy is determined according to the first sharding configuration parameter, where N is a positive integer, then divide a second interval in each first candidate shard according to a second preset sharding policy to obtain M second candidate shards, where the second preset sharding policy is determined according to the second sharding configuration parameter, where M is a positive integer, and finally classify the N × M second candidate shards to determine a target interval in the target data table, so as to obtain the target fragment in the node database according to the target interval segmentation. Therefore, target fragments are obtained finally through iterative segmentation and further fine-grained division of the data sheet, difference of data storage quantity on each target fragment can be reduced to a certain extent, the situation that a data card reading machine is read due to excessive data storage on individual fragments can be avoided, segmentation is carried out according to the data sheet on each node database, the number of connections established by reading data can be reduced to a certain extent, and therefore reading efficiency on the database can be improved.

Drawings

Fig. 1 is a flowchart illustrating steps of a database sharding method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a database shard splitting process provided by the embodiment of the present disclosure;

fig. 3 is a schematic diagram of another database shard splitting process provided by the embodiment of the present disclosure;

fig. 4 is a schematic diagram of a database shard splitting flow provided by the embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a database shard provided by the prior art;

fig. 6 is a schematic diagram of a database shard provided by an embodiment of the present disclosure;

fig. 7 is a block diagram of a database sharding apparatus provided by an embodiment of the present disclosure;

FIG. 8 is a block diagram illustrating an apparatus for database sharding in accordance with an exemplary embodiment;

FIG. 9 is a block diagram illustrating an apparatus for database sharding in accordance with an exemplary embodiment.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart of steps of a database fragmentation method provided by an embodiment of the present disclosure, which may be applied to a multi-node database, as shown in fig. 1, and the method may include:

step 101, determining a target data table on each node database, and determining a first fragmentation configuration parameter and a second fragmentation configuration parameter.

In the embodiment of the present disclosure, the multi-node database may be a distributed database including a plurality of node databases, and the distributed database may store data on the plurality of node databases, respectively, and may set the electronic devices of the node databases at different locations, and a plurality of electronic devices located at different locations may be connected to each other through a network, so that a logically unified and complete large database with different physical distributions may be formed.

In the implementation of the present disclosure, the target data table may be a network virtual table for temporarily storing data in a database, and each node database may record one target data table for recording data stored in the node database. The target data table on each node database is determined, which may be a data table stored in a read node database, and the data table is used as a target data table on the node database, so that corresponding target data tables on a plurality of node databases can be obtained. Wherein, since the data stored in each node database may be different, the target data table on each node database may also be different.

In this embodiment of the disclosure, the first slice configuration parameter and the second slice configuration parameter may be set by a user in advance according to an actual situation, where the first slice configuration parameter may be a parameter for characterizing iterative segmentation of the target data table, for example, the first slice configuration parameter may be at least one parameter of an iteration number, an iteration mode, an iteration range, and the like, and the second slice configuration parameter may be a parameter for characterizing fine-grained division, for example, the second slice configuration parameter may be at least one parameter of a division mode, a division number, a division size, and the like, and this disclosure is not limited. The determining of the first slice configuration parameter and the second slice configuration parameter may be reading information set by the user for the first slice configuration parameter, and taking the information as the first slice configuration parameter, and correspondingly, reading information set by the user for the second slice configuration parameter, and taking the information as the second slice configuration parameter.

It should be noted that the first fragmentation configuration parameter and the second fragmentation configuration parameter may be stored in the distributed storage system, so as to control each node database to perform fragmentation operation according to the same first fragmentation configuration parameter and second fragmentation configuration parameter. Or may be separately stored in each node database, so as to control each node database to perform fragmentation operation according to the first fragmentation configuration parameter and the second fragmentation configuration parameter in the node database, which is not limited in this disclosure. Further, the first fragmentation configuration parameter and the second fragmentation configuration parameter recorded on each node database may be different, and each node database may perform segmentation on the node database according to the first fragmentation configuration parameter and the second fragmentation configuration parameter recorded on the node database, so that the fragmentation operation on each node database may be implemented.

102, aiming at any node database, carrying out iterative segmentation on a first interval in the target data table based on a first preset segmentation strategy to obtain N first alternative segments; the first preset fragmentation strategy is determined according to the first fragmentation configuration parameters; and N is a positive integer.

In this embodiment of the disclosure, the first preset fragmentation strategy may be a strategy for performing iterative fragmentation on the target data table, and specifically, the iterative fragmentation strategy may be determined according to the first fragmentation configuration parameter, for example, if the first fragmentation configuration parameter is an iteration number, the first preset fragmentation strategy may perform iterative fragmentation on the target data table according to the iteration number, and if the first fragmentation configuration parameter is a binary iteration mode, the first preset fragmentation strategy may perform iterative fragmentation on the target data table according to the binary iteration mode.

In the embodiment of the present disclosure, iterative segmentation is performed on a first interval in a target data table based on a first preset segmentation policy to obtain N first candidate segments, where the first preset segmentation policy corresponding to a first segment configuration parameter is determined first, a segmentation operation is performed on the first interval in the target data table according to the first preset segmentation policy, and a segment obtained by the segmentation is used as a first candidate segment, so as to obtain N first candidate segments. The first interval in the target data table may be an interval composed of stored data at an end point not being empty, and the interval may be a set obtained by numbering each row of data in the target data table, for example, the first interval is [1,100], and may be represented as all data between the first row and the hundred th row in the target data table. By selecting the interval with the stored data at the end point not being empty, the problem of non-uniform stored data when the database is subsequently fragmented can be avoided to a certain extent.

It should be noted that in this step, for any node database, a partitioning policy on the node database is determined, and a corresponding partitioning operation is performed, so that a plurality of first candidate shards are obtained on each node database. The fragmentation strategies on each node database may be the same, so that the number of the first alternative fragments obtained on each node database may be the same, and the fragmentation strategies on each node database may also be different, so that the number of the first alternative fragments obtained on each node database may be different.

103, dividing a second interval in each first alternative fragment according to a second preset fragment strategy to obtain M second alternative fragments; the second preset fragmentation strategy is determined according to the second fragmentation configuration parameters; and M is a positive integer.

In this embodiment of the disclosure, the second preset fragmentation policy may be a policy for partitioning the first candidate fragmentation, and specifically, the partitioning policy may be determined according to a second fragmentation configuration parameter, for example, if the second fragmentation configuration parameter is an average partitioning manner, the second preset fragmentation policy may partition the target data table according to the average partitioning manner, and if the second fragmentation configuration parameter is the number of times of partitioning, the second preset fragmentation policy may partition the target data table according to the number of times of partitioning.

In the embodiment of the present disclosure, the second span in the first alternative shard may be a span in which the stored data at the endpoint is not empty by reading the data stored in the first alternative shard, for example, the data of the first alternative shard [1,10000], where the data stored in [1,200] is empty, and the data stored in [851,10000] is empty, then [201,850] may be selected as the second span in the first alternative shard. The second interval in the first alternative fragment is divided according to a second preset fragment policy to obtain M second alternative fragments, where the second preset fragment policy corresponding to the second fragment configuration parameter is determined first, the second interval in the first alternative fragment is divided according to the second preset fragment policy, and the divided fragments are used as the second alternative fragments to obtain M second alternative fragments.

And 104, classifying the N × M second alternative fragments, and determining a target interval in the target data table so as to obtain the target fragments in the node database according to the segmentation of the target interval.

In the embodiment of the present disclosure, since N first candidate shards may be obtained on any node database, and after each first candidate shard is divided, M second candidate shards may be obtained, so that N × M second candidate shards may be obtained on any node database. The classification of the N × M second candidate segments may be performed by classifying the N × M second candidate segments into one class according to the same standard, for example, every 10 second candidate segments may be classified into one class, or may be calculated according to a preset classification algorithm, and the second candidate segments are classified according to a calculation result, which is not limited in this disclosure.

In this embodiment of the present disclosure, a plurality of intervals corresponding to a plurality of second candidate fragments belonging to the same category may be merged, the merged interval is used as a target interval in the target data table, and if the second candidate fragments are categorized into twenty categories, twenty target intervals may be correspondingly obtained. Further, in each node database, a plurality of target fragments can be obtained by performing segmentation according to the target interval, and if 10 target intervals are obtained, 10 target fragments can be obtained by performing segmentation according to the target interval.

Optionally, the operation of determining the target data table on each node database in the embodiment of the present disclosure may specifically include:

and step 1011, reading the metadata stored in the database.

In the embodiment of the present disclosure, the metadata may be information describing attributes of stored data in a database, for example, the metadata may be information indicating a data storage location, information indicating a data storage sequence, or information indicating data search, which is not limited in the present disclosure. Reading the metadata stored in the database may be reading information at a specified position in the database as metadata in the database, for example, the database may be a relational database, and the first column in the database may be a position for recording the metadata, so that the information recorded in the first column may be read as metadata in the database.

Step 1012, determining a target data table in each node database according to the metadata.

In the embodiment of the present disclosure, since the metadata is often information for indicating a data storage location, the data stored in each node database can be determined by the storage location indicated in the metadata, and thus the target data table in each node database can be determined.

Optionally, in this embodiment of the present disclosure, the first fragmentation configuration parameter may be a target iteration number, and the operation of iteratively segmenting the first interval in the target data table based on the first preset fragmentation policy to obtain N first candidate fragments may specifically include:

step 1021, determining the first interval in the target data table according to the target identification value stored in the target data table.

In the embodiment of the present disclosure, the target identification value may be a primary key in the node database, that is, may be a column of numerical values in the target data table, and the target identification value may characterize the column of stored data, for example, the target identification value may be set to 0001 for the first row of data in the target data table. The first interval in the target data table is determined according to the target identification value stored in the target data table, which may be determining the target identification value corresponding to the first row and the target identification value corresponding to the last row in the target data table, and then determining the first interval according to the target identification value corresponding to the first row and the target identification value corresponding to the last row, for example, if the target identification value corresponding to the first row is 0001 and the target identification value corresponding to the last row is 1000, then the first interval in the target data table may be [1,1000 ].

And 1022, performing iterative segmentation on the first interval according to the target iteration times by using a bisection method to obtain the N first alternative fragments.

In the embodiment of the present disclosure, the first interval may be subjected to binary segmentation, and the segments obtained by the binary segmentation are continuously subjected to binary segmentation until the number of times of binary segmentation operation reaches the target number of iterations, and the N segments obtained by final segmentation are used as the N first alternative segments.

Optionally, in the embodiment of the present disclosure, the performing iterative segmentation on the first interval according to the target iteration number by using a bisection method to obtain the N first candidate slices specifically includes:

under the condition that the number of times of executing the binary segmentation operation is less than the target iteration number, sequentially and circularly executing the following steps: determining a first endpoint pair according to the data stored in the first interval; according to the first endpoint pair, executing the binary segmentation operation on the first interval to obtain a plurality of first segments; determining a second endpoint pair according to the data stored in the first segment, and respectively executing the binary segmentation operation on the plurality of first segments according to the second endpoint pair to obtain a plurality of second segments; stopping the loop execution step and executing the following steps when the number of times of executing the binary division operation reaches the target iteration number: and taking the plurality of second fragments obtained after the step of stopping the circular execution as the N first alternative fragments.

In the embodiment of the present disclosure, the endpoint pair may be composed of a head endpoint and a tail endpoint of the interval or the segment. In this step, a plurality of first segments are obtained by determining a first endpoint pair of a first interval and performing a binary segmentation operation once according to the first endpoint pair, then, for each first segment, a second endpoint pair on the first segment is determined and a binary segmentation operation once according to the second endpoint pair is performed to obtain a plurality of second segments, the binary segmentation operation is performed in a loop, the loop execution step is stopped until the number of times of performing the binary segmentation operation reaches a target iteration number, and a plurality of second segments obtained after the loop execution step is stopped are used as N first candidate segments. By circularly executing the binary segmentation operation, the segmentation granularity can be further refined while data are stored in the segments obtained by segmentation.

Optionally, in this embodiment of the present disclosure, the determining a first endpoint pair according to the data stored in the first interval includes:

step S21, traversing the data stored correspondingly in the first interval, and taking a target identification value corresponding to the data when the data is not empty as an effective value.

In this embodiment of the disclosure, stored data information may be queried one by one based on the data indicated by the first interval, when the data stored in the row is not null data, the target identification value corresponding to the row may be used as an effective value, and when the data stored in the row is null data, the target identification value corresponding to the row may be used as an invalid value.

And step S22, selecting the effective value at the end point in the first interval as the first end point pair to screen out the target identification value with empty data stored at the end point.

In this embodiment of the present disclosure, it may be queried whether a target identifier value corresponding to an endpoint in a first interval is an effective value, and when the target identifier value is the effective value, the endpoint is used as a first endpoint pair; and when the target identification value is not the effective value, taking a point in the midpoint direction of the first interval as a point to be measured, determining whether the target identification value corresponding to the point to be measured is the effective value or not until the target identification value corresponding to the point to be measured is inquired to be the effective value, and taking the point to be measured as a first endpoint pair.

In the embodiment of the present disclosure, the effective value at the endpoint is used as the first endpoint pair to screen out the target identification value where the stored data at the endpoint is empty, that is, the point where the stored data between the intervals is empty can be screened out, so that it can be ensured that the data is stored in the sliced piece obtained by slicing, so that the blank data between the sliced pieces can be filtered out, and the data amount stored in each sliced piece can be more average.

For example, the target data table is music _ account, the target identifier is a primary key id, the number of target fragments may be determined to be 10, the number of target iterations is 3, and the target number of fragments is 1000, querying a first interval corresponding to the primary key in the current target data table may be implemented by inputting the following contents, and filtering data stored between the fragments to be empty data segments by using binary segmentation iteration:

SELECT max(id)max_id,min(id)min_id FROM music_account

assuming that the obtained result is [1,1000W ], i.e. the minimum ID is 1 and the maximum ID is 1000W, then two fragments can be obtained by bisection splitting: [1,500W ], [500W,1000W ], and then the primary key ranges are calculated for the two segments respectively:

SELECT max(id)max_id,min(id)min_id FROM music_account WHERE id>＝1AND id<5000000；

SELECT max(id)max_id,min(id)min_id FROM music_account WHERE id>＝5000000AND id<10000000；

by analogy, after the binary segmentation is performed to obtain 2 segments, binary range query is performed on the segments again, the binary segmentation operation is performed three times according to the target iteration number, and finally, 8 first alternative segments can be obtained.

Optionally, in this embodiment of the present disclosure, the second fragmentation configuration parameter may be a target number of divisions, and the operation of dividing the second interval in the first candidate fragmentation according to the second preset fragmentation policy to obtain M second candidate fragmentation specifically includes:

1031, averagely dividing the second interval according to the target division number to obtain X third segments; and X is a positive integer.

In the embodiment of the present disclosure, if the targeting score is 1000 parts, the second interval may be equally divided into 1000 parts according to the corresponding target identification value in the second interval, and the divided segments are used as third segments, so that 1000 third segments may be obtained.

Step 1032, for each third segment, traversing the data correspondingly stored in the third segment, and selecting a segment whose data stored at an endpoint is not empty as the second alternative segment, thereby obtaining the M second alternative segments; said M is not greater than said X.

In this embodiment of the present disclosure, for each third segment, determining a corresponding endpoint in the third segment, querying whether data stored in correspondence with each endpoint is empty, and when the data stored in the endpoint is not empty, taking a segment divided by the endpoint pair as a second candidate segment; and when the data stored at the endpoint is empty data, taking a point in the middle point direction of the third segment as a point to be measured, determining whether the data stored at the point to be measured is empty data, and taking the segment divided by the point to be measured as a second alternative segment until the data stored at the point to be measured is not empty.

It should be noted that, since a situation that all stored data in the third segment is empty may occur, and when a segment whose data stored at the endpoint is not empty is selected, the segment may be screened out, so a situation that M is not greater than X may occur, that is, 100 third segments are obtained by average division, where all stored data in 4 third segments are empty, and 96 second candidate segments may be obtained by selecting a segment whose data stored at the endpoint is not empty.

For example, fig. 2 is a schematic diagram of a database shard splitting process provided in the embodiment of the present disclosure, as shown in fig. 2, an ID range in a node database is [1,1000 ten thousand ], first, a bisection splitting operation is performed once on [1,1000 ten thousand ] to obtain [1,500 ten thousand ] and [500 ten thousand, 1000 ten thousand ], after stored blank data is screened out, the ID ranges are [1,202 ten thousand ] and [800 ten thousand, 1000 ten thousand ], second, a bisection splitting operation is performed once again on [1,202 ten thousand ] and [800 ten thousand, 1000 ten thousand ], wherein after [1,202 ten thousand ] a bisection splitting operation obtains [1,101 ten thousand ] and [101 ten thousand, 202 ten thousand ], after stored blank data is screened out, the ID range in [1,101 ten thousand ] is [1,50 ten thousand ], [ ID range in [101 ten thousand, 202 ] is [200 ten thousand ]; dividing the [800 ten thousand and 1000 ten thousand ] into [800 ten thousand, 900 ten thousand ] and [900 ten thousand, 1000 ten thousand ] by two, screening out the stored blank data, wherein the ID range in [800 ten thousand and 900 ten thousand ] is [800 ten thousand, 888 ten thousand ], and the ID range in [900 ten thousand and 1000 ten thousand ] is [990 ten thousand and 1000 ten thousand ]; finally, 1000 parts of [1,50 ten thousand ], [200 ten thousand, 202 ten thousand ], [800 ten thousand, 888 ten thousand and [990 ten thousand and 1000 ten thousand ] are respectively equally divided, wherein 1000 parts of [1,50 ten thousand ] are [1,500], [500, 1000] … [49 ten thousand and 50 ten thousand ].

Optionally, in the embodiment of the present disclosure, the classifying the N × M second candidate segments and determining the operation of the target interval in the target data table may specifically include:

step 1041, classifying the N × M second candidate segments according to the number P of target segments by using a preset classification algorithm, and taking each classification result as a target interval, thereby obtaining P target intervals in the target data table; and P is a positive integer.

In this embodiment of the present disclosure, the preset classification algorithm may be a residue removal method in a hash classification, may also be a Bayes (Bayes) classification algorithm, and may also be an Artificial Neural Network (ANN) algorithm, which is not limited to this disclosure. Assuming that the preset classification algorithm is a residue division method in hash classification, the number of N × M second alternative fragments is labeled, the label is divided by the number P of target fragments, the remainder is taken as a hash address, the N × M second alternative fragments are classified according to the hash address to obtain P classification results, and one classification result is used as one target interval, so that P target intervals in a target data table can be obtained.

It should be noted that the target number P of slices may be determined according to the second slice configuration parameter, that is, the target number P of slices may be determined according to the target number of divisions, for example, one hundred of the target number of divisions, and the target number P of slices is 10 if the target number of divisions is 1000. The target number of slices P may be determined by specific data input by the user, regardless of the specific value of the second slice configuration parameter, for example, if the user inputs "P ═ 5", the target number of slices may be determined to be 5. The second fragmentation configuration parameter is a parameter for representing fine-grained division, that is, the number of divisions in the second fragmentation configuration parameter is often large, and the number of fragments in the database is often smaller than the number of fine-grained division on the premise of ensuring the processing efficiency of the database, so that the parameter of fine-grained division is determined, and the number of target fragments is also determined.

For example, fig. 3 is a schematic diagram of another database sharding and splitting process provided by the embodiment of the present disclosure, as shown in fig. 3, iterative sharding operation is performed on a node database 1, and a candidate shard 1 of [1,500], a candidate shard 2 of [500, 1000], and a candidate shard 3 of [1500, 2000) … …, and a candidate shard 1000 of [9999500, 1000 ten thousand ] is obtained, and then hash classification operation is performed on each candidate shard, so that a target shard 1, a target shard 2, a target shard 3, and a target shard … … can be obtained, data stored in each target shard may correspond to one processing service, which is processing service 1, processing service 2, processing service 3, and … …, respectively. The above operations are respectively executed on other node databases in the distributed database, and are not described herein again, so that each target fragment on each node database corresponds to a different processing service.

For example, fig. 4 is a schematic diagram of a database shard splitting flow provided by the embodiment of the present disclosure, as shown in fig. 4, step S1 is to determine whether the database is a distributed database, that is, whether the database has multiple physical nodes, and when the database is a distributed database, step S2 is executed; when the database is not a distributed database, performing step S3; step S2, slicing according to the database nodes; step S3, performing binary search on the data table of each database node to filter the part of the stored data which is empty; step S4, performing fine-grained slicing on each filtered area; and step S5, classifying the obtained fine-grained fragments by using Hash to obtain a plurality of target fragments.

Optionally, the embodiments of the present disclosure may further perform the following steps:

and step S51, receiving a data acquisition request.

In the embodiment of the present disclosure, the database may receive a data acquisition request sent by other electronic devices. The data acquisition request may be a characteristic of the data to be acquired, for example, a storage time of the data to be acquired, an attribute of the data to be acquired, and the like.

And step S52, in response to the data obtaining request, allowing data to be read on the target fragment.

In the embodiment of the disclosure, the database responds to the data acquisition request, and allows the electronic device to read data on the target segment according to the data acquisition request. Compared with the prior art in which the data stored in the database is fragmented as a whole, the embodiment of the present disclosure performs fragmentation based on the data table on each node database, and thus the situation that the same data is stored in a plurality of node databases can be reduced to a certain extent, and the number of connections established with the databases can be reduced.

For example, fig. 5 is a schematic diagram of a database fragment provided in the prior art, as shown in fig. 5, a distributed database includes a node database 1, a node database 2, a node database 3, a node database 4, and a node database 5, where a node database 1 stores ten thousand pieces of data, a data interval is [0,1 ten thousand ], a node database 2 stores five thousand pieces of data, a data interval is [1 ten thousand, 2 ten thousand ], a node database 3 stores one thousand pieces of data, a data interval is [2 ten thousand, 3 ten thousand, a node database 4 stores five thousand pieces of data, a data interval is [3 ten thousand, 4 ten thousand ], a node database 5 stores five thousand pieces of data, and a data interval is [4 ten thousand, 5 ten thousand. When the electronic device 1 needs to read data in the distributed database, connection with the node database 1, the node database 2 and the node database 4 needs to be established; when the electronic device 2 needs to read data in the distributed database, connection with the node database 1, the node database 2 and the node database 4 needs to be established; when the electronic device 3 needs to read data in the distributed database, it needs to establish connection with the node database 1, the node database 2, and the node database 5, that is, because the most data are stored in the node database 1, the node database 1 needs to establish connection with the electronic device 1, the electronic device 2, and the electronic device 3 at the same time, and because the least data are stored in the node database 3, the probability of the electronic device reading the data in the node database 3 is the smallest, and the node database 3 does not need to establish connection with the electronic device most of the time. It can be seen that, in the prior art, fragmentation is performed according to all data stored in the distributed databases, so that the data volumes stored in the node databases are different, some node databases store more data, and some node databases store less data, so that when an electronic device reads data in the distributed databases, data that the electronic device needs to read is often stored in a plurality of node databases, so that the electronic device needs to establish a connection relationship with the plurality of node databases at the same time, and a node database with a large connection number with the electronic device needs to process a plurality of reading transactions at the same time, which often causes a crash situation, and accordingly, the time required by the electronic device to read data in the node databases is also affected, thereby affecting the reading efficiency on the databases. Further, since the connection relationship between the node database and the electronic device is usually determined by the communication states of the node database and the electronic device, when the connection relationship between the electronic device and the plurality of node databases is simultaneously determined, the requirement on the communication state between the node database and the electronic device is high, and the situation that the connection relationship is not available is likely to occur, so that the efficiency of the electronic device in reading data is also affected.

For example, fig. 6 is a schematic diagram of a database fragment provided in the embodiment of the present disclosure, and as shown in fig. 6, the distributed database includes a node database 1, a node database 2, a node database 3, a node database 4, and a node database 5, where each node database is divided into 10 fragments, and when the electronic device 1 needs to read data in the distributed database, it only needs to establish connection with each fragment in the node database 1; when the electronic device 2 needs to read data in the distributed database, connection with each fragment in the node database 2 is only needed to be established; when the electronic device 3 needs to read data in the distributed database, it only needs to establish connection with each segment in the node database 3. Therefore, in the embodiment of the disclosure, the node databases are partitioned into the plurality of fragments based on the data stored in each node database, so that the data stored in each fragment is not very different, when the electronic device reads data, the establishment of a connection relationship with different node databases can be often reduced, and the data on the plurality of fragments in the node database can be read simultaneously only by establishing a connection relationship with one node database, thereby greatly reducing the situation of connection failure caused by the establishment of the plurality of connection relationships, facilitating the electronic device to execute the operation of reading data on the database, and further improving the reading efficiency on the database.

Fig. 7 is a block diagram of a database sharding apparatus provided by an embodiment of the present disclosure, where the apparatus is applied to a multi-node database, and as shown in fig. 4, the apparatus 40 may include:

a first determining module 401, configured to determine a target data table on each node database, and determine a first fragmentation configuration parameter and a second fragmentation configuration parameter;

a segmentation module 402, configured to iteratively segment, based on a first preset segmentation policy, a first interval in the target data table to obtain N first candidate segments, for any node database; the first preset fragmentation strategy is determined according to the first fragmentation configuration parameters; n is a positive integer;

a dividing module 403, configured to divide a second interval in each first alternative segment according to a second preset segment policy for each first alternative segment, to obtain M second alternative segments; the second preset fragmentation strategy is determined according to the second fragmentation configuration parameters; m is a positive integer;

a second determining module 404, configured to classify the N × M second candidate segments, and determine a target interval in the target data table, so as to obtain a target segment in the node database according to the target interval segmentation.

To sum up, the database sharding apparatus provided in this disclosure may determine a target data table on each node database, and determine a first sharding configuration parameter and a second sharding configuration parameter, and then, for any node database, perform iterative segmentation on a first interval in the target data table based on a first preset sharding policy to obtain N first candidate shards, where the first preset sharding policy is determined according to the first sharding configuration parameter, where N is a positive integer, then, divide a second interval in the first candidate shards according to a second preset sharding policy to obtain M second candidate shards, where the second preset sharding policy is determined according to the second sharding configuration parameter, where M is a positive integer, and finally, classify the N × M second candidate shards to determine a target interval in the target data table, so as to obtain the target fragment in the node database according to the target interval segmentation. Therefore, target fragments are obtained finally through iterative segmentation and further fine-grained division of the data sheet, difference of data storage quantity on each target fragment can be reduced to a certain extent, the situation that a data card reading machine is read due to excessive data storage on individual fragments can be avoided, segmentation is carried out according to the data sheet on each node database, the number of connections established by reading data can be reduced to a certain extent, and therefore reading efficiency on the database can be improved.

the segmentation module 402 is further specifically configured to:

Optionally, the dividing module 402 is further specifically configured to:

the dividing module 403 is further specifically configured to:

Optionally, the second determining module 404 is further specifically configured to:

Optionally, the apparatus 40 further includes:

the receiving module is used for receiving a data acquisition request;

Optionally, the first determining module 401 is further specifically configured to:

reading metadata stored in the database;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor, a memory for storing processor executable instructions, wherein the processor is configured to perform the steps of the database sharding method as in any of the above embodiments when executed.

There is also provided, according to an embodiment of the present disclosure, a non-transitory computer readable storage medium, wherein instructions, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the steps of the database sharding method as in any one of the above embodiments.

There is further provided, according to an embodiment of the present disclosure, a computer program product comprising readable program code which, when executed by a processor of a mobile terminal, enables the mobile terminal to perform the steps of the database sharding method as in any one of the above embodiments.

FIG. 8 is a block diagram illustrating an apparatus for database sharding in accordance with an exemplary embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, the apparatus 500 may include one or more of the following components: processing component 502, memory 504, power component 506, multimedia component 508, audio component 510, input/output (I/O) interface 512, sensor component 514, and communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the database sharding method described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in the position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described database fragmentation methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the database sharding method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

FIG. 9 is a block diagram illustrating an apparatus for database sharding in accordance with an exemplary embodiment. For example, the apparatus 600 may be provided as a server. Referring to fig. 9, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the database sharding methods described above.

The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input/output (I/O) interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A database fragmentation method applied to a multi-node database comprises the following steps:

2. The method of claim 1, wherein the first tile configuration parameter is a target number of iterations;

3. The method according to claim 2, wherein the iteratively segmenting the first interval according to the target iteration number by using the bisection method to obtain the N first candidate slices includes:

4. The method of claim 3, wherein determining the first endpoint pair from the data stored in the first interval comprises:

5. The method of claim 1, wherein the second slice configuration parameter is a target partition number;

6. The method according to claim 1, wherein the classifying the N × M second candidate slices and determining the target interval in the target data table comprises:

7. The method according to any one of claims 1 to 6, further comprising:

receiving a request for acquiring data;

8. A database sharding apparatus, for application to a multi-node database, the apparatus comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the database sharding method of any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the database sharding method of any one of claims 1 to 7.