US20230153326A9 - Space partitioning method for database table, device and storage medium - Google Patents
Space partitioning method for database table, device and storage medium Download PDFInfo
- Publication number
- US20230153326A9 US20230153326A9 US17/288,897 US201917288897A US2023153326A9 US 20230153326 A9 US20230153326 A9 US 20230153326A9 US 201917288897 A US201917288897 A US 201917288897A US 2023153326 A9 US2023153326 A9 US 2023153326A9
- Authority
- US
- United States
- Prior art keywords
- time period
- data amount
- data
- target
- database table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000638 solvent extraction Methods 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000003062 neural network model Methods 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000005192 partition Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- Embodiments of the present disclosure relate to the field of database technologies and particularly, relate to space partitioning method and apparatus for a database table, and a device and a storage medium thereof.
- database tables can be used to store data.
- the data amount stored in the database table gradually increases over time.
- Embodiments of the present disclosure provide a space partitioning method and apparatus for a database table, and a device and a storage medium thereof.
- the technical solutions are as follows.
- a space partitioning method for a database table includes:
- the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period;
- the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- the method before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
- determining the number of target regions based on the third data amount within the target time period includes:
- k is the number of target regions
- n is the third data amount within the target time period
- m indicates a maximum storage capacity of a single region
- ⁇ ⁇ represents a rounding-up operation
- the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period upon elapse of each of the plurality of time periods.
- a space partitioning device for a database table includes:
- a processor and a memory configured to store a computer program, wherein the processor, when running the computer program, is caused to perform a space partitioning method for a database table including:
- the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period;
- the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- the method before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
- determining the number of target regions based on the third data amount within the target time period includes:
- k is the number of target regions
- n is the third data amount within the target time period
- m indicates a maximum storage capacity of a single area
- “ ⁇ ⁇ ” represents a rounding-up operation.
- the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
- a non-volatile computer-readable storage medium storing instructions therein, wherein the instructions, when executed by a processor, cause the processor to perform the steps of the method according to the first aspect.
- a computer program product including at least one instructions therein is provided, wherein the at least one instruction, when executed by a computer, causes the computer to perform the steps of the method according to the first aspect.
- a computer device including a processor and a memory storing a computer program
- the processor when running the computer program, is caused to perform the method according to the first aspect.
- FIG. 1 is a schematic structural diagram showing an LSTM network model according to an embodiment
- FIG. 2 is a flowchart showing a space partitioning method for a database table according to an embodiment
- FIG. 3 is a schematic diagram of data comparison according to another embodiment
- FIG. 4 is a schematic structural diagram of a space partitioning apparatus for a database table according to an embodiment
- FIG. 5 is a schematic structural diagram of a space partitioning apparatus for a database table according to another embodiment.
- FIG. 6 is a schematic structural diagram of a computer device according to an embodiment.
- Three-sigma refers to a method for eliminating error data, and may also be called Laida criterion.
- LSTM network model is a time recurrent neural network (RNN), and essentially, is a threshold RNN.
- the threshold RNN herein means that compared with the RNN, the LSTM network additionally includes three gates: an input gate, a forget gate and an output gate. Through the three gates, information that needs to be forgotten and output in the LSTM network model is limited.
- the LSTM network model internally includes a plurality of units, each of which includes the above three gates. Furthermore, each unit further includes some weights and functions (such as a tan h function), wherein the weight of each unit depends on the context, rather than a fixed value.
- the LSTM network model internally transmits information through a cell state, and controls the discard or addition of information through the gates.
- each unit is constructed with a sigmoid function which may be used to determine whether there is information that needs to be forgotten in this unit according to the output of the previous unit and the input of this unit.
- the forget gate generates values within an interval of [0, 1] to control the information that needs to be forgotten.
- values within the interval of [0, 1] are generated by the tan h function to control whether new information needs to be added.
- a filtering degree of a current cell state is controlled by the output gate, that is, the information that needs to be output is integrated; and the tan h function is intended to control the output of the information, wherein values of the output information are in the interval of [0, 1].
- the LSTM network model generally includes a plurality of layers, each of which includes at least one node.
- FIG. 1 is a schematic structural diagram of an LSTM network model according to an example embodiment
- the LSTM network model includes an input layer, a hidden layer, and an output layer, wherein the above units are usually located in the hidden layer.
- the input layer includes a node X1
- the hidden layer includes nodes H1, H2, and H3, and the output layer includes a node Y1.
- the input data can be processed by each layer in the network model, wherein one time step may be understood as a time period for processing data correspondingly.
- the LSTM network model has a memory function, it can predict data in the next time period based on data in the previous time period and data in the current time period.
- databases such as HBase are provided with a plurality of sets of partitioning policies.
- data migration may be required after the database tables are partitioned by using the plurality of sets of the partitioning policies, adversely affecting the storage performance.
- the plurality of sets of the partitioning policies are independent of time factors, which leads to the fact that the partitioned data cannot be aggregated in a certain period of time, thereby greatly reducing the efficiency of other subsequent operations such as data query and deletion.
- the embodiments of the present disclosure provide a space partitioning method for a database table, which can solve the problems existing in the partitioning policy provided by the database per se.
- a space partitioning method for a database table which can solve the problems existing in the partitioning policy provided by the database per se.
- the space partitioning method according to the embodiments of the present disclosure may be performed by a computer device which may be configured to manage a database table, for example, performing space partitioning on the database table, and allocating each region obtained after the partitioning to a respective storage node of the database.
- a computer device which may be configured to manage a database table, for example, performing space partitioning on the database table, and allocating each region obtained after the partitioning to a respective storage node of the database.
- one storage node may be equivalent to a storage device, that is, the storage node may also be understood as the storage device.
- the computer device may be a tablet computer, a desktop computer, a notebook computer, a portable computer, or the like, which is not limited in the embodiments of the present disclosure.
- the database is provided with a database table partitioning policy, and a storage device can partition a database table according to the partitioning policy, specifically, which is implemented in that the database table usually involves a plurality of storage intervals, which corresponds to an interval label range.
- This partitioning policy is generally intended to partition a space by taking a middle value of the interval label range as a partitioning point. For example, if interval labels of a plurality of storage intervals of one database table include 1 to 100, the database table is partitioned into a first region and a second region by taking the middle label 50 as the partitioning point, wherein the interval labels of the storage interval in the first region include 1 to 50, and the interval labels of the storage interval in the second region include 51 to 100.
- the data migration may be required. For example, when the data amount in the first region reaches the upper limit of the storage space corresponding to the first region, if the storage device where the first region is located has no other available storage space, it is necessary to migrate the data in the first region from the storage device where the first region is located to another storage device which allocates a storage space for the data in the first region. Thus, the storage performance of the database is reduced.
- Embodiments of the present disclosure provide a space partitioning method.
- FIG. 2 which is a flow-chart showing a space partitioning method for a database table according to an example embodiment, the method may be performed by a computer device. The method includes the following implementation steps.
- step 201 the computer device acquires a plurality of groups of data by pre-partitioning, based on a time stamp of data in a database table, the data in the database table according to a predetermined period length.
- a space configured to store the data to be stored in the database table is partitioned based on the data stored in the database table.
- the database table is initialized first, that is, the stored data is pre-partitioned.
- the data stored in the database table may be pre-partitioned according to the predetermined period length. It may also be understood as pre-partitioning the space where the data is stored in the database table.
- the pre-partitioning means grouping the stored data, such that the plurality of groups of data can be obtained. Each group of data corresponds to a pre-partitioned region, and corresponds to a time period.
- the predetermined period length is customized by a user according to actual needs, or is defaulted by the computer device.
- the predetermined period length is several days, weeks or months, which is not limited by the embodiments of the present disclosure.
- the data stored in the database table can be partitioned into a plurality of groups with one week as the time period as each datum stored in the database table usually has the time stamp; and each group of data is stored in a region acquired by pre-partitioning the space.
- each region acquired by pre-partitioning the space corresponds to one time period.
- a time range corresponding to the region obtained by the pre-partitioning may be 20180601-20180607, 20180608-20180614, 20180615-20180621, etc. Taking 20180601 as an example for illustration, 2018 represents the year and 0601 represents June 1. Further, the region corresponding to 20180601-20180607 is configured to store the data in the database table whose time stamp is within this time period.
- the data in the database table is pre-partitioned according to the time granularity. In this way, when the data in the database need to be queried or deleted later, the data in a certain time period can be quickly obtained from the corresponding region obtained by the pre-partitioning according to the time period in which the data to be operated falls, thereby improving the data operation efficiency.
- the data stored in the database table are grouped in advance, which can facilitate the rapid counting of the data amount in a certain time period later.
- each pre-partitioned region may be further partitioned. Specifically, each pre-partitioned region is further partitioned according to the number of storage nodes configured to store the data. For example, when the number of storage nodes in the database is three, each pre-partitioned region obtained by the partitioning is further partitioned into three sub-regions; and each sub-region in the three sub-regions is distributed to each of the three storage nodes. In this way, when the data in each pre-partitioned region needs to be read later, the data can be read from the three storage nodes respectively, thereby ensuring the load balance.
- each group of data is stored in a separate region.
- grouping is equivalent to pre-partitioning the space.
- each pre-partitioned region needs to be further partitioned, as each group of data is stored in one pre-partitioned region, after the data in the pre-partitioned regions is grouped again, the data in each sub-group is stored in a separate region, which is equivalent to further partitioning the pre-partitioned region.
- a hash value is added at the end of the time period corresponding to each pre-partitioned region. This hash value is intended to distinguish the different sub-regions corresponding to the same pre-partitioned region. For example, if three sub-regions are obtained by further partitioning the pre-partitioned region corresponding to the time period 20180601-20180607, time ranges corresponding to the three sub-regions may be recorded as 20180601-2018060101, 2018060101-2018060102, and 2018060102-20180607, respectively. The last two numbers of 2018060101 are the added hash values. Similarly, the last two numbers of 2018060102 are also the added hash values.
- step 201 is an optional step. That is, in some embodiments, step 202 may be directly executed without step 201 , which is not limited in the embodiments of the present disclosure.
- step 202 the computer device determines a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored.
- the computer device may perform the operation of determining the first data amount within the first time period and the second data amount within the second time period of the database table. For example, if the target time period is from Sep. 1, 2018 to Sep. 7, 2018, the computer device performs this operation on Aug. 31, 2018.
- the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period. For example, if a time series corresponding to the target time period is t+1, a time series corresponding to the first time period is t, and a time series corresponding to the second time period is t ⁇ 1. In this way, the accuracy of predicting a third data amount within the target time period can be guaranteed later.
- the specific process of determining the first data amount within the first time period and the second data amount within the second time period of the database table includes: determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from a plurality of groups of data.
- the target time period is 20180901-20180907, that is, the data to be stored is from Sep. 1, 2018 to Sep. 7, 2018, the first time period is 20180825-20180831, and the second time period is 2010818-20180824. That is, the computer device, based on the plurality of groups of the data after grouping, counts the data amount from Aug. 25, 2018 to Aug. 31, 2018 to obtain the first data amount, and counts the data amount from Aug. 18, 2018 to Aug. 24, 2018 to obtain the second data amount.
- first and second time periods are adjacent ones and the first time period is the previous time period of the target time period is taken for illustration here.
- the first time period and the second time period may not be adjacent to each other, or the first time period may not be the previous time period of the target time period.
- the target time period is 20180901-20180907
- the first time period may be selected as 2010818-20180824
- the second time period may be selected as 20180607-20180613.
- the computer device calls a target network model, inputs the first data amount and the second data amount into the target network model, and outputs a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period.
- the first data amount and the second data amount are input into the target network model, and then sequentially processed by the input layer, the hidden layer, and the output layer to output the third data amount which is a predicted data amount of the data to be stored.
- the target network model is obtained by training, based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods, a neural network model.
- the data amount within the plurality of time periods and the data amount of one time period after each of the plurality of time periods can be acquired.
- the neural network model is trained to obtain the target network model.
- the time period after each of the plurality of time periods is one time period adjacent to each time period. For example, if the time series corresponding to each of the time periods is t, the time series corresponding to one time period after each of the plurality of time periods is t+1.
- the computer device acquires a plurality of pieces of data, groups the data according to a time stamp of the acquired data and the predetermined period length, and counts the data amount of each group of data to obtain a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
- the acquired data is converted and preprocessed, and the processed data is input into the neural network model for iterative training to obtain the target network model.
- the data size in Table 1 is the compressed data size.
- the three-sigma rule is adopted to detect errors, so as to eliminate bad data, such that the data is distributed in the range of (u ⁇ 3a, u+3a). That is, the data distributed outside the range is eliminated.
- u represents a mean of the data
- a represents a standard deviation of the data.
- the computer device counts the acquired data with a predetermined period length. For example, if the predetermined period length is one week, the counted data is shown in Table 2 below.
- the data in Table 2 may be converted further after the counting is completed.
- the data in Table 3 is converted into feature data and label data.
- the data amount of the time period t is used as the feature data input to the neural network; and the data amount of the time period t+1 is used as the label data for comparison with the predicted value output by the neural network model.
- the converted data is shown in Table 3 below.
- Table 3 is converted into a matrix:
- A [ b ⁇ 1 b ⁇ 2 b ⁇ 2 b ⁇ 3 ... ... b t b t + 1 ] .
- b t represents a data amount of the time period t;
- matrix A identifies a row and column structure for training the data; and the data in Table 3 is saved to a target file according to the data structure of the matrix A.
- the target file is a csv format file, and the data of the csv file is shown in Table 4.
- the data stored in the target file may also be loaded and preprocessed. That is, the data is normalized to be converted into the same data range, so as to prevent neurons in the neural network from generating too high or too low values when values of the data is too large or small. For example, when the neurons adopt the sigmoid function as the activation function
- Data preprocessing can normalize the data between 0 and 1.
- the following formula (2) is used for data preprocessing:
- X represents the data to be processed
- X max is the maximum value and X min is the minimum value
- X normal represents the normalized data
- the construction process generally includes: setting a weight of the neural network model, selecting a weight updater, determining the number of layers of the neural network model, determining the number of nodes in each layer, selecting a loss function and an activation function which are to be adopted, and performing other operations.
- a group of smaller random numbers is randomly generated as initial weights of the neural network model; the random gradient descent method is adopted to optimize a cost function, and an error is reversely transmitted accordingly; and the weight and thresholds in the neural network model are adjusted constantly.
- the XAVIER strategy is selected as the initial strategy of the weights.
- the weight updater includes two parameters, namely, a learning rate and a momentum, wherein the learning rate represents an adjustment range of the weight in each iteration; and the momentum can influence the direction of the weight adjustment. Finding suitable parameters to determine a suitable weight updater can effectively improve convergence, thereby preventing the neural network model from falling into a local minimum.
- a three-layer neural network is constructed, and is as shown in FIG. 1 .
- the first layer is the input layer configured to receive the input data and transmit the received data to the next layer, and can receive the input data by a single node.
- the third layer is the output layer, and is configured to output a predicted value by a single node.
- the loss function in the neural network model is a common mean square error function in regression, and the activation function is an IDENTITY function.
- the neural network model After constructing the neural network model, it is necessary to load the preprocessed data into the neural network model to train the neural network model, so as to obtain the target network model.
- the serial numbers in Table 4 are used as the time series input into the neural network model.
- the predicted value After inputting the time series and the preprocessed data into the neural network model, the predicted value is output.
- the weight of the neural network model is adjusted according to the error between the predicted value and the actual value, the step of inputting the data is repeated, and the training ends until the error of the whole data sample set falls within the specified range, so as to obtain the target network model.
- the target network model is verified further.
- the test data is as shown in Table 5, the test data further include feature data and label data.
- the test data is input into the target network model.
- a predicted value is output, and is compared with the label data to verify whether the predictive ability of the target network model achieves the expected effect.
- the predicted value output by the target network model is compared with the label data in the test data, and a comparison diagram obtained by a tracing point method is shown in FIG. 3 .
- the data amount within the time period of [1, 86] is the real data for training the neural network model.
- the data amount corresponding to curve 1 is the predicted value output by the target network model
- the data amount corresponding to curve 2 is the real data amount or label data in the test data.
- a difference between the predicted value and the label data in the test data can be found visually in the comparison. When the two values are close, it can be determined that the target network model meets an actual demand, that is, the data amount in the target time period can be successfully predicted.
- the predicted value and the label data are relatively close at the beginning, but the difference between the two gradually increases over time. This is because in each of the time periods after the time period 86 , only the data amount of the next time period is predicted by the target network model, but the target network model is not trained based on the data amount of the time period after the time period 86 . In other words, the target network model can generally accurately predict the data amount of the next time period nearest to the current time period, but there is a difference between the predicted value and the real value of the data amount in the time period farther from the current time period.
- the data in the csv file can be updated based on the data of the target time period while predicting the data amount every time.
- the training of the target network model continues based on the updated data, so as to enable the predictive ability of the target network model to be more accurate.
- step 204 the computer device determines the number of target regions based on the third data amount within the target time period.
- the number of target regions is determined by formula (1):
- m is set according to actual requirements. For example, if the optimal storage range of a single region is [5 G, 10 G], 10 G is selected as the upper storage limit of the single region. That is, the value of m is selected as 10 G. Further, it is assumed that the third data amount n determined through the target network model is 100 G, it can be determined that the number of target regions is 10.
- step 205 the computer device partitions, based on the number of target regions, a space in the database table configured to storing the data to be stored.
- the time period corresponding to the space for storing the data to be stored is 20180901-20180907.
- the space corresponding to the time period 20180901-20180907 is partitioned into 10 regions. That is, when storing the data in the time period 20180901-20180907, the data is stored in the 10 regions.
- the data amount to be stored is predicted first, and the space prepared for storing the data to be stored in the database table is partitioned according to the data amount, such that costly splitting of the data when writing into the database table is avoided, thereby ensuring that the optimal load balancing effect is achieved, and enabling the operation of the storage system to be more stable.
- step 202 is performed again to continue to partition the space corresponding to the next time period of the target time period according to the above implementation process.
- the process of judging whether the database table needs to be partitioned continuously may include: judging whether the target time period exceeds a deadline date of the database table, wherein the deadline date is configured to indicate the deadline for storing the data in the database table; and when the target time period exceeds the deadline date, it is determined that there is no need to continue partitioning. Otherwise, when the target time period does not exceed the deadline date, it is determined that the partitioning needs to be continued.
- the database table is generally configured with the deadline date.
- the deadline date is Jan. 1, 2019, which means that the data after Jan. 1, 2019 is not stored in the database table.
- the target time period is 20180901-20180907, it means that the deadline date is not exceeded, and it is determined that the partitioning needs to be continued at this time.
- the target time period is 20190901-20190907, it means that the deadline is exceeded, and it is determined that there is no need to continue the partitioning.
- a first data amount within a first time period of a database table and a second data amount within a second time period of the database table are determined, the first time period and the second time period are prior to a target time period corresponding to data to be stored.
- the first data amount and the second data amount are input into a target network model, and a third data amount within the target time period is output. That is, the data amount within the target time period is predicted by the target network model.
- a number of target regions is determined based on the predicted third data amount, and a space in the database table configured to store the data to be stored is partitioned based on the number of target regions.
- a data amount of data to be stored in a time period is predicted before storing the data, and then space partitioning is performed based on the predicted data amount, which avoids necessary data migration when space partitioning is performed in a fixed manner, thereby improving data storage performance.
- FIG. 4 is a structural schematic diagram of a space partitioning apparatus for a database table according to an example embodiment.
- the apparatus may be practiced by hardware, software or a combination thereof, and may include:
- a determining module 410 configured to determine a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
- a calling module 420 configured to call a target network model, input the first data amount and the second data amount into the target network model, and output a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period;
- a partitioning module 430 configured to determine a number of target regions based on the third data amount within the target time period, and partition, based on the number of target regions, a space in the database table configured to store the data to be stored.
- the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- the apparatus further includes:
- a pre-partitioning module 440 configured to acquire a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length;
- the determining module 410 is configured to determine the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
- the partitioning module 430 is configured to:
- k is the number of target regions
- n is the third data amount within the target time period
- m indicates a maximum storage capacity of a single area
- “ ⁇ ⁇ ” represents a rounding-up operation.
- the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
- a first data amount within a first time period and a second data amount within a second time period of the database table are determined, the first time period and the second time period are prior to a target time period corresponding to data to be stored.
- the first data amount and the second data amount are input into a target network model, and a third data amount within the target time period is output. That is, the data amount within the target time period is predicted by the target network model.
- a number of target regions is determined based on the predicted third data amount, and a space in the database table configured to store the data to be stored is partitioned based on the number of target regions.
- a data amount of data to be stored in a time period is predicted before storing the data, and then space partitioning is performed based on the predicted data amount, which avoids necessary data migration when space partitioning is performed in a fixed manner, thereby improving data storage performance.
- the space partitioning apparatus for the database table according to this embodiment is illustrated by only taking division of all the functional modules as an example during space partitioning of a database table.
- the functions may be implemented by the different functional modules as required. That is, the apparatus is divided into different functional modules to implement all or part of the functions described above.
- the apparatus according to this embodiment is based on the same inventive concept as the method according to the above embodiments. For details, reference may be made to the method embodiments, which is not described herein any further.
- FIG. 6 is a schematic diagram of a structure of a computer device according to an example embodiment.
- the computer device 600 includes a central processing unit (CPU) 601 , a system memory 604 including a random-access memory (RAM) 602 and a read-only memory (ROM) 603 , and a system bus 605 that connects the system memory 604 and the central processing unit 601 .
- the computer device 600 further includes a basic input/output system (I/O system) 606 that facilitates to transfer the information between respective units within the computer, and a mass storage device 607 for storing an operating system 613 , an application 614 , and other program modules 615 .
- I/O system basic input/output system
- the basic input/output system 606 includes a display 608 for displaying the information and an input device 609 , such as a mouse or keyboard, for the user to input information.
- the display 608 and the input device 609 are both connected to the CPU 601 via an input output controller 610 that is connected to the system bus 605 .
- the basic I/O system 606 may further include an input output controller 610 for receiving and processing the input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input and output controller 610 further provides output to a display screen, a printer, or other types of output devices.
- the mass storage device 607 is connected to the CPU 601 by a mass storage controller (not shown) connected to the system bus 605 .
- the mass storage device 607 and the related computer readable mediums provide non-volatile storage for the computer device 600 . That is, the mass storage device 607 may include a computer readable medium (not shown), such as, a hard disk or a compact disc read-only memory (CD-ROM) drive.
- a computer readable medium such as, a hard disk or a compact disc read-only memory (CD-ROM) drive.
- the computer-readable medium may include a computer storage medium and a communication medium.
- the computer storage medium includes volatile and nonvolatile, removable and non-removable mediums implemented by any method or technology for storing the information, such as, computer readable instructions, data structures, program modules or other data.
- the computer storage medium includes a RAM, a ROMs, an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory or other solid-state storage devices, a CD-ROM, a digital versatile disc (DVD) or other optical storage devices, a tape cartridge, a magnetic tape, a magnetic disk storage device or other magnetic storage devices.
- EPROM erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other solid-state storage devices
- CD-ROM compact disc
- DVD digital versatile disc
- the computer storage medium is not limited to the above ones.
- the computer device 600 may also be connected to a remote computer on a network over the network, such as the Internet, for operation. That is, the computer device 600 may be connected to the network 612 through a network interface unit 611 connected to the system bus 605 , or may be connected to other types of networks or remote computer systems (not shown) with the network interface unit 611 .
- the memory further includes one or more programs stored in the memory and to be executed by the CPU.
- the one or more programs include the space partitioning method for the database table according to the embodiments of the present disclosure.
- Embodiments of the present disclosure further provide a non-transitory computer-readable storage medium.
- the space partitioning method for the database table according to the embodiments of the present disclosure can be performed by the computer device.
- Embodiments of the present disclosure further provide a computer program product, which, when running in a computer, causes the computer to execute the space partitioning method for the database table according to the embodiments of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed is a space partitioning method for a database table, including: determining a first data amount within a first time period and a second data amount within a second time period of the database table; calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and determining a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
Description
- This application is a U.S. National Phase Application of International Application No. PCT/CN2019/113310, filed on Oct. 25, 2019, which claims priority to Chinese Patent Application No. 201811253560.2, filed on Oct. 25, 2018 and entitled “DATABASE TABLE AREA SEGMENTATION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM,” the contents of each of which is incorporated by reference herein in its entirety.
- Embodiments of the present disclosure relate to the field of database technologies and particularly, relate to space partitioning method and apparatus for a database table, and a device and a storage medium thereof.
- With the development of database technologies, database tables can be used to store data. In a business scenario using a database table to store the data, the data amount stored in the database table gradually increases over time. To reduce the storage pressure of the database, it is usually necessary to partition a space of the database table, so as to store the data in the database tables in units of regions formed by partitioning.
- Embodiments of the present disclosure provide a space partitioning method and apparatus for a database table, and a device and a storage medium thereof. The technical solutions are as follows.
- In a first aspect, a space partitioning method for a database table is provided. The method includes:
- determining a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
- calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and
- determining a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
- Optionally, the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- Optionally, before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further includes:
- acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
- determining the first data amount within the first time period and the second data amount within the second time period of the database table includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
- Optionally, determining the number of target regions based on the third data amount within the target time period includes:
- determining the number of target regions by formula:
-
- based on the third data amount within the target time period;
- wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single region, and “┌ ┐” represents a rounding-up operation.
- Optionally, the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period upon elapse of each of the plurality of time periods.
- In a second aspect, a space partitioning device for a database table is provided. The device includes:
- a processor and a memory configured to store a computer program, wherein the processor, when running the computer program, is caused to perform a space partitioning method for a database table including:
- determining a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
- calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and
- determining a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
- Optionally, the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- Optionally, before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further includes:
- acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
- determining the first data amount within the first time period and the second data amount within the second time period of the database table includes:
- determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
- Optionally, determining the number of target regions based on the third data amount within the target time period includes:
- determining the number of target regions by formula:
-
- based on the third data amount within the target time period;
- wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
- Optionally, the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
- In a third aspect, a non-volatile computer-readable storage medium storing instructions therein is provided, wherein the instructions, when executed by a processor, cause the processor to perform the steps of the method according to the first aspect.
- In a fourth aspect, a computer program product including at least one instructions therein is provided, wherein the at least one instruction, when executed by a computer, causes the computer to perform the steps of the method according to the first aspect.
- In a fifth aspect, a computer device including a processor and a memory storing a computer program is provided, wherein the processor, when running the computer program, is caused to perform the method according to the first aspect.
- To describe the technical solutions in the embodiments of the present disclosure more clearly, the drawings required for the description of the embodiments are briefly introduced below. Obviously, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
-
FIG. 1 is a schematic structural diagram showing an LSTM network model according to an embodiment; -
FIG. 2 is a flowchart showing a space partitioning method for a database table according to an embodiment; -
FIG. 3 is a schematic diagram of data comparison according to another embodiment; -
FIG. 4 is a schematic structural diagram of a space partitioning apparatus for a database table according to an embodiment; -
FIG. 5 is a schematic structural diagram of a space partitioning apparatus for a database table according to another embodiment; and -
FIG. 6 is a schematic structural diagram of a computer device according to an embodiment. - For clearer descriptions of the objectives, technical solutions, and advantages of the present disclosure, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
- Prior to detailed introduction of the embodiments of the present disclosure, the terms, application scenarios and implementation environments involved in the embodiments of the present disclosure are briefly introduced.
- First, the terms involved in the embodiments of the present disclosure are introduced.
- Three-sigma refers to a method for eliminating error data, and may also be called Laida criterion.
- Long short-term memory (LSTM) network model is a time recurrent neural network (RNN), and essentially, is a threshold RNN. The threshold RNN herein means that compared with the RNN, the LSTM network additionally includes three gates: an input gate, a forget gate and an output gate. Through the three gates, information that needs to be forgotten and output in the LSTM network model is limited. Specifically, the LSTM network model internally includes a plurality of units, each of which includes the above three gates. Furthermore, each unit further includes some weights and functions (such as a tan h function), wherein the weight of each unit depends on the context, rather than a fixed value. The LSTM network model internally transmits information through a cell state, and controls the discard or addition of information through the gates. In an embodiment, each unit is constructed with a sigmoid function which may be used to determine whether there is information that needs to be forgotten in this unit according to the output of the previous unit and the input of this unit. The forget gate generates values within an interval of [0, 1] to control the information that needs to be forgotten. In addition, values within the interval of [0, 1] are generated by the tan h function to control whether new information needs to be added. After that, a filtering degree of a current cell state is controlled by the output gate, that is, the information that needs to be output is integrated; and the tan h function is intended to control the output of the information, wherein values of the output information are in the interval of [0, 1].
- Next, the internal structure of the LSTM network model is introduced. The LSTM network model generally includes a plurality of layers, each of which includes at least one node. Referring to
FIG. 1 which is a schematic structural diagram of an LSTM network model according to an example embodiment, the LSTM network model includes an input layer, a hidden layer, and an output layer, wherein the above units are usually located in the hidden layer. In the LSTM network model, the input layer includes a node X1, the hidden layer includes nodes H1, H2, and H3, and the output layer includes a node Y1. In each time step, the input data can be processed by each layer in the network model, wherein one time step may be understood as a time period for processing data correspondingly. - It should be noted that as the LSTM network model has a memory function, it can predict data in the next time period based on data in the previous time period and data in the current time period.
- Secondly, the application scenarios involved in the embodiments of the present disclosure are briefly introduced.
- In some usage scenarios of database tables, space partitioning is necessary generally to avoid huge storage pressure to a database caused by using a single space to store data. At present, databases such as HBase are provided with a plurality of sets of partitioning policies. However, data migration may be required after the database tables are partitioned by using the plurality of sets of the partitioning policies, adversely affecting the storage performance. In addition, the plurality of sets of the partitioning policies are independent of time factors, which leads to the fact that the partitioned data cannot be aggregated in a certain period of time, thereby greatly reducing the efficiency of other subsequent operations such as data query and deletion. For this reason, the embodiments of the present disclosure provide a space partitioning method for a database table, which can solve the problems existing in the partitioning policy provided by the database per se. For the detailed process of the method, reference may be made to the embodiment shown in
FIG. 2 below. - Finally, the implementation environments related to the embodiments of the present disclosure are briefly introduced.
- The space partitioning method according to the embodiments of the present disclosure may be performed by a computer device which may be configured to manage a database table, for example, performing space partitioning on the database table, and allocating each region obtained after the partitioning to a respective storage node of the database. It should be noted that in some embodiments, one storage node may be equivalent to a storage device, that is, the storage node may also be understood as the storage device.
- In some embodiments, the computer device may be a tablet computer, a desktop computer, a notebook computer, a portable computer, or the like, which is not limited in the embodiments of the present disclosure.
- After introducing the terms, the application scenarios and the implementation environments involved in the embodiments of the present disclosure, the method according to the embodiments of the present disclosure are described in detail with reference to the accompanying drawings.
- At present, the database is provided with a database table partitioning policy, and a storage device can partition a database table according to the partitioning policy, specifically, which is implemented in that the database table usually involves a plurality of storage intervals, which corresponds to an interval label range. This partitioning policy is generally intended to partition a space by taking a middle value of the interval label range as a partitioning point. For example, if interval labels of a plurality of storage intervals of one database table include 1 to 100, the database table is partitioned into a first region and a second region by taking the middle label 50 as the partitioning point, wherein the interval labels of the storage interval in the first region include 1 to 50, and the interval labels of the storage interval in the second region include 51 to 100.
- As the data is stored in the database table in units of regions when the amount of data is larger, and the storage space allocated for a single region in the storage device is limited, as a result, in the space partitioning for the database table, if the data amount in a partitioned region reaches an upper limit of the storage space corresponding to this region, data migration may be required. For example, when the data amount in the first region reaches the upper limit of the storage space corresponding to the first region, if the storage device where the first region is located has no other available storage space, it is necessary to migrate the data in the first region from the storage device where the first region is located to another storage device which allocates a storage space for the data in the first region. Thus, the storage performance of the database is reduced.
- Embodiments of the present disclosure provide a space partitioning method. Referring to
FIG. 2 which is a flow-chart showing a space partitioning method for a database table according to an example embodiment, the method may be performed by a computer device. The method includes the following implementation steps. - In
step 201, the computer device acquires a plurality of groups of data by pre-partitioning, based on a time stamp of data in a database table, the data in the database table according to a predetermined period length. - In the embodiment of the present disclosure, a space configured to store the data to be stored in the database table is partitioned based on the data stored in the database table. Before partitioning the space configured to store the data to be stored in the database, the database table is initialized first, that is, the stored data is pre-partitioned. During initialization, as all the data stored in the database table have corresponding time stamps, the data stored in the database table may be pre-partitioned according to the predetermined period length. It may also be understood as pre-partitioning the space where the data is stored in the database table. In fact, the pre-partitioning means grouping the stored data, such that the plurality of groups of data can be obtained. Each group of data corresponds to a pre-partitioned region, and corresponds to a time period.
- The predetermined period length is customized by a user according to actual needs, or is defaulted by the computer device. For example, the predetermined period length is several days, weeks or months, which is not limited by the embodiments of the present disclosure.
- For convenience of understanding, an example in which the predetermined period length is set to be one week is taken for illustration here. The data stored in the database table can be partitioned into a plurality of groups with one week as the time period as each datum stored in the database table usually has the time stamp; and each group of data is stored in a region acquired by pre-partitioning the space. In this way, each region acquired by pre-partitioning the space corresponds to one time period. For example, a time range corresponding to the region obtained by the pre-partitioning may be 20180601-20180607, 20180608-20180614, 20180615-20180621, etc. Taking 20180601 as an example for illustration, 2018 represents the year and 0601 represents June 1. Further, the region corresponding to 20180601-20180607 is configured to store the data in the database table whose time stamp is within this time period.
- It is worth mentioning that the data in the database table is pre-partitioned according to the time granularity. In this way, when the data in the database need to be queried or deleted later, the data in a certain time period can be quickly obtained from the corresponding region obtained by the pre-partitioning according to the time period in which the data to be operated falls, thereby improving the data operation efficiency. In addition, the data stored in the database table are grouped in advance, which can facilitate the rapid counting of the data amount in a certain time period later.
- Further, considering that when the data in a region is stored on one storage node, the data can only be read from the one storage node during reading, a load balance effect is poorer. Therefore, after data of the plurality of pre-partitioned regions is obtained by the pre-partitioning, each pre-partitioned region may be further partitioned. Specifically, each pre-partitioned region is further partitioned according to the number of storage nodes configured to store the data. For example, when the number of storage nodes in the database is three, each pre-partitioned region obtained by the partitioning is further partitioned into three sub-regions; and each sub-region in the three sub-regions is distributed to each of the three storage nodes. In this way, when the data in each pre-partitioned region needs to be read later, the data can be read from the three storage nodes respectively, thereby ensuring the load balance.
- It should be noted that for the above wording that the plurality of pre-partitioned regions are acquired by pre-partitioning the data stored in the database table, as the stored data is allocated in one region, after grouping the data, each group of data is stored in a separate region. Thus, grouping is equivalent to pre-partitioning the space. Similarly, for the wording that each pre-partitioned region needs to be further partitioned, as each group of data is stored in one pre-partitioned region, after the data in the pre-partitioned regions is grouped again, the data in each sub-group is stored in a separate region, which is equivalent to further partitioning the pre-partitioned region.
- Further, in order to distinguish the all sub-regions obtained after each pre-partitioned region is further partitioned, a hash value is added at the end of the time period corresponding to each pre-partitioned region. This hash value is intended to distinguish the different sub-regions corresponding to the same pre-partitioned region. For example, if three sub-regions are obtained by further partitioning the pre-partitioned region corresponding to the time period 20180601-20180607, time ranges corresponding to the three sub-regions may be recorded as 20180601-2018060101, 2018060101-2018060102, and 2018060102-20180607, respectively. The last two numbers of 2018060101 are the added hash values. Similarly, the last two numbers of 2018060102 are also the added hash values.
- It should be noted that
step 201 is an optional step. That is, in some embodiments,step 202 may be directly executed withoutstep 201, which is not limited in the embodiments of the present disclosure. - In
step 202, the computer device determines a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored. - In some embodiments, when determining that the current time reaches the target time period, the computer device may perform the operation of determining the first data amount within the first time period and the second data amount within the second time period of the database table. For example, if the target time period is from Sep. 1, 2018 to Sep. 7, 2018, the computer device performs this operation on Aug. 31, 2018.
- In a possible embodiment, the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period. For example, if a time series corresponding to the target time period is t+1, a time series corresponding to the first time period is t, and a time series corresponding to the second time period is t−1. In this way, the accuracy of predicting a third data amount within the target time period can be guaranteed later.
- Further, the specific process of determining the first data amount within the first time period and the second data amount within the second time period of the database table includes: determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from a plurality of groups of data.
- For example, if the target time period is 20180901-20180907, that is, the data to be stored is from Sep. 1, 2018 to Sep. 7, 2018, the first time period is 20180825-20180831, and the second time period is 2010818-20180824. That is, the computer device, based on the plurality of groups of the data after grouping, counts the data amount from Aug. 25, 2018 to Aug. 31, 2018 to obtain the first data amount, and counts the data amount from Aug. 18, 2018 to Aug. 24, 2018 to obtain the second data amount.
- An example in which the first and second time periods are adjacent ones and the first time period is the previous time period of the target time period is taken for illustration here. In another embodiment, the first time period and the second time period may not be adjacent to each other, or the first time period may not be the previous time period of the target time period. For example, taking the above example for illustration, if the target time period is 20180901-20180907, the first time period may be selected as 2010818-20180824, and the second time period may be selected as 20180607-20180613.
- In
step 203, the computer device calls a target network model, inputs the first data amount and the second data amount into the target network model, and outputs a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period. - For example, if a model structure of the target network model is as shown in
FIG. 1 , the first data amount and the second data amount are input into the target network model, and then sequentially processed by the input layer, the hidden layer, and the output layer to output the third data amount which is a predicted data amount of the data to be stored. - Further, the target network model is obtained by training, based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods, a neural network model.
- That is, before calling the target network model, it is generally necessary to perform training to obtain the target network model. In the training process, the data amount within the plurality of time periods and the data amount of one time period after each of the plurality of time periods can be acquired. After that, based on the data amount within the plurality of time periods and the data amount of one time period after each of the plurality of time periods, the neural network model is trained to obtain the target network model.
- As an example, the time period after each of the plurality of time periods is one time period adjacent to each time period. For example, if the time series corresponding to each of the time periods is t, the time series corresponding to one time period after each of the plurality of time periods is t+1.
- Next, the training process is introduced. The computer device acquires a plurality of pieces of data, groups the data according to a time stamp of the acquired data and the predetermined period length, and counts the data amount of each group of data to obtain a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods. The acquired data is converted and preprocessed, and the processed data is input into the neural network model for iterative training to obtain the target network model.
- When acquiring the data, data with similar business types, similar data structures and similar time distribution may be selected. For example, the selected data is shown in Table 1.
-
TABLE 1 Date Data Size 2016 Jul. 1 1.3 G 2016 Jul. 2 1.4 G 2016 Jul. 3 1.2 G 2016 Jul. 4 0.8 G 2016 Jul. 5 1.5 G 2016 Jul. 6 1.3 G 2016 Jul. 7 1.2 G 2016 Jul. 8 1.0 G 2016 Jul. 9 0.6 G 2016 Jul. 10 0.8 G 2016 Jul. 11 1.7 G 2016 Jul. 12 1.7 G 2016 Jul. 13 1.6 G 2016 Jul. 14 1.5 G . . . . . . - If the database adopts compressed storage, the data size in Table 1 is the compressed data size. After that, the three-sigma rule is adopted to detect errors, so as to eliminate bad data, such that the data is distributed in the range of (u−3a, u+3a). That is, the data distributed outside the range is eliminated. Herein, u represents a mean of the data, and a represents a standard deviation of the data. Further, the computer device counts the acquired data with a predetermined period length. For example, if the predetermined period length is one week, the counted data is shown in Table 2 below.
-
TABLE 2 Date Data Size Time Period Total Data Size 2016 Jul. 1 1.7 G 1 12.6 G 2016 Jul. 2 1.9 G 2016 Jul. 3 1.6 G 2016 Jul. 4 2.2 G 2016 Jul. 5 1.9 G 2016 Jul. 6 2.1 G 2016 Jul. 7 1.2 G 2016 Jul. 8 1.8 G 2 12.7 G 2016 Jul. 9 1.6 G 2016 Jul. 10 1.8 G 2016 Jul. 11 1.7 G 2016 Jul. 12 1.7 G 2016 Jul. 13 1.9 G 2016 Jul. 14 2.2 G . . . . . . . . . . . . - Further, in order to facilitate subsequent training of the neural network model, the data in Table 2 may be converted further after the counting is completed. For example, the data in Table 3 is converted into feature data and label data. The data amount of the time period t is used as the feature data input to the neural network; and the data amount of the time period t+1 is used as the label data for comparison with the predicted value output by the neural network model. The converted data is shown in Table 3 below.
-
TABLE 3 Feature Data Label Data 12.6 G 12.7 G 12.7 G . . . . . . . . . - Further, Table 3 is converted into a matrix:
-
- bt represents a data amount of the time period t; matrix A identifies a row and column structure for training the data; and the data in Table 3 is saved to a target file according to the data structure of the matrix A. For example, the target file is a csv format file, and the data of the csv file is shown in Table 4.
-
TABLE 4 1 12.6, 12.7 2 12.7, 12.8 3 12.8, 12.9 4 12.9, 13.1 5 13.1, 13.2 6 13.2, 13.3 7 13.3, 13.3 8 13.3, 13.4 9 13.4, 13.5 10 13.5, 13.4 11 13.4, 13.4 12 13.4, 13.4 13 13.4, 13.4 14 13.4, 13.4 15 13.4, 13.4 16 13.4, 13.4 17 13.4, 13.3 18 13.3, 13.3 19 13.3, 13.2 20 13.2, 13.1 21 13.1, 13.1 22 13.1, 13.0 23 13.0, 13.0 24 13.0, 13.0 25 13.0, 12.9 26 12.9, 12.9 27 12.9, 12.9 28 12.9, 13.0 29 13.0, 13.0 30 13.0, 13.1 31 13.1, 13.2 32 13.2, 13.2 33 13.2, 13.3 34 13.3, 13.3 35 13.3, 13.4 36 13.4, 13.4 37 13.4, 13.5 38 13.5, 13.4 39 13.4, 13.4 40 13.4, 13.3 41 13.3, 13.3 42 13.3, 13.2 43 13.2, 13.1 44 13.1, 13.1 45 13.1, 13.0 46 13.0, 13.0 47 13.0, 12.9 48 12.9, 12.9 49 12.9, 12.6 50 12.6, 12.6 51 12.6, 12.5 52 12.5, 12.5 53 12.5, 12.4 54 12.4, 12.4 55 12.4, 12.3 56 12.3, 12.3 57 12.3, 12.2 58 12.2, 12.2 59 12.2, 12.1 60 12.1, 12.1 61 12.1, 12.0 62 12.0, 12.1 63 12.1, 12.1 64 12.1, 12.2 65 12.2, 12.2 66 12.2, 12.1 67 12.1, 12.1 68 12.1, 12.1 69 12.1, 12.0 70 12.0, 12.0 71 12.0, 11.9 72 11.9, 11.9 73 11.9, 11.8 74 11.8, 11.8 75 11.8, 11.7 76 11.7, 11.7 77 11.7, 11.7 78 11.7, 11.6 79 11.6, 11.6 80 11.6, 11.5 81 11.5, 11.6 82 11.6, 11.5 83 11.5, 11.4 84 11.4, 11.3 85 11.3, 11.2 86 11.2, 11.2 - Further, the data stored in the target file may also be loaded and preprocessed. That is, the data is normalized to be converted into the same data range, so as to prevent neurons in the neural network from generating too high or too low values when values of the data is too large or small. For example, when the neurons adopt the sigmoid function as the activation function
-
- the too large or too small values will cause derivatives of neurons to approach zero, thereby adversely affecting the training results. Data preprocessing can normalize the data between 0 and 1. The following formula (2) is used for data preprocessing:
-
- X represents the data to be processed; Xmax is the maximum value and Xmin is the minimum value; and Xnormal represents the normalized data.
- After that, it is necessary to construct a neural network model. The construction process generally includes: setting a weight of the neural network model, selecting a weight updater, determining the number of layers of the neural network model, determining the number of nodes in each layer, selecting a loss function and an activation function which are to be adopted, and performing other operations.
- In a possible embodiment, a group of smaller random numbers is randomly generated as initial weights of the neural network model; the random gradient descent method is adopted to optimize a cost function, and an error is reversely transmitted accordingly; and the weight and thresholds in the neural network model are adjusted constantly. The XAVIER strategy is selected as the initial strategy of the weights.
- The weight updater includes two parameters, namely, a learning rate and a momentum, wherein the learning rate represents an adjustment range of the weight in each iteration; and the momentum can influence the direction of the weight adjustment. Finding suitable parameters to determine a suitable weight updater can effectively improve convergence, thereby preventing the neural network model from falling into a local minimum.
- In the embodiment of the present disclosure, as the data amount is only a factor affecting the neural network model, a three-layer neural network is constructed, and is as shown in
FIG. 1 . The first layer is the input layer configured to receive the input data and transmit the received data to the next layer, and can receive the input data by a single node. The second layer is the hidden layer configured to construct the LSTM network; and the number of nodes in the hidden layer is determined by the Kolmogorov theorem which includes s=2*n+1, wherein s represents the number of hidden nodes and n represents the number of nodes in the input layer. The third layer is the output layer, and is configured to output a predicted value by a single node. - As an example, the loss function in the neural network model is a common mean square error function in regression, and the activation function is an IDENTITY function.
- After constructing the neural network model, it is necessary to load the preprocessed data into the neural network model to train the neural network model, so as to obtain the target network model. It should be noted that as the LSTM network model needs information with time series, it is required to add a time dimension to the preprocessed data. For example, the serial numbers in Table 4 are used as the time series input into the neural network model. After inputting the time series and the preprocessed data into the neural network model, the predicted value is output. The weight of the neural network model is adjusted according to the error between the predicted value and the actual value, the step of inputting the data is repeated, and the training ends until the error of the whole data sample set falls within the specified range, so as to obtain the target network model.
- Further, after the target network model is obtained, the target network model is verified further. For example, it is assumed that the test data is as shown in Table 5, the test data further include feature data and label data. The test data is input into the target network model. A predicted value is output, and is compared with the label data to verify whether the predictive ability of the target network model achieves the expected effect.
-
TABLE 5 1 11.2, 11.2 2 11.2, 11.1 3 11.1, 11.0 4 11.0, 10.9 5 10.9, 10.9 6 10.9, 10.8 7 10.8, 10.7 8 10.7, 10.7 9 10.7, 10.7 10 10.7, 10.8 11 10.8, 11.0 12 11.0, 11.1 13 11.1, 11.2 14 11.2, 11.3 15 11.3, 11.4 16 11.4, 11.5 17 11.5, 11.6 18 11.6, 11.7 19 11.7, 11.8 20 11.8, 11.9 21 11.9, 12.0 22 12.0, 12.2 23 12.2, 12.3 24 12.3, 12.4 25 12.4, 12.5 26 12.5, 12.4 27 12.4, 12.6 28 12.6, 12.7 29 12.7, 12.8 30 12.8, 13.0 31 13.0, 13.1 32 13.1, 13.2 33 13.2, 13.3 34 13.3, 13.5 35 13.5, 13.6 36 13.6, 13.7 37 13.7, 13.7 38 13.7, 13.6 39 13.6, 13.5 40 13.5, 13.7 41 13.7, 13.6 42 13.6, 13.5 43 13.5, 13.7 44 13.7, 13.4 45 13.4, 13.3 46 13.3, 13.3 47 13.3, 13.4 48 13.4, 13.4 49 13.4, 13.6 50 13.6, 13.7 51 13.7, 13.8 52 13.8, 13.8 53 13.8, 13.9 54 13.9, 14.1 55 14.1, 14.0 56 14.0, 14.0 57 14.0, 14.1 58 14.1, 14.2 59 14.2, 14.4 60 14.4, 14.5 61 14.5, 14.6 62 14.6, 14.8 63 14.8, 15.0 64 15.0, 15.1 65 15.1, 15.3 66 15.3, 15.4 67 15.4, 15.5 68 15.5, 15.7 69 15.7, 15.8 70 15.8, 16.0 71 16.0, 16.2 72 16.2, 16.3 73 16.3, 16.5 74 16.5, 16.6 75 16.6, 16.8 76 16.8, 17.0 77 17.0, 17.2 78 17.2, 17.3 79 17.3, 17.5 80 17.5, 17.7 81 17.7, 17.8 82 17.8, 18.0 83 18.0, 18.2 84 18.2, 18.4 85 18.4, 18.5 86 18.5, 18.7 87 18.7, 18.9 88 18.9, 19.1 89 19.1, 18.7 90 18.7, 18.6 91 18.6, 18.8 92 18.8, 19.0 93 19.0, 19.1 94 19.1, 18.7 95 18.7, 18.8 96 18.8, 18.6 - The predicted value output by the target network model is compared with the label data in the test data, and a comparison diagram obtained by a tracing point method is shown in
FIG. 3 . InFIG. 3 , the data amount within the time period of [1, 86] is the real data for training the neural network model. In the time period beyond the range, the data amount corresponding tocurve 1 is the predicted value output by the target network model, and the data amount corresponding tocurve 2 is the real data amount or label data in the test data. InFIG. 3 , a difference between the predicted value and the label data in the test data can be found visually in the comparison. When the two values are close, it can be determined that the target network model meets an actual demand, that is, the data amount in the target time period can be successfully predicted. - It should also be noted that, as can be seen from
FIG. 3 , the predicted value and the label data are relatively close at the beginning, but the difference between the two gradually increases over time. This is because in each of the time periods after the time period 86, only the data amount of the next time period is predicted by the target network model, but the target network model is not trained based on the data amount of the time period after the time period 86. In other words, the target network model can generally accurately predict the data amount of the next time period nearest to the current time period, but there is a difference between the predicted value and the real value of the data amount in the time period farther from the current time period. Therefore, in order to ensure the prediction accuracy of the target network model, the data in the csv file can be updated based on the data of the target time period while predicting the data amount every time. In addition, the training of the target network model continues based on the updated data, so as to enable the predictive ability of the target network model to be more accurate. - In
step 204, the computer device determines the number of target regions based on the third data amount within the target time period. - In a possible embodiment, based on the third data amount within the target time period, the number of target regions is determined by formula (1):
-
- Here, k is the number of target regions; n is the third data amount within the target time period, m indicates the maximum storage capacity of a single region, and “┌ ┐” represents a rounding-up operation.
- In addition, m is set according to actual requirements. For example, if the optimal storage range of a single region is [5 G, 10 G], 10 G is selected as the upper storage limit of the single region. That is, the value of m is selected as 10 G. Further, it is assumed that the third data amount n determined through the target network model is 100 G, it can be determined that the number of target regions is 10.
- In
step 205, the computer device partitions, based on the number of target regions, a space in the database table configured to storing the data to be stored. - Continuing to take the above example for illustration, the time period corresponding to the space for storing the data to be stored is 20180901-20180907. When the number of target regions is 10, the space corresponding to the time period 20180901-20180907 is partitioned into 10 regions. That is, when storing the data in the time period 20180901-20180907, the data is stored in the 10 regions.
- In this way, before storing the data, the data amount to be stored is predicted first, and the space prepared for storing the data to be stored in the database table is partitioned according to the data amount, such that costly splitting of the data when writing into the database table is avoided, thereby ensuring that the optimal load balancing effect is achieved, and enabling the operation of the storage system to be more stable.
- Further, after performing space partitioning on the database table, the computer device can also judge whether the database table needs to be partitioned continuously. When it is determined that the partitioning needs to be continued,
step 202 is performed again to continue to partition the space corresponding to the next time period of the target time period according to the above implementation process. - Further, the process of judging whether the database table needs to be partitioned continuously may include: judging whether the target time period exceeds a deadline date of the database table, wherein the deadline date is configured to indicate the deadline for storing the data in the database table; and when the target time period exceeds the deadline date, it is determined that there is no need to continue partitioning. Otherwise, when the target time period does not exceed the deadline date, it is determined that the partitioning needs to be continued.
- That is, the database table is generally configured with the deadline date. For example, the deadline date is Jan. 1, 2019, which means that the data after Jan. 1, 2019 is not stored in the database table. If the target time period is 20180901-20180907, it means that the deadline date is not exceeded, and it is determined that the partitioning needs to be continued at this time. If the target time period is 20190901-20190907, it means that the deadline is exceeded, and it is determined that there is no need to continue the partitioning.
- In the embodiments of the present disclosure, a first data amount within a first time period of a database table and a second data amount within a second time period of the database table are determined, the first time period and the second time period are prior to a target time period corresponding to data to be stored. The first data amount and the second data amount are input into a target network model, and a third data amount within the target time period is output. That is, the data amount within the target time period is predicted by the target network model. Thus, a number of target regions is determined based on the predicted third data amount, and a space in the database table configured to store the data to be stored is partitioned based on the number of target regions. In other words, a data amount of data to be stored in a time period is predicted before storing the data, and then space partitioning is performed based on the predicted data amount, which avoids necessary data migration when space partitioning is performed in a fixed manner, thereby improving data storage performance.
-
FIG. 4 is a structural schematic diagram of a space partitioning apparatus for a database table according to an example embodiment. The apparatus may be practiced by hardware, software or a combination thereof, and may include: - a determining
module 410, configured to determine a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored; - a
calling module 420, configured to call a target network model, input the first data amount and the second data amount into the target network model, and output a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and - a
partitioning module 430, configured to determine a number of target regions based on the third data amount within the target time period, and partition, based on the number of target regions, a space in the database table configured to store the data to be stored. - Optionally, the first and second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
- Optionally, referring to
FIG. 5 , the apparatus further includes: - a
pre-partitioning module 440, configured to acquire a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; - wherein the determining
module 410 is configured to determine the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data. - Optionally, the
partitioning module 430 is configured to: - determine the number of target regions by formula:
-
- based on the third data amount within the target time period;
- wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
- Optionally, the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
- In the embodiments of the present disclosure, a first data amount within a first time period and a second data amount within a second time period of the database table are determined, the first time period and the second time period are prior to a target time period corresponding to data to be stored. The first data amount and the second data amount are input into a target network model, and a third data amount within the target time period is output. That is, the data amount within the target time period is predicted by the target network model. Thus, a number of target regions is determined based on the predicted third data amount, and a space in the database table configured to store the data to be stored is partitioned based on the number of target regions. In other words, a data amount of data to be stored in a time period is predicted before storing the data, and then space partitioning is performed based on the predicted data amount, which avoids necessary data migration when space partitioning is performed in a fixed manner, thereby improving data storage performance.
- It should be noted that the space partitioning apparatus for the database table according to this embodiment is illustrated by only taking division of all the functional modules as an example during space partitioning of a database table. In practice, the functions may be implemented by the different functional modules as required. That is, the apparatus is divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus according to this embodiment is based on the same inventive concept as the method according to the above embodiments. For details, reference may be made to the method embodiments, which is not described herein any further.
-
FIG. 6 is a schematic diagram of a structure of a computer device according to an example embodiment. - Specifically, the
computer device 600 includes a central processing unit (CPU) 601, asystem memory 604 including a random-access memory (RAM) 602 and a read-only memory (ROM) 603, and asystem bus 605 that connects thesystem memory 604 and thecentral processing unit 601. Thecomputer device 600 further includes a basic input/output system (I/O system) 606 that facilitates to transfer the information between respective units within the computer, and amass storage device 607 for storing anoperating system 613, anapplication 614, andother program modules 615. - The basic input/
output system 606 includes adisplay 608 for displaying the information and aninput device 609, such as a mouse or keyboard, for the user to input information. Thedisplay 608 and theinput device 609 are both connected to theCPU 601 via aninput output controller 610 that is connected to thesystem bus 605. The basic I/O system 606 may further include aninput output controller 610 for receiving and processing the input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input andoutput controller 610 further provides output to a display screen, a printer, or other types of output devices. - The
mass storage device 607 is connected to theCPU 601 by a mass storage controller (not shown) connected to thesystem bus 605. Themass storage device 607 and the related computer readable mediums provide non-volatile storage for thecomputer device 600. That is, themass storage device 607 may include a computer readable medium (not shown), such as, a hard disk or a compact disc read-only memory (CD-ROM) drive. - Without loss of generality, the computer-readable medium may include a computer storage medium and a communication medium. The computer storage medium includes volatile and nonvolatile, removable and non-removable mediums implemented by any method or technology for storing the information, such as, computer readable instructions, data structures, program modules or other data. The computer storage medium includes a RAM, a ROMs, an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a flash memory or other solid-state storage devices, a CD-ROM, a digital versatile disc (DVD) or other optical storage devices, a tape cartridge, a magnetic tape, a magnetic disk storage device or other magnetic storage devices. A person skilled in the art appreciates that the computer storage medium is not limited to the above ones. The
aforesaid system memory 604 andmass storage device 607 may be collectively referred to as a memory. - According to various embodiments of the present disclosure, the
computer device 600 may also be connected to a remote computer on a network over the network, such as the Internet, for operation. That is, thecomputer device 600 may be connected to thenetwork 612 through anetwork interface unit 611 connected to thesystem bus 605, or may be connected to other types of networks or remote computer systems (not shown) with thenetwork interface unit 611. - The memory further includes one or more programs stored in the memory and to be executed by the CPU. The one or more programs include the space partitioning method for the database table according to the embodiments of the present disclosure.
- Embodiments of the present disclosure further provide a non-transitory computer-readable storage medium. When being executed by a processor of a computer device, the space partitioning method for the database table according to the embodiments of the present disclosure can be performed by the computer device.
- Embodiments of the present disclosure further provide a computer program product, which, when running in a computer, causes the computer to execute the space partitioning method for the database table according to the embodiments of the present disclosure.
- It may be understood by an ordinary person skilled in the art that all or part of steps in the above embodiments may be performed by hardware or by a program instructing relevant hardware. The program may be stored in a computer-readable storage medium such as a ROM, a magnetic disk, an optical disc, or the like.
- Described above are only preferred embodiments of the present disclosure, which are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements and the like made within the spirit and principles of the present disclosure should be included within the scope of protection of the present disclosure.
Claims (20)
1. A space partitioning method for a database table, comprising:
determining a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and
determining a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
2. The method according to claim 1 , wherein the first and the second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
3. The method according to claim 1 , wherein before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further comprises:
acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
wherein determining the first data amount within the first time period and the second data amount within the second time period of the database table comprises:
determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
4. The method according to claim 1 , wherein determining the number of target regions based on the third data amount within the target time period comprises:
determining the number of target regions by formula:
based on the third data amount within the target time period;
wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
5. The method according to claim 1 , wherein the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period upon elapse of each of the plurality of time periods.
6. A space partitioning device for a database table, comprising:
a processor and a memory configured to store a computer program, wherein the processor, when running the computer program, is caused to perform a space partitioning method for a database table comprising:
determining a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target time period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and
determining a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
7. The device according to claim 6 , wherein the first and the second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
8. The apparatus device according to claim 6 , wherein before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further comprises:
acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
wherein determining the first data amount within the first time period and the second data amount within the second time period of the database table comprises:
determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
9. The device according to claim 6 , wherein determining the number of target regions based on the third data amount within the target time period comprises:
determining the number of target regions by formula:
based on the third data amount within the target time period;
wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
10. The device according to claim 6 , wherein the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period after each of the plurality of time periods.
11. A non-volatile computer-readable storage medium storing instructions therein, wherein the instructions, when executed by a processor, causes the processor to perform the method as defined in claim 1 .
12. A computer device comprising a processor and a memory configured to store a computer program, wherein the processor, when running the computer program, is caused to perform a space partitioning method for a database table, comprising:
determining a first data amount within a first time period and a second data amount within a second time period of the database table, wherein the first time period and the second time period are prior to a target time period corresponding to data to be stored;
calling a target network model, inputting the first data amount and the second data amount into the target network model, and outputting a third data amount within the target period, wherein the target network model is configured to predict a data amount of a next time period based on data amounts of a previous time period and a current time period; and
determining, a number of target regions based on the third data amount within the target time period, and partitioning, based on the number of target regions, a space in the database table configured to store the data to be stored.
13. The computer device according to claim 12 , wherein the first and the second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
14. The computer device according to claim 12 , wherein before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further comprises:
acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
wherein determining the first data amount within the first time period and the second data amount within the second time period of the database table comprises:
determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
15. The computer device according to claim 12 , wherein determining the number of target regions based on the third data amount within the target time period comprises:
determining the number of target regions by formula:
based on the third data amount within the target time period;
wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
16. The computer device according to claim 12 , wherein the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period upon elapse of each of the plurality of time periods.
17. The non-volatile computer-readable storage medium according to claim 11 , wherein the first and the second time periods are adjacent ones, and the first time period is a previous time period of the target time period.
18. The non-volatile computer-readable storage medium according to claim 11 , wherein before determining the first data amount within the first time period and the second data amount within the second time period of the database table, the method further comprises:
acquiring a plurality of groups of data by pre-partitioning, based on a time stamp of data in the database table, the data in the database table according to a predetermined period length; and
wherein determining the first data amount within the first time period and the second data amount within the second time period of the database table comprises:
determining the first data amount within the first time period and the second data amount within the second time period of the database table by counting a data amount in the first time period and counting a data amount in the second time period from the plurality of groups of data.
19. The non-volatile computer-readable storage medium according to claim 11 , wherein determining the number of target regions based on the third data amount within the target time period comprises:
determining the number of target regions by formula:
based on the third data amount within the target time period;
wherein k is the number of target regions, n is the third data amount within the target time period, m indicates a maximum storage capacity of a single area, and “┌ ┐” represents a rounding-up operation.
20. The non-volatile computer-readable storage medium according to claim 11 , wherein the target network model is obtained by training a neural network model based on a data amount within a plurality of time periods and a data amount of one time period upon elapse of each of the plurality of time periods.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811253560.2 | 2018-10-25 | ||
CN201811253560.2A CN111104569B (en) | 2018-10-25 | 2018-10-25 | Method, device and storage medium for partitioning database table |
PCT/CN2019/113310 WO2020083381A1 (en) | 2018-10-25 | 2019-10-25 | Database table area segmentation method and apparatus, device, and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220004564A1 US20220004564A1 (en) | 2022-01-06 |
US20230153326A9 true US20230153326A9 (en) | 2023-05-18 |
Family
ID=70330912
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/288,897 Pending US20230153326A9 (en) | 2018-10-25 | 2019-10-25 | Space partitioning method for database table, device and storage medium |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230153326A9 (en) |
EP (1) | EP3872654B1 (en) |
CN (1) | CN111104569B (en) |
WO (1) | WO2020083381A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11694031B2 (en) * | 2020-11-30 | 2023-07-04 | International Business Machines Corporation | Identifying routine communication content |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150254285A1 (en) * | 2012-10-04 | 2015-09-10 | Alcatel Lucent | Data logs management in a multi-client architecture |
CN105701027A (en) * | 2016-02-24 | 2016-06-22 | 中国联合网络通信集团有限公司 | Prediction method and device for data memory space |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7480662B2 (en) * | 2003-07-03 | 2009-01-20 | Oracle International Corporation | Fact table storage in a decision support system environment |
US8560639B2 (en) * | 2009-04-24 | 2013-10-15 | Microsoft Corporation | Dynamic placement of replica data |
US9104674B1 (en) * | 2010-04-14 | 2015-08-11 | Inmar, Inc. | System, method and computer program product for time sharing access control to data |
US20140181085A1 (en) * | 2012-12-21 | 2014-06-26 | Commvault Systems, Inc. | Data storage system for analysis of data across heterogeneous information management systems |
CN103345508B (en) * | 2013-07-04 | 2016-09-21 | 北京大学 | A kind of date storage method being applicable to community network figure and system |
CN104408189B (en) * | 2014-12-15 | 2018-11-09 | 北京国双科技有限公司 | The methods of exhibiting and device of keyword ranking |
GB2547712A (en) * | 2016-02-29 | 2017-08-30 | Fujitsu Ltd | Method and apparatus for generating time series data sets for predictive analysis |
US10380188B2 (en) * | 2016-08-05 | 2019-08-13 | International Business Machines Corporation | Distributed graph databases that facilitate streaming data insertion and queries by reducing number of messages required to add a new edge by employing asynchronous communication |
CN107730087A (en) * | 2017-09-20 | 2018-02-23 | 平安科技(深圳)有限公司 | Forecast model training method, data monitoring method, device, equipment and medium |
-
2018
- 2018-10-25 CN CN201811253560.2A patent/CN111104569B/en active Active
-
2019
- 2019-10-25 WO PCT/CN2019/113310 patent/WO2020083381A1/en unknown
- 2019-10-25 EP EP19875393.1A patent/EP3872654B1/en active Active
- 2019-10-25 US US17/288,897 patent/US20230153326A9/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150254285A1 (en) * | 2012-10-04 | 2015-09-10 | Alcatel Lucent | Data logs management in a multi-client architecture |
CN105701027A (en) * | 2016-02-24 | 2016-06-22 | 中国联合网络通信集团有限公司 | Prediction method and device for data memory space |
Also Published As
Publication number | Publication date |
---|---|
EP3872654A1 (en) | 2021-09-01 |
WO2020083381A1 (en) | 2020-04-30 |
EP3872654A4 (en) | 2022-01-05 |
CN111104569B (en) | 2023-10-20 |
EP3872654B1 (en) | 2023-10-18 |
US20220004564A1 (en) | 2022-01-06 |
CN111104569A (en) | 2020-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10614360B2 (en) | Automatically scaling neural networks based on load | |
CN108595157B (en) | Block chain data processing method, device, equipment and storage medium | |
US10460241B2 (en) | Server and cloud computing resource optimization method thereof for cloud big data computing architecture | |
WO2019237523A1 (en) | Safety risk evaluation method and apparatus, computer device, and storage medium | |
US10552292B2 (en) | System, method and computer product for management of proof-of-concept software pilots, including neural network-based KPI prediction | |
KR101959153B1 (en) | System for efficient processing of transaction requests related to an account in a database | |
US11093354B2 (en) | Cognitively triggering recovery actions during a component disruption in a production environment | |
US10742519B2 (en) | Predicting attribute values for user segmentation by determining suggestive attribute values | |
US20190095495A1 (en) | Computerized methods and systems for grouping data using data streams | |
US11109085B2 (en) | Utilizing one hash permutation and populated-value-slot-based densification for generating audience segment trait recommendations | |
US9633081B1 (en) | Systems and methods for determining application installation likelihood based on user network characteristics | |
CN114968652A (en) | Fault processing method and computing device | |
CN113918884A (en) | Traffic prediction model construction method and traffic prediction method | |
CN111767340A (en) | Data processing method, device, electronic equipment and medium | |
US20200364211A1 (en) | Predictive database index modification | |
WO2019218456A1 (en) | Job manpower forecasting method and apparatus, and computer device and storage medium | |
US10313457B2 (en) | Collaborative filtering in directed graph | |
US20230153326A9 (en) | Space partitioning method for database table, device and storage medium | |
CN108550019B (en) | Resume screening method and device | |
WO2023015615A1 (en) | Multi-channel fund transfer method and apparatus based on prediction model, and device and medium | |
CN110166498A (en) | Class of subscriber determines method and device, computer equipment and storage medium | |
US11403302B1 (en) | Quantile data sketches based on frequent data items | |
CN113761193A (en) | Log classification method and device, computer equipment and storage medium | |
CN115759742A (en) | Enterprise risk assessment method and device, computer equipment and storage medium | |
CN111598390B (en) | Method, device, equipment and readable storage medium for evaluating high availability of server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |