CN110019187A - A kind of data distributing method, device and equipment - Google Patents

A kind of data distributing method, device and equipment Download PDF

Info

Publication number
CN110019187A
CN110019187A CN201710816443.1A CN201710816443A CN110019187A CN 110019187 A CN110019187 A CN 110019187A CN 201710816443 A CN201710816443 A CN 201710816443A CN 110019187 A CN110019187 A CN 110019187A
Authority
CN
China
Prior art keywords
data
simulation
practical
partitioned
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710816443.1A
Other languages
Chinese (zh)
Other versions
CN110019187B (en
Inventor
雷尚顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710816443.1A priority Critical patent/CN110019187B/en
Publication of CN110019187A publication Critical patent/CN110019187A/en
Application granted granted Critical
Publication of CN110019187B publication Critical patent/CN110019187B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application discloses a kind of data distributing method, device and equipment, the described method includes: being divided according to the simulation number of partitions of setting to partition data, to obtain multiple simulation partitioned data sets, wherein, the quantity of the simulation partitioned data set is greater than practical number of partitions;It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, wherein each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;The data division result for meeting each simulation partitioned data set of condition according to the data volume, is divided to described in practical subregion to partition data.This method realizes to divide equally to partition data and be divided in practical subregion, avoid because the data volume in some practical subregion is too many increase overall data inquiry or other operate when time-consuming, improve whole response speed and operating efficiency.

Description

A kind of data distributing method, device and equipment
Technical field
This application involves field of computer technology more particularly to a kind of data distributing methods, device and equipment.
Background technique
Distributed database is generallyd use in big data processing, carrying out subregion to data is common primary demand.That is, Data can be dispersed storage and deposited to multiple subregions, such as by data dispersion by distributed database when storing data Store up multiple databases, tables of data or data file etc..
In the prior art, for the needs of practical business, the attribute for being normally based on data carries out data distribution, that is, Data with same alike result are assigned on the same subregion, thus when carrying out data query according to the attribute, in the subregion Inquiry, it is not necessary to transregional inquiry.
But, this method also has shortcoming, and distribution data are possible to the data volume for unevenly causing each subregion to be assigned to There are larger difference, other significantly extra subregions of the data volume that certain subregions obtain, and then lead to the subregion more in data volume The middle time-consuming for carrying out data query, is higher than other subregions, response speed and operating efficiency when influencing data query.
Summary of the invention
The embodiment of the present application provides a kind of data distributing method, device and equipment, for solving present in current techniques The non-uniform problem of data subregion.
The embodiment of the present application provides a kind of data distributing method, it has been determined that practical subregion needed for partition data, comprising:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein The quantity of the simulation partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, Wherein, each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, it will be described to the number of partitions According to being divided in practical subregion.
Based on same thinking, the embodiment of the present application provides a kind of data distribution device, comprising:
First division module is divided for the simulation number of partitions according to setting to partition data, to obtain multiple simulations Partitioned data set, wherein the quantity of the simulation partitioned data set is greater than practical number of partitions;
Module is obtained, for being concentrated in the multiple simulation partition data, data volume is obtained and meets each mould to impose a condition Quasi- partitioned data set, wherein each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Second division module divides knot for meeting the data of each simulation partitioned data set of condition according to the data volume Fruit is divided to described in practical subregion to partition data.
Corresponding, the embodiment of the present application also provides a kind of equipment of data distribution, and the equipment includes:
Memory, storing data partition program;
Processor receives after partition data, calls the data partition program in memory, and execute:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein The quantity of the simulation partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, Wherein, each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, it will be described to the number of partitions According to being divided in practical subregion.
The embodiment of the present application also provides a kind of nonvolatile computer storage medias, are stored with the executable finger of computer It enables, the computer executable instructions setting are as follows:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein The quantity of the simulation partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, Wherein, each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, it will be described to the number of partitions According to being divided in practical subregion.
Compared to existing technologies, the embodiment of the present application is by treating the digital simulation subregion of subregion, in simulation subregion It, will be to according to the data division result of the data set when data volume of obtained data set meets the condition being previously set Partition data, which is divided equally, to be divided in practical subregion, is avoided and is increased overall data because the data volume in some practical subregion is too many Time-consuming when inquiry or other operations, improves whole response speed and operating efficiency.
Detailed description of the invention
Fig. 1 is system architecture schematic diagram provided by the embodiments of the present application;
Fig. 2 is method flow schematic diagram provided by the embodiments of the present application;
Fig. 3 is Part Methods flow diagram provided by the embodiments of the present application;
Fig. 4 is Part Methods flow diagram provided by the embodiments of the present application;
Fig. 5 a is Part Methods flow diagram provided by the embodiments of the present application;
Fig. 5 b to Fig. 5 d is the intuitive schematic diagram of number mapping provided by the embodiments of the present application;
Fig. 6 is apparatus structure schematic diagram provided by the embodiments of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
Based on foregoing teachings, what needs to be explained here is that, the practical subregion is certain entities of physical presence, including more Perhaps data file etc. and the simulation subregion can be using corresponding model or algorithm for a database, tables of data Simulate obtained, that is, simulation subregion and non-actual existence.
The data that can be to partition data in also unallocated to practical subregion;It is also possible in practical subregion Have existed but be unevenly distributed and the data that need to repartition.
In the embodiment of the present application, the data distributing method can be used framework as shown in Figure 1, in the framework Database may include Greenplum Database, RDB database, UDB database, RDS database, SQL, MySQL, Teradata database and other with MPP (Massively Parallel Processing, massively parallel system) framework Based on database product.
Below by based on framework as shown in Figure 1, data allocation process provided by the embodiments of the present application is described in detail, the mistake Journey specifically includes following steps, as shown in Figure 2:
Step S201 is divided according to the simulation number of partitions of setting to partition data, to obtain multiple simulation partition datas Collection, wherein the quantity of the simulation partitioned data set is greater than practical number of partitions.
The simulation number of partitions can be calculated according to corresponding partitioning model or partitioning algorithm it is obtained, It can be what operator was directly arranged.
Just it has been observed that in the embodiment of the present application, the data to subregion may be the number in also unallocated to practical subregion According to;It is also likely to be the data for having existed but being unevenly distributed in practical subregion and need to repartition.So at this time Partition data is treated according to the quantity of simulation subregion using corresponding partitioning model or partitioning algorithm and carries out simulation division.
As feasible mode a kind of in the embodiment of the present application, setting to simulation number of partitions be can be dynamic.Tool For body, in the simulation partition process, simulation subregion can be continuously increased since the quantity of practical subregion adds 1 Quantity changes the quantity of simulation subregion every time, all obtains and the different multiple simulation numbers of partitions of data distribution in practical subregion According to collection, until the data volume of some of them simulation partitioned data set meets pre-set condition.
It should be noted that not needing to divide to partition data during division above-mentioned is to partition data To each data set, it is only necessary to know the data volume in each data set.Certainly, if actual conditions need, can also will to point Area's data are divided to each data set.
Step S203 is concentrated in the multiple simulation partition data, is obtained data volume and is met each simulation point to impose a condition Area's data set, wherein the data volume meets each simulation partitioned data set quantity to impose a condition and is equal to practical number of partitions.
In embodiments herein, the setting condition can include: the difference of the data volume in each data set is no more than A certain range.
For example, obtaining the data volume of each simulation partitioned data set, multiple data sets identical with practical number of partitions are chosen, It wherein imposes a condition and is no more than certain numerical value for the difference of maximum amount of data and minimum data amount.For example, practical number of partitions is 5 It is a, 1500 groups of data are divided to 6 simulation partition datas thus and are concentrated, are imposed a condition as wherein certain 5 data set maximum number Difference according to amount and minimum data amount is no more than 60.To if analog result data volume is respectively 100,200,200,300,350, 350, it is known that the analog result does not meet preset condition;If analog result data volume is respectively 100,260,270,280,290, 300, it is known that for data volume is 260,270,280,290,300 five data sets, met preset condition.
It is necessary to when, impose a condition can also include the data volume for meeting condition data collection with to the number of partitions Ratio according to data volume is more than certain numerical value, or meeting the minimum data amount of the data set of condition is more than certain numerical value, for example, In the foregoing embodiments, setting condition can also include: that the data count amount of five data sets is no less than total to partition data The 80% of quantity, under this setting condition, data volume above-mentioned is 260,270,280,290,300 this five data lump numbers Setting condition is still met according to amount.
In the aforementioned feasible pattern referred to, number of partitions can be simulated by changing, obtain different data division results, Up to the data volume of data set in some division result, satisfaction presets condition.For continuation of the previous cases, when analog result data When amount is respectively 100,200,200,300,350,350, it is known that the analog result does not meet preset condition, then can change at this time Simulating number of partitions is 7, if analog result data volume is respectively 100,150,230,240,250,260,270 at this time, mould When quasi- number of partitions is 7,5 simulation partitioned data sets of data volume 230,240,250,260,270 meet setting item Part.
Step S205 meets the data division result of each simulation partitioned data set of condition according to the data volume, by institute It states and is divided in practical subregion to partition data.
Continuation of the previous cases, i.e., in practical subregion also according to data volume be 230,240,250,260,270 corresponding to data The data of concentration are divided to partition data.Obviously, in practical subregion last data volume distribution be also 230,240,250, 260,270, realize that the data volume in each practical subregion is essentially identical.
By simulation subregion process above-mentioned, multiple simulation partition datas identical with practical number of partitions can be obtained Collection, and the data volume that they are included is essentially identical.Thus in practical subregion, also according to the simulation for meeting condition The data division result that partition data is concentrated divides to partition data, obtains the essentially identical data division result of data volume.
In method above-mentioned, the embodiment of the present application carries out simulation subregion by treating partition data, in simulation subregion institute When the data volume of obtained data set meets the condition being previously set, according to the data division result of the data set, in reality It is divided in subregion according to the division result, realization will divide equally to partition data to be divided in practical subregion, is avoided Time-consuming when the data volume in some practical subregion is too many increases overall data inquiry or other operations, improves entirety Response speed and operating efficiency.
The simulation number of partitions according to setting as an implementable solution of the embodiment of the present application, in the step S201 Amount is divided to partition data, specifically can include:
According to the simulation number of partitions and to the attribute of partition data, by drawing to partition data with same alike result value Divide to identical simulation partitioned data set.
For example, a transaction data, the attribute generally comprised have both parties id, date, transaction amount etc., if choosing Transaction data on the same day as zone attribute, is then finally divided to the same subregion by the date, if choosing transaction amount conduct The identical transaction data of transaction amount is then finally divided to the same subregion by zone attribute.
More specifically, the process divided according to attribute can be as shown in figure 3, include the following steps:
Step S301 chooses the attribute for being included to partition data.
Usually if when a tables of data is divided to the subregion, when choosing attribute there is no limit.But When multiple relevant data are divided, the attribute that they are jointly comprised should be usually chosen, is divided, so that The same simulation partitioned data set can be corresponded to by obtaining the data by subsequent partiting step, with same alike result value.
Step S303 determines the attribute value uniquely corresponding characterization value for the corresponding attribute value of the attribute.
I.e. for the attribute value of the attribute in data, certain algorithm or mapping (such as hash algorithm) are carried out, it will be described Attribute value is uniquely corresponded to a characterization value, so that in the same data or different data, it is having the same described The data of attribute value all correspond to the characterization value.According to the difference of algorithm, the form of the characterization value is different, specifically, described Characterization value may include cryptographic Hash, random number or digest value etc..
Step S305 determines simulation corresponding to the characterization value according to the characterization value and the simulation number of partitions Partitioned data set.
It is using the uniqueness of characterization value, characterization value is corresponding to unique simulation number of partitions i.e. according to simulation number of partitions According to collection, realize that the data with identical characterization value correspond to identical simulation partitioned data set, to make in subsequent step with this For foundation, dividing has the data of same alike result value to practical subregion, thus between multiple and different data according to the attribute into When row Connection inquiring, it is not necessary to transregional inquiry.
As concrete implementation mode a kind of in the embodiment of the present application, for the step S305, according to the characterization value With the simulation number of partitions, simulation partitioned data set corresponding to the characterization value is determined, it can be real using following method It is existing, as shown in Figure 4:
Step S401 numbers the simulation partitioned data set, and the simulation partitioned data set number is to divide from 0 to simulation The integer that area's quantity subtracts 1.It is of course also possible to use number of other modes, such as the serial number of English alphabet etc., then pass through Certain assignment algorithm, the number of the other modes is corresponding to the integer for subtracting 1 to simulation number of partitions from 0.
Step S403 carries out modulo operation according to the characterization value and the simulation number of partitions.That is, obtaining the table Value indicative divided by the simulation number of partitions remainder, it is clear that the remainder result of acquisition by for 0 to simulating between number of partitions subtracts 1 One integer.
The result of modulo operation is determined as the corresponding simulation partitioned data set of the characterization value and numbered by step S405.
In the above-described embodiment, it realizes and determines the unique corresponding simulation partitioned data set of the characterization value institute.At this During a, modulo operation is only a kind of citing for realizing algorithm.On this basis, it is determined only by algorithm is changed described Simulation partitioned data set corresponding to characterization value, should be within the scope of the application be protected.
As one enforceable scheme of the embodiment of the present application, for the step S205, in practical subregion, according to institute The data division result for stating each simulation partitioned data set that data volume meets condition is divided to partition data, is included the following steps:
I establishes the corresponding relationship of each simulation partitioned data set for meeting condition and the practical by stages;
II, according to the corresponding relationship, the data that each simulation partition data for meeting condition is concentrated are divided to each Self-corresponding practical subregion.
The corresponding relationship for establishing each simulation partitioned data set for meeting condition and the practical by stages, tool Body is said, including following implementation, as shown in Figure 5 a:
Step S502 numbers in order to each simulation partitioned data set and each practical subregion.For example, now determining practical subregion Number is 5, and the simulation number of partitions is 7, then numbers in order to simulation partitioned data set from 0 to 6, press to practical subregion from 0 to 4 Serial number.
Step S504, when there are multiple data volumes to meet the simulation partitioned data set number consecutive hours to impose a condition, determination The data volume meets the offset of the simulation partitioned data set number to impose a condition.
In actual conditions, there is the simulation partitioned data set for meeting and imposing a condition and number discontinuous and continuous two kinds of situations.
For numbering discontinuous situation, for example, in numbering example above-mentioned, 5 simulations of number 0,1,3,4,5 Partitioned data set, which meets, to impose a condition, and chooses suitable mapping algorithm, by it is described number the number 0 for corresponding to practical subregion, 1, 2,3,4, it in this mapping process, can successively map, can not also be carried out according to number order according to number order It maps one by one, as shown in Figure 5 b.
And in the case of continuous, determine that the simulation partitioned data set for meeting and imposing a condition is numbered relative to actual number Offset.For example, impose a condition in number above-mentioned if the simulation partitioned data set that number is 1,2,3,4,5 meets, this When, the practical subregion for being 0,1,2,3,4 relative to number determines that the offset is 1.
The data volume is met the simulation partitioned data set to impose a condition and numbered by step S506 according to the offset It is offset to identical as practical partition number.
That is, 1,2,3,4,5 entirety of number is moved to left a unit, obtains the data volume and meet the simulation point to impose a condition Number after area's data set offset is 0,1,2,3,4, as shown in Figure 5 c.
Step S508, establishes the practical subregion with identical number and the data volume meets the simulation subregion to impose a condition One-to-one relationship, as fig 5d.
It is readily appreciated that, in the foregoing embodiments, is numbered with continuous number from 0, is only a kind of implementation method Citing.During actual treatment, numbers and be not required the beginning of weight zero, and, number also needs not be continuous number.Only need There is determining sequence between number, and the offset between each number is confirmable, so that it may realize according to offset Determine that practical subregion and the data volume meet the one-to-one relationship of the simulation partitioned data set to impose a condition.For example, using Suite number of the sequence of English alphabet from B to G deviates a unit to the left, obtains the suite number from A to F.It is basic herein On only by method for numbering serial is changed realize the scheme of the one-to-one relationship, should be in the range that the application is protected Within.
In the above scheme, when data volume, which meets the simulation partitioned data set to impose a condition, numbers continuous, it is only necessary to Primary number offset can realize the mapping of simulation partitioned data set to practical subregion, more convenient in practical applications.
In addition, an optinal plan as the embodiment of the present application, according to by each simulation subregion for meeting condition Data in data set are divided to after corresponding practical subregion, further includes: according to the quantity of practical subregion and it is described to The data that data volume is unsatisfactory for each simulation partitioned data set of condition are divided to practical subregion by the attribute of partition data, so that Must have same alike result value is divided to the same practical subregion to partition data.
Specifically, including the following steps:
The attribute that the data for each simulation partitioned data set that the data volume is unsatisfactory for condition are included is chosen, needs to illustrate , identified attribute when division, can also choose new attribute again before attribute herein can be.For example, at it It is divided when preceding division according to the date, can choose the date at this time, also can choose the amount of money and divided;
For the attribute value of the attribute, its unique corresponding characterization value is determined, wherein the characterization value, which includes at least, to be breathed out Uncommon value, random number or digest value;
According to the characterization value and practical number of partitions, practical subregion corresponding to the characterization value is determined;
The data that the data volume is unsatisfactory for each simulation partitioned data set of condition are divided to corresponding to the characterization value Practical subregion.
As for the implementation for further, determining practical subregion corresponding to the characterization value, in method above-mentioned In be described in detail, which is not described herein again.
The data that data volume is unsatisfactory for each simulation partitioned data set of condition are usually minority, in practical application scene, Following scheme can also be taken: according to the quantity of practical subregion, data volume being unsatisfactory for each simulation partitioned data set of condition Data (for convenience of description, referred to as remaining data) are evenly dividing to practical subregion.Described being evenly dividing may include stricti jurise On average division, but also may include it is approximate average divide, i.e., it is generally uniform.For example, if practical subregion has 4, When division, remaining data is evenly dividing into 4 parts according to practical number of partitions, and draw to practical subregion;Alternatively, calculating remainder According to quantity to the par (i.e. par=remaining data quantity/4) of the practical number of partitions, and using par as Benchmark allows to be divided to the remaining data in different practical subregions, changes within the scope of the measures of dispersion of setting.
In addition, it is worth noting that, in the embodiment of the present application, when being changed to simulation number of partitions, if root Factually border number of partitions reduces number of partitions down to be simulated, and also may be implemented in practical subregion, evenly distributes wait divide Area's data.The division result only finally obtained is no longer just the quantity based on practical subregion, but than the practical number of partitions It measures and realizes being uniformly distributed for data in less subregion, although also achieving the purpose of the application in this way, certain will be will cause A little practical subregions are vacant, do not meet expection in practice.
Based on same thinking, the present invention also provides a kind of data distribution devices, as shown in Figure 6, comprising:
A kind of data distribution device characterized by comprising
First division module 601 is divided for the simulation number of partitions according to setting to partition data, multiple to obtain Simulate partitioned data set, wherein the quantity of the simulation partitioned data set is greater than practical number of partitions;
Module 602 is obtained, for concentrating in the multiple simulation partition data, acquisition data volume satisfaction imposes a condition each Simulate partitioned data set, wherein each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Second division module 603 is drawn for meeting the data of each simulation partitioned data set of condition according to the data volume Divide as a result, being divided to described in practical subregion to partition data.
Further, first division module 601, for according to the simulation number of partitions and to the category of partition data Property, identical simulation partitioned data set will be divided to partition data with same alike result value.
Further, first division module 601, for choosing the attribute for being included to partition data, for described The corresponding attribute value of attribute determines the attribute value uniquely corresponding characterization value, according to the characterization value and the simulation subregion Quantity determines simulation partitioned data set corresponding to the characterization value, wherein the characterization value includes at least cryptographic Hash, random Numerical value or digest value.
Further, first division module 601, for being numbered to the simulation partitioned data set, according to the table Value indicative and the simulation number of partitions carry out modulo operation, the result of modulo operation are determined as the corresponding mould of the characterization value Quasi- partitioned data set number, wherein the simulation partitioned data set number is the integer for subtracting 1 to simulation number of partitions from 0.
Further, first division module 601, for establish each simulation partitioned data set for meeting condition and The corresponding relationship of the practical by stages concentrates each simulation partition data for meeting condition according to the corresponding relationship Data, be divided to corresponding practical subregion.
Further, second division module 603, for each simulation partitioned data set and each practical subregion in order Number determines that the data volume satisfaction is set when the simulation partitioned data set number consecutive hours that multiple data volumes satisfaction imposes a condition The data volume is met setting condition according to the offset by the offset of the simulation partitioned data set number of fixed condition Simulation partitioned data set number is offset to, foundation practical subregion and the number with identical number identical as practical partition number Meet the one-to-one relationship of the simulation subregion to impose a condition according to amount.
Further, described device further includes third division module 604, for according to the quantity of practical subregion and it is described to The data that data volume is unsatisfactory for each simulation partitioned data set of condition are divided to practical subregion by the attribute of partition data, so that Must have same alike result value is divided to the same practical subregion to partition data.
Further, the third division module 604 chooses each simulation partition data that the data volume is unsatisfactory for condition The attribute that the data of collection are included determines the attribute value uniquely corresponding characterization value for the corresponding attribute value of the attribute, According to the characterization value and the practical number of partitions, practical partitioned data set corresponding to the characterization value is determined, it will be described The data that data volume is unsatisfactory for each simulation partitioned data set of condition are divided to practical subregion corresponding to the characterization value, wherein The characterization value includes at least cryptographic Hash, random number or digest value.
Further, data volume is unsatisfactory for condition according to the quantity of practical subregion by the third division module 604 The data of each simulation partitioned data set are evenly dividing to practical subregion.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement or step recorded in detail in the claims or module can be according to different from embodiments Sequence executes and still may be implemented desired result.In addition, process depicted in the drawing is not necessarily required and is shown Particular order or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing It is also possible or may be advantageous.
Corresponding, the embodiment of the present application also provides a kind of data ditribution facilities, and the equipment includes server, the service Device includes:
Memory, storing data partition program;
Processor receives after partition data, calls the data partition program in memory, and execute:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein The quantity of the simulation partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, Wherein, each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, it will be described to the number of partitions According to being divided in practical subregion.
Based on same invention thinking, the embodiment of the present application also provides a kind of corresponding non-volatile computer storage Jie Matter is stored with computer executable instructions, the computer executable instructions setting are as follows:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein The quantity of the simulation partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, Wherein, each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, it will be described to the number of partitions According to being divided in practical subregion.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For equipment and medium class embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, related place Illustrate referring to the part of embodiment of the method, just no longer repeats one by one here.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), the data letter number and carrier wave of such as modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routine, programs, objects, the group for executing particular transaction or realizing particular abstract data type Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Affairs are executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included among the interest field of the application.

Claims (11)

1. a kind of data distributing method, which is characterized in that have determined that practical subregion needed for partition data, the method packet It includes:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein described The quantity for simulating partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, wherein Each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, described will be drawn to partition data Divide into practical subregion.
2. data distributing method as described in claim 1, which is characterized in that according to the simulation number of partitions of setting, divide to Partition data, comprising:
According to the simulation number of partitions and to the attribute of partition data, by being divided to partition data with same alike result value Identical simulation partitioned data set.
3. data distributing method as claimed in claim 2, which is characterized in that according to the simulation number of partitions and to the number of partitions According to attribute, identical simulation partitioned data set will be divided to partition data with same alike result value, comprising:
Choose the attribute for being included to partition data;
For the corresponding attribute value of the attribute, the attribute value uniquely corresponding characterization value is determined;
According to the characterization value and the simulation number of partitions, simulation partitioned data set corresponding to the characterization value is determined;
Wherein, the characterization value includes at least cryptographic Hash, random number or digest value.
4. data distributing method as claimed in claim 3, which is characterized in that according to the characterization value and the simulation number of partitions Amount, determines simulation partitioned data set corresponding to the characterization value, comprising:
The simulation partitioned data set is numbered;
According to the characterization value and the simulation number of partitions, modulo operation is carried out;
The result of modulo operation is determined as the corresponding simulation partitioned data set number of the characterization value.
5. data distributing method as claimed in claim 2, which is characterized in that meet each simulation of condition according to the data volume The data division result of partitioned data set is divided to described in practical subregion to partition data, comprising:
Establish the corresponding relationship of each simulation partitioned data set for meeting condition and the practical by stages;
According to the corresponding relationship, the data that each simulation partition data for meeting condition is concentrated are divided to respective correspondence Practical subregion.
6. data distributing method as claimed in claim 5, which is characterized in that establish each simulation number of partitions for meeting condition According to the corresponding relationship of collection and the practical by stages, comprising:
It numbers in order to each simulation partitioned data set and each practical subregion;
When the simulation partitioned data set number consecutive hours that multiple data volumes satisfaction imposes a condition, determine that the data volume meets setting The offset of the simulation partitioned data set number of condition;
According to the offset, the data volume is met to the simulation partitioned data set number to impose a condition and is offset to and practical point Area's number is identical;
It establishes the practical subregion with identical number and the data volume meets the one-to-one correspondence for simulating subregion to impose a condition and closes System.
7. data distributing method as claimed in claim 5, which is characterized in that according to the corresponding relationship, by the satisfaction The data that each simulation partition data of condition is concentrated, are divided to after corresponding practical subregion, further includes:
According to the quantity of practical subregion and the attribute to partition data, data volume is unsatisfactory for each simulation number of partitions of condition It is divided to practical subregion according to the data of collection, so that being divided to the same reality point to partition data with same alike result value Area.
8. data distributing method as claimed in claim 7, which is characterized in that according to the quantity of practical subregion and described to subregion The data that data volume is unsatisfactory for each simulation partitioned data set of condition are divided to practical subregion by the attribute of data, comprising:
Choose the attribute that the data for each simulation partitioned data set that the data volume is unsatisfactory for condition are included
For the corresponding attribute value of the attribute, the attribute value uniquely corresponding characterization value is determined;
According to the characterization value and the practical number of partitions, practical partitioned data set corresponding to the characterization value is determined;
The data that the data volume is unsatisfactory for each simulation partitioned data set of condition are divided to practical corresponding to the characterization value Subregion;
Wherein, the characterization value includes at least cryptographic Hash, random number or digest value.
9. data distributing method as described in claim 1, which is characterized in that according to the corresponding relationship, by the satisfaction The data that each simulation partition data of condition is concentrated, are divided to after corresponding practical subregion, further includes:
According to the quantity of practical subregion, the data that data volume is unsatisfactory for each simulation partitioned data set of condition are evenly dividing to reality Border subregion.
10. a kind of data distribution device characterized by comprising
First division module is divided for the simulation number of partitions according to setting to partition data, to obtain multiple simulation subregions Data set, wherein the quantity of the simulation partitioned data set is greater than practical number of partitions;
Module is obtained, for being concentrated in the multiple simulation partition data, data volume is obtained and meets each simulation point to impose a condition Area's data set, wherein each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Second division module, the data division result of each simulation partitioned data set for meeting condition according to the data volume, It is divided to described in practical subregion to partition data.
11. a kind of data ditribution facilities, the equipment include:
Memory, storing data partition program;
Processor receives after partition data, calls the data partition program in memory, and execute:
According to the simulation number of partitions of setting, divide to partition data, to obtain multiple simulation partitioned data sets, wherein described The quantity for simulating partitioned data set is greater than practical number of partitions;
It is concentrated in the multiple simulation partition data, obtains data volume and meet each simulation partitioned data set to impose a condition, wherein Each simulation partitioned data set quantity that the data volume meets condition is equal to practical number of partitions;
Meet the data division result of each simulation partitioned data set of condition according to the data volume, described will be drawn to partition data Divide into practical subregion.
CN201710816443.1A 2017-09-12 2017-09-12 Data distribution method, device and equipment Active CN110019187B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710816443.1A CN110019187B (en) 2017-09-12 2017-09-12 Data distribution method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710816443.1A CN110019187B (en) 2017-09-12 2017-09-12 Data distribution method, device and equipment

Publications (2)

Publication Number Publication Date
CN110019187A true CN110019187A (en) 2019-07-16
CN110019187B CN110019187B (en) 2023-05-12

Family

ID=67186258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710816443.1A Active CN110019187B (en) 2017-09-12 2017-09-12 Data distribution method, device and equipment

Country Status (1)

Country Link
CN (1) CN110019187B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221477A (en) * 2020-01-10 2020-06-02 烽火云科技有限公司 OSD (on screen display) disk allocation method and system
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877141A (en) * 2009-11-18 2010-11-03 南京师范大学 Three-dimensional intersection detection algorithm based on space scanning strategy
CN103544258A (en) * 2013-10-16 2014-01-29 国家计算机网络与信息安全管理中心 Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
CN103714098A (en) * 2012-09-29 2014-04-09 伊姆西公司 Method and system used for sectioning data base
CN106156159A (en) * 2015-04-16 2016-11-23 阿里巴巴集团控股有限公司 A kind of table connection processing method, device and cloud computing system
CN107025137A (en) * 2016-11-24 2017-08-08 阿里巴巴集团控股有限公司 A kind of resource query method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877141A (en) * 2009-11-18 2010-11-03 南京师范大学 Three-dimensional intersection detection algorithm based on space scanning strategy
CN103714098A (en) * 2012-09-29 2014-04-09 伊姆西公司 Method and system used for sectioning data base
CN103544258A (en) * 2013-10-16 2014-01-29 国家计算机网络与信息安全管理中心 Cardinal number estimating method and cardinal number estimating device under multi-section query condition of big data
CN106156159A (en) * 2015-04-16 2016-11-23 阿里巴巴集团控股有限公司 A kind of table connection processing method, device and cloud computing system
CN107025137A (en) * 2016-11-24 2017-08-08 阿里巴巴集团控股有限公司 A kind of resource query method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高宇飞: "MapReduce计算模型下数据倾斜处理方法的研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221477A (en) * 2020-01-10 2020-06-02 烽火云科技有限公司 OSD (on screen display) disk allocation method and system
CN111221477B (en) * 2020-01-10 2023-08-22 烽火云科技有限公司 OSD (on Screen display) disc distribution method and system
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN112905596B (en) * 2021-03-05 2024-02-02 北京中经惠众科技有限公司 Data processing method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110019187B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN108305158A (en) A kind of method, apparatus and equipment of trained air control model and air control
CN107391527A (en) A kind of data processing method and equipment based on block chain
CN107402824A (en) A kind of method and device of data processing
CN104111936B (en) Data query method and system
CN107577697B (en) Data processing method, device and equipment
CN108681484A (en) A kind of distribution method of task, device and equipment
US11074246B2 (en) Cluster-based random walk processing
CN109104327A (en) A kind of business diary generation method, device and equipment
CN110019187A (en) A kind of data distributing method, device and equipment
CN107451204B (en) Data query method, device and equipment
CN110069523A (en) A kind of data query method, apparatus and inquiry system
CN110020004A (en) A kind of method for computing data and engine
CN105589853B (en) A kind of classification catalogue determines method and device, automatic classification method and device
CN109656946A (en) A kind of multilist relation query method, device and equipment
CN110276637B (en) Resource allocation method and device, and coupon allocation method and device
CN109886804B (en) Task processing method and device
TWI721422B (en) Cross-border transaction declaration method and device
CN110008382B (en) Method, system and equipment for determining TopN data
CN110083602A (en) A kind of method and device of data storage and data processing based on hive table
CN109582388A (en) One parameter configuration method, device and equipment
CN103064862B (en) A kind of multi objective sorting data disposal route and equipment
CN108829790A (en) A kind of data batch processing method, apparatus and system
CN110032565A (en) A kind of method, system and electronic equipment generating statistical information
CN108681554A (en) A kind of matching process, device and equipment using regular expression
CN110032563B (en) Processing method and system of mismatch value and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40010846

Country of ref document: HK

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211009

Address after: Room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou, Zhejiang

Applicant after: Alibaba (China) Co.,Ltd.

Address before: P.O. Box 847, 4th floor, Grand Cayman capital building, British Cayman Islands

Applicant before: ALIBABA GROUP HOLDING Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211129

Address after: 310000 No. 12, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province

Applicant after: Aliyun Computing Co.,Ltd.

Address before: 310056 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou, Zhejiang

Applicant before: Alibaba (China) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant