CN110502505A - A kind of data migration method and device - Google Patents
A kind of data migration method and device Download PDFInfo
- Publication number
- CN110502505A CN110502505A CN201910806491.1A CN201910806491A CN110502505A CN 110502505 A CN110502505 A CN 110502505A CN 201910806491 A CN201910806491 A CN 201910806491A CN 110502505 A CN110502505 A CN 110502505A
- Authority
- CN
- China
- Prior art keywords
- data
- subregion
- migration
- gradient
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2255—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data migration method and devices, the subregion column of migration table are set first, and the degree of parallelism of the data migration task of migration table is set, default hashing algorithm is recycled to carry out hash subregion to the data in migration table, obtain the data volume of each subregion, data volume according to each subregion again, calculate partition data gradient, when partition data gradient is greater than gradient threshold value, re-start hash subregion, when partition data gradient is not more than gradient threshold value, Paralleled executes the data migration task of migration table.The present invention is not more than gradient threshold value by control partition data skewness, is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoids the occurrence of data skew problem, improve Spark data migration efficiency and quality.
Description
Technical field
The present invention relates to Data Transference Technology fields, more particularly to a kind of data migration method and device.
Background technique
With the rapid development of big data and artificial intelligence the relevant technologies, new technology is gradually answered in social every profession and trade
With current each business bank all carries out deeply in the application for gradually carrying out big data technology, and by new technology and banking service strategy
Degree fusion lays the foundation for financial technology development, and big data has widely in financial fields such as financial industry anti money washing, anti-frauds
Application prospect.And for a long time, all kinds of business datums of bank are mainly stored in relational database, with deeply making for big data
The problem of data quick and stable in relational database is migrated to big data platform with, urgent need to resolve.
Industry generallys use the Data Migration Tools such as Sqoop, Spark at present, wherein although Sqoop tool is convenient,
Due to being realized using Map/Reduce, intermediate data must land disk, and data migration efficiency is lower, and also needs to handle difference
The problems such as database character set transcoding;Spark compartment model efficiency is higher, but when field number is discrete type, is easy
Existing data skew, i.e., a large amount of data have been focused on one or several machines and have been calculated, and lead to entire data migration process mistake
Slowly, cause data migration efficiency low.
Summary of the invention
In view of this, avoiding carrying out Data Migration using Spark the present invention provides a kind of data migration method and device
During there is the problem of data skew.
In order to achieve the above-mentioned object of the invention, specific technical solution provided by the invention is as follows:
A kind of data migration method, comprising:
The subregion column of migration table are set, and the degree of parallelism of the data migration task of the migration table is set;
Hash subregion is carried out to the data in the migration table using default hashing algorithm, obtains the data of each subregion
Amount;
According to the data volume of each subregion, partition data gradient is calculated;
When the partition data gradient is greater than gradient threshold value, returns to described utilize of execution and preset hashing algorithm to institute
The data stated in migration table carry out hash subregion;
When the partition data gradient is not more than the gradient threshold value, Paralleled executes the number of the migration table
According to migration task.
Optionally, the subregion column of the setting migration table, comprising:
The essential information of the migration table is obtained, and is arranged according to the essential information of migration table setting subregion.
Optionally, the degree of parallelism of the data migration task of the setting migration table, comprising:
According to calculate node core cpu sum in Spark cluster, the parallel of the data migration task of the migration table is set
Degree, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, described that hash subregion is carried out to the data in the migration table using default hashing algorithm, it obtains each
The data volume of subregion, comprising:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is
Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data
The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value
It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
Optionally, the data volume according to each subregion calculates partition data gradient, comprising:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination
Degree.
A kind of data migration device, comprising:
Setting unit, for be arranged migration table subregion arrange, and be arranged the migration table data migration task it is parallel
Degree;
Hash zoning unit is obtained for carrying out hash subregion to the data in the migration table using default hashing algorithm
To the data volume of each subregion;
Gradient computing unit calculates partition data gradient, when the subregion for the data volume according to each subregion
When data skewness is greater than gradient threshold value, the hash zoning unit is triggered, when the partition data gradient is not more than institute
When stating gradient threshold value, task executing units are triggered;
The task executing units execute the data migration task of the migration table for Paralleled.
Optionally, the setting unit includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and according to the basic of the migration table
Information is arranged subregion and arranges.
Optionally, the setting unit includes:
Subelement is arranged in degree of parallelism, for the migration table to be arranged according to calculate node core cpu sum in Spark cluster
Data migration task degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, the hash zoning unit, is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is
Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data
The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value
It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
Optionally, the gradient computing unit, is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination
Degree.
Compared with the existing technology, beneficial effects of the present invention are as follows:
A kind of data migration method disclosed by the invention, first the subregion column of setting migration table, and the number of migration table is set
According to the degree of parallelism of migration task, recycles default hashing algorithm to carry out hash subregion to the data in migration table, obtain each point
The data volume in area, then according to the data volume of each subregion, partition data gradient is calculated, when partition data gradient is greater than inclination
When spending threshold value, hash subregion is re-started, when partition data gradient is not more than gradient threshold value, Paralleled executes migration
The data migration task of table.Gradient threshold value is not more than by control partition data skewness in data migration process, is made point
Area's data are uniformly distributed, so that data migration task load balancing be made to execute parallel, are avoided the occurrence of data skew problem, are improved
Spark data migration efficiency and quality.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow diagram of data migration method disclosed by the embodiments of the present invention;
Fig. 2 is a kind of flow diagram for hashing partition method disclosed by the embodiments of the present invention;
Fig. 3 is a kind of structural schematic diagram of data migration device disclosed by the embodiments of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Present embodiment discloses a kind of data migration methods, applied to based on the Data Migration field under Spark technological frame
Jing Zhong, specifically, referring to Fig. 1, data migration method disclosed in the present embodiment specifically includes the following steps:
S101: the subregion column of migration table are set, and the degree of parallelism of the data migration task of migration table is set;
Migration table is the database table for needing to carry out Data Migration.
The essential information for needing to obtain migration table before Data Migration first, as database-driven, address, user name,
The data migration task of migration table can be generated in password, database-name, migration table etc., the essential information according to migration table.
Subregion is arranged according to the essential information of migration table to arrange, subregion is classified as the less column of repetition values in migration table, convenient for according to
Hash subregion is carried out according to subregion column, can such as be arranged primary key column or unique key column as the subregion of migration table.
Degree of parallelism is indicated using the operation number executed parallel during Spark migrating data.
Specifically, the data migration task of the migration table is arranged according to calculate node core cpu sum in Spark cluster
Degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.Since degree of parallelism is element
Number, the value of degree of parallelism cannot be decomposed into the other values in addition to 1, guarantee that the operation of data migration task balanced can be assigned to
Core cpu in Spark cluster.
S102: hash subregion is carried out to the data in migration table using default hashing algorithm, obtains the data of each subregion
Amount;
Firstly generate the first random number and the second random number, wherein the first random number can be 5 digits, the second random number
It can be 6 digits, then execute hash partition method as shown in Figure 2 for each data in migration table, specifically include
Following steps:
S201: it calculates hash=hash*a+key.charAt (i);
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, i.e., the initial value of i is 0, and the subregion of target data arranges
Value is key, and length len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) indicates the data
The corresponding numerical value of i-th bit in subregion train value.
Target data is that the data of hash partition method are currently executed in migration table.
It is arranged using subregion as name, the subregion train value of target data is for Zhang San, the corresponding character string of Zhang San is
Zhangsan, i.e. key are zhangsan, and len 8, i=0 indicate that z, key.charAt (i) are 0 × 5a.
S202: judge whether i is equal to len-1;
If it is not, executing S203: calculating i=i+1, a=a*b;And it returns and executes S201;
If so, executing S204: obtaining the final hash value of target data;
S205: remainder calculating is carried out to degree of parallelism using final hash value, obtains the hashed value of target data;
S206: the hashed value according to target data determines the corresponding subregion of target data.
It is obtained after being calculated for the final hash value of target data degree of parallelism remainder due to the hashed value of target data, because
This, the hashed value of target data be [0, Pd) between integer, the subregion that reference numeral is 0 when the hashed value of data is 0, when
The subregion that reference numeral is 1 when the hashed value of data is 1, and so on, obtain subregion corresponding to every data.
It should be noted that the disclosed hash partition method of the present embodiment is realized by key.charAt (i) function by word
Symbol type column are mapped as numeric type column, make any character row in migration table be mapped as determining the number of range, so that using Spark
When parallel migration relation database table, user arranges without providing numeric type field as subregion, provides and arranges nonumeric type subregion
Support, expand the use scope of Spark parallel migration relation database table.
S103: according to the data volume of each subregion, partition data gradient is calculated;
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination
Degree.
Specifically, data skewness d=(MAX (T)-MIN (T))/SUM (T);
Wherein, MAX (T) is the maximum amount of data in each subregion, and MIN (T) is the minimum data amount in each subregion,
SUM (T) is the total amount of data of migration table.
S104: judge whether partition data gradient is greater than gradient threshold value;
S102 is executed if so, returning;That is, regenerating the first random number and the second random number, and re-start hash point
Area.
If it is not, executing S105: the data migration task of Paralleled execution migration table.
When data skewness is not more than gradient threshold value, the data migration task of migration table is submitted into Spark, Spark
The data migration task of migration table is divided into multiple Data Migration operations, the quantity and the number of partitions, degree of parallelism of Data Migration operation
Identical, each core cpu of the node in Spark cluster can only at most be assigned a Data Migration operation, be assigned
The core cpu parallel execution of data of Data Migration operation migrates operation.
As it can be seen that data migration method disclosed in the present embodiment, the subregion by the way that migration table is arranged first is arranged, and migration is arranged
The degree of parallelism of the data migration task of table recycles default hashing algorithm to carry out hash subregion to the data in migration table, obtains
The data volume of each subregion, then according to the data volume of each subregion, partition data gradient is calculated, when partition data gradient is big
When gradient threshold value, hash subregion is re-started, when partition data gradient is not more than gradient threshold value, Paralleled is held
The data migration task of row migration table.Gradient threshold is not more than by control partition data skewness in data migration process
Value, is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoids the occurrence of data skew and ask
Topic, improves Spark data migration efficiency and quality.
Disclosed a kind of data migration method based on the above embodiment, the present embodiment is corresponding to disclose a kind of Data Migration dress
It sets, referring to Fig. 3, the device includes:
Setting unit 301, for be arranged migration table subregion arrange, and be arranged the migration table data migration task and
Row degree;
Zoning unit 302 is hashed, for carrying out hash subregion to the data in the migration table using default hashing algorithm,
Obtain the data volume of each subregion;
Gradient computing unit 303 calculates partition data gradient, when described for the data volume according to each subregion
When partition data gradient is greater than gradient threshold value, trigger the hash zoning unit 302, when the partition data gradient not
When greater than the gradient threshold value, task executing units 304 are triggered;
The task executing units 304, the data migration task of the migration table is executed for Paralleled.
Optionally, the setting unit 301 includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and according to the basic of the migration table
Information is arranged subregion and arranges.
Optionally, the setting unit 301 includes:
Subelement is arranged in degree of parallelism, for the migration table to be arranged according to calculate node core cpu sum in Spark cluster
Data migration task degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, the hash zoning unit 302, is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is
Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data
The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value
It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.Optionally, the gradient computing unit
303, it is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination
Degree.
A kind of data migration device disclosed in the present embodiment passes through control partition data skewness in data migration process
No more than gradient threshold value, it is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoided out
Existing data skew problem, improves Spark data migration efficiency and quality.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of data migration method characterized by comprising
The subregion column of migration table are set, and the degree of parallelism of the data migration task of the migration table is set;
Hash subregion is carried out to the data in the migration table using default hashing algorithm, obtains the data volume of each subregion;
According to the data volume of each subregion, partition data gradient is calculated;
When the partition data gradient is greater than gradient threshold value, return execution is described to be moved using default hashing algorithm to described
Data in shifting table carry out hash subregion;
When the partition data gradient is not more than the gradient threshold value, the data that Paralleled executes the migration table are moved
Shifting task.
2. the method according to claim 1, wherein the subregion column of the setting migration table, comprising:
The essential information of the migration table is obtained, and is arranged according to the essential information of migration table setting subregion.
3. the method according to claim 1, wherein the data migration task of the setting migration table and
Row degree, comprising:
The degree of parallelism of the data migration task of the migration table is set according to calculate node core cpu sum in Spark cluster,
In, degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
4. the method according to claim 1, wherein described utilize default hashing algorithm in the migration table
Data carry out hash subregion, obtain the data volume of each subregion, comprising:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, length len, a
Indicate the first random number, b indicates the second random number, and key.charAt (i) indicates that i-th bit is corresponding in the subregion train value of the data
Numerical value;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder meter to degree of parallelism using final hash value
It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
5. the method according to claim 1, wherein the data volume according to each subregion, calculates the number of partitions
According to gradient, comprising:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table, obtains the partition data gradient.
6. a kind of data migration device characterized by comprising
Setting unit, the subregion for migration table to be arranged arranges, and the degree of parallelism of the data migration task of the migration table is arranged;
Zoning unit is hashed, for carrying out hash subregion to the data in the migration table using default hashing algorithm, is obtained every
The data volume of a subregion;
Gradient computing unit calculates partition data gradient, when the partition data for the data volume according to each subregion
When gradient is greater than gradient threshold value, the hash zoning unit is triggered, when the partition data gradient is inclined no more than described
When gradient threshold value, task executing units are triggered;
The task executing units execute the data migration task of the migration table for Paralleled.
7. device according to claim 6, which is characterized in that the setting unit includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and the essential information according to the migration table
Subregion is arranged to arrange.
8. device according to claim 6, which is characterized in that the setting unit includes:
Subelement is arranged in degree of parallelism, for the number of the migration table to be arranged according to calculate node core cpu sum in Spark cluster
According to the degree of parallelism of migration task, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
9. device according to claim 6, which is characterized in that the hash zoning unit is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, length len, a
Indicate the first random number, b indicates the second random number, and key.charAt (i) indicates that i-th bit is corresponding in the subregion train value of the data
Numerical value;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder meter to degree of parallelism using final hash value
It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
10. device according to claim 6, which is characterized in that the gradient computing unit is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table, obtains the partition data gradient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806491.1A CN110502505A (en) | 2019-08-29 | 2019-08-29 | A kind of data migration method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806491.1A CN110502505A (en) | 2019-08-29 | 2019-08-29 | A kind of data migration method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110502505A true CN110502505A (en) | 2019-11-26 |
Family
ID=68590441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910806491.1A Pending CN110502505A (en) | 2019-08-29 | 2019-08-29 | A kind of data migration method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110502505A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112650736A (en) * | 2020-12-31 | 2021-04-13 | 中国农业银行股份有限公司 | Data migration method and device |
CN113778727A (en) * | 2020-06-19 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170371892A1 (en) * | 2016-06-22 | 2017-12-28 | Aol Advertising Inc. | Systems and methods for dynamic partitioning in distributed environments |
CN107562542A (en) * | 2017-09-06 | 2018-01-09 | 腾讯科技(深圳)有限公司 | distributed data processing system data partition method and device |
CN108334596A (en) * | 2018-01-31 | 2018-07-27 | 华南师范大学 | A kind of massive relation data efficient concurrent migration method towards big data platform |
CN108572873A (en) * | 2018-04-24 | 2018-09-25 | 中国科学院重庆绿色智能技术研究院 | A kind of load-balancing method and device solving the problems, such as Spark data skews |
CN110069502A (en) * | 2019-04-24 | 2019-07-30 | 东南大学 | Data balancing partition method and computer storage medium based on Spark framework |
-
2019
- 2019-08-29 CN CN201910806491.1A patent/CN110502505A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170371892A1 (en) * | 2016-06-22 | 2017-12-28 | Aol Advertising Inc. | Systems and methods for dynamic partitioning in distributed environments |
CN107562542A (en) * | 2017-09-06 | 2018-01-09 | 腾讯科技(深圳)有限公司 | distributed data processing system data partition method and device |
CN108334596A (en) * | 2018-01-31 | 2018-07-27 | 华南师范大学 | A kind of massive relation data efficient concurrent migration method towards big data platform |
CN108572873A (en) * | 2018-04-24 | 2018-09-25 | 中国科学院重庆绿色智能技术研究院 | A kind of load-balancing method and device solving the problems, such as Spark data skews |
CN110069502A (en) * | 2019-04-24 | 2019-07-30 | 东南大学 | Data balancing partition method and computer storage medium based on Spark framework |
Non-Patent Citations (4)
Title |
---|
XK_一步一步来: "几种经典的hash算法", 《CSDN》 * |
王诚 等: "基于贪心算法的一致性哈希负载均衡优化", 《南京邮电大学学报(自然科学版)》 * |
阿飞_: "散列函数中求模运算为什么要使用素数,原因分析", 《CSDN》 * |
黄超杰: "Spark中的数据均衡分配算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113778727A (en) * | 2020-06-19 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN112650736A (en) * | 2020-12-31 | 2021-04-13 | 中国农业银行股份有限公司 | Data migration method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110602148B (en) | Method and device for generating state tree of block and verifying data on chain | |
RU2724136C1 (en) | Data processing method and device | |
CN106406896B (en) | Block chain block building method based on parallel Pipeline technology | |
CN110737664B (en) | Method and device for synchronizing block chain link points | |
CN107368259A (en) | A kind of method and apparatus that business datum is write in the catenary system to block | |
US10992459B2 (en) | Updating a state Merkle tree | |
TW201823988A (en) | Block data checking method and device | |
US10908833B2 (en) | Data migration method for a storage system after expansion and storage system | |
CN109903049A (en) | A kind of block chain transaction data storage method, device, equipment and storage medium | |
CN110502505A (en) | A kind of data migration method and device | |
CN106126334A (en) | The workload migration of probability data de-duplication perception | |
CN106407224A (en) | Method and device for file compaction in KV (Key-Value)-Store system | |
EP3961461A1 (en) | Method and apparatus for obtaining number for transaction-accessed variable in blockchain in parallel | |
CN109408590A (en) | Expansion method, device, equipment and the storage medium of distributed data base | |
CN110245145A (en) | Structure synchronization method and apparatus of the relevant database to Hadoop database | |
CN106406762A (en) | A repeated data deleting method and device | |
CN108763536A (en) | Data bank access method and device | |
CN110287179A (en) | A kind of filling equipment of shortage of data attribute value, device and method | |
CN107798120B (en) | Data conversion method and device | |
CN102541622A (en) | Method for placing load-related virtual machine | |
CN109582649A (en) | A kind of metadata storing method, device, equipment and readable storage medium storing program for executing | |
CN103825946A (en) | Virtual machine placement method based on network perception | |
CN110298517A (en) | A kind of logistics transportation dispatching method, device and equipment based on parallel computation | |
CN106326005A (en) | Automatic parameter tuning method for iterative MapReduce operation | |
CN106648891A (en) | MapReduce model-based task execution method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191126 |
|
RJ01 | Rejection of invention patent application after publication |