CN110502505A - A kind of data migration method and device - Google Patents

A kind of data migration method and device Download PDF

Info

Publication number
CN110502505A
CN110502505A CN201910806491.1A CN201910806491A CN110502505A CN 110502505 A CN110502505 A CN 110502505A CN 201910806491 A CN201910806491 A CN 201910806491A CN 110502505 A CN110502505 A CN 110502505A
Authority
CN
China
Prior art keywords
data
subregion
migration
gradient
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910806491.1A
Other languages
Chinese (zh)
Inventor
苏新锋
薛飞
王会武
赵焕芳
王太宁
吴洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201910806491.1A priority Critical patent/CN110502505A/en
Publication of CN110502505A publication Critical patent/CN110502505A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data migration method and devices, the subregion column of migration table are set first, and the degree of parallelism of the data migration task of migration table is set, default hashing algorithm is recycled to carry out hash subregion to the data in migration table, obtain the data volume of each subregion, data volume according to each subregion again, calculate partition data gradient, when partition data gradient is greater than gradient threshold value, re-start hash subregion, when partition data gradient is not more than gradient threshold value, Paralleled executes the data migration task of migration table.The present invention is not more than gradient threshold value by control partition data skewness, is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoids the occurrence of data skew problem, improve Spark data migration efficiency and quality.

Description

A kind of data migration method and device
Technical field
The present invention relates to Data Transference Technology fields, more particularly to a kind of data migration method and device.
Background technique
With the rapid development of big data and artificial intelligence the relevant technologies, new technology is gradually answered in social every profession and trade With current each business bank all carries out deeply in the application for gradually carrying out big data technology, and by new technology and banking service strategy Degree fusion lays the foundation for financial technology development, and big data has widely in financial fields such as financial industry anti money washing, anti-frauds Application prospect.And for a long time, all kinds of business datums of bank are mainly stored in relational database, with deeply making for big data The problem of data quick and stable in relational database is migrated to big data platform with, urgent need to resolve.
Industry generallys use the Data Migration Tools such as Sqoop, Spark at present, wherein although Sqoop tool is convenient, Due to being realized using Map/Reduce, intermediate data must land disk, and data migration efficiency is lower, and also needs to handle difference The problems such as database character set transcoding;Spark compartment model efficiency is higher, but when field number is discrete type, is easy Existing data skew, i.e., a large amount of data have been focused on one or several machines and have been calculated, and lead to entire data migration process mistake Slowly, cause data migration efficiency low.
Summary of the invention
In view of this, avoiding carrying out Data Migration using Spark the present invention provides a kind of data migration method and device During there is the problem of data skew.
In order to achieve the above-mentioned object of the invention, specific technical solution provided by the invention is as follows:
A kind of data migration method, comprising:
The subregion column of migration table are set, and the degree of parallelism of the data migration task of the migration table is set;
Hash subregion is carried out to the data in the migration table using default hashing algorithm, obtains the data of each subregion Amount;
According to the data volume of each subregion, partition data gradient is calculated;
When the partition data gradient is greater than gradient threshold value, returns to described utilize of execution and preset hashing algorithm to institute The data stated in migration table carry out hash subregion;
When the partition data gradient is not more than the gradient threshold value, Paralleled executes the number of the migration table According to migration task.
Optionally, the subregion column of the setting migration table, comprising:
The essential information of the migration table is obtained, and is arranged according to the essential information of migration table setting subregion.
Optionally, the degree of parallelism of the data migration task of the setting migration table, comprising:
According to calculate node core cpu sum in Spark cluster, the parallel of the data migration task of the migration table is set Degree, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, described that hash subregion is carried out to the data in the migration table using default hashing algorithm, it obtains each The data volume of subregion, comprising:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
Optionally, the data volume according to each subregion calculates partition data gradient, comprising:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination Degree.
A kind of data migration device, comprising:
Setting unit, for be arranged migration table subregion arrange, and be arranged the migration table data migration task it is parallel Degree;
Hash zoning unit is obtained for carrying out hash subregion to the data in the migration table using default hashing algorithm To the data volume of each subregion;
Gradient computing unit calculates partition data gradient, when the subregion for the data volume according to each subregion When data skewness is greater than gradient threshold value, the hash zoning unit is triggered, when the partition data gradient is not more than institute When stating gradient threshold value, task executing units are triggered;
The task executing units execute the data migration task of the migration table for Paralleled.
Optionally, the setting unit includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and according to the basic of the migration table Information is arranged subregion and arranges.
Optionally, the setting unit includes:
Subelement is arranged in degree of parallelism, for the migration table to be arranged according to calculate node core cpu sum in Spark cluster Data migration task degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, the hash zoning unit, is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
Optionally, the gradient computing unit, is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination Degree.
Compared with the existing technology, beneficial effects of the present invention are as follows:
A kind of data migration method disclosed by the invention, first the subregion column of setting migration table, and the number of migration table is set According to the degree of parallelism of migration task, recycles default hashing algorithm to carry out hash subregion to the data in migration table, obtain each point The data volume in area, then according to the data volume of each subregion, partition data gradient is calculated, when partition data gradient is greater than inclination When spending threshold value, hash subregion is re-started, when partition data gradient is not more than gradient threshold value, Paralleled executes migration The data migration task of table.Gradient threshold value is not more than by control partition data skewness in data migration process, is made point Area's data are uniformly distributed, so that data migration task load balancing be made to execute parallel, are avoided the occurrence of data skew problem, are improved Spark data migration efficiency and quality.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow diagram of data migration method disclosed by the embodiments of the present invention;
Fig. 2 is a kind of flow diagram for hashing partition method disclosed by the embodiments of the present invention;
Fig. 3 is a kind of structural schematic diagram of data migration device disclosed by the embodiments of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Present embodiment discloses a kind of data migration methods, applied to based on the Data Migration field under Spark technological frame Jing Zhong, specifically, referring to Fig. 1, data migration method disclosed in the present embodiment specifically includes the following steps:
S101: the subregion column of migration table are set, and the degree of parallelism of the data migration task of migration table is set;
Migration table is the database table for needing to carry out Data Migration.
The essential information for needing to obtain migration table before Data Migration first, as database-driven, address, user name, The data migration task of migration table can be generated in password, database-name, migration table etc., the essential information according to migration table.
Subregion is arranged according to the essential information of migration table to arrange, subregion is classified as the less column of repetition values in migration table, convenient for according to Hash subregion is carried out according to subregion column, can such as be arranged primary key column or unique key column as the subregion of migration table.
Degree of parallelism is indicated using the operation number executed parallel during Spark migrating data.
Specifically, the data migration task of the migration table is arranged according to calculate node core cpu sum in Spark cluster Degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.Since degree of parallelism is element Number, the value of degree of parallelism cannot be decomposed into the other values in addition to 1, guarantee that the operation of data migration task balanced can be assigned to Core cpu in Spark cluster.
S102: hash subregion is carried out to the data in migration table using default hashing algorithm, obtains the data of each subregion Amount;
Firstly generate the first random number and the second random number, wherein the first random number can be 5 digits, the second random number It can be 6 digits, then execute hash partition method as shown in Figure 2 for each data in migration table, specifically include Following steps:
S201: it calculates hash=hash*a+key.charAt (i);
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, i.e., the initial value of i is 0, and the subregion of target data arranges Value is key, and length len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) indicates the data The corresponding numerical value of i-th bit in subregion train value.
Target data is that the data of hash partition method are currently executed in migration table.
It is arranged using subregion as name, the subregion train value of target data is for Zhang San, the corresponding character string of Zhang San is Zhangsan, i.e. key are zhangsan, and len 8, i=0 indicate that z, key.charAt (i) are 0 × 5a.
S202: judge whether i is equal to len-1;
If it is not, executing S203: calculating i=i+1, a=a*b;And it returns and executes S201;
If so, executing S204: obtaining the final hash value of target data;
S205: remainder calculating is carried out to degree of parallelism using final hash value, obtains the hashed value of target data;
S206: the hashed value according to target data determines the corresponding subregion of target data.
It is obtained after being calculated for the final hash value of target data degree of parallelism remainder due to the hashed value of target data, because This, the hashed value of target data be [0, Pd) between integer, the subregion that reference numeral is 0 when the hashed value of data is 0, when The subregion that reference numeral is 1 when the hashed value of data is 1, and so on, obtain subregion corresponding to every data.
It should be noted that the disclosed hash partition method of the present embodiment is realized by key.charAt (i) function by word Symbol type column are mapped as numeric type column, make any character row in migration table be mapped as determining the number of range, so that using Spark When parallel migration relation database table, user arranges without providing numeric type field as subregion, provides and arranges nonumeric type subregion Support, expand the use scope of Spark parallel migration relation database table.
S103: according to the data volume of each subregion, partition data gradient is calculated;
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination Degree.
Specifically, data skewness d=(MAX (T)-MIN (T))/SUM (T);
Wherein, MAX (T) is the maximum amount of data in each subregion, and MIN (T) is the minimum data amount in each subregion, SUM (T) is the total amount of data of migration table.
S104: judge whether partition data gradient is greater than gradient threshold value;
S102 is executed if so, returning;That is, regenerating the first random number and the second random number, and re-start hash point Area.
If it is not, executing S105: the data migration task of Paralleled execution migration table.
When data skewness is not more than gradient threshold value, the data migration task of migration table is submitted into Spark, Spark The data migration task of migration table is divided into multiple Data Migration operations, the quantity and the number of partitions, degree of parallelism of Data Migration operation Identical, each core cpu of the node in Spark cluster can only at most be assigned a Data Migration operation, be assigned The core cpu parallel execution of data of Data Migration operation migrates operation.
As it can be seen that data migration method disclosed in the present embodiment, the subregion by the way that migration table is arranged first is arranged, and migration is arranged The degree of parallelism of the data migration task of table recycles default hashing algorithm to carry out hash subregion to the data in migration table, obtains The data volume of each subregion, then according to the data volume of each subregion, partition data gradient is calculated, when partition data gradient is big When gradient threshold value, hash subregion is re-started, when partition data gradient is not more than gradient threshold value, Paralleled is held The data migration task of row migration table.Gradient threshold is not more than by control partition data skewness in data migration process Value, is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoids the occurrence of data skew and ask Topic, improves Spark data migration efficiency and quality.
Disclosed a kind of data migration method based on the above embodiment, the present embodiment is corresponding to disclose a kind of Data Migration dress It sets, referring to Fig. 3, the device includes:
Setting unit 301, for be arranged migration table subregion arrange, and be arranged the migration table data migration task and Row degree;
Zoning unit 302 is hashed, for carrying out hash subregion to the data in the migration table using default hashing algorithm, Obtain the data volume of each subregion;
Gradient computing unit 303 calculates partition data gradient, when described for the data volume according to each subregion When partition data gradient is greater than gradient threshold value, trigger the hash zoning unit 302, when the partition data gradient not When greater than the gradient threshold value, task executing units 304 are triggered;
The task executing units 304, the data migration task of the migration table is executed for Paralleled.
Optionally, the setting unit 301 includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and according to the basic of the migration table Information is arranged subregion and arranges.
Optionally, the setting unit 301 includes:
Subelement is arranged in degree of parallelism, for the migration table to be arranged according to calculate node core cpu sum in Spark cluster Data migration task degree of parallelism, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
Optionally, the hash zoning unit 302, is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, and length is Len, a indicate the first random number, and b indicates the second random number, and key.charAt (i) is indicated i-th in the subregion train value of the data The corresponding numerical value in position;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder to degree of parallelism using final hash value It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.Optionally, the gradient computing unit 303, it is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table obtains the partition data inclination Degree.
A kind of data migration device disclosed in the present embodiment passes through control partition data skewness in data migration process No more than gradient threshold value, it is uniformly distributed partition data, so that data migration task load balancing be made to execute parallel, avoided out Existing data skew problem, improves Spark data migration efficiency and quality.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of data migration method characterized by comprising
The subregion column of migration table are set, and the degree of parallelism of the data migration task of the migration table is set;
Hash subregion is carried out to the data in the migration table using default hashing algorithm, obtains the data volume of each subregion;
According to the data volume of each subregion, partition data gradient is calculated;
When the partition data gradient is greater than gradient threshold value, return execution is described to be moved using default hashing algorithm to described Data in shifting table carry out hash subregion;
When the partition data gradient is not more than the gradient threshold value, the data that Paralleled executes the migration table are moved Shifting task.
2. the method according to claim 1, wherein the subregion column of the setting migration table, comprising:
The essential information of the migration table is obtained, and is arranged according to the essential information of migration table setting subregion.
3. the method according to claim 1, wherein the data migration task of the setting migration table and Row degree, comprising:
The degree of parallelism of the data migration task of the migration table is set according to calculate node core cpu sum in Spark cluster, In, degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
4. the method according to claim 1, wherein described utilize default hashing algorithm in the migration table Data carry out hash subregion, obtain the data volume of each subregion, comprising:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, length len, a Indicate the first random number, b indicates the second random number, and key.charAt (i) indicates that i-th bit is corresponding in the subregion train value of the data Numerical value;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder meter to degree of parallelism using final hash value It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
5. the method according to claim 1, wherein the data volume according to each subregion, calculates the number of partitions According to gradient, comprising:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table, obtains the partition data gradient.
6. a kind of data migration device characterized by comprising
Setting unit, the subregion for migration table to be arranged arranges, and the degree of parallelism of the data migration task of the migration table is arranged;
Zoning unit is hashed, for carrying out hash subregion to the data in the migration table using default hashing algorithm, is obtained every The data volume of a subregion;
Gradient computing unit calculates partition data gradient, when the partition data for the data volume according to each subregion When gradient is greater than gradient threshold value, the hash zoning unit is triggered, when the partition data gradient is inclined no more than described When gradient threshold value, task executing units are triggered;
The task executing units execute the data migration task of the migration table for Paralleled.
7. device according to claim 6, which is characterized in that the setting unit includes:
Subregion column setting subelement, for obtaining the essential information of the migration table, and the essential information according to the migration table Subregion is arranged to arrange.
8. device according to claim 6, which is characterized in that the setting unit includes:
Subelement is arranged in degree of parallelism, for the number of the migration table to be arranged according to calculate node core cpu sum in Spark cluster According to the degree of parallelism of migration task, wherein degree of parallelism is prime number and is less than calculate node core cpu sum in Spark cluster.
9. device according to claim 6, which is characterized in that the hash zoning unit is specifically used for:
Generate the first random number and the second random number;
For each data in the migration table, following loop iteration is executed:
Hash=hash*a+key.charAt (i);
A=a*b;
Wherein, the initial value of hash is 0, i={ 0 ..., len-1 }, and the subregion train value of the data is key, length len, a Indicate the first random number, b indicates the second random number, and key.charAt (i) indicates that i-th bit is corresponding in the subregion train value of the data Numerical value;
Loop iteration terminates to obtain the final hash value of the data, and carries out remainder meter to degree of parallelism using final hash value It calculates, obtains the hashed value of the data;
Hashed value according to the data determines the corresponding subregion of the data.
10. device according to claim 6, which is characterized in that the gradient computing unit is specifically used for:
Determine the maximum amount of data and minimum data amount in each subregion;
Calculate the data volume difference between the maximum amount of data and the minimum data amount;
The ratio for calculating the total amount of data of the data volume difference and the migration table, obtains the partition data gradient.
CN201910806491.1A 2019-08-29 2019-08-29 A kind of data migration method and device Pending CN110502505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806491.1A CN110502505A (en) 2019-08-29 2019-08-29 A kind of data migration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806491.1A CN110502505A (en) 2019-08-29 2019-08-29 A kind of data migration method and device

Publications (1)

Publication Number Publication Date
CN110502505A true CN110502505A (en) 2019-11-26

Family

ID=68590441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806491.1A Pending CN110502505A (en) 2019-08-29 2019-08-29 A kind of data migration method and device

Country Status (1)

Country Link
CN (1) CN110502505A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650736A (en) * 2020-12-31 2021-04-13 中国农业银行股份有限公司 Data migration method and device
CN113778727A (en) * 2020-06-19 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170371892A1 (en) * 2016-06-22 2017-12-28 Aol Advertising Inc. Systems and methods for dynamic partitioning in distributed environments
CN107562542A (en) * 2017-09-06 2018-01-09 腾讯科技(深圳)有限公司 distributed data processing system data partition method and device
CN108334596A (en) * 2018-01-31 2018-07-27 华南师范大学 A kind of massive relation data efficient concurrent migration method towards big data platform
CN108572873A (en) * 2018-04-24 2018-09-25 中国科学院重庆绿色智能技术研究院 A kind of load-balancing method and device solving the problems, such as Spark data skews
CN110069502A (en) * 2019-04-24 2019-07-30 东南大学 Data balancing partition method and computer storage medium based on Spark framework

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170371892A1 (en) * 2016-06-22 2017-12-28 Aol Advertising Inc. Systems and methods for dynamic partitioning in distributed environments
CN107562542A (en) * 2017-09-06 2018-01-09 腾讯科技(深圳)有限公司 distributed data processing system data partition method and device
CN108334596A (en) * 2018-01-31 2018-07-27 华南师范大学 A kind of massive relation data efficient concurrent migration method towards big data platform
CN108572873A (en) * 2018-04-24 2018-09-25 中国科学院重庆绿色智能技术研究院 A kind of load-balancing method and device solving the problems, such as Spark data skews
CN110069502A (en) * 2019-04-24 2019-07-30 东南大学 Data balancing partition method and computer storage medium based on Spark framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XK_一步一步来: "几种经典的hash算法", 《CSDN》 *
王诚 等: "基于贪心算法的一致性哈希负载均衡优化", 《南京邮电大学学报(自然科学版)》 *
阿飞_: "散列函数中求模运算为什么要使用素数,原因分析", 《CSDN》 *
黄超杰: "Spark中的数据均衡分配算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778727A (en) * 2020-06-19 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN112650736A (en) * 2020-12-31 2021-04-13 中国农业银行股份有限公司 Data migration method and device

Similar Documents

Publication Publication Date Title
CN110602148B (en) Method and device for generating state tree of block and verifying data on chain
RU2724136C1 (en) Data processing method and device
CN106406896B (en) Block chain block building method based on parallel Pipeline technology
CN110737664B (en) Method and device for synchronizing block chain link points
CN107368259A (en) A kind of method and apparatus that business datum is write in the catenary system to block
US10992459B2 (en) Updating a state Merkle tree
TW201823988A (en) Block data checking method and device
US10908833B2 (en) Data migration method for a storage system after expansion and storage system
CN109903049A (en) A kind of block chain transaction data storage method, device, equipment and storage medium
CN110502505A (en) A kind of data migration method and device
CN106126334A (en) The workload migration of probability data de-duplication perception
CN106407224A (en) Method and device for file compaction in KV (Key-Value)-Store system
EP3961461A1 (en) Method and apparatus for obtaining number for transaction-accessed variable in blockchain in parallel
CN109408590A (en) Expansion method, device, equipment and the storage medium of distributed data base
CN110245145A (en) Structure synchronization method and apparatus of the relevant database to Hadoop database
CN106406762A (en) A repeated data deleting method and device
CN108763536A (en) Data bank access method and device
CN110287179A (en) A kind of filling equipment of shortage of data attribute value, device and method
CN107798120B (en) Data conversion method and device
CN102541622A (en) Method for placing load-related virtual machine
CN109582649A (en) A kind of metadata storing method, device, equipment and readable storage medium storing program for executing
CN103825946A (en) Virtual machine placement method based on network perception
CN110298517A (en) A kind of logistics transportation dispatching method, device and equipment based on parallel computation
CN106326005A (en) Automatic parameter tuning method for iterative MapReduce operation
CN106648891A (en) MapReduce model-based task execution method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191126

RJ01 Rejection of invention patent application after publication