CN111144701A - ETL job scheduling resource classification evaluation method under distributed environment - Google Patents

ETL job scheduling resource classification evaluation method under distributed environment Download PDF

Info

Publication number
CN111144701A
CN111144701A CN201911225107.5A CN201911225107A CN111144701A CN 111144701 A CN111144701 A CN 111144701A CN 201911225107 A CN201911225107 A CN 201911225107A CN 111144701 A CN111144701 A CN 111144701A
Authority
CN
China
Prior art keywords
etl
index
server
etl server
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911225107.5A
Other languages
Chinese (zh)
Other versions
CN111144701B (en
Inventor
杜海
唐伟力
苗青鹏
吴迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN201911225107.5A priority Critical patent/CN111144701B/en
Publication of CN111144701A publication Critical patent/CN111144701A/en
Application granted granted Critical
Publication of CN111144701B publication Critical patent/CN111144701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for classifying and evaluating ETL job scheduling resources in a distributed environment, which comprises the following steps: determining an index system for evaluating the performance of an ETL server; step two, determining the ETL operation type; step three, clustering and analyzing ETL server index data based on an index system to obtain an ETL server candidate set classified correspondingly to the ETL operation type; step four, establishing an index evaluation matrix and calculating the information entropy of the index and the weight of the information entropy; step five, sorting all kinds of ETL server candidate sets; and step six, calculating and determining the ETL operation type according to the step two, and then selecting the ETL server with the top rank from the ETL server candidate set of the corresponding classification. The invention adopts a cluster analysis method and an evaluation method based on information entropy to evaluate the performance of the ETL server, automatically matches the ETL operation which does not enter an execution state, can fully utilize idle resources of the ETL server, and dynamically and quasi-real-time distributes computing resources.

Description

ETL job scheduling resource classification evaluation method under distributed environment
Technical Field
The invention belongs to the technical field of networks, and particularly relates to a method for classifying and evaluating ETL job scheduling resources in a distributed environment.
Background
The rapid development of information technology brings about the explosion of data construction, the scale of a data warehouse is gradually huge, the architecture is more complex, the most important link in the data warehouse construction is ETL data connection, the process of data extraction, conversion and loading accounts for 80% of the workload of the data warehouse construction process, a distributed data connection mode is the mainstream technology of the data connection at present, the data connection operation can be dispersed on a relatively cheap computer cluster, and the problem of how to reasonably distribute the ETL operation and fully call and utilize the calculation resources of an ETL server is necessarily involved. In a medium-scale data warehouse in the field of network operation and maintenance, in order to meet various data statistical requirements such as various data analysis, daily reports, weekly reports, monthly reports and the like, an ETL (extract-transform-load) access task in a ready state is generally maintained above one hundred, an ETL server cluster generally consists of twelve left and right common computers or virtual machines, and if a strategy of 'who is idle and who is processing' is simply adopted, the ETL server cluster is wasted by resources, as shown in fig. 1.
Generally, the task of data access mainly includes three types of data extraction, data conversion and data loading, and the consumption of CPU, memory and I/O operation resources is heavier, if a certain ETL server is engaged in ETL operation and occupies a large amount of I/O operation resources for a long time, the CPU computing resources are still in an idle state, which is a great waste for ETL operation requiring CPU computing resources, and similarly, for a computer performing a large amount of memory-occupied operations, I/O operation idle is also a waste of resources for ETL operation requiring simple inter-table data replication, for example, idle resources of ETL operation and ETL server can be accurately matched, the efficiency of executing ETL task can be greatly improved, which requires that we first determine the type of ETL operation and then accurately evaluate the service capability of idle resources of ETL server, finally, the ETL job to be executed and the ETL server are precisely matched, as shown in FIG. 2.
Disclosure of Invention
The invention aims to: aiming at the technical problem, the invention provides a method for classifying and evaluating ETL job scheduling resources in a distributed environment.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for classifying and evaluating ETL job scheduling resources in a distributed environment comprises the following steps:
determining an index system for evaluating the performance of an ETL server;
step two, determining the type of the ETL operation by calculating the comprehensive evaluation value of the ETL operation;
step three, clustering and analyzing ETL server index data based on an index system to obtain an ETL server candidate set classified correspondingly to the ETL operation type;
step four, aiming at various ETL server candidate sets, establishing an index evaluation matrix and calculating the information entropy of the index and the weight of the information entropy;
step five, calculating the distance between the ideal point and the index evaluation of the ETL server according to the calculation result of the step four, and sequencing various ETL server candidate sets according to the distance value;
and step six, calculating and determining the ETL job type according to the step two, forming a queue according to the calculated and determined ETL job type, and selecting a target server ranked in front from the corresponding classified ETL server queues.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the invention adopts a cluster analysis method and an evaluation method based on information entropy, realizes the refined classification of the ETL server and the ETL operation, evaluates the performance of the ETL server based on the information entropy theory, automatically matches and executes the classified ETL operation, can fully utilize the idle resources of the ETL server, dynamically allocates the computing resources in quasi-real time, improves the efficiency of data access, and adapts to the increasingly huge requirements of data warehouses.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram of simple policy allocation scheduling resources;
FIG. 2 is a schematic diagram of a classification evaluation policy scheduling resource;
FIG. 3 is a flow diagram of a method for evaluating and scheduling ETL jobs by ETL server classification in a distributed environment according to the present invention.
Detailed Description
As shown in fig. 3, the method for classifying and evaluating ETL job scheduling resources in a distributed environment of the present invention is characterized by comprising the following steps:
determining an index system for evaluating the performance of an ETL server;
dividing indexes for evaluating the performance of the ETL server into a CPU class, a memory class and an I/O class; the indexes of the CPU class comprise the utilization rate of the CPU and the occupancy rate of a CPU process queue; the indexes of the memory class comprise memory occupancy rate, buffer zone non-waiting rate and Redo buffer zone non-waiting rate; the indicator of the I/O class includes an I/O busy rate. Specifically, the index system for evaluating the performance of the ETL server is shown in table 1.
Table 1:
Figure BDA0002301978110000031
Figure BDA0002301978110000041
step two, determining the type of the ETL operation by calculating the comprehensive evaluation value of the ETL operation;
and classifying the ETL operation types into a CPU type, a memory type and an I/O type corresponding to the index classification for evaluating the performance of the ETL server according to the index system for evaluating the performance of the ETL server. For the data volume, the conversion workload (the number of conversion components) and the loading workload (the number of loading components) of each ETL job, after determining the weights (respectively, the influence factors of the CPU utilization rate, the memory occupancy rate and the I/O busy rate) of the data volume, the conversion workload (the number of conversion components) and the loading workload (the number of loading components), substituting the weights into a comprehensive evaluation value calculation formula to calculate a comprehensive evaluation value, and then determining the type of the ETL job according to the calculated comprehensive evaluation value.
The specific method comprises the following steps:
(1) recording the ETL operation type as ETLS, and defining a comprehensive evaluation value calculation formula of ETLS as follows:
ETLS=DS*α+TS*β+LD*γ
wherein DS represents data volume, TS represents the number of conversion components, LD represents the number of loading components, α, β and gamma are respectively the influence factors of CU, MA and IU, the value range is (0,1), and the adjustment can be carried out according to expert knowledge and test results.
(2) Determining the ETL job type according to the following formula by using the calculated comprehensive evaluation value:
Figure BDA0002301978110000042
step three, clustering and analyzing ETL server index data based on an index system to obtain an ETL server candidate set classified correspondingly to the ETL operation type;
the basic idea of cluster analysis is as follows by means of fuzzy classification idea in fuzzy mathematics: establishing an index vector set in an m-dimensional space, firstly giving the number of classifications and initial classifications to obtain the initial membership of each vector, calculating the clustering center of each initial classification, and then iterating repeatedly until each vector belongs to a certain class with a certain membership.
The specific method is that according to the number n of the ETL servers in practice, a six-dimensional vector space based on an index system for evaluating the performance of the ETL servers is established, and then the following steps are executed:
step 3.1, determining the initial classification number and the initial classification of the ETL server to obtain an initial membership degree;
determining an initial classification of an ETL server as S-class, initial classification Cj(j<=S),XiIs a vector in six-dimensional vector space, the initial membership uijThe expression of (a) is as follows:
Figure BDA0002301978110000051
further, the initial classification S of the ETL server corresponds to the number of ETL job types, that is, the initial classification of the ETL server is divided into 3 classes, which correspond to a CPU class, a memory class, and an I/O class, respectively.
Step 3.2, calculating the distance of the class center to which each vector belongs according to the Euclidean distance;
setting v at initializationjIs CjThe distance of the cluster center to which each vector belongs is expressed as follows:
Figure BDA0002301978110000052
wherein l represents the number of iterations, m corresponds to the classification of the evaluation index,
Figure BDA0002301978110000053
is denoted by vjThe ith iteration value of the kth vector;
step 3.3, calculating the membership degree of each vector, wherein the expression is as follows:
Figure BDA0002301978110000054
α is an empirical constant of convergence rate, and is generally set to α > 1;
step 3.4, judging the convergence degree of the membership degree, and if the following formula is met, ending the iteration of the step 3.2;
Figure BDA0002301978110000061
wherein, epsilon is an empirical constant, and is generally set to be 0.05;
step 3.5, calculating a new clustering center of each ETL server type;
Figure BDA0002301978110000062
wherein the content of the first and second substances,
Figure BDA0002301978110000063
to be classified into CjThe aggregation center of (a);
and 3.6, judging that the ETL server belongs to one of ETL server candidate sets (such as a CPU type, a memory type and an I/O type) of a specific type according to the clustering center and the subscript codes of the ETL server vector classification.
Step four, establishing an index evaluation matrix aiming at various ETL server candidate sets, and calculating the information entropy of the index and the weight of the information entropy;
the index which has the largest influence on the ETL server is analyzed based on the principle of the information entropy, and the entropy value of the index and the weight of the index in the evaluation model are evaluated in a quantitative mode.
The method specifically comprises the following steps:
step 4.1, establishing an index evaluation matrix X;
X=(xij)n*m
wherein n is the number of ETL server candidate sets, m is six types of indexes for evaluating the performance of the ETL server, and i and j are respectively the horizontal coordinate and the vertical coordinate of an index vector;
step 4.2, the indexes are normalized;
(1) positive correlation index treatment:
Figure BDA0002301978110000071
wherein, aijIs a certain index data set;
(2) and (3) processing the negative correlation index:
Figure BDA0002301978110000072
wherein, aijIs a certain index data set;
4.3, calculating the information entropy of the index;
Figure BDA0002301978110000073
where k is a constant, generally set to k 1;
the information entropy represents the disorder degree of the information, and the larger the value is, the smaller the information contained in the index is, and the smaller the contribution degree to the overall evaluation is.
Step 4.4, calculating the information deviation degree dj
dj=1-Ej
Step 4.5, calculating the weight of the information entropy of the index;
Figure BDA0002301978110000074
according to the weight value w of each index of the ETL serverjA composite performance score for the index may be calculated.
Step five, calculating the distance between the ideal point and the index evaluation of the ETL server according to the calculation result of the step four, and sequencing various ETL server candidate sets according to the distance value;
the method specifically comprises the following steps:
step 5.1, obtaining an index evaluation matrix according to the calculation result of the step 4.1
X=(xij)n*m
By weight w with information entropyjNormalizing and constraining the multiplication line by line to obtain an attribute matrix
B=(bij)
Step 5.2, calculating an ideal point
Figure BDA0002301978110000081
Figure BDA0002301978110000082
Figure BDA0002301978110000083
Step 5.3, solving the distance between each ETL server and an ideal point, and substituting the distance into a formula:
Figure BDA0002301978110000084
and 5.4, sorting the ETL servers in the various ETL server candidate sets according to the distance values, wherein the smaller the distance value is, the more the distance value meets the requirement of selecting a target.
And step six, calculating and determining the ETL operation type of the ETL operation according to the step two, forming ETL operation queues of three types (CPU type, memory type and I/O type) to be distributed according to the scheduling time of the operation plan, forming ETL server queues of three types (CPU type, memory type and I/O type) by the ETL server serving as a distribution target after the step five is finished, sorting the ETL server queues of each type according to the numerical values from the ideal point, wherein the smaller the numerical value is, the higher the priority is, when the ETL operation is distributed, adopting a 'first come first serve' strategy, preferentially selecting the server with the highest priority in the ETL server queues of the corresponding type, and removing the ETL server from the queues after the distribution is finished, thereby finally achieving the purpose of reasonably allocating the calculation resources.
As can be seen from the above, the present invention has the following beneficial effects:
the invention adopts a cluster analysis method and an evaluation method based on information entropy, realizes the refined classification of the ETL server and the ETL operation, evaluates the performance of the ETL server based on the information entropy theory, automatically matches the ETL operation which does not enter an execution state, fully utilizes idle resources of the ETL server, dynamically allocates computing resources in a quasi-real-time and relatively accurate manner, improves the efficiency of data access, and adapts to the increasingly huge requirements of data warehouses.

Claims (10)

1. A method for classified evaluation of ETL job scheduling resources in a distributed environment is characterized by comprising the following steps:
determining an index system for evaluating the performance of an ETL server;
step two, determining the type of the ETL operation by calculating the comprehensive evaluation value of the ETL operation;
step three, clustering and analyzing ETL server index data based on an index system to obtain an ETL server candidate set classified correspondingly to the ETL operation type;
step four, aiming at various ETL server candidate sets, establishing an index evaluation matrix and calculating the information entropy of the index and the weight of the information entropy;
step five, calculating the distance between the ideal point and the index evaluation of the ETL server according to the calculation result of the step four, and sequencing various ETL server candidate sets according to the distance value;
and step six, calculating and determining the ETL job type according to the step two, forming a queue according to the calculated and determined ETL job type, and selecting a target server ranked in front from the corresponding classified ETL server queues.
2. The method for classification and evaluation of ETL job scheduling resources in a distributed environment according to claim 1, wherein the method of the first step is: dividing indexes for evaluating the performance of the ETL server into a CPU class, a memory class and an I/O class; the indexes of the CPU class comprise the utilization rate of the CPU and the occupancy rate of a CPU process queue; the indexes of the memory class comprise memory occupancy rate, buffer zone non-waiting rate and Redo buffer zone non-waiting rate; the indicator of the I/O class includes an I/O busy rate.
3. The method for classifying and evaluating ETL job scheduling resources in a distributed environment according to claim 2, wherein the method of the second step comprises:
(1) recording the ETL operation type as ETLS, and defining a comprehensive evaluation value calculation formula of ETLS as follows:
ETLS=DS*α+TS*β+LD*γ
wherein DS represents data volume, TS represents the number of conversion components, LD represents the number of loading components, α, β and gamma are respectively the influence factors of CU, MA and IU, and the value range is (0, 1);
(2) determining the ETL job type according to the following formula by using the calculated comprehensive evaluation value:
Figure FDA0002301978100000021
4. the method according to claim 2, wherein the method in step three is to establish a six-dimensional vector space based on an index system for evaluating the performance of the ETL server according to the number n of the ETL servers in practice, and then perform the following steps:
step 3.1, determining the initial classification number and the initial classification of the ETL server to obtain an initial membership degree;
determining an initial classification of an ETL server as S-class, initial classification Cj(j≤S),XiIs a vector in six-dimensional vector space, the initial membership uijThe expression of (a) is as follows:
Figure FDA0002301978100000022
step 3.2, calculating the distance of the class center to which each vector belongs according to the Euclidean distance;
setting v at initializationjIs CjThe distance of the cluster center to which each vector belongs is expressed as follows:
Figure FDA0002301978100000023
wherein l represents the number of iterations, m corresponds to the classification of the evaluation index,
Figure FDA0002301978100000024
is denoted by vjThe ith iteration value of the kth vector;
step 3.3, calculating the membership degree of each vector, wherein the expression is as follows:
Figure FDA0002301978100000025
wherein α is an empirical constant for convergence rate;
step 3.4, judging the convergence degree of the membership degree, and if the following formula is met, ending the iteration of the step 3.2;
Figure FDA0002301978100000031
wherein epsilon is an empirical constant;
step 3.5, calculating a new clustering center of each ETL server type;
Figure FDA0002301978100000032
wherein, Vj (l+1)To be classified into CjThe aggregation center of (a);
and 3.6, judging that the ETL server belongs to one type of ETL server candidate sets of a specific type according to the clustering center and the subscript codes of the ETL server vector classification.
5. The method of claim 4, wherein the initial classification S of the ETL server corresponds to the number of ETL job types.
6. The method of claim 4, wherein α >1 is used for classification evaluation of ETL job scheduling resources in a distributed environment.
7. The method of claim 4, wherein ε is 0.05.
8. The method for classification and evaluation of ETL job scheduling resources in a distributed environment according to claim 1, wherein the method of step four is:
step 4.1, establishing an index evaluation matrix X;
X=(xij)n*m
wherein n is the number of ETL server candidate sets, m is the number of six types of indexes for evaluating the performance of the ETL server, and i and j are respectively the horizontal coordinate and the vertical coordinate of an index vector;
step 4.2, the indexes are normalized;
(1) positive correlation index treatment:
Figure FDA0002301978100000041
wherein, aijIs a certain index data set;
(2) and (3) processing the negative correlation index:
Figure FDA0002301978100000042
wherein, aijIs a certain index data set;
4.3, calculating the information entropy of the index;
Ej=-k∑i=1(rijlnrij),j=1,…….,m
wherein k is a constant;
step 4.4, calculating the information deviation degree dj
dj=1-Ej
Step 4.5, calculating the weight of the information entropy of the index;
Figure FDA0002301978100000043
9. the method of claim 8, wherein k is 1.
10. The method for classifying and evaluating ETL job scheduling resources in a distributed environment according to claim 1, wherein the method of the fifth step is:
(1) calculating ideal points according to the calculation result of the step four
Obtaining an index evaluation matrix according to the calculation result of the step four
X=(xij)n*m
By weight w with information entropyjNormalizing and constraining the multiplication line by line to obtain an attribute matrix
B=(bij)
(2) Finding the ideal point
Figure FDA0002301978100000051
Figure FDA0002301978100000052
(3) And solving the distance between each ETL server and an ideal point, and substituting the distance into a formula:
Figure FDA0002301978100000053
(4) and sorting various ETL server candidate sets according to the distance values, wherein the smaller the distance value is, the more the selection target requirement is met.
CN201911225107.5A 2019-12-04 2019-12-04 ETL job scheduling resource classification evaluation method under distributed environment Active CN111144701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911225107.5A CN111144701B (en) 2019-12-04 2019-12-04 ETL job scheduling resource classification evaluation method under distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911225107.5A CN111144701B (en) 2019-12-04 2019-12-04 ETL job scheduling resource classification evaluation method under distributed environment

Publications (2)

Publication Number Publication Date
CN111144701A true CN111144701A (en) 2020-05-12
CN111144701B CN111144701B (en) 2022-03-22

Family

ID=70517617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911225107.5A Active CN111144701B (en) 2019-12-04 2019-12-04 ETL job scheduling resource classification evaluation method under distributed environment

Country Status (1)

Country Link
CN (1) CN111144701B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231314A (en) * 2020-11-05 2021-01-15 深圳市丽湖软件有限公司 Quality data evaluation method based on ETL
CN116600014A (en) * 2023-07-17 2023-08-15 中移(苏州)软件技术有限公司 Server scheduling method and device, electronic equipment and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050042A (en) * 2014-05-30 2014-09-17 北京先进数通信息技术股份公司 Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN104168318A (en) * 2014-08-18 2014-11-26 中国联合网络通信集团有限公司 Resource service system and resource distribution method thereof
US20160098292A1 (en) * 2014-10-03 2016-04-07 Microsoft Corporation Job scheduling using expected server performance information
CN106453546A (en) * 2016-10-08 2017-02-22 电子科技大学 Distributed storage scheduling method
CN107067182A (en) * 2017-04-27 2017-08-18 贵州大学 Towards the product design scheme appraisal procedure of multidimensional image
CN107480856A (en) * 2017-07-06 2017-12-15 浙江大学 Based on the sale of electricity company power customer appraisal procedure for improving similarity to ideal solution ranking method
CN108833302A (en) * 2018-06-27 2018-11-16 重庆邮电大学 Resource allocation methods under cloud environment based on fuzzy clustering and stringent bipartite matching
CN109389145A (en) * 2018-08-17 2019-02-26 国网浙江省电力有限公司宁波供电公司 Electric energy meter production firm evaluation method based on metering big data Clustering Model
CN109408236A (en) * 2018-10-22 2019-03-01 福建南威软件有限公司 A kind of task load equalization methods of ETL on cluster
CN110362392A (en) * 2019-07-15 2019-10-22 深圳乐信软件技术有限公司 A kind of ETL method for scheduling task, system, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050042A (en) * 2014-05-30 2014-09-17 北京先进数通信息技术股份公司 Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN104168318A (en) * 2014-08-18 2014-11-26 中国联合网络通信集团有限公司 Resource service system and resource distribution method thereof
US20160098292A1 (en) * 2014-10-03 2016-04-07 Microsoft Corporation Job scheduling using expected server performance information
CN106453546A (en) * 2016-10-08 2017-02-22 电子科技大学 Distributed storage scheduling method
CN107067182A (en) * 2017-04-27 2017-08-18 贵州大学 Towards the product design scheme appraisal procedure of multidimensional image
CN107480856A (en) * 2017-07-06 2017-12-15 浙江大学 Based on the sale of electricity company power customer appraisal procedure for improving similarity to ideal solution ranking method
CN108833302A (en) * 2018-06-27 2018-11-16 重庆邮电大学 Resource allocation methods under cloud environment based on fuzzy clustering and stringent bipartite matching
CN109389145A (en) * 2018-08-17 2019-02-26 国网浙江省电力有限公司宁波供电公司 Electric energy meter production firm evaluation method based on metering big data Clustering Model
CN109408236A (en) * 2018-10-22 2019-03-01 福建南威软件有限公司 A kind of task load equalization methods of ETL on cluster
CN110362392A (en) * 2019-07-15 2019-10-22 深圳乐信软件技术有限公司 A kind of ETL method for scheduling task, system, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
任艳波: "网格资源分配综合评价决策模型研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
刘愚: "基于熵权—理想点法的大坝安全评价模型及应用", 《水电能源科学》 *
张翔: "基于模糊熵方法的云服务选择研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231314A (en) * 2020-11-05 2021-01-15 深圳市丽湖软件有限公司 Quality data evaluation method based on ETL
CN116600014A (en) * 2023-07-17 2023-08-15 中移(苏州)软件技术有限公司 Server scheduling method and device, electronic equipment and readable storage medium
CN116600014B (en) * 2023-07-17 2023-10-10 中移(苏州)软件技术有限公司 Server scheduling method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN111144701B (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN110390345B (en) Cloud platform-based big data cluster self-adaptive resource scheduling method
CN106776005B (en) Resource management system and method for containerized application
US10198292B2 (en) Scheduling database queries based on elapsed time of queries
CN107038069A (en) Dynamic labels match DLMS dispatching methods under Hadoop platform
CN108984301A (en) Self-adaptive cloud resource allocation method and device
CN111431961B (en) Energy-saving task allocation method for cloud data center
CN111144701B (en) ETL job scheduling resource classification evaluation method under distributed environment
CN106202431A (en) A kind of Hadoop parameter automated tuning method and system based on machine learning
US20140025658A1 (en) Validating database table partitioning schemes using stratified random sampling
CN110806954A (en) Method, device and equipment for evaluating cloud host resources and storage medium
CN104881322A (en) Method and device for dispatching cluster resource based on packing model
CN108833302B (en) Resource allocation method based on fuzzy clustering and strict bilateral matching in cloud environment
CN109343951B (en) Mobile computing resource allocation method, computer-readable storage medium and terminal
Kolomvatsos et al. A probabilistic model for assigning queries at the edge
CN114066073A (en) Power grid load prediction method
Muthusamy et al. Cluster-based task scheduling using K-means clustering for load balancing in cloud datacenters
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN109976879B (en) Cloud computing virtual machine placement method based on resource usage curve complementation
CN114356531A (en) Edge calculation task classification scheduling method based on K-means clustering and queuing theory
CN117407178A (en) Acceleration sub-card management method and system for self-adaptive load distribution
CN109358962B (en) Mobile computing resource autonomous allocation device
CN111258730A (en) Task scheduling method based on competition conditions
CN115033389A (en) Energy-saving task resource scheduling method and device for power grid information system
WO2023224742A1 (en) Predicting runtime variation in big data analytics
Ray et al. Is high performance computing (HPC) ready to handle big data?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant