CN112130968A - Method and device for distributing data - Google Patents

Method and device for distributing data Download PDF

Info

Publication number
CN112130968A
CN112130968A CN201910557427.4A CN201910557427A CN112130968A CN 112130968 A CN112130968 A CN 112130968A CN 201910557427 A CN201910557427 A CN 201910557427A CN 112130968 A CN112130968 A CN 112130968A
Authority
CN
China
Prior art keywords
input data
cluster
sampling
value
processes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910557427.4A
Other languages
Chinese (zh)
Inventor
王红雁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910557427.4A priority Critical patent/CN112130968A/en
Publication of CN112130968A publication Critical patent/CN112130968A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for distributing data, and relates to the technical field of computers. One embodiment of the method comprises: sampling input data according to nodes in the cluster, and determining a sampling result; predicting the distribution characteristics of the input data according to the sampling result; dividing the input data into regions according to the distribution characteristics; and matching the input data of each divided region with the processes in the cluster. The implementation mode aims at the technical problem caused by unbalance of input data in the computer engine, and achieves the technical effect that the time for running a large amount of input data on different nodes in a cluster is more balanced, and further the running efficiency of the computer engine is improved.

Description

Method and device for distributing data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for distributing data.
Background
When a computing engine processes data, there may be a technical drawback that the execution time consumed by tasks executed in the same time period is significantly high. The task whose execution time is significantly higher is called a Straggler type task. The prior art employs speculative execution mechanisms to reduce the occurrence of the Straggler-type tasks.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
when the stratgger-type task is caused by non-uniformity of input data, the speculative execution mechanism cannot effectively solve the technical defects of long consumed time and further increased overall completion time of the executed task.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for distributing data, so that the time for a large amount of input data to run on different nodes in a cluster is more balanced, and a technical effect of improving the running efficiency of a computer engine is achieved.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of allocating data, including:
sampling input data according to nodes in the cluster, and determining a sampling result;
predicting the distribution characteristics of the input data according to the sampling result;
dividing the input data into regions according to the distribution characteristics;
and matching the input data of each divided region with the processes in the cluster.
Optionally, sampling the input data according to nodes in the cluster, and determining a sampling result, including:
calculating the total record number of the cluster according to the elastic distributed data set;
determining the number of the extracted samples of the elastic distributed data set region according to the total number of the records and the sampling proportion;
sampling the corresponding elastic distributed data set region according to the number of the extracted samples, and determining the key values of the region extracted samples and the record number of the key values;
and determining the set element ancestor generated by combining the key value and the record number thereof as a sampling result.
Optionally, predicting a distribution characteristic of the input data according to the sampling result includes:
constructing a distribution histogram of the input data according to the sampling result;
and predicting the distribution characteristics of the input data according to the distribution histogram.
Optionally, the number of the regions is an integer multiple of the number of processor cores in the cluster.
Optionally, dividing the input data into regions according to the distribution characteristics includes:
determining the upper limit of the number of sampling results contained in the area to be divided;
sorting the sampling results according to the record number;
judging whether the space occupied by the area with the maximum residual capacity in the cluster is larger than the space occupied by the sampling result corresponding to the maximum record number;
if so, distributing the sampling result corresponding to the maximum record number to the area with the maximum residual capacity, and modifying the residual capacity of the area;
if not, determining a first storage space occupied by the area with the maximum residual capacity; determining sub-key value pairs of sampling results corresponding to the maximum record number with the same size as the first storage space; assigning the sub-key-value pair to an area where the remaining capacity is the largest; and matching the sub-key value pairs which are not allocated in the sampling result corresponding to the maximum record number with other areas.
Optionally, before matching the input data of each divided region with the processes in the cluster, the method includes:
initializing weights, calculation capacity values and monitoring times of processes corresponding to nodes in the cluster;
for a process, if the CPU utilization rate and the memory utilization rate both reach the upper limit, value-added processing is carried out on the calculation capacity value and the monitoring times value; if the CPU utilization rate and/or the memory utilization rate do not reach the upper limit, performing value-added processing on the monitoring times;
when the monitoring times value reaches a preset adjusting period, judging the relation between the calculation capacity value and a preset first threshold value and a preset second threshold value; when the calculation capacity value is larger than the preset first threshold value, increasing the weight of the process; when the calculation capacity value is smaller than the preset second threshold value, reducing the weight of the process; when the calculation capacity value is smaller than the preset first threshold and larger than the second threshold, the weight of the process is not changed;
wherein the preset first threshold is greater than the preset second threshold.
Optionally, before matching the input data of each divided region with the processes in the cluster, the method includes:
monitoring performance parameters of the corresponding processes of the node;
wherein the performance parameters include: CPU utilization rate and memory utilization rate.
Optionally, monitoring the CPU utilization of the process corresponding to the node includes:
determining the CPU utilization rate of the process corresponding to the node by using a negative feedback mechanism;
the calculation formula of the CPU utilization rate of the process corresponding to the node is as follows:
Figure BDA0002107255920000031
wherein, CUi(tj) At t for the processjCPU utilization value of time, CU'i(tj) At t for the processjA detection value of a time; CUi(tj-1) At t for the processjLast moment of time tj-1CPU utilization value of.
Optionally, the calculation formula for monitoring the memory utilization rate of the process corresponding to the node is as follows:
Figure BDA0002107255920000032
wherein, MUiThe memory utilization rate of the process is; remainingMem is the remaining memory amount of the process; totalMem is the amount of memory initially configured for the process.
Optionally, the calculation formula for initializing the weight of the process corresponding to the node in the cluster is as follows:
Wi=Speedcpu×(1-Rcpu)×(1-Rmem)
wherein SpeedcpuThe CPU main frequency of the corresponding process of the node; rcpuThe average CPU utilization rate of the process corresponding to the node is obtained; rmemThe memory utilization rate of the corresponding process of the node; wiIs the initial weight of the process.
Optionally, matching the input data of each divided region with a process in the cluster, including:
calculating a performance factor of each process according to the weight of the processes in the cluster;
arranging the processes in the cluster according to the size of the performance factor;
judging whether the number of cores of the CPU available for the process with the maximum performance factor is larger than the number of cores required for inputting data after the area division is completed; and if so, distributing the input data of the divided areas to the corresponding processes, and updating the available core number of the distributed processes.
Optionally, the calculation formula for calculating the performance factor of each process is:
Figure BDA0002107255920000041
Figure BDA0002107255920000042
wherein Wi is the weight of the process; num is the number of all processes running the input data in the cluster; favg is the average value of all process weights running the input data in the cluster; fi is a performance factor.
According to another aspect of the embodiments of the present invention, there is provided an apparatus for distributing data, including:
the sampling result determining module is used for sampling the input data according to the nodes in the cluster and determining the sampling result;
the distribution characteristic prediction module is used for predicting the distribution characteristics of the input data according to the sampling result;
the region dividing module is used for dividing the input data into regions according to the distribution characteristics;
and the data distribution module is used for matching the input data of each divided region with the processes in the cluster.
Optionally, sampling the input data according to nodes in the cluster, and determining a sampling result, including:
calculating the total record number of the cluster according to the elastic distributed data set;
determining the number of the extracted samples of the elastic distributed data set region according to the total number of the records and the sampling proportion;
sampling the corresponding elastic distributed data set region according to the number of the extracted samples, and determining the key values of the region extracted samples and the record number of the key values;
and determining the set element ancestor generated by combining the key value and the record number thereof as a sampling result.
Optionally, predicting a distribution characteristic of the input data according to the sampling result includes:
constructing a distribution histogram of the input data according to the sampling result;
and predicting the distribution characteristics of the input data according to the distribution histogram.
Optionally, the number of the regions is an integer multiple of the number of processor cores in the cluster.
Optionally, dividing the input data into regions according to the distribution characteristics includes:
determining the upper limit of the number of sampling results contained in the area to be divided;
sorting the sampling results according to the record number;
judging whether the space occupied by the area with the maximum residual capacity in the cluster is larger than the space occupied by the sampling result corresponding to the maximum record number;
if so, distributing the sampling result corresponding to the maximum record number to the area with the maximum residual capacity, and modifying the residual capacity of the area;
if not, determining a first storage space occupied by the area with the maximum residual capacity; determining sub-key value pairs of sampling results corresponding to the maximum record number with the same size as the first storage space; assigning the sub-key-value pair to an area where the remaining capacity is the largest; and matching the sub-key value pairs which are not allocated in the sampling result corresponding to the maximum record number with other areas.
Optionally, before matching the input data of each divided region with the processes in the cluster, the method includes:
initializing weights, calculation capacity values and monitoring times of processes corresponding to nodes in the cluster;
for a process, if the CPU utilization rate and the memory utilization rate both reach the upper limit, value-added processing is carried out on the calculation capacity value and the monitoring times value; if the CPU utilization rate and/or the memory utilization rate do not reach the upper limit, performing value-added processing on the monitoring times;
when the monitoring times value reaches a preset adjusting period, judging the relation between the calculation capacity value and a preset first threshold value and a preset second threshold value; when the calculation capacity value is larger than the preset first threshold value, increasing the weight of the process; when the calculation capacity value is smaller than the preset second threshold value, reducing the weight of the process; when the calculation capacity value is smaller than the preset first threshold and larger than the second threshold, the weight of the process is not changed;
wherein the preset first threshold is greater than the preset second threshold.
Optionally, before matching the input data of each divided region with the processes in the cluster, the method includes:
monitoring performance parameters of the corresponding processes of the node;
wherein the performance parameters include: CPU utilization rate and memory utilization rate.
Optionally, monitoring the CPU utilization of the process corresponding to the node includes:
determining the CPU utilization rate of the process corresponding to the node by using a negative feedback mechanism;
the calculation formula of the CPU utilization rate of the process corresponding to the node is as follows:
Figure BDA0002107255920000061
wherein, CUi(tj) At t for the processjCPU utilization value of time, CU'i(tj) At t for the processjA detection value of a time; CUi(tj-1) At t for the processjLast moment of time tj-1CPU utilization value of.
Optionally, the calculation formula for monitoring the memory utilization rate of the process corresponding to the node is as follows:
Figure BDA0002107255920000071
wherein, MUiThe memory utilization rate of the process is; remainingMem is the remaining memory amount of the process; totalMem is the amount of memory initially configured for the process.
Optionally, the calculation formula for initializing the weight of the process corresponding to the node in the cluster is as follows:
Wi=Speedcpu×(1-Rcpu)×(1-Rmem)
wherein SpeedcpuThe CPU main frequency of the corresponding process of the node; rcpuThe average CPU utilization rate of the process corresponding to the node is obtained; rmemThe memory utilization rate of the corresponding process of the node; wiIs the initial weight of the process.
Optionally, matching the input data of each divided region with a process in the cluster, including:
calculating a performance factor of each process according to the weight of the processes in the cluster;
arranging the processes in the cluster according to the size of the performance factor;
judging whether the number of cores of the CPU available for the process with the maximum performance factor is larger than the number of cores required for inputting data after the area division is completed; and if so, distributing the input data of the divided areas to the corresponding processes, and updating the available core number of the distributed processes.
Optionally, the calculation formula for calculating the performance factor of each process is:
Figure BDA0002107255920000072
Figure BDA0002107255920000073
wherein Wi is the weight of the process; num is the number of all processes running the input data in the cluster; favg is the average value of all process weights running the input data in the cluster; fi is a performance factor.
According to another aspect of the embodiments of the present invention, there is provided an electronic device for distributing data, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for distributing data provided by the present invention.
According to a further aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method of distributing data provided by the present invention.
One embodiment of the above invention has the following advantages or benefits:
aiming at the technical problem caused by unbalance of input data in the computer engine, the invention adopts the technical means of dividing the input data into areas in a balanced manner according to the distribution characteristics of the input data, thereby achieving the purpose that the time for running a large amount of input data on different nodes in a cluster is more balanced, and further achieving the technical effect of improving the running efficiency of the computer engine.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a method of distributing data according to an embodiment of the invention;
FIG. 2 is a flow chart of partitioning input data into regions according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of the assignment of input data to designated areas in accordance with an embodiment of the present invention;
FIG. 4 is a flow diagram of process weight adjustment in accordance with an embodiment of the present invention;
FIG. 5 is a flow diagram of assigning input data to corresponding processes in accordance with an embodiment of the present invention;
FIG. 6 is a block diagram illustrating an architecture of a cluster for distributing data according to an embodiment of the present invention;
FIG. 7 is a detailed flow diagram according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of the main blocks of an apparatus for distributing data according to an embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 10 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a method of distributing data according to an embodiment of the present invention, as shown in fig. 1,
step S101, sampling input data according to nodes in a cluster, and determining a sampling result;
step S102, predicting the distribution characteristics of input data according to the sampling result;
step S103, dividing the input data into regions according to the distribution characteristics;
and step S104, matching the input data of each divided area with the processes in the cluster.
During execution of a computer engine (including but not limited to a computational framework based on memory iteration), data imbalance problems may occur due to imbalances in the distribution of input data and default algorithm assignments. There may be tasks in the computer engine with significantly higher execution times, called Straggler-type tasks (redundant tasks).
When the tasks to be executed or the input data are unbalanced, the computer engine cannot avoid the generation of the stratgler-type task, and further the time consumed for completing the tasks on the computer engine is long.
Aiming at the technical problem caused by unbalance of input data in the computer engine, the invention adopts the technical means of dividing the input data into areas in a balanced manner according to the distribution characteristics of the input data, thereby achieving the purpose that the time for running a large amount of input data on different nodes in a cluster is more balanced, and further achieving the technical effect of improving the running efficiency of the computer engine.
The technical defect of uneven input data is mainly caused by unbalance in upstream data distribution, namely when the data is distributed by adopting a default region function, the characteristics of unbalanced distribution of the data and the difference of the computing capacity of nodes in the cluster are not considered. For the heterogeneous cluster, the difference of computing power between processes (executors) corresponding to each node (worker) in the cluster needs to be considered.
The invention can determine the difference of each node in the cluster by monitoring the parameters of the nodes. Therefore, before matching the input data of each divided region with the processes in the cluster, optionally, the method includes:
monitoring performance parameters of the corresponding processes of the node;
wherein the performance parameters include: CPU utilization rate and memory utilization rate.
CPU utilization rate CU of node-to-processi(CPU utilization) can be determined by two consecutive samples, which is calculated by the formula:
Figure BDA0002107255920000101
in the above formula, total _ CPU _1 represents the consumption time of all CPUs in the cluster obtained by the 1 st sampling, total _ CPU _2 represents the consumption time of all CPUs in the cluster obtained by the 2 nd sampling, total _ process _1 represents the CPU time consumed by the specified process obtained by the 1 st sampling, and total _ process _2 represents the CPU time consumed by the specified process obtained by the 2 nd sampling.
In the execution process of the computing engine, certain errors exist in the collected CPU utilization rate due to network fluctuation, sudden stop or start of tasks and the like. In order to reduce the above error, optionally, monitoring a performance parameter of the process corresponding to the node includes: the CPU utilization rate of the process corresponding to the node, that is, the adjustment amount of the mechanism added at the previous time at the current time, is determined by using a negative feedback mechanism, and the specific negative feedback mechanism may adopt various existing negative feedback formulas, where the negative feedback formula in this embodiment is as follows:
Figure BDA0002107255920000111
wherein, X(t-1)A calculated value representing a last monitoring time; x' (t) represents a monitored value at the current monitoring time; a is used for controlling the amplitude of regulation and can be set to be close to X(t-1)A value of (d); x(t)Is the final result.
Applying the formula of the negative feedback mechanism to the monitoring of the CPU utilization rate of the process, wherein the calculation formula for determining the CPU utilization rate of the process is as follows:
Figure BDA0002107255920000112
wherein, CUi(tj) At t for the processjCPU utilization value of time, CU'i(tj) At t for the processjA detection value of a time; CUi(tj-1) At t for the processjLast moment of time tj-1CPU utilization value of.
According to the calculation formula of the CPU utilization rate, when the current time is 0, the CPU utilization rate CU of the current process is knowni(tj) Is its monitored value C'i(tj) When the current time is more than or equal to 1 time, the CPU utilization rate CU of the current processi(tj) The value of (is) the last time CPU utilization CUi(tj-1) And a detected value C 'of the current time'i(tj) Collectively, the value of A in the negative feedback equation may be set to C'i(tj) A detected value CU 'of the CPU utilization rate of the process at each time'i(tj) May be determined from a history.
In practical application, the hash table may be calculated by two hash tables, namely, a block management identifier process table (block manager idbyexecutor, which is used for caching a mapping relationship between a process and a block management identifier owned by the process) and a block management specific information table (block manager info, which is used for caching a mapping relationship between a block management identifier and block management specific information) in a block management main endpoint class (block manager masterendpointer). The block management identifier process table maintains the corresponding relationship between the process and the block management identifier (blockmanager id), the block management specific information table maintains the corresponding relationship between the block management identifier and the block management specific information (blockmanager info), and the block management specific information stores the memory use status of each block management identifier. According to the transfer relationship between the two hash tables, the remaining memory amount (remaining mem) of each process can be obtained from the block management specific information table, and the memory size allocated to the process is determined when the application is submitted.
Specifically, the calculation formula for monitoring the memory utilization rate of the process corresponding to the node is as follows:
Figure BDA0002107255920000121
wherein, MUiThe memory utilization rate of the process is; remainingMem is the remaining memory amount of the process; totalMem is the amount of memory initially configured for the process.
The initial value of the sampling weight of the nodes in the cluster can be determined by the master frequency of the node CPU, the CPU utilization rate and the memory utilization rate. The initial values are related to node hardware attributes and the magnitude of the utilization of the parameters. The larger the master frequency of the CPU of the node is, the lower the utilization rate of the CPU and the lower the utilization rate of the memory are, the larger the idle computing capacity of the corresponding process is, and the larger the distributed weight is. Optionally, the calculation formula for initializing the weight of the process corresponding to the node in the cluster is as follows:
Wi=Speedcpu×(1-Rcpu)×(1-Rmem)
wherein SpeedcpuThe CPU main frequency of the corresponding process of the node; rcpuThe average CPU utilization rate of the process corresponding to the node is obtained; rmemThe memory utilization rate of the corresponding process of the node; wiIs the initial weight of the process.
Optionally, sampling the input data according to nodes in the cluster, and determining a sampling result, including:
calculating the total record number of the cluster according to an elastic distributed data set (RDD); the region and the number of records thereof can be collected to a driving end (Driver end);
determining the number of the extracted samples of the elastic distributed data set region according to the total number of the records and the sampling proportion;
because the higher the sampling proportion is set, the more accurate the data distribution estimation is, but the time consumption is correspondingly longer, the size of the extracted sample (samplesipertization) can be determined according to the number of the areas and the number of the records of each area, and the sampling efficiency is improved under the condition of ensuring the accuracy;
sampling the corresponding elastic distributed data set region according to the number of the extracted samples, and determining the key values of the region extracted samples and the record number of the key values;
and determining a set element ancestor (tuple) generated by combining the key value and the record number thereof as a sampling result.
Wherein, the composition form of the aggregators can be tuples { (K1, C1), (K2, C2),............ the (Ki, Ci) }. Ki denotes a key value, and Ci denotes the number of records corresponding to Ki.
Optionally, predicting a distribution characteristic of the input data according to the sampling result includes:
according to the sampling result, a distribution histogram of input data is constructed to assist in analyzing the distribution of the input data; and then the distribution characteristics of the input data are predicted according to the distribution histogram.
When the number of regions is less than and/or not equal to an integer multiple of the number of pre-allocated CPU cores, the resources of the cluster may not be maximally utilized. In order to maximize the application of pre-allocated resources, the number of regions in the cluster may be set before the input data is partitioned according to the distribution characteristics. The number of the regions is an integral multiple of the number of the processor cores in the cluster. And the storage space of each region is approximately the same, namely partitionNum ═ λ Coreapp. Wherein the content of the first and second substances,Coreappin order to apply the number of the distributed CPU cores, lambda is more than or equal to 1, and the specific multiple can be set according to the actual situation.
Fig. 2 is a flow chart of partitioning input data into regions according to an embodiment of the present invention. As shown in fig. 2, dividing the input data into regions according to the distribution characteristics includes:
s201; determining the upper limit P of the number of sampling results contained in the region to be divided according to the collected key value pair set primitive tupleavg
S202, sorting sampling results (tuples) according to the record number Ci;
s203, matching the area P with the maximum residual capacity in the cluster with the number of records larger than the maximum storage number;
s204, judging whether the space occupied by the area with the maximum residual capacity in the cluster is larger than the space occupied by the sampling result corresponding to the maximum record number;
s205, if yes, distributing the sampling result corresponding to the maximum record number to the area with the maximum residual capacity, and modifying the residual capacity of the area;
s206, if not, determining a first storage space occupied by the area with the maximum residual capacity; determining sub-key value pairs of sampling results corresponding to the maximum record number with the same size as the first storage space; assigning the sub-key-value pair to an area where the remaining capacity is the largest; and matching the sub-key value pairs which are not allocated in the sampling result corresponding to the maximum record number with other areas.
In the process of dividing the input data into regions, each key-value pair can be traversed to determine the corresponding region.
FIG. 3 is a schematic illustration of input data being assigned to designated areas in accordance with an embodiment of the present invention.
As shown in fig. 3, if the remaining capacity of the area is less than the recording number CiThen, data of the size of the remaining capacity of the area is allocated to the area, and the allocated K isiIs marked as Ki_1And record Ki_1To continue assigning the remainderKiTo the region where the remaining capacity is the largest. If the residual capacity of the next distributed area is still insufficient, the distributed Key is marked as K in turni_2,Ki_3,….,Ki_nN is more than or equal to 1 and less than or equal to partitionNum, and recording region attribution until KiAnd (5) finishing the distribution. Then, (Ki, Ci) can be converted to (Ki _1, Ci _1), (Ki _2, Ci _2), (Ki _ n, Ci _ n), i.e., the original Ki is converted to a number of new Ki _ j, 1 ≦ j ≦ n, depending on the size of Ci _ j.
Keys, when in the set of sampling results, can be assigned according to fig. 3; otherwise, a default hash algorithm can be used to assign keys to corresponding partitions.
The idle computing power of each process is dynamically changed along with the resource use condition, and the idle computing power of each process needs to be measured in the process of task execution so as to facilitate the decision making in the process of task allocation.
FIG. 4 is a flow chart of process weight adjustment according to an embodiment of the present invention.
As shown in fig. 4, before matching the input data of each divided region with the processes in the cluster, optionally, the method includes:
s401, initializing weights and computing capacity values of processes corresponding to nodes in clustersi(S4011) and the monitoring Counti(S4012);
S402, calculating Cpu utilization rate (S4021) and memory utilization rate (S4022) of a process;
s403, for one process, if the CPU utilization rate is CUiHas reached an upper bound CUupperboundAnd memory utilization ratio MUiHas also reached the upper limit MUupperboundAdding value to the calculation capacity value and the monitoring times value; if the CPU utilization rate and/or the memory utilization rate do not reach the upper limit, performing value-added processing on the monitoring times;
s404, when the monitoring times reach a preset adjusting period, judging the relation between the calculation capacity value and a preset first threshold value alpha T and a preset second threshold value beta T; when the calculation capacity value is larger than the preset first threshold value, increasing the weight of the process; when the calculation capacity value is smaller than the preset second threshold value, reducing the weight of the process; when the calculation capacity value is smaller than the preset first threshold and larger than the second threshold, the weight of the process is not changed;
wherein the preset first threshold is greater than the preset second threshold; alpha and beta are regulatory factors. In practical applications, both α and β may be set to a fraction greater than 0 and less than 1. Typically, α is set to 0.6-0.9 and β is set to 0.1-0.5.
Specifically, if α is set to 0.7, β is set to 0.4, 10 checks are performed, and if at least 8 times are idle, the task should be added; if only 3 times at most are idle, the weight of the process should be reduced, i.e. the input data for the process is reduced. The values of α and β can be adjusted according to the actual operating conditions.
The computing power count value Capability may be adjusted each time the weight adjustment is completediAnd monitoring the Count of the decimal valueiReset to 0 facilitates subsequent weight adjustments.
Since the computing power of each process in the computing engine may be different, the method of distributing processes using the prior art method may have a technical drawback of uneven computation.
The embodiment of the invention determines the calculation efficiency of the process by a technical means of calculating the performance factor of the process, and can dynamically adjust the weight of the process according to the performance factor. Wherein the adjustment of the weights and the allocation of the input data may be performed asynchronously. Optionally, the input data obtains weight information of the latest process at the time of allocation. After passing through the data equalization area, the data amount that each area can accommodate is approximately the same. In order to reduce the algorithm complexity while meeting the optimization goal as much as possible, a greedy strategy can be adopted in task allocation.
FIG. 5 is a flow diagram of assigning input data to corresponding processes in accordance with an embodiment of the present invention.
As shown in fig. 5, the specific steps of assigning the input data to the corresponding process are as follows:
s501, calculating the performance factor of the process according to the weight of the process corresponding to the node in the cluster. Specifically, the calculation formula of the performance factor is as follows:
Figure BDA0002107255920000161
Figure BDA0002107255920000162
wherein Wi is a weight of the process; num is the number of all processes running the input data in the cluster; favg is the average value of all process weights running the input data in the cluster; fi is a performance factor. The higher the performance factor fi is, the better the performance of the process is, and the higher the calculation efficiency is.
S502, arranging the processes in the cluster according to the size of the performance factor;
s503, judging whether the number of cores of the CPU available for the process with the maximum performance factor is larger than the number of cores (default to 1 core) required by the input data after the region division is completed; and if so, distributing the input data of the divided areas to the corresponding processes, and updating the available core number of the distributed processes. So that subsequent other input data can be run in the process with the highest performance factor. If the task allocation requires multiple polling to complete, the above steps may be repeated.
Fig. 6 is a schematic diagram illustrating an architecture of a cluster for distributing data according to an embodiment of the present invention.
As shown in fig. 6, there is one master node and multiple slave nodes in the cluster.
And the master node samples and estimates the input data process according to each slave node, and determines the overall distribution condition of the input data. The master node may repartition the input data into regions according to the distribution.
And monitoring the process corresponding to each slave node, adjusting the weight of each process, and finally reallocating the input data.
Fig. 7 is a detailed flow diagram according to an embodiment of the invention.
As shown in fig. 7, the method comprises the following steps:
s1, monitoring the average CPU utilization rate and the memory utilization rate of each slave node, and initializing the weight information of a process (executive) after the process is started;
s2, each slave node samples input data according to the sampling proportion, and then the slave node sends local sampling information to a Master node (Master);
s3, the main node collects the sampling information of all the subordinate nodes, then according to the sampling proportion, a histogram of the input data distribution is established, and the overall characteristics of the input data distribution are predicted;
s4, dividing the input data into a plurality of areas according to the distribution condition of the input data, wherein the number of the areas is integral multiple of the total number of cores of all processes, and a larger Key can be split in the area dividing process;
s5, calculating performance factors of the processes, wherein input data of each data area is distributed to the process with the highest performance factor according to a greedy strategy;
in the whole process, the weight of the process needs to be dynamically adjusted according to the load and the resource utilization rate, and S5 may be repeated until the input data of each data region is completely allocated.
Fig. 8 is a schematic diagram of main blocks of an apparatus for distributing data according to an embodiment of the present invention.
As shown in fig. 8, according to still another aspect of the embodiment of the present invention, there is provided an apparatus 800 for distributing data, including:
a sampling result determining module 801, configured to sample input data according to a node in a cluster, and determine a sampling result;
a distribution characteristic prediction module 802, configured to predict a distribution characteristic of the input data according to the sampling result;
a region dividing module 803, configured to divide a region of the input data according to the distribution characteristics;
and a data distribution module 804, configured to match the input data of each divided region with a process in the cluster.
Fig. 9 illustrates an exemplary system architecture 900 to which the method of allocating data or the apparatus for allocating data of embodiments of the present invention may be applied.
As shown in fig. 9, the system architecture 900 may include terminal devices 901, 902, 903, a network 904, and a server 905 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific situations). Network 904 is the medium used to provide communication links between terminal devices 901, 902, 903 and server 905. Network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 901, 902, 903 to interact with a server 905 over a network 904 to receive or send messages and the like. The terminal devices 901, 902, 903 may have installed thereon various messenger client applications such as, for example only, a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc.
The terminal devices 901, 902, 903 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 905 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 901, 902, 903. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the data distribution method provided by the embodiment of the present invention is generally executed by the server 905, and accordingly, the data distribution device is generally disposed in the server 905.
It should be understood that the number of terminal devices, networks, and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 10, a block diagram of a computer system 1000 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 10, the computer system 1000 includes a central processing module (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program executes the above-described functions defined in the system of the present invention when executed by the central processing module (CPU) 1001.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not form a limitation on the modules themselves in some cases, and for example, the sending module may also be described as a "module sending a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
sampling input data according to nodes in the cluster, and determining a sampling result;
predicting the distribution characteristics of the input data according to the sampling result;
dividing the input data into regions according to the distribution characteristics;
and matching the input data of each divided region with the processes in the cluster.
According to the technical scheme of the embodiment of the invention, the following beneficial effects can be achieved:
aiming at the technical problem caused by unbalance of input data in the computer engine, the invention adopts the technical means of dividing the input data into areas in a balanced manner according to the distribution characteristics of the input data, thereby achieving the purpose that the time for running a large amount of input data on different nodes in a cluster is more balanced, and further achieving the technical effect of improving the running efficiency of the computer engine.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method of distributing data, comprising:
sampling input data according to nodes in the cluster, and determining a sampling result;
predicting the distribution characteristics of the input data according to the sampling result;
dividing the input data into regions according to the distribution characteristics;
and matching the input data of each divided region with the processes in the cluster.
2. The method of claim 1, wherein sampling the input data according to nodes in the cluster and determining a sampling result comprises:
calculating the total record number of the cluster according to the elastic distributed data set;
determining the number of the extracted samples of the elastic distributed data set region according to the total number of the records and the sampling proportion;
sampling the corresponding elastic distributed data set region according to the number of the extracted samples, and determining the key values of the region extracted samples and the record number of the key values;
and determining the set element ancestor generated by combining the key value and the record number thereof as a sampling result.
3. The method of claim 1, wherein predicting the distribution characteristic of the input data based on the sampling result comprises:
constructing a distribution histogram of the input data according to the sampling result;
and predicting the distribution characteristics of the input data according to the distribution histogram.
4. The method of claim 1,
the number of the regions is an integral multiple of the number of the processor cores in the cluster.
5. The method of claim 2, wherein partitioning the input data into regions according to the distribution characteristics comprises:
determining the upper limit of the number of sampling results contained in the area to be divided;
sorting the sampling results according to the record number;
judging whether the space occupied by the area with the maximum residual capacity in the cluster is larger than the space occupied by the sampling result corresponding to the maximum record number;
if so, distributing the sampling result corresponding to the maximum record number to the area with the maximum residual capacity, and modifying the residual capacity of the area;
if not, determining a first storage space occupied by the area with the maximum residual capacity; determining sub-key value pairs of sampling results corresponding to the maximum record number with the same size as the first storage space; assigning the sub-key-value pair to an area where the remaining capacity is the largest; and matching the sub-key value pairs which are not allocated in the sampling result corresponding to the maximum record number with other areas.
6. The method of claim 5, wherein before matching the input data of each divided region with the processes in the cluster, the method comprises:
initializing weights, calculation capacity values and monitoring times of processes corresponding to nodes in the cluster;
for a process, if the CPU utilization rate and the memory utilization rate both reach the upper limit, value-added processing is carried out on the calculation capacity value and the monitoring times value; if the CPU utilization rate and/or the memory utilization rate do not reach the upper limit, performing value-added processing on the monitoring times;
when the monitoring times value reaches a preset adjusting period, judging the relation between the calculation capacity value and a preset first threshold value and a preset second threshold value; when the calculation capacity value is larger than the preset first threshold value, increasing the weight of the process; when the calculation capacity value is smaller than the preset second threshold value, reducing the weight of the process;
wherein the preset first threshold is greater than the preset second threshold.
7. The method of claim 6, wherein before matching the input data of each divided region with the processes in the cluster, the method comprises:
monitoring performance parameters of the corresponding processes of the node;
wherein the performance parameters include: CPU utilization rate and memory utilization rate.
8. The method of claim 7, wherein monitoring CPU utilization of the process corresponding to the node comprises:
determining CPU detection values of the nodes at the adjacent time of the corresponding process;
and determining the CPU utilization rate of the process corresponding to the node by using a negative feedback mechanism according to the CPU detection value at the adjacent moment.
9. The method of claim 6, wherein matching the input data of each divided region with processes in the cluster comprises:
calculating a performance factor of each process according to the weight of the processes in the cluster;
arranging the processes in the cluster according to the size of the performance factor;
judging whether the number of cores of the CPU available for the process with the maximum performance factor is larger than the number of cores required for inputting data after the area division is completed; and if so, distributing the input data of the divided areas to the corresponding processes, and updating the available core number of the distributed processes.
10. An apparatus for distributing data, comprising:
the sampling result determining module is used for sampling the input data according to the nodes in the cluster and determining the sampling result;
the distribution characteristic prediction module is used for predicting the distribution characteristics of the input data according to the sampling result;
the region dividing module is used for dividing the input data into regions according to the distribution characteristics;
and the data distribution module is used for matching the input data of each divided region with the processes in the cluster.
11. An electronic device for distributing data, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN201910557427.4A 2019-06-25 2019-06-25 Method and device for distributing data Pending CN112130968A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910557427.4A CN112130968A (en) 2019-06-25 2019-06-25 Method and device for distributing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910557427.4A CN112130968A (en) 2019-06-25 2019-06-25 Method and device for distributing data

Publications (1)

Publication Number Publication Date
CN112130968A true CN112130968A (en) 2020-12-25

Family

ID=73849576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910557427.4A Pending CN112130968A (en) 2019-06-25 2019-06-25 Method and device for distributing data

Country Status (1)

Country Link
CN (1) CN112130968A (en)

Similar Documents

Publication Publication Date Title
Kumar et al. Deadline constrained based dynamic load balancing algorithm with elasticity in cloud environment
CN108089921B (en) Server for cloud big data operation architecture and operation resource optimization method thereof
US9021477B2 (en) Method for improving the performance of high performance computing applications on Cloud using integrated load balancing
US8131843B2 (en) Adaptive computing using probabilistic measurements
CN107273185B (en) Load balancing control method based on virtual machine
US8024737B2 (en) Method and a system that enables the calculation of resource requirements for a composite application
CN111786895A (en) Method and apparatus for dynamic global current limiting
Al-Dulaimy et al. Type-aware virtual machine management for energy efficient cloud data centers
KR20170029263A (en) Apparatus and method for load balancing
CN109196807B (en) Network node and method of operating a network node for resource distribution
Li An adaptive overload threshold selection process using Markov decision processes of virtual machine in cloud data center
Nie et al. Energy-aware multi-dimensional resource allocation algorithm in cloud data center
Li et al. PageRankVM: A pagerank based algorithm with anti-collocation constraints for virtual machine placement in cloud datacenters
Song et al. Server consolidation energy-saving algorithm based on resource reservation and resource allocation strategy
Shalu et al. Artificial neural network-based virtual machine allocation in cloud computing
CN112000460A (en) Service capacity expansion method based on improved Bayesian algorithm and related equipment
Surya et al. Prediction of resource contention in cloud using second order Markov model
Nehra et al. Efficient resource allocation and management by using load balanced multi-dimensional bin packing heuristic in cloud data centers
US10594620B1 (en) Bit vector analysis for resource placement in a distributed system
Li et al. Data allocation in scalable distributed database systems based on time series forecasting
CN112685167A (en) Resource using method, electronic device and computer program product
CN112130968A (en) Method and device for distributing data
Wang et al. Model-based scheduling for stream processing systems
Tutov Models and methods of resources allocation of infocommunication system in cloud data centers
Singhi et al. A load balancing approach for increasing the resource utilization by minimizing the number of active servers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination