WO2023246709A1 - 数据处理方法、装置、设备和系统 - Google Patents

数据处理方法、装置、设备和系统 Download PDF

Info

Publication number
WO2023246709A1
WO2023246709A1 PCT/CN2023/101119 CN2023101119W WO2023246709A1 WO 2023246709 A1 WO2023246709 A1 WO 2023246709A1 CN 2023101119 W CN2023101119 W CN 2023101119W WO 2023246709 A1 WO2023246709 A1 WO 2023246709A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
task
reduction
data processing
tasks
Prior art date
Application number
PCT/CN2023/101119
Other languages
English (en)
French (fr)
Inventor
徐华
包小明
朱策
孙宏伟
王兴隆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023246709A1 publication Critical patent/WO2023246709A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present application relates to the field of data processing, and in particular, to a data processing method, device, equipment and system.
  • the control node in the computer cluster divides the job into multiple execution stages. Each execution stage includes a mapping task and a reduce task. After the computing node executes multiple mapping tasks, the result data of the mapping task is processed in parallel. Execute multiple reduction tasks to improve job processing performance. If the amount of data in a single reduction (or reduction) task is too large, data overflow (spill) may occur due to insufficient memory on the computing node, resulting in reduced processing performance of the reduction task. If the data volume of a single reduction task is too small, too many reduction tasks will be started, resulting in a lot of overhead. Therefore, how to set the number of reduction tasks to improve reduction task processing performance is an issue that needs to be solved urgently.
  • This application provides data processing methods, devices, equipment and systems, thereby improving reduction task processing performance by reasonably setting the number of reduction tasks.
  • a data processing method includes a control node and multiple computing nodes. Multiple second computing nodes among multiple computing nodes execute data processing tasks in parallel, obtain result data, control the node to estimate the data volume of the result data, and obtain memory information of the first computing node among the multiple computing nodes that performs the reduction task. . Furthermore, the control node determines the number of reduction tasks based on the amount of data and memory information, and each second computing node partitions the result data generated by executing the data processing task according to the number, and each partition corresponds to a reduction task; the first computing node Perform reduction processing on the partitioned data of the plurality of second computing nodes.
  • the solution provided by this application is based on The parameters that affect the processing performance of the reduction task are automatically adjusted and optimized for the number of tasks, that is, the number of the reduction task is determined based on the data volume of the result data generated after the data processing task is executed, and the memory information of the computing node that executes the reduction task.
  • Quantity try to make the storage capacity of the memory of the computing node that performs the reduction task meet the data volume of the reduction task, so as to avoid the possible problem of insufficient memory and data overflow due to the excessive data volume of a single reduction task, and avoid a single reduction task.
  • the data amount of the reduction task is too small, which leads to the problem of too many reduction tasks being started, resulting in a lot of overhead. Therefore, before executing the reduction task, the number of reduction tasks should be flexibly and dynamically set to improve the processing of the reduction task. performance.
  • the control node can use various methods to estimate the amount of result data after the current data processing task is executed.
  • control node can estimate the amount of result data after the current data processing task is executed based on historical data.
  • the control node estimates the amount of result data generated after the data processing task is executed, including: obtaining the historical data generated when the previously completed data processing task is executed.
  • the historical data includes the data of the result data generated by the completed data processing task.
  • Volume estimate the volume of result data generated after the data processing task is executed based on historical data.
  • relevant data of the data processing task is collected in real time to estimate the amount of result data after the current data processing task is executed.
  • the control node estimates the amount of result data generated after the data processing task is executed, including: Within a period of time after the parallel execution of the data processing task is started, the result data generated by multiple second computing nodes executing the data processing task is sampled; based on the sampled result data, the amount of result data generated after the data processing task is completed is estimated. .
  • the period of time may refer to the time used to sample the result data generated by multiple second computing nodes executing data processing tasks.
  • the result data generated by the plurality of second computing nodes executing the data processing task is sampled.
  • the result data generated by the multiple second computing nodes executing the data processing task is sampled.
  • control node estimating the data volume of the result data generated after the data processing task is executed includes: sampling multiple second computing nodes before executing the data processing task.
  • the second computing node stores the data to be processed, and instructs multiple second computing nodes to process the sampled data to be processed; and estimates the amount of result data generated after the data processing task is executed based on the processing results of the data to be processed.
  • control node uses less data to estimate the overall data volume of the data processing task, thereby reducing the resources occupied by the estimated data volume.
  • the memory information is the memory size
  • the control node determines the number of reduction tasks based on the data amount and memory information, including: dividing the data amount by the memory size and then rounding up to obtain the reduction. Approximate number of tasks.
  • the number of first computing nodes is equal to the number of reduction tasks, and one first computing node executes one reduction task.
  • the number of first computing nodes is less than the number of reduction tasks, and one first computing node executes multiple reduction tasks.
  • the memory sizes of the first computing nodes are the same. Therefore, the control node determines the amount of data for each computing node to process the reduction task based on the memory size of the computing node, and tries to make the storage capacity of the memory of the computing node that performs the reduction task meet the amount of data for the reduction task.
  • a second aspect provides a control device, which includes various modules for executing the method of controlling a node in the first aspect or any possible design of the first aspect.
  • a control device in a third aspect, includes at least one processor and a memory, and the memory is used to store a set of computer instructions; when the processor serves as the control in the first aspect or any possible implementation of the first aspect, When a node executes the set of computer instructions, it executes the operational steps of the method for controlling a node in the first aspect or any possible implementation of the first aspect.
  • a data processing system in a fourth aspect, includes a control node and a plurality of computing nodes; the control node is used to execute the method of controlling the node in the first aspect or any possible design of the first aspect; the computing node is used to A method of calculating nodes in the first aspect or any possible design of the first aspect is performed.
  • a computer-readable storage medium including: computer software instructions; when the computer software instructions are run in a computing device, the computing device is caused to execute as in the first aspect or any possible implementation of the first aspect. The steps of the method.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computing device to perform the operation steps of the method described in the first aspect or any possible implementation of the first aspect.
  • Figure 1 is a schematic architectural diagram of a data processing system provided by this application.
  • Figure 2 is a schematic flow chart of a data processing method provided by this application.
  • FIG. 3 is a schematic diagram of a data processing process provided by this application.
  • FIG. 4 is a schematic structural diagram of a control device provided by this application.
  • FIG. 5 is a schematic structural diagram of a control device provided by this application.
  • Big data is a collection of data that cannot be captured, managed and processed by conventional software tools over a period of time. Since the large amounts of data contained in big data are related, it is necessary to use data analysis methods, models or tools to analyze big data. Analyze data, mine data relationships in big data, and use the data relationships in big data to make predictions or decisions. For example, analyze user shopping trend data and push items that the user may purchase to improve the user's shopping experience. Therefore, big data has the characteristics of large data volume, fast data growth, diverse data types and high utilization value. Since the data volume of big data jobs is very large, a single computing node cannot meet the computing needs. Typically, data is processed using distributed processing. Big data operations can also be called big data businesses.
  • Map Reduce is a distributed programming model that is used to decompose big data jobs into map tasks and reduce tasks. Multiple computing nodes execute map tasks to obtain intermediate data, and perform reduce tasks on the intermediate data. Intermediate data can also be called map data or shuffle data.
  • Shuffle refers to a stage of task processing in the MapReduce model, that is, the process in which data is processed and exchanged according to rules from the node where the map task is located to the node where the reduce task is located, which usually results in a large amount of network transmission.
  • Task parallelism is used to indicate the granularity at which big data jobs are divided. Due to the resource limitations of a single computing node and the processing time requirements of big data jobs, the number of reduction tasks in the task execution phase is determined according to the task parallelism. In each task execution phase, multiple computing nodes process multiple reduction tasks in parallel. , improve reduction task processing performance.
  • the overflow (spill) mechanism means that when data is being processed, if the memory does not have enough storage space to store the data during data processing, part of the data overflows to the disk for storage. Although this mechanism can effectively alleviate the problem of insufficient memory, due to the storage capacity of the disk The retrieval speed is slow and the performance of data processing may be significantly reduced.
  • Computer cluster refers to a group of computers that are loosely or tightly connected to work together, usually used to perform large-scale jobs. Deploying a cluster is often more cost-effective than a single computer of comparable speed or availability to improve overall performance through task parallelism. Computers are connected to each other through a network, and each computer runs its own operating system instance. In most cases, each computer uses the same hardware and the same operating system, and in some cases, different operating systems can be used on different hardware.
  • this application provides a data processing method, that is, multiple second computing nodes among multiple computing nodes execute data processing tasks in parallel,
  • the result data is obtained
  • the control node estimates the data volume of the result data, and obtains the memory information of the first computing node that performs the reduction task among the multiple computing nodes.
  • the control node determines the number of reduction tasks based on the amount of data and memory information, and each second computing node partitions the result data generated by executing the data processing task according to the number, and each partition corresponds to a reduction task; the first computing node Perform reduction processing on the partitioned data of the plurality of second computing nodes.
  • the solution provided by this application is based on The parameters that affect the processing performance of the reduction task are automatically adjusted and optimized for the number of tasks, that is, the number of the reduction task is determined based on the data volume of the result data generated after the data processing task is executed, and the memory information of the computing node that executes the reduction task.
  • Quantity try to make the storage capacity of the memory of the computing node that performs the reduction task meet the data volume of the reduction task, so as to avoid the possible problem of insufficient memory and data overflow due to the excessive data volume of a single reduction task, and avoid a single reduction task.
  • the data amount of the reduction task is too small, which leads to the problem of too many reduction tasks being started, resulting in a lot of overhead. Therefore, before executing the reduction task, the number of reduction tasks should be flexibly and dynamically set to improve the processing of the reduction task. performance.
  • FIG. 1 is a schematic architectural diagram of a data processing system provided by this application.
  • a data processing system can be an entity architecture that performs distributed processing of application data.
  • the data processing system 100 includes a control node 110 and a plurality of servers 120 connected to the control node 110 .
  • Multiple servers 120 may form a computer cluster.
  • Multiple servers 120 may be interconnected via network 121.
  • the network 121 may refer to an enterprise's internal network (such as: Local Area Network (LAN)) or the Internet.
  • Each server 120 includes multiple processors or processor cores, and the processor or processor core may also Deploy a virtual machine or container.
  • the control node 110 can allocate tasks based on processors, processor cores, virtual machines or containers.
  • the processor or processor core runs a process or thread for executing a task.
  • a calculation described in this application Nodes correspond to processor cores, virtual machines or containers.
  • Computing nodes may refer to processes or threads.
  • One computing node is used to execute at least one task.
  • a control node is provided in the computing cluster, and the control node can also be called a manager.
  • the control node 110 may control computing resources allocated to jobs to be executed so that high-priority jobs can be executed preferentially.
  • the control node 110 can monitor the execution status of a job and change the resource allocation to the job according to the policy.
  • the control node 110 is specifically used to generate an execution plan for a job, that is, to split a job into multiple tasks, and these tasks can be allocated to multiple computing resources for execution. Multiple tasks can be divided into multiple execution stages. Tasks in the same stage can be executed in parallel. All tasks are scheduled to be completed in parallel or serially. When all tasks end, a job is marked as completed.
  • computing nodes can perform distributed processing of jobs based on the MapReduce model.
  • the control node 110 indicates a second computing node that performs a map task and a first computing node that performs a reduce task.
  • Mapping tasks and reduction tasks can be set by developer users. For example, tasks include addition, subtraction, weighting, string splicing, intersection or union of data, and other operations.
  • the second computing node reads the sharded data.
  • the second computing node performs the mapping task on the fragmented data to obtain the intermediate data, and stores the intermediate data.
  • the second computing node stores the intermediate data into the storage space of the second computing node or the global memory pool.
  • the first computing node reads the intermediate data.
  • the first computing node that performs the reduction task reads the intermediate data from the storage space or global memory pool of the first computing node.
  • the first computing node performs a reduction task on the intermediate data, obtains the result of the reduction task, and stores the result of the reduction task.
  • the storage media of the computing nodes and the storage media of the storage nodes in the data processing system are uniformly addressed to form a global memory pool, and any node in the system can access the storage space in the global memory pool. This application does not limit the storage space for storing intermediate data.
  • control node 110 is also used to estimate the data volume of the result data generated after the data processing task is executed, and to obtain the memory information of the first computing node among the multiple computing nodes that performs the reduction task. Furthermore, the control node determines the number of reduction tasks based on the amount of data and memory information, and instructs the second computing node that performs the data processing task to partition the result data generated by the execution of the data processing task according to the number, and each partition corresponds to a reduction task; The first computing node performs reduction processing on the partitioned data of the plurality of second computing nodes. Partition is used to indicate the data obtained by dividing the result data generated by executing the data processing task according to the number of reduction tasks. The number of reduction tasks is equal to the number of partitions.
  • the assignment is to classify a pile of fruits according to their types and count the number of each type of fruit. If the number of fruits in a pile is large, the pile of fruits can be divided into M piles of fruits. Start M mapping tasks, and each mapping task counts the number of each type of fruit in M piles of fruits. If a reduction task is started to sum the number of each fruit in the result data of M mapping tasks, data overflow may occur due to insufficient memory on the computing node, or the processing time may be long due to the large amount of calculation. If a reduction task is started for each type of fruit, for example, a pile of fruits includes 100 kinds of fruits, and 100 reduction tasks are started, this will result in too many reduction tasks being started and a lot of overhead.
  • the result data of M mapping tasks can be partitioned according to the type of fruit, N reduction tasks can be started, and each reduction task counts the number of at least one type of fruit. For example, if 100 kinds of fruits are divided into 5 groups, 5 reduction tasks will be started, and each reduction task will count the number of 20 kinds of fruits.
  • the types of fruits counted in different reduction tasks may be different. For example, if the result data of each of the M mapping tasks contains a large number of the same type of fruit (such as apples), a separate reduction task can be started to count the quantities for this type of fruit. Another reduction task can count the number of at least two kinds of fruits in the result data of each of the M mapping tasks.
  • control node 110 instructs one computing node or multiple computing nodes to execute N reduction tasks.
  • N reduction tasks the computing node performs N reduction tasks in sequence.
  • multiple computing nodes execute N reduction tasks multiple computing nodes execute N reduction tasks in parallel.
  • the jobs here are usually large-scale jobs that require more computing resources to be processed in parallel. This application does not limit the nature and quantity of the jobs. Most tasks are executed concurrently or in parallel, while some tasks depend on data generated by other tasks. This application places no restrictions on the number of tasks or the data of tasks that can be executed in parallel.
  • jobs can be submitted to the control node 110 from any suitable source.
  • This application does not limit the location where assignments are submitted, nor does it limit the specific mechanism for users to submit assignments.
  • user 131 operates client 133 to submit job 132 to control node 110 .
  • the client 133 can be installed with a client program.
  • the client 133 runs the client program to display a user interface (UI).
  • the user 131 operates the user interface to access the distributed file system and distributed database to obtain data and instruct the processing of big data.
  • Job data may refer to a computer connected to the network 140, and may also be called a workstation. Different clients can share resources on the network (such as computing resources, storage resources).
  • client 133 is connected to control node 110 through network 140, which may be the Internet, or other networks. Therefore, users can submit jobs to the control node 110 from a remote location.
  • Control node 110 may obtain input data from a database.
  • the data processing system 100 can also provide parameter tuning services, that is, provide storage, reading and processing of data of completed data processing tasks, as well as a data volume estimation interface for the current task, etc.
  • the function of the parameter tuning service may be provided by a control node or a computing node in the data processing system 100 .
  • Computing nodes that perform reduction tasks call these interfaces to perform It can store and read the data of completed data processing tasks, estimate the data volume of the result data of the current data processing tasks, etc., and then adjust the dynamic partitioning.
  • computing nodes provide parameter tuning services.
  • the control node 110 obtains the data volume of the result data generated after the data processing task is executed from the computing node, as well as the memory information of the computing node that executes the reduction task, and determines the number of reduction tasks.
  • the computing node that provides the parameter tuning service determines the number of reduction tasks based on the data volume of the result data generated after the data processing task is executed and the memory information of the computing node that performs the reduction task, and feeds back the reduction to the control node 110 The number of tasks does not require the control node 110 to determine the number of reduction tasks by itself.
  • the computing node that provides parameter tuning services can also determine the data volume of the processing results of multiple mapping tasks based on the data volume of historical mapping tasks and the data volume of the processing results of historical mapping tasks, and then feedback multiple data volumes to the control node 110.
  • the control node 110 determines the number of reduction tasks based on the data volume of the processing results of the mapping tasks and the memory information of the computing node that executes the reduction tasks.
  • data processing system 100 may also include a storage cluster.
  • the storage cluster contains at least two storage nodes 150.
  • a storage node 150 includes one or more controllers, network cards, and multiple hard disks.
  • Hard drives are used to store data. For example, store the processing results of a job to the hard disk.
  • a computing node performs a mapping task or a reduction task, it reads the data to be processed from the hard disk.
  • the hard disk can be a magnetic disk or other type of storage medium, such as a solid state drive or a shingled magnetic recording hard drive.
  • Network cards are used to communicate with the computing nodes contained in a computer cluster.
  • the controller is used to write data to the hard disk or read data from the hard disk according to the read/write data request sent by the computing node. In the process of reading and writing data, the controller needs to convert the address carried in the read/write data request into an address that the hard disk can recognize.
  • storage clusters store and manage large amounts of data based on distributed file systems and distributed databases.
  • Figure 2 is a schematic flow chart of a data processing method provided by this application. It is assumed here that the control node instructs the first computing node to perform a reduction task, and instructs a plurality of second computing nodes to perform data processing tasks (eg, mapping tasks).
  • the control node and the computing node may be the control node 110 and the computing node in FIG. 1 . As shown in Figure 2, the method includes the following steps.
  • Step 210 The control node obtains the service request.
  • the client responds to user operations and sends service requests to the control node.
  • the control node can receive service requests sent by the client through the LAN or the Internet.
  • a business request may include business identification and business data.
  • the business identifier is used to uniquely indicate a business.
  • the business data may be data for distributed processing of big data by computing nodes or identification data indicating data for distributed processing of big data.
  • the user operation may refer to an operation in which the user operates the big data user interface to submit a big data job.
  • Big data operations include data analysis business, data query business and data modification business, etc.
  • big data operations refer to analyzing customers' personal data and purchasing behavior data to describe user portraits and classify customers, so that targeted products or preferential products can be recommended to specific customers, improve customer satisfaction, stabilize customer relationships, etc.
  • big data operations refer to analyzing the historical sales of products to predict future sales, discovering the reasons for sales decline or increase, and recommending constructive suggestions to increase sales.
  • control node determines the computing nodes that divide the business into multiple execution stages and execute the tasks. Each stage includes mapping tasks and reduction tasks. An execution phase can execute multiple tasks in parallel.
  • the control node can instruct the idle computing nodes in the system to perform the task; or, according to the computing requirements and delay requirements required by the task, select the computing node that meets the computing requirements and delay requirements from the system to perform the task; this application schedules the control node
  • the method of computing nodes that execute tasks is not limited.
  • the control node sends a control instruction to at least one computing node performing a task, instructing the computing node to perform the task on the data indicated by the service request. For example, the control node sends a control instruction to the second computing node to instruct the second computing node to perform the mapping task. After the control node determines the number of reduction tasks based on the data volume of the result data generated after the mapping task is executed and the memory information of the computing node that performs the reduction task, the control node sends a control instruction to the first computing node to instruct the first calculation Nodes perform reduction tasks.
  • Step 220 The control node estimates the data volume of the result data generated after the data processing task is executed, and obtains the memory information of the first computing node among the multiple computing nodes that performs the reduction task.
  • the control node can estimate the data volume of the result data of the data processing task in advance, so as to recombine the result data of the data processing task according to the data volume of the result data of the data processing task, obtain intermediate data, and avoid performing reduction tasks on the intermediate data. Problems such as data overflow occur when data overflow occurs, thereby improving the processing performance of reduction tasks.
  • control node can collect the data volume of the result data of the data processing task in real time.
  • the control node samples the result data generated by the plurality of second computing nodes executing the data processing tasks; and predetermines the data based on the sampled result data. Estimate the amount of result data generated after the data processing task is executed.
  • the period of time may refer to the time used to sample the result data generated by multiple second computing nodes executing data processing tasks.
  • the computing node scans the data volume of the processing result of the mapping task, and reports the data volume of the processing result of the mapping task to the control node 110 .
  • the control node can obtain the amount of data of the processing results of multiple mapping tasks.
  • the computing node After the computing node completes the execution of the mapping task, it obtains the data volume of the scanned data according to the processing result of the proportional scan mapping task, and reports the data volume of the scanned data to the control node 110. Amount of data to estimate the processing result of the mapping task.
  • the ratio can be preset based on experience.
  • the computing node After the computing node completes the execution of multiple mapping tasks, it scans the processing results of the multiple mapping tasks according to the proportion, and reports the data amount of the scanned mapping tasks to the control node 110. Amount of data to estimate the processing result of the mapping task.
  • control node estimates the amount of result data generated after the data processing task is executed based on the sampled data processing task. Specifically, before the plurality of second computing nodes execute the data processing task, the data to be processed in the plurality of second computing nodes are sampled, and the plurality of second computing nodes are instructed to process the sampled data to be processed; according to the processing of the data to be processed The result estimates the amount of result data generated after the data processing task is executed.
  • the control node can estimate the data volume of the result data of the current data processing task based on the data volume of the result data of the completed data processing task. That is, the control node obtains historical data generated when previously completed data processing tasks are executed, and estimates the amount of result data generated after the data processing tasks are executed based on the historical data. Historical data includes the amount of data resulting from completed data processing tasks.
  • control node 110 trains the neural network based on the data volume of the historical mapping task and the data volume of the processing result of the historical mapping task, so that the neural network has the function of estimating the data volume of the processing result of the mapping task based on the data of the mapping task.
  • the control node 110 may input the data of the mapping task into the neural network and output the data amount of the processing result of the mapping task.
  • control node 110 establishes a fitting relationship based on the data volume of the execution of the historical mapping task and the data volume of the processing results of the historical mapping task, so that the control node 110 determines multiple maps based on the data volume and fitting relationship of the multiple mapping tasks.
  • x represents the data amount of the mapping task
  • y represents the data of the processing result of the mapping task.
  • This embodiment does not limit the expression form of F(x).
  • a and b represent parameters
  • a and b can be based on the data volume of the historical mapping task and the data volume of the processing result of the historical mapping task. Obtained by training to establish a fitting relationship.
  • Step 230 The control node determines the number of reduction tasks based on the amount of data and memory information.
  • the control node may receive memory information of the first computing node that performs the reduction task.
  • the memory information is the memory size.
  • the storage space of the computing node that executes the reduction task is used to store at least one of data required to execute the reduction task, data generated during task execution, and processing results from executing the reduction task. If the storage space of the computing node cannot meet the demand for data storage when executing the reduction task, data overflow may occur and the processing performance of the reduction task will be reduced.
  • multiple first computing nodes that perform reduction tasks may have the same memory size, and the control node may obtain the memory information of the first computing nodes and determine the number of reduction tasks based on the memory information. For example, the control node divides the data amount by the memory size and then rounds it (such as rounding up or rounding down) to obtain the number of reduction tasks, so that when the first computing node performs the reduction task on the divided partition, it avoids Data overflow, thereby improving the processing performance of reduction tasks.
  • the number of reduction tasks satisfies the following formula (2). P>S/M formula (2)
  • P represents the number of reduction tasks
  • S represents the estimated amount of result data generated after the data processing task is executed
  • M represents the memory size of the computing node that performs the reduction task.
  • Step 240 After multiple second computing nodes executing the data processing task execute the data processing task in parallel, each second computing node partitions the result data generated by executing the data processing task according to quantity, and each partition corresponds to a reduction task.
  • the control node instructs the second computing node that performs the data processing task to divide the result data of the data processing task according to the number of reduction tasks.
  • each second computing node partitions the result data generated by executing the data processing tasks according to the determined number of reduction tasks to obtain intermediate data.
  • Each partition corresponds to a reduction task.
  • the second computing node stores the intermediate data to the storage location indicated by the control indication.
  • the control indication includes the physical address of the storage space where the intermediate data is stored.
  • the physical address indicates the storage space of the second computing node that performs the mapping task (such as a local storage medium or an extended local storage medium), the storage space of other computing nodes in the computing cluster except the computing node that performs the mapping task, or the storage space of the second computing node in the storage cluster. Any of the storage space of the storage node, the storage space of the global memory pool, and the storage space of the extended global storage medium.
  • the control node sends a control instruction to the computing node that executes the mapping task, instructing the computing node that executes the mapping task to divide the processing results of the mapping task according to the number of reduction tasks, obtain intermediate data (ie, partitioned data), and store the intermediate data.
  • the intermediate data includes multiple data blocks, and the number of multiple data blocks represents the number of reduction tasks.
  • control node does not limit the control node to determine the number of reduction tasks and the order in which the data processing tasks are executed.
  • the number of reduction tasks can be determined first.
  • the control node may estimate the data amount of the result data of the current data processing task based on the data amount of the result data of the completed data processing task.
  • Step 250 The first computing node performs reduction processing on the partitioned data of the plurality of second computing nodes.
  • the control node instructs the first computing node to perform a reduction task on the intermediate data.
  • the first computing node obtains the intermediate data from the storage space of the second computing node or obtains the intermediate data from the global memory pool, and performs the reduction task on the intermediate data.
  • control node instructs a number of first computing nodes equal to the number of reduction tasks to perform the reduction tasks, that is, one first computing node performs one reduction task.
  • control node instructs the first computing nodes that are less than the number of the reduction tasks to perform the reduction tasks, that is, one first computing node performs multiple reduction tasks.
  • the data volume of the processing results of the mapping tasks is estimated. Based on the size of the data and the memory size of the computing node of the reduction task, the data volume is automatically evaluated.
  • the computing node that executes the mapping task divides the processing results of the mapping tasks according to the number of reduction tasks to generate intermediate data, which can avoid the excessive data volume of a single reduction task and the possible occurrence of insufficient memory and data overflow. problem, leading to reduced processing performance of the reduction task, and to avoid the problem that the amount of data in a single reduction task is too small, resulting in too many reduction tasks being started and a large amount of overhead. Therefore, before executing the reduction task, flexibly Dynamically set the number of reduction tasks to improve the processing performance of reduction tasks.
  • control node determines the number of reduction tasks based on the estimated data volume of the result data generated after the data processing task is executed and the memory size of the computing node executing the reduction task. It can also be described as, the control node determines the number of reduction tasks based on The estimated amount of result data generated after the data processing task is executed and the memory size of the computing node executing the reduction task determine the task parallelism. Task parallelism is used to indicate the result of partitioning the result data produced by a data processing task.
  • the changes can be directly seen through the job's user interaction interface or the job's running log, thereby determining whether the task parallelism has been dynamically adjusted.
  • the number of reduction tasks in the original physical plan indicates that the processing results of the mapping tasks are divided into 200 data blocks
  • the number of reduction tasks in the optimized physical plan indicates that the processing results of the mapping tasks are divided into 500 data blocks. data block.
  • control node and the computing node include corresponding hardware structures and/or software modules that perform each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.
  • Figure 4 is a schematic structural diagram of a possible control device provided by this embodiment. These control devices can be used to implement the functions of the control nodes in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments.
  • the The control device may be the control node 110 as shown in Figure 1, or may be a module (such as a chip) applied to the server.
  • the control device 400 includes a communication module 410 , a processing module 420 and a storage module 430 .
  • the control device 400 is used to implement the functions of the control node 110 in the method embodiment shown in FIG. 1 .
  • Communication module 410 used to obtain service requests.
  • the communication module 410 is used to perform step 210 in FIG. 2 .
  • the processing module 420 is used to estimate the data volume of the result data, obtain the memory information of the first computing node among the multiple computing nodes that performs the reduction task, and determine the number of reduction tasks based on the data volume and memory information. For example, the processing module 420 is used to perform step 220 and step 230 in FIG. 2 .
  • the storage module 430 is used to store the number of reduction tasks, memory size, data volume of result data generated after the data processing task is executed, historical data and intermediate data generated when the completed data processing task is executed, etc.
  • the processing module 420 is specifically configured to obtain historical data generated when a previously completed data processing task is executed, where the historical data includes the amount of result data generated by the completed data processing task; according to the Historical data estimates the amount of result data generated after the data processing task is executed.
  • the processing module 420 is specifically configured to sample the data generated by the plurality of second computing nodes executing the data processing task within a period of time after the plurality of second computing nodes begin to execute the data processing task in parallel.
  • Result data estimate the amount of result data generated after the data processing task is executed based on the sampled result data.
  • the processing module 420 is specifically configured to sample the data to be processed in the plurality of second computing nodes and instruct the plurality of second computing nodes before the plurality of second computing nodes execute the data processing task.
  • the node processes the sampled data to be processed; and estimates the amount of result data generated after the data processing task is executed based on the processing results of the data to be processed.
  • the processing module 420 is specifically configured to divide the data amount by the memory size and then round up to obtain the number of the reduction tasks.
  • FIG. 5 is a schematic structural diagram of a control device 500 provided in this embodiment.
  • the control device 500 includes a processor 510, a bus 520, a memory 530, a communication interface 540, and a memory unit 550 (which may also be called a main memory unit).
  • the processor 510, the memory 530, the memory unit 550 and the communication interface 540 are connected through a bus 520.
  • the processor 510 can be a CPU, and the processor 510 can also be other general-purpose processors, digital signal processors (digital signal processing, DSP), ASICs, FPGAs or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processing
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • the processor can also be a graphics processing unit (GPU), a neural network processing unit (NPU), a microprocessor, an ASIC, or one or more integrations used to control the execution of the program of this application. circuit.
  • GPU graphics processing unit
  • NPU neural network processing unit
  • ASIC application specific integrated circuit
  • the communication interface 540 is used to implement communication between the control device 500 and external devices or devices.
  • the communication interface 540 is used to send a control instruction to instruct the computing node to perform the mapping task, or instruct the computing node to perform the mapping task according to the number of reduction tasks. Partition the result data produced by performing data processing tasks.
  • the communication interface 540 is used to receive control instructions and report the data amount of the processing result of the mapping task to the control node 110 .
  • Bus 520 may include a path for transmitting information between the components described above, such as processor 510, memory unit 550, and storage 530.
  • the bus 520 may also include a power bus, a control bus, a status signal bus, etc.
  • the various buses are labeled bus 520 in the figure.
  • the bus 520 may be a Peripheral Component Interconnect Express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer quick link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe Peripheral Component Interconnect Express
  • EISA extended industry standard architecture
  • CXL compute express link
  • CIX cache coherent interconnect for accelerators
  • the bus 520 can be divided into an address bus, a data bus, a control bus, etc.
  • control device 500 may include multiple processors.
  • the processor may be a multi-CPU processor.
  • a processor here may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions).
  • the processor 510 can estimate the data volume of the result data and obtain the first number of the multiple computing nodes that perform the reduction task. Calculate the memory information of the node, and determine the number of reduction tasks based on the amount of data and memory information.
  • the control device 500 is used to implement the calculation section shown in Figure 1
  • the processor 510 can partition the result data generated by executing the data processing task according to the number of reduction tasks, and perform reduction processing on the partitioned data of the computing node.
  • FIG. 5 only takes the control device 500 including a processor 510 and a memory 530 as an example.
  • the processor 510 and the memory 530 are respectively used to indicate a type of device or equipment.
  • the quantity of each type of device or equipment can be determined based on business needs.
  • the memory unit 550 may correspond to the storage medium used to store information such as the number of reduction tasks and intermediate data in the above method embodiment.
  • Memory unit 550 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • the memory 530 may correspond to the storage medium used to store computer instructions, storage policies and other information in the above method embodiments, for example, a magnetic disk, such as a mechanical hard disk or a solid state hard disk.
  • the above control device 500 may be a general device or a special device.
  • the control device 500 may be an edge device (eg, a box carrying a chip with processing capabilities) or the like.
  • the control device 500 may also be a server or other device with computing capabilities.
  • control device 500 may correspond to the control device 400 in this embodiment, and may correspond to the corresponding subject executing any method according to FIG. 2, and the above-mentioned functions of each module in the control device 400. and other operations and/or functions respectively in order to implement the corresponding processes of each method in Figure 2. For the sake of simplicity, they will not be described again here.
  • the method steps in this embodiment can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC.
  • the ASIC can be located in the control device.
  • the processor and the storage medium can also be present in the control device as discrete components.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
  • the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media.
  • the available media may be magnetic media, such as floppy disks, hard disks, and magnetic tapes; they may also be optical media, such as digital video discs (DVDs); they may also be semiconductor media, such as solid state drives (solid state drives). , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

公开了数据处理方法、装置、设备和系统,涉及数据处理领域。该数据处理系统中控制节点预估多个第二计算节点并行执行数据处理任务的结果数据的数据量,根据数据量及执行归约任务的第一计算节点的内存信息确定归约任务的数量,每个第二计算节点按照数量对执行数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务;第一计算节点对多个第二计算节点分区后的数据执行归约处理。如此,尽量使执行归约任务的计算节点的内存的存储容量满足归约任务的数据量,避免由于单个归约任务的数据量过大或过小,从而在执行归约任务之前,通过灵活动态地设置归约任务的数量,提升归约任务的处理性能。

Description

数据处理方法、装置、设备和系统
本申请要求于2022年06月25日提交国家知识产权局、申请号为202210731652.7、申请名称为“一种数据处理方法”的中国专利申请的优先权,本申请要求于2022年11月17日提交国家知识产权局、申请号为202211460871.2、申请名称为“数据处理方法、装置、设备和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,尤其涉及一种数据处理方法、装置、设备和系统。
背景技术
目前,计算机集群中控制节点将作业划分为多个执行阶段,每个执行阶段包括映射(map)任务和归约(reduce),计算节点执行完多个映射任务后,对映射任务的结果数据并行执行多个归约任务,以提高作业处理性能。如果单个归约(或规约)任务的数据量过大,可能由于计算节点的内存不足发生数据溢出(spill)的问题,导致降低归约任务的处理性能。如果单个归约任务的数据量过小,导致启动的归约任务数量过多而产生大量开销。因此,如何设置归约任务的数量来提升归约任务处理性能是一个亟待解决的问题。
发明内容
本申请提供了数据处理方法、装置、设备和系统,由此通过合理地设置归约任务的数量来提升归约任务处理性能。
第一方面,提供一种数据处理方法,数据处理系统包括控制节点和多个计算节点。多个计算节点中多个第二计算节点并行执行数据处理任务,得到结果数据,控制节点预估结果数据的数据量,并获取多个计算节点中执行归约任务的第一计算节点的内存信息。进而,控制节点根据数据量及内存信息确定归约任务的数量,每个第二计算节点按照数量对执行数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务;第一计算节点对多个第二计算节点分区后的数据执行归约处理。
相对于预先配置好归约任务的数量,导致归约任务的数据量过大或过小,归约任务的数量的调整难度大及归约任务处理性能较低的问题,本申请提供的方案基于影响归约任务的处理性能的参数对任务的数量进行自动调整优化,即根据数据处理任务被执行后产生的结果数据的数据量,以及执行归约任务的计算节点的内存信息确定归约任务的数量,尽量使执行归约任务的计算节点的内存的存储容量满足归约任务的数据量,从而尽量避免由于单个归约任务的数据量过大,可能发生内存不足数据溢出的问题,以及避免单个归约任务的数据量过小,导致启动的归约任务数量过多而产生大量开销的问题,因此,在执行归约任务之前,灵活动态地设置归约任务的数量,提升归约任务的处理性能。
控制节点可以采用多种方式预估当前的数据处理任务被执行后结果数据的数据量。
结合第一方面,在一种可能的实现方式中,控制节点可以根据历史数据预估当前的数据处理任务被执行后结果数据的数据量。
控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:获取之前已完成的数据处理任务被执行时产生的历史数据,历史数据包括已完成的数据处理任务产生的结果数据的数据量;根据历史数据预估数据处理任务被执行后产生的结果数据的数据量。
由于大多数的任务具有周期性运行的特点,根据任务的历史数据的数据量估计当前数据处理任务被执行后结果数据的数据量,既可确保估计的准确性又可降低估计数据量的所占用的资源。
结合第一方面,在另一种可能的实现方式中,实时采集数据处理任务的相关数据预估当前的数据处理任务被执行后结果数据的数据量。
控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:在多个第二计算节点开 始并行执行数据处理任务后的一段时间内,采样多个第二计算节点执行数据处理任务产生的结果数据;根据所采样的结果数据预估数据处理任务被执行完成后产生的结果数据的数据量。其中,一段时间可以是指采样多个第二计算节点执行数据处理任务产生的结果数据所使用的时间。在一些实施例中,在多个第二计算节点开始并行执行数据处理任务的过程中,采样多个第二计算节点执行数据处理任务产生的结果数据。在另一些实施例中,在多个第二计算节点开始并行执行数据处理任务完成后,采样多个第二计算节点执行数据处理任务产生的结果数据。
结合第一方面,在另一种可能的实现方式中,控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:在多个第二计算节点执行数据处理任务之前,采样多个第二计算节点中待处理数据,并指示多个第二计算节点处理所采样的待处理数据;根据待处理数据的处理结果预估数据处理任务被执行后产生的结果数据的数据量。
如此,控制节点使用较少的数据估计数据处理任务的整体数据量,降低估计数据量所占用的资源。
结合第一方面,在另一种可能的实现方式中,内存信息为内存大小,控制节点根据数据量及内存信息确定归约任务的数量包括:用数据量除以内存大小后向上取整得到归约任务的数量。
结合第一方面,在另一种可能的实现方式中,第一计算节点的数量与归约任务的数量相等,一个第一计算节点执行一个归约任务。
结合第一方面,在另一种可能的实现方式中,第一计算节点的数量小于归约任务的数量,一个第一计算节点执行多个归约任务。
结合第一方面,在另一种可能的实现方式中,第一计算节点的内存大小相同。从而,控制节点根据计算节点的内存大小确定每个计算节点处理归约任务的数据量,尽量使执行归约任务的计算节点的内存的存储容量满足归约任务的数据量。
第二方面,提供一种控制装置,所述装置包括用于执行第一方面或第一方面任一种可能设计中控制节点的方法的各个模块。
第三方面,提供一种控制设备,该控制设备包括至少一个处理器和存储器,存储器用于存储一组计算机指令;当处理器作为第一方面或第一方面任一种可能实现方式中的控制节点执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的控制节点的方法的操作步骤。
第四方面,提供一种数据处理系统,数据处理系统包括控制节点和多个计算节点;控制节点用于执行第一方面或第一方面任一种可能设计中控制节点的方法;计算节点用于执行第一方面或第一方面任一种可能设计中计算节点的方法。
第五方面,提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在计算设备中运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
第六方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种数据处理系统的架构示意图;
图2为本申请提供的一种数据处理方法的流程示意图;
图3为本申请提供的一种数据处理过程示意图;
图4为本申请提供的一种控制装置的结构示意图;
图5为本申请提供的一种控制设备的结构示意图。
具体实施方式
为了便于理解,首先对本申请所涉及的主要术语进行解释。
大数据(big data),是一种无法在一段时间范围内用常规软件工具进行捕捉、管理和处理的数据集合。由于大数据包含的大量数据间具有关联关系,利用数据分析的方法、模型或工具对大 数据进行分析,挖掘大数据中的数据关系,利用大数据的数据关系进行预测或决策。例如,对用户购物趋势数据进行分析,向用户推送用户可能购买的物品,提高用户的购物体验。因此,大数据具有数据量大、数据增长速度快、数据类型多样和利用价值高等特征。由于大数据作业(job)的数据量非常大,单一的计算节点无法满足计算需求。通常,采用分布式处理对数据进行处理。大数据作业也可以称为大数据业务。
映射归约(MapReduce),是一种分布式编程模型,用于将大数据作业进行分解为map任务和reduce任务,由多个计算节点执行map任务得到中间数据,对中间数据执行reduce任务。中间数据也可以称为map数据或shuffle数据。
交换(shuffle),指MapReduce模型中任务处理的一个阶段,即数据从map任务所在节点经过处理按照规则交换到reduce任务所在节点的过程,通常会产生大量的网络传输。
任务并行度,用于指示大数据作业被划分的粒度。由于单个计算节点的资源限制和大数据作业的处理时长需求,根据任务并行度确定任务执行阶段的归约任务的数量,在每个任务执行阶段,由多个计算节点并行处理多个归约任务,提升归约任务处理性能。
溢出(spill)机制,指数据在处理时,如果内存没有足够的存储空间存储数据处理时的数据,部分数据溢出到磁盘中存储,虽然该机制可有效缓解内存不足的问题,但是由于磁盘的存取速度较慢,数据处理的性能可能会大幅下降。
计算机集群(computer cluster),是指一组松散或紧密连接在一起工作的计算机,通常用于执行大型作业。部署集群通常是通过任务并行度提升总体性能,比速度或可用性相当的单台计算机的成本效益要高。各个计算机之间通过网络相互连接,每个计算机运行自己的操作系统实例。在大多数情况下,每台计算机使用相同的硬件和相同的操作系统,在某些情况下,也可以在不同的硬件上使用不同的操作系统。
为了解决由于归约任务的数量设置不合理,导致归约任务处理性能较低的问题,本申请提供一种数据处理方法,即多个计算节点中多个第二计算节点并行执行数据处理任务,得到结果数据,控制节点预估结果数据的数据量,并获取多个计算节点中执行归约(reduce)任务的第一计算节点的内存信息。进而,控制节点根据数据量及内存信息确定归约任务的数量,每个第二计算节点按照数量对执行数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务;第一计算节点对多个第二计算节点分区后的数据执行归约处理。相对于预先配置好归约任务的数量,导致归约任务的数据量过大或过小,归约任务的数量的调整难度大及归约任务处理性能较低的问题,本申请提供的方案基于影响归约任务的处理性能的参数对任务的数量进行自动调整优化,即根据数据处理任务被执行后产生的结果数据的数据量,以及执行归约任务的计算节点的内存信息确定归约任务的数量,尽量使执行归约任务的计算节点的内存的存储容量满足归约任务的数据量,从而尽量避免由于单个归约任务的数据量过大,可能发生内存不足数据溢出的问题,以及避免单个归约任务的数据量过小,导致启动的归约任务数量过多而产生大量开销的问题,因此,在执行归约任务之前,灵活动态地设置归约任务的数量,提升归约任务的处理性能。
图1为本申请提供的一种数据处理系统的架构示意图。数据处理系统可以是一个对应用数据进行分布式处理的实体架构。如图1所示,数据处理系统100包括控制节点110和连接控制节点110的多个服务器120。多个服务器120可以构成一个计算机集群。多个服务器120可以通过网络121互联。网络121可以是指企业内部网络(如:局域网((Local Area Network,LAN))或互联网(Internet)。每个服务器120包含多个处理器或处理器核,处理器或处理器核上也可以部署虚拟机或容器。控制节点110可以基于处理器、处理器核、虚拟机或容器分配任务。处理器或处理器核运行进程或线程用于执行任务(task)。本申请所述的一个计算节点对应处理器核、虚拟机或者容器,计算节点可以是指进程或线程,一个计算节点用于执行至少一个任务。
在一个计算集群中,有多个作业等待执行。如果将过多的计算资源(如:多个计算节点或一个计算节点)分配给单个作业,就可能会影响其他作业的性能。因此,在计算集群中设置有控制节点,控制节点也可以称为管理器。例如,控制节点110可以控制分配给待执行的作业的计算资源,使得高优先级作业能够优先被执行。控制节点110可以监控作业的执行状态,并且根据策略改变对该作业的资源分配。
控制节点110具体用于生成作业的执行计划,即将一个作业拆分成多个任务,这些任务可以分配给多个计算资源执行。多个任务可以分为多个执行阶段,同一阶段的任务可以并行执行,所有任务经过调度并行或串行完成,所有任务结束时标志一个作业完成。在一些实施例中,计算节点可以基于MapReduce模型对作业进行分布式处理。
控制节点110指示执行映射(map)任务的第二计算节点和执行归约(reduce)任务的第一计算节点。映射任务和归约任务可以是开发者用户设置的,例如,任务包括加法、减法、加权、字符串拼接、取数据间的交集或并集等运算。第二计算节点读取分片数据。第二计算节点对分片数据执行映射任务得到的中间数据,存储中间数据。例如,第二计算节点将中间数据存储到第二计算节点的存储空间或全局内存池。第一计算节点读取中间数据。例如,执行归约任务的第一计算节点从第一计算节点的存储空间或全局内存池读取中间数据。第一计算节点对中间数据执行归约任务,得到归约任务的结果,存储归约任务的结果。数据处理系统中的计算节点的存储介质和存储节点的存储介质经过统一编址构成全局内存池,系统中任意一个节点均可以访问全局内存池中的存储空间。本申请对存储中间数据的存储空间不予限定。
在本申请中,控制节点110还用于预估数据处理任务被执行后产生的结果数据的数据量,并获取多个计算节点中执行归约任务的第一计算节点的内存信息。进而,控制节点根据数据量及内存信息确定归约任务的数量,指示执行数据处理任务的第二计算节点按照数量对执行数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务;第一计算节点对多个第二计算节点分区后的数据执行归约处理。分区用于指示按照归约任务的数量对执行数据处理任务产生的结果数据进行划分后的数据。归约任务的数量与分区的数量相等。
示例地,假设作业为根据水果种类对一堆水果进行分类并统计每种水果的数量。如果一堆水果的数量较多,可以将一堆水果划分为M堆水果。启动M个映射任务,每个映射任务统计M堆水果种一堆水果中每种水果的数量。如果启动一个归约任务对M个映射任务的结果数据中每种水果的数量进行求和,可能由于计算节点的内存不足发生数据溢出,或者,由于计算量较大导致处理时长较长。如果对每种水果启动一个归约任务,比如一堆水果包括100种水果,启动100个归约任务,导致启动的归约任务数量过多而产生大量开销。因此,可以依据水果种类对M个映射任务的结果数据进行分区,启动N个归约任务,每个归约任务统计至少一种水果的数量。比如,将100种水果划分为5组,则启动5个归约任务,每个归约任务统计20种水果的数量。在一些实施例中,不同归约任务统计水果的种类可以不同。比如,M个映射任务中每个映射任务的结果数据中同一种水果(如:苹果)的数量均较多,可以针对该种水果单独启动一个归约任务统计数量。另一归约任务可以统计M个映射任务中每个映射任务的结果数据中至少两种水果的数量。进而,控制节点110指示一个计算节点或多个计算节点执行N个归约任务。当一个计算节点执行N个归约任务时,该计算节点依次执行N个归约任务。当多个计算节点执行N个归约任务时,多个计算节点并行执行N个归约任务。
这里的作业通常是需要较多计算资源并行处理的大型作业,本申请不限定作业的性质和数量。大多数任务是并发或并行执行的,而有一些任务则需要依赖于其他任务所产生的数据。本申请对任务的数量,以及可以并行执行的任务的数据都没有予以限制。
需要说明的是,作业可以从任何合适的源头提交给控制节点110。本申请不限定提交作业的位置,也不限定用户提交作业的具体机制。在图1中,例如,用户131操作客户端133以向控制节点110提交作业132。客户端133可以安装有客户端程序,客户端133运行客户端程序显示一种用户界面(user interface,UI),用户131操作用户界面访问分布式文件系统和分布式数据库获取数据,指示处理大数据作业的数据。客户端133可以是指连入网络140的计算机,也可称为工作站(workstation)。不同的客户端可以共享网络上的资源(如:计算资源、存储资源)。在该示例中,客户端133通过网络140连接到控制节点110,网络140可以是因特网,或者其他网络。因此,用户可以从远程位置向控制节点110提交作业。控制节点110可以从数据库获取输入数据。
在一些实施例中,数据处理系统100还可以提供参数调优服务,即提供已完成的数据处理任务的数据的存储、读取和处理等,以及当前任务的数据量预估接口等。可以由数据处理系统100中的控制节点或计算节点提供参数调优服务的功能。执行归约任务的计算节点调用这些接口,进 行已完成的数据处理任务的数据的存储和读取,预估当前数据处理任务的结果数据的数据量等功能,进而进行动态分区的调整。
例如,计算节点提供参数调优服务。控制节点110从计算节点获取数据处理任务被执行后产生的结果数据的数据量,以及执行归约任务的计算节点的内存信息,确定归约任务的数量。或者,提供参数调优服务的计算节点根据数据处理任务被执行后产生的结果数据的数据量以及执行归约任务的计算节点的内存信息,确定归约任务的数量,向控制节点110反馈归约任务的数量,无需控制节点110自行确定归约任务的数量。
又如,提供参数调优服务的计算节点也可以根据历史映射任务的数据量和历史映射任务的处理结果的数据量确定多个映射任务的处理结果的数据量后,向控制节点110反馈多个映射任务的处理结果的数据量,控制节点110根据多个映射任务的处理结果的数据量以及执行归约任务的计算节点的内存信息,确定归约任务的数量。
可选地,数据处理系统100还可以包括存储集群。存储集群包含至少两个存储节点150。一个存储节点150包括一个或多个控制器、网卡与多个硬盘。硬盘用于存储数据。例如,将作业的处理结果存储到硬盘。又如,计算节点执行映射任务或归约任务时从硬盘读取待处理数据。硬盘可以是磁盘或者其他类型的存储介质,例如固态硬盘或者叠瓦式磁记录硬盘等。网卡用于与计算机集群包含的计算节点通信。控制器用于根据计算节点发送的读/写数据请求,往硬盘中写入数据或者从硬盘中读取数据。在读写数据的过程中,控制器需要将读/写数据请求中携带的地址转换为硬盘能够识别的地址。在一些实施例中,存储集群基于分布式文件系统和分布式数据库存储和管理大量的数据。
下面将结合附图对本申请实施例提供的数据处理方法的实施方式进行详细描述。
图2为本申请提供的一种数据处理方法的流程示意图。在这里假设控制节点指示第一计算节点执行归约任务,以及指示多个第二计算节点执行数据处理任务(如,映射任务)。控制节点和计算节点可以是图1中的控制节点110和计算节点。如图2所示,该方法包括以下步骤。
步骤210、控制节点获取业务请求。
客户端响应用户操作,向控制节点发送业务请求。控制节点可以通过局域网或互联网接收客户端发送的业务请求。业务请求可以包括业务标识和业务数据。业务标识用于唯一指示一个业务。业务数据可以是计算节点进行大数据分布式处理的数据或指示大数据分布式处理的数据的标识数据。
用户操作可以是指用户操作大数据用户界面提交大数据作业的操作。大数据作业包括数据分析业务、数据查询业务和数据修改业务等。例如,大数据作业是指分析客户的个人数据和购买行为数据来描绘用户画像对客户进行分类,使得可以向特定客户推荐针对性的产品或优惠产品,提升客户满意度,稳固客户关系等。又如,大数据作业是指分析产品的历史销售量预测未来的销售量,发现销售量下降原因或销售量上升原因,推荐提升销售量的建设性建议。
例如,控制节点确定将业务划分为多个执行阶段以及执行任务的计算节点。每个阶段包括映射任务和归约任务。一个执行阶段可以并行执行多个任务。控制节点可以指示系统中空闲的计算节点执行任务;或者,根据任务所需的计算需求和时延需求,从系统中选择满足计算需求和时延需求的计算节点执行任务;本申请对控制节点调度执行任务的计算节点的方法不予限定。
在一些实施例中,控制节点向至少一个执行任务的计算节点发送控制指示,指示计算节点对业务请求指示的数据执行任务。例如,控制节点向第二计算节点发送控制指示,指示第二计算节点执行映射任务。控制节点根据映射任务被执行后产生的结果数据的数据量以及执行归约任务的计算节点的内存信息,确定归约任务的数量后,控制节点向第一计算节点发送控制指示,指示第一计算节点执行归约任务。
步骤220、控制节点预估数据处理任务被执行后产生的结果数据的数据量,并获取多个计算节点中执行归约任务的第一计算节点的内存信息。
如果数据处理任务的结果数据的数据量较大,第一计算节点对数据处理任务的结果数据执行归约任务时,由于第一计算节点的存储空间无法满足执行归约任务时存储数据的需求,可能导致数据溢出,降低任务的处理性能。因此,在第一计算节点存储数据处理任务的结果数据之前,控 制节点可以预先估计数据处理任务的结果数据的数据量,以便于根据数据处理任务的结果数据的数据量对数据处理任务的结果数据进行重新组合,得到中间数据,避免对中间数据执行归约任务时发生数据溢出等问题,从而,提升归约任务的处理性能。
方式一,控制节点可以实时地采集数据处理任务的结果数据的数据量。
在一些实施例中,在多个第二计算节点开始并行执行数据处理任务后的一段时间内,控制节点采样多个第二计算节点执行数据处理任务产生的结果数据;根据所采样的结果数据预估数据处理任务被执行完成后产生的结果数据的数据量。其中,一段时间可以是指采样多个第二计算节点执行数据处理任务产生的结果数据所使用的时间。
例如,计算节点在执行映射任务完成后,扫描映射任务的处理结果的数据量,向控制节点110上报映射任务的处理结果的数据量。控制节点可以获取到多个映射任务的处理结果的数据量。
又如,计算节点在执行映射任务完成后,根据比例扫描映射任务的处理结果,得到所扫描数据的数据量,向控制节点110上报所扫描数据的数据量,控制节点110根据所扫描数据的数据量预估映射任务的处理结果的数据量。比例可以是根据经验预先设置的。
又如,计算节点在执行多个映射任务完成后,根据比例扫描多个映射任务的处理结果,向控制节点110上报所扫描的映射任务的数据量,控制节点110根据所扫描的映射任务的数据量预估映射任务的处理结果的数据量。
在另一些实施例中,控制节点根据所采样的数据处理任务预估数据处理任务被执行后产生的结果数据的数据量。具体地,在多个第二计算节点执行数据处理任务之前,采样多个第二计算节点中待处理数据,并指示多个第二计算节点处理所采样的待处理数据;根据待处理数据的处理结果预估数据处理任务被执行后产生的结果数据的数据量。
方式二,控制节点可以根据已完成的数据处理任务的结果数据的数据量估计当前数据处理任务的结果数据的数据量。即,控制节点获取之前已完成的数据处理任务被执行时产生的历史数据,根据历史数据预估数据处理任务被执行后产生的结果数据的数据量。历史数据包括已完成的数据处理任务产生的结果数据的数据量。
例如,控制节点110根据历史映射任务的数据量和历史映射任务的处理结果的数据量进行训练神经网络,使神经网络具有根据映射任务的数据估计映射任务的处理结果的数据量的功能。控制节点110可以将映射任务的数据输入神经网络,输出映射任务的处理结果的数据量。
又如,控制节点110根据执行历史映射任务的数据量和历史映射任务的处理结果的数据量建立拟合关系,使控制节点110根据多个映射任务的数据量和拟合关系,确定多个map任务的处理结果的数据量。拟合关系满足如下公式(1)。
y=F(x)    公式(1)
其中,x表示映射任务的数据量,y表示映射任务的处理结果的数据。本实施例不限定F(x)的表达形式。例如,F(x)=ax+b,或,F(x)=ax2+b,a和b表示参数,a和b可以根据历史映射任务的数据量和历史映射任务的处理结果的数据量建立拟合关系训练得到。
步骤230、控制节点根据数据量及内存信息确定归约任务的数量。
控制节点可以接收执行归约任务的第一计算节点的内存信息。内存信息为内存大小。执行归约任务的计算节点的存储空间用于存储执行归约任务所需的数据、任务执行过程中产生的数据和执行归约任务的处理结果中至少一种。如果计算节点的存储空间无法满足执行归约任务时存储数据的需求,可能导致数据溢出,降低归约任务的处理性能。
其中,执行归约任务的多个第一计算节点可以具有相同的内存大小,则控制节点可以获取第一计算节点的内存信息,根据该内存信息确定归约任务的数量。例如,控制节点用数据量除于内存大小后取整(如:向上取整或向下取整)得到归约任务的数量,使第一计算节点对划分得到的分区执行归约任务时,避免数据溢出,从而提升归约任务的处理性能。归约任务的数量满足如下公式(2)。
P>S/M    公式(2)
其中,P表示归约任务的数量,S表示预估数据处理任务被执行后产生的结果数据的数据量,M表示执行归约任务的计算节点的内存大小。
步骤240、执行数据处理任务的多个第二计算节点并行执行数据处理任务后,每个第二计算节点按照数量对执行数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务。
控制节点指示执行数据处理任务的第二计算节点依据归约任务的数量划分数据处理任务的结果数据。多个第二计算节点并行执行数据处理任务后,每个第二计算节点按照确定的归约任务的数量对执行数据处理任务产生的结果数据进行分区,得到中间数据。每个分区对应一个归约任务。第二计算节点将中间数据存储到控制指示所指示的存储位置。例如,控制指示包括存储中间数据的存储空间的物理地址。该物理地址指示执行映射任务的第二计算节点的存储空间(如:本地存储介质或扩展的本地存储介质)、计算集群中除执行映射任务的计算节点外其他计算节点的存储空间、存储集群中存储节点的存储空间、全局内存池的存储空间和扩展的全局存储介质的存储空间中任一个。
控制节点向执行映射任务的计算节点发送控制指示,指示执行映射任务的计算节点根据归约任务的数量划分映射任务的处理结果,得到中间数据(即,分区后的数据),并存储中间数据。中间数据包括多个数据块,多个数据块的数量表示归约任务的数量。
需要说明的是,本申请不限定控制节点确定归约任务的数量和执行数据处理任务的先后顺序,可以先确定归约任务的数量。例如,控制节点可以根据已完成的数据处理任务的结果数据的数据量估计当前数据处理任务的结果数据的数据量。
步骤250、第一计算节点对多个第二计算节点分区后的数据执行归约处理。
控制节点指示第一计算节点对中间数据执行归约任务,第一计算节点从第二计算节点的存储空间中获取中间数据或从全局内存池获取中间数据,对中间数据执行归约任务。
在一些实施例中,控制节点指示与归约任务的数量相等的第一计算节点执行归约任务,即一个第一计算节点执行一个归约任务。
在另一些实施例中,控制节点指示小于归约任务的数量的第一计算节点执行归约任务,即一个第一计算节点执行多个归约任务。
从而,在每个映射任务执行完成后,存储映射任务的处理结果之前,对映射任务的处理结果的数据量进行预估,基于数据量大小和归约任务的计算节点的内存大小,自动评估出一个合理的归约任务的数量,执行映射任务的计算节点根据归约任务的数量划分映射任务的处理结果产生中间数据,能够避免单个归约任务的数据量过大,可能发生内存不足数据溢出的问题,导致降低归约任务的处理性能,以及避免单个归约任务的数据量过小,导致启动的归约任务数量过多而产生大量的开销的问题,因此,在执行归约任务之前,灵活动态地设置归约任务的数量,提升归约任务的处理性能。
可选地,控制节点根据预估的数据处理任务被执行后产生的结果数据的数据量及执行归约任务的计算节点的内存大小确定归约任务的数量,也可以替换描述为,控制节点根据预估的数据处理任务被执行后产生的结果数据的数据量及执行归约任务的计算节点的内存大小确定任务并行度。任务并行度用于指示对数据处理任务产生的结果数据进行分区的结果。
在另一种可能的实现方式中,当任务并行度被自动调整后,通过作业的用户交互界面或者作业的运行日志能够直接看出发生的变化,从而确定是否对任务并行度进行了动态调整。示例地,如图3所示,原始物理计划中归约任务的数量指示将映射任务的处理结果划分200个数据块,优化后物理计划中归约任务的数量指示将映射任务的处理结果划分500个数据块。
可以理解的是,为了实现上述实施例中的功能,控制节点和计算节点包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
上文中结合图1至图3,详细描述了根据本实施例所提供的数据处理方法,下面将结合图4和图5,描述根据本实施例所提供的调度装置和数据处理装置。
图4为本实施例提供的可能的控制装置的结构示意图。这些控制装置可以用于实现上述方法实施例中控制节点的功能,因此也能实现上述方法实施例所具备的有益效果。在本实施例中,该 控制装置可以是如图1所示的控制节点110,还可以是应用于服务器的模块(如芯片)。
如图4所示,控制装置400包括通信模块410、处理模块420和存储模块430。控制装置400用于实现上述图1中所示的方法实施例中控制节点110的功能。
通信模块410,用于获取业务请求。例如,通信模块410用于执行图2中步骤210。
处理模块420,用于预估结果数据的数据量,并获取多个计算节点中执行归约任务的第一计算节点的内存信息,以及根据数据量及内存信息确定归约任务的数量。例如,处理模块420用于执行图2中步骤220和步骤230。
存储模块430用于存储归约任务的数量、内存大小、数据处理任务被执行后产生的结果数据的数据量、已完成的数据处理任务被执行时产生的历史数据和中间数据等。
可选地,处理模块420具体用于获取之前已完成的数据处理任务被执行时产生的历史数据,所述历史数据包括所述已完成的数据处理任务产生的结果数据的数据量;根据所述历史数据预估所述数据处理任务被执行后产生的结果数据的数据量。
可选地,处理模块420具体用于在所述多个第二计算节点开始并行执行所述数据处理任务后的一段时间内,采样所述多个第二计算节点执行所述数据处理任务产生的结果数据;根据所采样的所述结果数据预估所述数据处理任务被执行完成后产生的结果数据的数据量。
可选地,处理模块420具体用于在所述多个第二计算节点执行所述数据处理任务之前,采样所述多个第二计算节点中待处理数据,并指示所述多个第二计算节点处理所采样的待处理数据;根据所述待处理数据的处理结果预估所述数据处理任务被执行后产生的结果数据的数据量。
可选地,处理模块420具体用于用所述数据量除于所述内存大小后向上取整得到所述归约任务的数量。
图5为本实施例提供的一种控制设备500的结构示意图。如图所示,控制设备500包括处理器510、总线520、存储器530、通信接口540和内存单元550(也可以称为主存(main memory)单元)。处理器510、存储器530、内存单元550和通信接口540通过总线520相连。
应理解,在本实施例中,处理器510可以是CPU,该处理器510还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
处理器还可以是图形处理器(graphics processing unit,GPU)、神经网络处理器(neural network processing unit,NPU)、微处理器、ASIC、或一个或多个用于控制本申请方案程序执行的集成电路。
通信接口540用于实现控制设备500与外部设备或器件的通信。在本实施例中,控制设备500用于实现图1所示的控制节点110的功能时,通信接口540用于发送控制指示,指示计算节点执行映射任务,或者指示计算节点按照归约任务的数量对执行数据处理任务产生的结果数据进行分区。控制设备500用于实现图1所示的计算节点的功能时,通信接口540用于接收控制指示,以及向控制节点110上报映射任务的处理结果的数据量。
总线520可以包括一通路,用于在上述组件(如处理器510、内存单元550和存储器530)之间传送信息。总线520除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线520。总线520可以是快捷外围部件互连标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线520可以分为地址总线、数据总线、控制总线等。
作为一个示例,控制设备500可以包括多个处理器。处理器可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的计算单元。在本实施例中,控制设备500用于实现图1所示的控制节点110的功能时,处理器510可以预估结果数据的数据量,并获取多个计算节点中执行归约任务的第一计算节点的内存信息,以及根据数据量及内存信息确定归约任务的数量。控制设备500用于实现图1所示的计算节 点的功能时,处理器510可以按照归约任务的数量对执行数据处理任务产生的结果数据进行分区,对计算节点分区后的数据执行归约处理。
值得说明的是,图5中仅以控制设备500包括1个处理器510和1个存储器530为例,此处,处理器510和存储器530分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。
内存单元550可以对应上述方法实施例中用于存储归约任务的数量和中间数据等信息的存储介质。内存单元550可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
存储器530可以对应上述方法实施例中用于存储计算机指令、存储策略等信息的存储介质,例如,磁盘,如机械硬盘或固态硬盘。
上述控制设备500可以是一个通用设备或者是一个专用设备。例如,控制设备500可以是边缘设备(例如,携带具有处理能力芯片的盒子)等。可选地,控制设备500也可以是服务器或其他具有计算能力的设备。
应理解,根据本实施例的控制设备500可对应于本实施例中的控制装置400,并可以对应于执行根据图2中任一方法中的相应主体,并且控制装置400中的各个模块的上述和其它操作和/或功能分别为了实现图2中的各个方法的相应流程,为了简洁,在此不再赘述。
本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于控制设备中。当然,处理器和存储介质也可以作为分立组件存在于控制设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (16)

  1. 一种数据处理方法,其特征在于,应用于数据处理系统,所述数据处理系统包括控制节点和多个计算节点,所述方法包括:
    所述控制节点预估数据处理任务被执行后产生的结果数据的数据量,并获取所述多个计算节点中执行归约reduce任务的第一计算节点的内存信息;
    所述控制节点根据所述数据量及所述内存信息确定所述归约任务的数量;
    所述多个计算节点中执行所述数据处理任务的多个第二计算节点并行执行所述数据处理任务,每个第二计算节点按照所述数量对执行所述数据处理任务产生的结果数据进行分区,每个分区对应一个归约任务;
    所述第一计算节点对所述多个第二计算节点分区后的数据执行归约处理。
  2. 根据权利要求1所述的方法,其特征在于,所述控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:
    获取之前已完成的数据处理任务被执行时产生的历史数据,所述历史数据包括所述已完成的数据处理任务产生的结果数据的数据量;
    根据所述历史数据预估所述数据处理任务被执行后产生的结果数据的数据量。
  3. 根据权利要求1所述的方法,其特征在于,所述控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:
    在所述多个第二计算节点开始并行执行所述数据处理任务后的一段时间内,采样所述多个第二计算节点执行所述数据处理任务产生的结果数据;
    根据所采样的所述结果数据预估所述数据处理任务被执行完成后产生的结果数据的数据量。
  4. 根据权利要求1所述的方法,其特征在于,所述控制节点预估数据处理任务被执行后产生的结果数据的数据量包括:
    在所述多个第二计算节点执行所述数据处理任务之前,采样所述多个第二计算节点中待处理数据,并指示所述多个第二计算节点处理所采样的待处理数据;
    根据所述待处理数据的处理结果预估所述数据处理任务被执行后产生的结果数据的数据量。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述内存信息为内存大小,所述控制节点根据所述数据量及所述内存信息确定所述归约任务的数量包括:
    用所述数据量除以所述内存大小后向上取整得到所述归约任务的数量。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述第一计算节点的数量与所述归约任务的数量相等,一个所述第一计算节点执行一个归约任务。
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述第一计算节点的数量小于所述归约任务的数量,一个所述第一计算节点执行多个归约任务。
  8. 一种控制装置,其特征在于,所述装置应用于数据处理系统中的控制节点,所述数据处理系统还包括多个计算节点,所述装置包括:
    处理模块,用于预估数据处理任务被执行后产生的结果数据的数据量,并获取所述多个计算节点中执行归约reduce任务的第一计算节点的内存信息;
    所述处理模块,还用于根据所述数据量及所述内存信息确定所述归约任务的数量;
    所述处理模块,还用于指示每个第二计算节点按照所述数量对执行所述数据处理任务产生的结果数据进行分区,以及指示所述第一计算节点对所述多个第二计算节点分区后的数据执行归约处理,每个分区对应一个归约任务。
  9. 根据权利要求8所述的装置,其特征在于,所述处理模块预估数据处理任务被执行后产生的结果数据的数据量时具体用于:
    获取之前已完成的数据处理任务被执行时产生的历史数据,所述历史数据包括所述已完成的数据处理任务产生的结果数据的数据量;
    根据所述历史数据预估所述数据处理任务被执行后产生的结果数据的数据量。
  10. 根据权利要求8所述的装置,其特征在于,所述处理模块预估数据处理任务被执行后产生的结果数据的数据量时具体用于:
    在所述多个第二计算节点开始并行执行所述数据处理任务后的一段时间内,采样所述多个第二计算节点执行所述数据处理任务产生的结果数据;
    根据所采样的所述结果数据预估所述数据处理任务被执行完成后产生的结果数据的数据量。
  11. 根据权利要求8所述的装置,其特征在于,所述处理模块预估数据处理任务被执行后产生的结果数据的数据量时具体用于:
    在所述多个第二计算节点执行所述数据处理任务之前,采样所述多个第二计算节点中待处理数据,并指示所述多个第二计算节点处理所采样的待处理数据;
    根据所述待处理数据的处理结果预估所述数据处理任务被执行后产生的结果数据的数据量。
  12. 根据权利要求8-11中任一项所述的装置,其特征在于,所述内存信息为内存大小,所述处理模块根据所述数据量及所述内存信息确定所述归约任务的数量时具体用于:
    用所述数据量除以所述内存大小后向上取整得到所述归约任务的数量。
  13. 根据权利要求8-12中任一项所述的装置,其特征在于,所述第一计算节点的数量与所述归约任务的数量相等,一个所述第一计算节点执行一个归约任务。
  14. 根据权利要求8-12中任一项所述的装置,其特征在于,所述第一计算节点的数量小于所述归约任务的数量,一个所述第一计算节点执行多个归约任务。
  15. 一种控制设备,其特征在于,所述控制设备包括存储器和至少一个处理器,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行上述权利要求1-7中任一项所述的方法中控制节点的操作步骤。
  16. 一种数据处理系统,其特征在于,所述数据处理系统包括控制节点和多个计算节点;所述控制节点用于执行上述权利要求1-7中任一所述的方法中控制节点的操作步骤,所述计算节点用于执行上述权利要求1-7中任一所述的方法中计算节点的操作步骤。
PCT/CN2023/101119 2022-06-25 2023-06-19 数据处理方法、装置、设备和系统 WO2023246709A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210731652.7 2022-06-25
CN202210731652 2022-06-25
CN202211460871.2A CN117290062A (zh) 2022-06-25 2022-11-17 数据处理方法、装置、设备和系统
CN202211460871.2 2022-11-17

Publications (1)

Publication Number Publication Date
WO2023246709A1 true WO2023246709A1 (zh) 2023-12-28

Family

ID=89257714

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101119 WO2023246709A1 (zh) 2022-06-25 2023-06-19 数据处理方法、装置、设备和系统

Country Status (2)

Country Link
CN (1) CN117290062A (zh)
WO (1) WO2023246709A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970520A (zh) * 2013-01-31 2014-08-06 国际商业机器公司 MapReduce架构中的资源管理方法、装置和架构系统
US20150365474A1 (en) * 2014-06-13 2015-12-17 Fujitsu Limited Computer-readable recording medium, task assignment method, and task assignment apparatus
CN111782385A (zh) * 2019-04-04 2020-10-16 伊姆西Ip控股有限责任公司 用于处理任务的方法、电子设备和计算机程序产品
CN113342266A (zh) * 2021-05-20 2021-09-03 普赛微科技(杭州)有限公司 一种基于非易失内存的分布式计算方法、系统及存储介质
CN114510325A (zh) * 2020-11-17 2022-05-17 华为技术有限公司 任务调度方法、装置及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970520A (zh) * 2013-01-31 2014-08-06 国际商业机器公司 MapReduce架构中的资源管理方法、装置和架构系统
US20150365474A1 (en) * 2014-06-13 2015-12-17 Fujitsu Limited Computer-readable recording medium, task assignment method, and task assignment apparatus
CN111782385A (zh) * 2019-04-04 2020-10-16 伊姆西Ip控股有限责任公司 用于处理任务的方法、电子设备和计算机程序产品
CN114510325A (zh) * 2020-11-17 2022-05-17 华为技术有限公司 任务调度方法、装置及系统
CN113342266A (zh) * 2021-05-20 2021-09-03 普赛微科技(杭州)有限公司 一种基于非易失内存的分布式计算方法、系统及存储介质

Also Published As

Publication number Publication date
CN117290062A (zh) 2023-12-26

Similar Documents

Publication Publication Date Title
US11656911B2 (en) Systems, methods, and apparatuses for implementing a scheduler with preemptive termination of existing workloads to free resources for high priority items
US11425194B1 (en) Dynamically modifying a cluster of computing nodes used for distributed execution of a program
US10514951B2 (en) Systems, methods, and apparatuses for implementing a stateless, deterministic scheduler and work discovery system with interruption recovery
US11294726B2 (en) Systems, methods, and apparatuses for implementing a scalable scheduler with heterogeneous resource allocation of large competing workloads types using QoS
US9647955B2 (en) Systems, methods, and devices for dynamic resource monitoring and allocation in a cluster system
US8826277B2 (en) Cloud provisioning accelerator
US20200104230A1 (en) Methods, apparatuses, and systems for workflow run-time prediction in a distributed computing system
WO2010137455A1 (ja) 計算機システム、方法、およびプログラム
US11023148B2 (en) Predictive forecasting and data growth trend in cloud services
US20140195673A1 (en) DYNAMICALLY BALANCING EXECUTION RESOURCES TO MEET A BUDGET AND A QoS of PROJECTS
CN104050042A (zh) Etl作业的资源分配方法及装置
CN110737717B (zh) 一种数据库迁移方法及装置
CN115033340A (zh) 一种宿主机的选择方法及相关装置
EP3739449B1 (en) Prescriptive cloud computing resource sizing based on multi-stream data sources
CN112000460A (zh) 一种基于改进贝叶斯算法的服务扩缩容的方法及相关设备
US11675515B2 (en) Intelligent partitioning engine for cluster computing
WO2023193814A1 (zh) 融合系统的数据处理方法、装置、设备和系统
WO2023246709A1 (zh) 数据处理方法、装置、设备和系统
CN107493205B (zh) 一种设备集群扩容性能预测方法及装置
He et al. Queuing-oriented job optimizing scheduling in cloud mapreduce
Zacheilas et al. A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads
US20210397485A1 (en) Distributed storage system and rebalancing processing method
CN117093335A (zh) 分布式存储系统的任务调度方法及装置
Ferikoglou Resource aware GPU scheduling in Kubernetes infrastructure
WO2024036940A1 (zh) 一种容器管理方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23826357

Country of ref document: EP

Kind code of ref document: A1