CN114741165A - Processing method of data processing platform, computer equipment and storage device - Google Patents

Processing method of data processing platform, computer equipment and storage device Download PDF

Info

Publication number
CN114741165A
CN114741165A CN202210199249.4A CN202210199249A CN114741165A CN 114741165 A CN114741165 A CN 114741165A CN 202210199249 A CN202210199249 A CN 202210199249A CN 114741165 A CN114741165 A CN 114741165A
Authority
CN
China
Prior art keywords
task
execution node
node
data
executing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210199249.4A
Other languages
Chinese (zh)
Inventor
李康
张淑云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202210199249.4A priority Critical patent/CN114741165A/en
Publication of CN114741165A publication Critical patent/CN114741165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a processing method of a data processing platform, computer equipment and a storage device. The method comprises the following steps: receiving a task processing request of data to be processed; determining a control execution node meeting the task processing request; performing affinity scheduling on a first task executing the task processing request based on the control execution node to allocate the first execution node; executing a first task on the data to be processed by utilizing a first execution node to obtain intermediate processing data; performing affinity scheduling on a second task executing the task processing request based on the control execution node to allocate a second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result. By the scheme, the execution efficiency of the task can be improved.

Description

Processing method of data processing platform, computer equipment and storage device
Technical Field
The present application relates to the field of data processing platform technologies, and in particular, to a processing method for a data processing platform, a computer device, and a storage apparatus.
Background
With the continuous development of information science, data enters a massive multiplication era, and more distributed computing platforms for data processing and analysis are provided.
The existing big data processing platform provides a data stream processing model and can perform functions of data processing, task scheduling, data caching, data communication and the like. In a distributed data processing system, such as MapReduce, Spark, etc., since a large number of intermediate results of MapReduce need to be transmitted through a network, when the calculation amount is large, the network transmission pressure is increased, thereby reducing the processing efficiency of the data processing platform on tasks.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a processing method of a data processing platform, a computer device and a storage device, which can improve the execution efficiency of tasks.
In order to solve the above problem, a first aspect of the present application provides a processing method for a data processing platform, where the method includes: receiving a task processing request of data to be processed; determining a control execution node meeting the task processing request; performing affinity scheduling on a first task executing the task processing request based on the control execution node to allocate the first execution node; executing a first task on the data to be processed by utilizing a first execution node to obtain intermediate processing data; performing affinity scheduling on a second task executing the task processing request based on the control execution node to allocate a second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result.
In order to solve the above problem, a second aspect of the present application provides a computer device, which includes a memory and a processor coupled to each other, wherein the memory stores program data, and the processor is configured to execute the program data to implement any step of the processing method of the data processing platform.
In order to solve the above problem, a third aspect of the present application provides a storage device, which stores program data capable of being executed by a processor, the program data being used to implement any one of the steps of the processing method of the data processing platform.
According to the scheme, the management and control execution nodes meeting the task processing request are determined by receiving the task processing request of the data to be processed; performing affinity scheduling on a first task executing the task processing request based on the control execution node to allocate the first execution node; executing a first task on the data to be processed by using a first execution node to obtain intermediate processing data; performing affinity scheduling on a second task executing the task processing request based on the control execution node to allocate a second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result. When the management and control execution node meets the first task and the second task, the first task and the second task are scheduled on the management and control execution node, so that the first task and the second task can be executed on the same node, the transmission of intermediate processing data through a network is reduced, the reading and writing of the intermediate processing data in a disk are reduced, and the execution efficiency of the tasks can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the present application, the drawings required in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a processing method of the data processing platform of the present application;
FIG. 2 is a flowchart illustrating an embodiment of step S12 in FIG. 1;
FIG. 3 is a flowchart illustrating an embodiment of step S13 of FIG. 1;
FIG. 4 is a flowchart illustrating an embodiment of step S15 of FIG. 1;
FIG. 5 is a flowchart illustrating an embodiment of step S16 of FIG. 1;
FIG. 6 is a block diagram of an embodiment of a processing device of the data processing platform of the present application;
FIG. 7 is a schematic block diagram of an embodiment of a computer apparatus of the present application;
FIG. 8 is a schematic structural diagram of an embodiment of a memory device according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first" and "second" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The inventor of the application discovers, by long-term research, that by taking a data processing platform MapReduce as an example, the MapReduce is a calculation model, a frame and a platform oriented to large-data parallel processing, is used for parallel operation of a large-scale data set (larger than 1TB), and can be divided into two stages of Map (mapping) and Reduce (reduction), the MapReduce can abstract a calculation process with complex data processing into a plurality of groups of Map and Reduce steps, the Map and the Reduce are respectively executed by a plurality of Map and reducer functions, different maps and reducers are distributed to different calculation nodes, and efficient distributed calculation processing on a large-scale calculation cluster is realized.
When MapReduce processes data, after executing a Map task, the obtained intermediate processing data is stored in a local storage space of a computing node, when executing a Reduce task, the computing node at this stage copies the intermediate processing data of each Map task to the local storage space of the node through HTTP (Hyper Text Transfer Protocol), and then executes the Reduce task to calculate the intermediate processing data, when the processed data amount is large, the data processing efficiency of a data processing platform is reduced due to the network broadband and the performance of disk I/O (input/output) of the environment.
In order to solve the above problems, the present application provides the following examples, each of which is specifically described below.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a processing method of a data processing platform according to an embodiment of the present application. The method may comprise the steps of:
s11: and receiving a task processing request of data to be processed.
The client can send a task processing request of the data to be processed to the data processing platform, and the data processing platform receives the task processing request of the data to be processed, so that the data processing platform processes the data to be processed.
In some embodiments, the data processing platform may include a programming model platform for data processing, or may be a parallel computing model, a process of the data processing platform for data processing may be divided into a first task and a second task, for example, MapReduce, Spark, and the like, and may include a Map task and a Reduce task, and the data processing platform of the present application may also be another data processing platform for parallel computing, which is not limited in this application.
In some embodiments, the task processing request may include multiple tasks for parallel computing, for example, may include a first task including performing the mapping process and a second task including performing the reduction computation. For example, the first task is a Map task and the second task is a Reduce task. The first task and the second task of the present application may also be other parallel processing tasks, which is not limited in this application.
S12: and determining a control execution node meeting the task processing request.
The client can send a task processing request of data to be processed to the data processing platform, after the data processing platform receives the task processing request sent by the client, a control execution node meeting the task processing request can be determined, and the control execution node can control the operation of MapReduce.
S13: and performing affinity scheduling on a first task for executing the task processing request based on the control execution node so as to allocate the first execution node.
After the first task is started, a first task request may be initiated, and affinity scheduling may be performed on the first task executing the task processing request based on the management and control execution node, so that the first execution node for executing the first task may be allocated.
In some embodiments, a first task executing a task processing request may be affinity scheduled based on affinity information governing the executing nodes.
In some embodiments, if the managed execution node satisfies the first task, the managed execution node is taken as the first execution node. If the first task is satisfied, that is, the resource of the management and control execution node satisfies the computing resource required by the first task, and that the affinity information of the management and control execution node satisfies the processing requirement of the first task, the first task may be scheduled on the management and control execution node, that is, the management and control execution node is used as the first execution node.
In some embodiments, if the managed execution node does not satisfy the first task, the first task may be scheduled at another node, and the another node may be used as the first execution node.
In some embodiments, if the managed execution node includes a plurality of nodes, and at least some of the managed execution nodes satisfy the first task, at least some of the managed execution nodes may be used as the first execution node, and if resources of at least some of the managed execution nodes do not satisfy the first task, at least some of the managed execution nodes and other nodes may be used together as the first execution node.
S14: and executing a first task on the data to be processed by utilizing the first execution node to obtain intermediate processing data.
And executing a first task on the data to be processed by using the first execution node, for example, if the first task is a Map task, mapping the data to be processed to obtain intermediate processing data.
In some embodiments, a first task is executed with a first execution node to obtain intermediate processing data, and the intermediate processing data is stored under a local path of the first execution node. The local path also includes a local path of the first execution node, for example, the first execution node is used to perform mapping processing on the data to be processed, and the obtained intermediate processing data may be stored in the local storage space of the first execution node.
S15: and performing affinity scheduling on a second task executing the task processing request based on the control execution node so as to allocate the second execution node.
After the second task is started, a second task request may be initiated, and affinity scheduling may be performed on the second task executing the task processing request based on the managed execution node, so that a second execution node for executing the second task may be allocated.
In some embodiments, affinity scheduling may be performed for the second task executing the task processing request based on affinity information governing the executing node.
In some embodiments, if the managed execution node satisfies the second task, the managed execution node is regarded as the second execution node. If the resource meeting the second task may be a computing resource meeting the second task, or the affinity information of the management and control execution node meets the processing requirement of the second task, the second task may be scheduled on the management and control execution node, that is, the management and control execution node is used as the second execution node.
In some embodiments, if the managed execution node does not satisfy the second task, the second task may be scheduled at another node, and the another node is used as the second execution node.
In some embodiments, if the managed execution node includes a plurality of nodes, at least part of the managed execution node satisfies the second task, at least part of the managed execution node may be used as the second execution node, and if the resource of at least part of the managed execution node does not satisfy the second task, at least part of the managed execution node and other nodes may be used together as the second execution node.
S16: and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result.
And executing a second task on the data to be processed by using a second execution node, for example, if the second task is a Reduce task, performing reduction calculation on the data to be processed to obtain a processing result.
In some embodiments, the second execution node may obtain intermediate processing data required for executing the second task, where the intermediate processing data is obtained by the first execution node performing mapping processing on the data to be processed.
In some embodiments, when the managed execution node satisfies the first task and the second task, both the first task and the second task may be scheduled on the managed execution node, so that the first task and the second task may be executed on the same node, and at this time, the second execution node may be used to directly read the intermediate processing data to execute the second task.
In the embodiment, a control execution node meeting a task processing request is determined by receiving the task processing request of data to be processed; performing affinity scheduling on a first task executing the task processing request based on the control execution node to allocate the first execution node; executing a first task on the data to be processed by using a first execution node to obtain intermediate processing data; performing affinity scheduling on a second task executing the task processing request based on the control execution node to allocate a second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result. When the management and control execution node meets the first task and the second task, the first task and the second task are scheduled on the management and control execution node, so that the first task and the second task can be executed on the same node, the transmission of intermediate processing data through a network is reduced, the reading and writing of the intermediate processing data in a disk are reduced, and the execution efficiency of the tasks can be improved.
In some embodiments, referring to fig. 2, the step S12 of determining the managed execution node that satisfies the task processing request may include the following steps:
s121: and calling the management container corresponding to the task processing request by using the task manager, and setting the affinity information of the nodes for the management container.
In some embodiments, the MapReduce may be run using Yarn in kubernets, where Yarn is a cluster resource management system of Hadoop and kubernets is a container orchestration framework, i.e., a container cluster management system.
The client may send a task processing request for data to be processed to the data processing platform, and after receiving the task processing request sent by the client, the data processing platform may call an interface of Kubernetes to pull up a task by using a node manager (NodeManager), that is, start a management container of a task manager (MRAppMaster). The task manager is a role in the MapReduce operation process and can be used for controlling the operation of MapReduce. The NodeManager, which is an agent for each machine (or node), may be used to manage the operation of tasks, monitor the resource usage of applications, and report to the MRAppMaster.
In some embodiments, the management container may be set with the affinity information of the node, that is, the management container is added with the affinity information of the node.
S122: and calling the nodes meeting the preset conditions as the control execution nodes meeting the task processing request by utilizing the affinity information of the nodes.
The nodes meeting the preset conditions can be called as the management and control execution nodes meeting the task processing request by utilizing the affinity information of the nodes. Since there is no container of this affinity information on kubernets at this time, this management container/task manager can be scheduled on any node that meets the preset conditions.
In some embodiments, the predetermined condition includes the same node satisfying the affinity scheduling condition, and/or a different node satisfying the resource scheduling condition. For example, the task manager may be scheduled on a node that satisfies a preset condition, if the node that satisfies the resource scheduling condition, that is, the resource of the node satisfies the task processing request, if the node that satisfies the affinity scheduling, etc., the node may be used as a management and control execution node, and the node that satisfies the affinity scheduling condition may be the same node, or may be multiple nodes, multiple different nodes that satisfy the affinity scheduling condition, multiple different nodes that satisfy the resource scheduling, etc. This is not limited by the present application.
In some embodiments, a task manager may be scheduled onto a governing execution node, to launch the task manager/management container with the governing execution node, and so on.
In some embodiments, referring to fig. 3, in the step S13, if the managed execution node satisfies the first task, the step of using the managed execution node as the first execution node may include the following steps:
s131: a first task is initiated with the task manager to invoke the first task container and to set affinity information for the nodes for the first task container.
After the task manager is started, a request for running a first task, that is, a Map task in MapReduce, may be initiated to the node manager by using the task manager, and after receiving the request, the node manager may call an interface of kubernets to pull up a first task container, where the first task container is a container of MapReduce, and may add affinity information of a node to the first task container.
S132: and if the management and control execution node corresponding to the task manager meets the first task, scheduling the first task to the management and control execution node so as to take the management and control execution node as the first execution node.
At this time, since the container having the affinity information already runs on the kubernets, that is, the management container having the affinity information runs, when the management and control execution node corresponding to the task manager satisfies a condition of the first task, that is, when the affinity information of the node satisfies the affinity scheduling, the resource of the management and control execution node can execute the first task, or when at least one part of the management and control execution node satisfies the first task, the management and control execution node satisfying the first task may be used as the first execution node, and the first task is scheduled to the management and control execution node, that is, the first task is scheduled on the node where the MRAppMaster is located, so that the node where the MRAppMaster is located is used to execute the first task. The node where the MRAppMaster is located may be a plurality of nodes, and the plurality of nodes may be all nodes satisfying affinity scheduling.
In some embodiments, if the management and control execution node corresponding to the task manager does not satisfy the first task, the first task may be further scheduled to another node, so that the other node is used as the first execution node to execute the first task.
In some embodiments, after the first execution node completes execution of the first task on the data to be processed, the obtained intermediate processing data may be output to a local storage space/disk of the first execution node, and meanwhile, the first execution node may further send a storage path of the intermediate processing data and node information to the task manager, where the node information is node information of the first execution node.
In some embodiments, referring to fig. 4, in the step S15, if the managed execution node satisfies the second task, the method for setting the managed execution node as the second execution node may include the following steps:
s151: and initiating a second task by using the task manager to call the second task container and set the affinity information of the node for the second task container.
After the first task is finished running, a task manager can be used for initiating a request for running a second task to a node manager, namely, a Reduce task (a Reduce task in MapReduce) is initiated to the node manager, after the node manager receives the request, an interface of kubernets can be called to pull up a second task container, and the second task container is a container of the Reduce task and can add the affinity information of the node to the second task container.
S152: and if the control execution node corresponding to the task manager meets the second task, scheduling the second task to the control execution node so as to take the control execution node as the second execution node.
If the management and control execution node corresponding to the task manager meets the condition of the second task, that is, when the affinity information of the node meets the affinity scheduling, the resource of the management and control execution node can execute the second task, or when at least one part of the management and control execution node meets the second task, the management and control execution node meeting the second task can be used as the second execution node, and the second task is scheduled to the management and control execution node, that is, the second task is scheduled on the node where the MRAppMaster is located, so that the node where the MRAppMaster is located is used to execute the second task. The node where the MRAppMaster is located may be a plurality of nodes, and the plurality of nodes may be all nodes satisfying affinity scheduling.
In some embodiments, if the management and control execution node corresponding to the task manager does not satisfy the second task, the second task may be further scheduled to another node, so that the another node is used as a second execution node to execute the second task.
In some embodiments, since the container with the affinity information already runs on kubernets, that is, the management container with the affinity information and the first task container run, if the managed execution node satisfies the second task, the second task is scheduled to the managed execution node to use the managed execution node as the second execution node, and if the first execution node satisfies the second task, the second task is scheduled to the first execution node to use the first execution node as the second execution node to execute the second task.
In some embodiments, after the second executing node starts the ReduceTask, the second executing node may send node information of the second executing node to the MRAppMaster, where the node information may be host information of the second executing node. And may also request the MRAppMaster for a storage path of intermediate processing data corresponding to mapmaster.
In this embodiment, by using the affinity information of the Kubernetes node and scheduling the first task and the second task through affinity, data transmission can be avoided, so that the execution speed of the task is increased, and the processing efficiency of the data processing platform on the task is improved.
In some embodiments, referring to fig. 5, in the step S16, the executing the second task on the intermediate processing data by the second executing node to obtain the processing result includes the following steps:
s161: and judging whether the first execution node and the second execution node are the same node.
After the first task is executed by the first execution node, the obtained intermediate processing data is stored under the local path of the first execution node. And the first execution node sends the storage path of the intermediate processing data and the node information to the task manager.
When the second task is started, the second execution node sends node information of the second execution node to the task manager, and acquires storage information of the intermediate processing data from the task manager, wherein the storage information comprises the node information of the first execution node storing the intermediate processing data and a storage path of the intermediate processing data, so as to acquire the intermediate processing data through the storage information.
In some embodiments, the task manager may determine whether the second execution node and the first execution node are the same node by using the node information of the second execution node and the storage information of the intermediate processing data. And if the node information of the first execution node in the storage information is consistent with the node information of the second execution node, judging that the second execution node is the same as the first execution node. And if the node information of the first execution node in the storage information is inconsistent with the node information of the second execution node, judging that the second execution node is different from the first execution node. Whether the first executing node and the second executing node are the same node or not can be judged in other manners, which is not limited in the present application.
If it is determined that the nodes are the same node, step S162 is performed.
If it is determined that the nodes are not the same node, step S163 is executed.
S162: and returning the local path of the intermediate processing data to the second execution node, so that the second execution node accesses the local path and executes a second task on the intermediate processing data to obtain a processing result.
Because the second execution node is the same node as the first execution node, the task manager can return a local path of the intermediate processing data to the second node, so that the second execution node can access the local path and read the intermediate processing data to execute the second task to obtain a processing result.
In some embodiments, after receiving the local path, the second execution node may read intermediate processing data stored in the local path of the second execution node, that is, read intermediate processing data stored in the local storage space, execute a second task on the intermediate processing data by using the second execution node, obtain a processing result, and output the processing result of the task processing request.
S163: and returning the network path of the intermediate processing data to the second execution node, so that the second execution node accesses the network path and executes a second task on the intermediate processing data to obtain a processing result.
Because the second executing node is not the same as the first executing node, the task manager returns a network path of the intermediate processing data to the second executing node, and the network path may include storage information of the intermediate processing data. The storage information comprises node information of a first execution node storing the intermediate processing data and a local path of the first execution node to the intermediate processing data.
In some embodiments, the network path may be an HTTP address to access the intermediate process data, and the HTTP address may indicate node information of the first execution node of the intermediate process data storage and a local path of the first execution node storing the intermediate process data.
In some embodiments, the second execution node may obtain the intermediate processing data stored under the local path of the first execution node by using the network path, and store the intermediate processing data under the local path of the second execution node, that is, copy the intermediate processing data under the local path of the second execution node by accessing the network path.
And the second execution node reads the intermediate processing data stored in the local path, executes a second task on the intermediate processing data by using the second execution node, namely after executing the Reduce task, obtains a processing result, and outputs the processing result of the task processing request.
In this embodiment, whether the first executing node and the second executing node are the same node is judged, and if yes, a local path of intermediate processing data is returned to the second executing node, so that the second executing node accesses the local path to execute a second task on the intermediate processing data; if not, returning the network path of the intermediate processing data to the second execution node, so that the second execution node accesses the network path and executes a second task on the intermediate processing data, thereby reducing the transmission of the intermediate processing data in the network and the pressure on the network environment in the running process of the task processing request. In addition, the copy of the intermediate processing of the second execution node on the first execution node is reduced, the read-write pressure of a disk can be reduced, and the execution speed of the task can be increased, so that the task processing efficiency of the data processing platform is improved.
For the above embodiments, the present application further provides a processing device of a data processing platform, please refer to fig. 6, and fig. 6 is a schematic structural diagram of an embodiment of the processing device of the data processing platform of the present application. The processing means 20 of the data processing platform comprises a receiving module 21, a determining module 22, a scheduling module 23, a first executing module 24 and a second executing module 25.
The receiving module 21 is configured to receive a task processing request for data to be processed.
The determination module 22 is configured to determine a managed execution node that satisfies the task processing request.
The scheduling module 23 is configured to perform affinity scheduling on a first task that executes the task processing request based on the managed execution node to allocate the first execution node. And the affinity scheduling module is used for performing affinity scheduling on a second task for executing the task processing request based on the control execution node so as to allocate the second execution node.
The first executing module 24 is configured to execute a first task on the data to be processed by using the first executing node, so as to obtain intermediate processing data.
The second executing module 25 is configured to execute a second task on the intermediate processing data by using a second executing node, so as to obtain a processing result.
In some embodiments, the data processing platform comprises a programming model platform for data processing; the first task comprises performing a mapping process; the second task includes performing a reduction calculation.
In some embodiments, the determining module 22 is configured to determine a governing execution node that satisfies the task processing request, and includes: calling a management container corresponding to the task processing request by using the task manager, and setting the affinity information of the nodes for the management container; and calling the nodes meeting the preset conditions as the control execution nodes meeting the task processing request by utilizing the affinity information of the nodes.
In some embodiments, the scheduling module 23 is configured to perform affinity scheduling on the first task executing the task processing request based on the governing execution node to allocate the first execution node, and includes: and if the management and control execution node meets the first task, taking the management and control execution node as a first execution node.
In some embodiments, initiating a first task with a task manager to invoke a first task container and set affinity information for a node for the first task container; and if the management and control execution node corresponding to the task manager meets the first task, scheduling the first task to the management and control execution node so as to take the management and control execution node as the first execution node.
In some embodiments, the scheduling module 23 is configured to perform affinity scheduling on the second task executing the task processing request based on the governing execution node to allocate the second execution node, and includes: and if the management and control execution node meets the second task, taking the management and control execution node as a second execution node.
In some embodiments, initiating the second task with a task manager to invoke a second task container and set affinity information for a node for the second task container; and if the control execution node corresponding to the task manager meets the second task, scheduling the second task to the control execution node so as to take the control execution node as the second execution node.
In some embodiments, the first executing module 24 is configured to execute the first task on the data to be processed by using the first executing node, and obtain intermediate processing data, and includes: the first task is executed by the first execution node to obtain intermediate processing data, and the intermediate processing data is stored under a local path of the first execution node.
In some embodiments, the second executing module 25 is configured to execute the second task on the intermediate processing data by using the second executing node, and obtain the processing result, including: judging whether the first execution node and the second execution node are the same node or not; if so, returning the local path of the intermediate processing data to the second execution node, so that the second execution node accesses the local path and executes a second task on the intermediate processing data; if not, returning the network path of the intermediate processing data to the second execution node, so that the second execution node accesses the network path and executes the second task on the intermediate processing data. Wherein the local path comprises a local path of the first executing node.
In some embodiments, the second execution module 25 is configured to read intermediate processing data stored in the local path of the second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result.
In some embodiments, the second execution module 25 is configured to obtain the intermediate processing data stored under the local path of the first execution node by using the network path, and store the intermediate processing data under the local path of the second execution node; and executing a second task on the intermediate processing data by using a second execution node to obtain a processing result.
In some embodiments, the second execution module 25 is configured to determine whether the first execution node and the second execution node are the same node, and includes: sending the node information of the second execution node to a task manager; and obtaining storage information of the intermediate processing data from the task manager, wherein the storage information comprises node information of a first execution node storing the intermediate processing data; judging whether the second execution node and the first execution node are the same node or not by utilizing the node information of the second execution node and the storage information of the intermediate processing data; and if the node information of the first execution node in the storage information is consistent with the node information of the second execution node, judging that the second execution node is the same as the first execution node.
The specific implementation of this embodiment can refer to the implementation process of the above embodiment, and is not described herein again.
With respect to the above embodiments, the present application provides a computer device, please refer to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the computer device of the present application. The computer device 30 comprises a memory 31 and a processor 32, wherein the memory 31 and the processor 32 are coupled to each other, the memory 31 stores program data, and the processor 32 is configured to execute the program data to implement the steps of any of the embodiments of the processing method of the data processing platform.
In the present embodiment, the processor 32 may also be referred to as a CPU (Central Processing Unit). The processor 32 may be an integrated circuit chip having signal processing capabilities. The processor 32 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor 32 may be any conventional processor or the like.
The specific implementation of this embodiment can refer to the implementation process of the above embodiment, and is not described herein again.
For the method of the above embodiment, it can be implemented in the form of a computer program, so that the present application provides a storage device, please refer to fig. 8, where fig. 8 is a schematic structural diagram of an embodiment of the storage device of the present application. The storage device 40 stores therein program data 41 executable by a processor, and the program data 41 is executable by the processor to implement the steps of any of the embodiments of the processing method of the data processing platform.
The specific implementation of this embodiment can refer to the implementation process of the above embodiment, and is not described herein again.
The storage device 40 of the present embodiment may be a medium that can store the program data 41, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may be a server that stores the program data 41, and the server may transmit the stored program data 41 to another device for operation, or may operate the stored program data 41 by itself.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a storage device, which is a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing an electronic device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only an example of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (12)

1. A method of processing data processing platforms, the method comprising:
receiving a task processing request of data to be processed;
determining a management and control execution node meeting the task processing request;
performing affinity scheduling on a first task executing the task processing request based on the control execution node to allocate a first execution node;
executing the first task on the data to be processed by using the first execution node to obtain intermediate processing data;
performing affinity scheduling on a second task executing the task processing request based on the control execution node to allocate a second execution node;
and executing the second task on the intermediate processing data by using the second execution node to obtain a processing result.
2. The processing method of claim 1, wherein said performing the second task on intermediate processed data with the second execution node comprises:
judging whether the first execution node and the second execution node are the same node;
if so, returning a local path of the intermediate processing data to the second execution node, so that the second execution node accesses the local path and executes the second task on the intermediate processing data;
if not, returning the network path of the intermediate processing data to the second execution node, so that the second execution node accesses the network path and executes the second task on the intermediate processing data.
3. The processing method of claim 2, wherein the local path comprises a local path of a first executing node; the executing the first task on the data to be processed by using the first executing node to obtain intermediate processing data includes:
and executing the first task by using the first execution node to obtain the intermediate processing data, and storing the intermediate processing data under a local path of the first execution node.
4. The method of claim 3, wherein the second executing node accesses the local path and executes the second task on the intermediate process data, comprising:
reading the intermediate processing data stored under the local path of the second execution node;
and executing the second task on the intermediate processing data by using the second execution node to obtain the processing result.
5. The method of claim 3, wherein the second executing node accesses the network path and executes the second task on the intermediate process data, comprising:
acquiring the intermediate processing data stored under the local path of the first execution node by using the network path, and storing the intermediate processing data under the local path of the second execution node;
and executing the second task on the intermediate processing data by using the second execution node to obtain the processing result.
6. The processing method according to claim 2, wherein the determining a managed execution node that satisfies the task processing request includes:
calling a management container corresponding to the task processing request by using a task manager, and setting the affinity information of the nodes for the management container;
and calling the nodes meeting preset conditions as the control execution nodes meeting the task processing request by utilizing the affinity information of the nodes.
7. The method of claim 6, wherein the determining whether the first executing node and the second executing node are the same node comprises:
sending the node information of the second execution node to the task manager; and
obtaining storage information of the intermediate processing data from the task manager, wherein the storage information comprises node information of a first execution node storing the intermediate processing data;
judging whether the second execution node and the first execution node are the same node or not by utilizing the node information of the second execution node and the storage information of the intermediate processing data; and if the node information of the first execution node in the storage information is consistent with the node information of the second execution node, determining that the second execution node and the first execution node are the same node.
8. The processing method according to claim 6,
the affinity scheduling, based on the management and control execution node, a first task executing the task processing request to allocate a first execution node, including:
if the management and control execution node meets the first task, taking the management and control execution node as the first execution node;
the affinity scheduling, based on the management and control execution node, a second task executing the task processing request to allocate a second execution node, including:
and if the management and control execution node meets the second task, taking the management and control execution node as the second execution node.
9. The processing method according to claim 8,
if the managed execution node satisfies the first task, taking the managed execution node as the first execution node includes:
initiating the first task by using the task manager to call a first task container and set the affinity information of the node for the first task container;
if the management and control execution node corresponding to the task manager meets the first task, scheduling the first task to the management and control execution node so as to take the management and control execution node as the first execution node;
if the management and control execution node satisfies the second task, taking the management and control execution node as the second execution node, including:
initiating the second task by using the task manager to call a second task container and setting the affinity information of the node for the second task container;
and if the control execution node corresponding to the task manager meets the second task, scheduling the second task to the control execution node, so that the control execution node is used as the second execution node.
10. The method of claim 1,
the data processing platform comprises a programming model platform for data processing;
the first task comprises performing a mapping process; the second task includes performing a reduction calculation.
11. A computer device comprising a memory and a processor coupled to each other, the memory having stored therein program data for execution by the processor to perform the steps of the method of any one of claims 1 to 10.
12. A storage device, characterized by program data stored therein which can be executed by a processor for carrying out the steps of the method according to any one of claims 1 to 10.
CN202210199249.4A 2022-03-02 2022-03-02 Processing method of data processing platform, computer equipment and storage device Pending CN114741165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210199249.4A CN114741165A (en) 2022-03-02 2022-03-02 Processing method of data processing platform, computer equipment and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210199249.4A CN114741165A (en) 2022-03-02 2022-03-02 Processing method of data processing platform, computer equipment and storage device

Publications (1)

Publication Number Publication Date
CN114741165A true CN114741165A (en) 2022-07-12

Family

ID=82274577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210199249.4A Pending CN114741165A (en) 2022-03-02 2022-03-02 Processing method of data processing platform, computer equipment and storage device

Country Status (1)

Country Link
CN (1) CN114741165A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176728A (en) * 2023-07-04 2023-12-05 北京百星电子系统有限公司 Industrial Internet of things dispatching method and dispatching system based on cloud edge cooperative technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176728A (en) * 2023-07-04 2023-12-05 北京百星电子系统有限公司 Industrial Internet of things dispatching method and dispatching system based on cloud edge cooperative technology

Similar Documents

Publication Publication Date Title
CN110764901B (en) Data processing method based on GPU (graphics processing Unit) resources, electronic equipment and system
US10965733B2 (en) Efficient, automated distributed-search methods and systems
CN113641457B (en) Container creation method, device, apparatus, medium, and program product
WO2020258290A1 (en) Log data collection method, log data collection apparatus, storage medium and log data collection system
CN109117252B (en) Method and system for task processing based on container and container cluster management system
CN109729106B (en) Method, system and computer program product for processing computing tasks
CN108287708B (en) Data processing method and device, server and computer readable storage medium
US11095531B2 (en) Service-aware serverless cloud computing system
Lin et al. ABS-YARN: A formal framework for modeling Hadoop YARN clusters
CN110166507B (en) Multi-resource scheduling method and device
CN113535363A (en) Task calling method and device, electronic equipment and storage medium
US20190258534A1 (en) Message oriented middleware with integrated rules engine
Bok et al. An efficient MapReduce scheduling scheme for processing large multimedia data
CN111078516A (en) Distributed performance test method and device and electronic equipment
CN113010265A (en) Pod scheduling method, scheduler, memory plug-in and system
WO2021086693A1 (en) Management of multiple physical function non-volatile memory devices
US8543722B2 (en) Message passing with queues and channels
CN114741165A (en) Processing method of data processing platform, computer equipment and storage device
CN110209466B (en) Resource allocation method, resource allocation device and storage medium
CN115878333A (en) Method, device and equipment for judging consistency between process groups
CN110825536A (en) Communication method and device between tasks in embedded real-time operating system
US8561077B1 (en) Binder for a multi-threaded process to access an un-shareable resource
CN114675954A (en) Task scheduling method and device
CN111752728B (en) Message transmission method and device
WO2018089339A1 (en) Method and system for affinity load balancing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination