CN115061790B - SPARK KMEANS core allocation method and system for ARM two-way server - Google Patents

SPARK KMEANS core allocation method and system for ARM two-way server Download PDF

Info

Publication number
CN115061790B
CN115061790B CN202210652744.6A CN202210652744A CN115061790B CN 115061790 B CN115061790 B CN 115061790B CN 202210652744 A CN202210652744 A CN 202210652744A CN 115061790 B CN115061790 B CN 115061790B
Authority
CN
China
Prior art keywords
mode
task
spark
kmeans
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210652744.6A
Other languages
Chinese (zh)
Other versions
CN115061790A (en
Inventor
王晓飞
魏健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210652744.6A priority Critical patent/CN115061790B/en
Publication of CN115061790A publication Critical patent/CN115061790A/en
Application granted granted Critical
Publication of CN115061790B publication Critical patent/CN115061790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses SPARK KMEANS core allocation method and system for an ARM two-way server, wherein warmup scripts are added at a Driver end, and a first calculation mode and a second calculation mode are submitted in sequence; and applying for allocation Executor processes to the resource manager by the Spark Context according to the current computing mode, and defining a first core allocation mode and a second core allocation mode. And submitting the first task or the second task according to the current computing mode. Executing the current task and acquiring task execution time under different calculation modes; comparing task execution time under different calculation modes, and determining the calculation mode with the shortest time; all iterative computations are completed using this computation pattern. The system comprises: the system comprises a judging module, a computing mode submitting module, a Spark Context creating module, a core allocation mode determining module, a resource allocation module, a task submitting module, a task execution time obtaining module, a computing mode determining module and an iterative computing module. The application can effectively improve the calculation efficiency of Spark.

Description

SPARK KMEANS core allocation method and system for ARM two-way server
Technical Field
The application relates to the technical field of SPARK KMEANS (a clustering model in machine learning and adopting KMeans algorithm to calculate clustering center points) computing performance in an ARM (ADVANCED RISC MACHINES, RISC microprocessor) server, in particular to a SPARK KMEANS core distribution method and a SPARK KMEANS core distribution system for an ARM two-way server.
Background
With the development of artificial intelligence technology, the demands of users for artificial intelligence and high-performance computing are increasing, and the application of ARM servers is becoming wider due to the large number of RISC (Reduced Instruction Set Computer ) processors, related technologies and software, which are high in cost performance and low in energy consumption. Each cloud service provider has successively introduced servers based on the ARM architecture. In the process of accessing a memory or a hard disk, how to perform SPARK KMEANS task core allocation of the main stream ARM two-way server and what kind of SPARK KMEANS iterative calculation mode are important factors influencing multiple cores and high energy consumption of the ARM two-way server.
Currently, the method for performing SPARK KMEANS task core allocation and iterative computation in an ARM two-way server generally uses CCIX protocol to realize cross-path access, and the protocol is based on PCIe (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, a high-speed serial computer expansion bus standard) standard. As can be seen from fig. 1, the method can be seen from fig. 1, a Spark Context (a main entry of a Spark function) is created first, then the Spark Context applies for allocation Executor (executor) resources to a resource manager, the allocated Executor process applies for tasks to the Spark Context, and the Spark Context distributes Kmeans application programs to Executor. The Spark Context is constructed into a DAG (DIRECTED ACYCLIC GRAPH loop-free directed graph) graph, and the DAG graph is decomposed into task sets and sent to a task scheduler; and finally, the task dispatcher sends the task to Executor for operation. That is, the task core allocation in the prior art is self-allocated with the operating system, and is generally distributed according to the balanced load.
However, compared to the Intel dual-path server, the bandwidth and delay of the cross-path memory access and the cross-path hard disk access in the ARM dual-path server have a larger gap, especially in applications that use a large amount of memory like Spark (big data memory computing component, which is used for online real-time computing), the bandwidth and delay of the cross-path memory access and the cross-path hard disk access are larger, and this larger gap results in poorer cross-path performance, so that Spark computing efficiency is lower.
Disclosure of Invention
The application provides SPARK KMEANS core allocation method and system for an ARM two-way server, which are used for solving the problem that Spark calculation efficiency is low due to the method in the prior art.
In order to solve the technical problems, the embodiment of the application discloses the following technical scheme:
a SPARK KMEANS core allocation method for an ARM two-way server, the method comprising:
judging whether the core number of the current SPARK KMEANS task is larger than or equal to the core number of a CPU0 in the ARM two-way server, wherein the CPU0 is a CPU directly connected with a hard disk;
If not, directly distributing the current SPARK KMEANS tasks to the CPU0, and completing all iterative calculations of SPARK KMEANS;
If yes, adding warmup (data preheating) scripts at a Driver (driving) end, and sequentially submitting a first calculation mode and a second calculation mode by running the warmup scripts, wherein the first calculation mode is as follows: setting the current SPARK KMEANS task core number as the core number of the CPU0, wherein the second calculation mode is as follows: keeping the current SPARK KMEANS task cores unchanged;
creating Spark Context at the Driver end according to the SPARK KMEANS tasks;
According to the current computing mode, the Spark Context applies for allocation Executor processes to a resource manager, and defines a first core allocation mode and a second core allocation mode, wherein the mode of totally allocating the current Executor processes to CPU0 is defined as a first core allocation mode, the mode of firstly allocating the current Executor processes to CPU0 and reallocating the remaining Executor processes to CPU1 is defined as a second core allocation mode;
according to the current core allocation mode, executor processes are allocated to corresponding CPUs;
submitting a first task or a second task through a task scheduler according to the current computing mode, wherein the first task is matched with the first computing mode, and the second task is matched with the second computing mode;
Executing a current task, and acquiring task execution time under different calculation modes, wherein the task execution time comprises the following steps: calculating the running time and the cross-path data transmission time;
comparing task execution time under different calculation modes, and determining the calculation mode with the shortest time;
all iterative calculations of SPARK KMEANS are completed at Executor using the shortest-in-time calculation mode.
Optionally, by running the warmup script, the method of sequentially submitting the first computing mode and the second computing mode includes:
Defining a first calculation mode and a second calculation mode at a Driver end according to the SPARK KMEANS tasks;
and performing first iteration on the first computing mode and the second computing mode, and submitting the first computing mode and the second computing mode in sequence.
Optionally, the allocating Executor processes to the corresponding CPUs according to the current core allocation mode includes:
if the current core allocation mode is the first core allocation mode, all Executor processes are allocated to the CPU0;
If the current core allocation pattern is the second core allocation pattern, then Executor processes are allocated to CPU0 and the remaining Executor processes are allocated to CPU1.
Optionally, the submitting, according to the current computing mode, the first task or the second task through the task scheduler includes:
The Executor processes distributed in the CPU send requests for applying tasks to Spark Context;
the Spark Context distributes the Kmeans application program to Executor according to the request;
The Spark Context is constructed into a DAG graph, and the DAG graph is decomposed into task sets and sent to a task scheduler;
If the current computing mode is a first computing mode, the task scheduler marks the task set as a first task, and the first task uses a first core allocation mode;
If the current computing mode is a second computing mode, the task scheduler marks the task set as a second task, and the second task uses the first core allocation mode;
The first task or the second task is sent to Executor.
Optionally, the executing the current task, obtaining task execution time in different computing modes includes:
if the current task is the first task, acquiring the calculation running time in the first calculation mode;
And if the current task is the second task, acquiring the calculation running time and the cross-road data transmission time in the second calculation mode, and summing the calculation running time and the cross-road data transmission time.
Optionally, after performing a first iteration on the first computing mode and the second computing mode and submitting the first computing mode and the second computing mode in sequence, the method further includes:
performing a second iteration on the first computing mode and the second computing mode;
And sequentially submitting the first calculation mode and the second calculation mode after the second iteration until the current task is executed, and acquiring task execution time under different calculation modes after the two iterations.
Optionally, the executing the current task, and acquiring task execution time in different calculation modes after two iterations includes:
for the first calculation mode, averaging task time after two iterations to serve as task execution time in the first calculation mode;
And averaging task time after two iterations aiming at the second calculation mode to be used as task execution time in the second calculation mode.
Optionally, after all iterative computations of SPARK KMEANS are completed at Executor using the shortest-in-time computation mode, the method further includes:
All SPARK KMEANS computing resources running on Executor are released.
A SPARK KMEANS core distribution system for an ARM two-way server, the system comprising:
The judging module is used for judging whether the core number of the current SPARK KMEANS task is larger than or equal to that of a CPU0 in the ARM two-way server, if not, directly distributing the current SPARK KMEANS task to the CPU0 to complete all iterative computation of SPARK KMEANS, wherein the CPU0 is a CPU directly connected with a hard disk, and if so, starting a computation mode submitting module;
The calculation mode submitting module is used for adding warmup scripts at the Driver end when the core number of the current SPARK KMEANS tasks is smaller than that of CPU0 in the ARM two-way server, and submitting a first calculation mode and a second calculation mode in sequence by running the warmup scripts, wherein the first calculation mode is as follows: setting the current SPARK KMEANS task core number as the core number of the CPU0, wherein the second calculation mode is as follows: keeping the current SPARK KMEANS task cores unchanged;
the Spark Context creation module is used for creating Spark Context at the Driver end according to the SPARK KMEANS tasks acquired;
The core allocation mode determining module is configured to apply for allocation Executor of processes to the resource manager through the Spark Context according to the current computing mode, and define a first core allocation mode and a second core allocation mode, where a mode of allocating all current Executor processes to CPU0 is defined as a first core allocation mode, a mode of allocating current Executor processes to CPU0 first, and a mode of reallocating remaining Executor processes to CPU1 is defined as a second core allocation mode;
The resource allocation module is used for allocating Executor processes to corresponding CPUs according to the current core allocation mode;
the task submitting module is used for submitting a first task or a second task through the task scheduler according to the current computing mode, wherein the first task is matched with the first computing mode, and the second task is matched with the second computing mode;
The task execution time acquisition module is used for executing the current task and acquiring task execution time under different calculation modes, wherein the task execution time comprises the following steps: calculating the running time and the cross-path data transmission time;
The computing mode determining module is used for comparing task execution time under different computing modes and determining the computing mode with the shortest time;
and the iterative calculation module is used for completing all iterative calculations of SPARK KMEANS by using the shortest calculation mode.
Optionally, the computing mode submitting module includes:
The computing mode defining unit is used for defining a first computing mode and a second computing mode at the Driver end according to the obtained SPARK KMEANS tasks;
The first iterative computing unit is used for carrying out first iteration on the first computing mode and the second computing mode and submitting the first computing mode and the second computing mode in sequence.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
The application provides a SPARK KMEANS core allocation method for an ARM two-way server, which mainly aims at the situation that the core number of SPARK KMEANS tasks is larger than or equal to that of CPU0 in the ARM two-way server, and when the condition is not met, the current SPARK KMEANS task is directly allocated to the CPU0 to finish all iterative calculations of SPARK KMEANS. When the condition is met, a warmup script is added at a Driver end, and a first calculation mode and a second calculation mode are sequentially submitted by running a warmup script; and then creating Spark Context at the Driver according to the obtained SPARK KMEANS tasks, applying for allocation Executor processes to the resource manager according to the current calculation mode by the Spark Context, and defining a first core allocation mode and a second core allocation mode. Then, according to the current core allocation mode, executor processes are allocated to corresponding CPUs; the first task or the second task is submitted by the task scheduler according to the current computing mode. Then executing the current task and obtaining task execution time under different calculation modes; comparing task execution time under different calculation modes, and determining the calculation mode with the shortest time; finally, all iterative computations of SPARK KMEANS are completed at Executor using the shortest-in-time mode of computation.
In the embodiment, by adding warmup scripts at the Driver end, submitting two calculation modes, iterating the two calculation modes only once, comparing the running time of single-path and cross-path transmission, finally determining the calculation mode with the shortest time, and completing all iterative calculation of SPARK KMEANS by using the calculation mode with the shortest time. Compared with the prior art in a self-distribution mode along with an operation system or in a balanced load distribution mode, the method can combine the current practical application scene, and the shortest calculation mode in real use is determined through comparison by utilizing one-time iterative calculation, so that the Spark calculation efficiency can be effectively improved. In this embodiment, the first calculation mode and the second calculation mode may be iterated for the second time, and for each calculation mode, the task execution time average value of the two iterations is calculated respectively, and by comparing the two average values, the calculation mode with the shortest time is determined.
The application also provides a SPARK KMEANS core distribution system for the ARM two-way server, which mainly comprises: the system comprises a judging module, a computing mode submitting module, a Spark Context creating module, a core allocation mode determining module, a resource allocation module, a task submitting module, a task execution time obtaining module, a computing mode determining module and an iterative computing module. And firstly determining the condition conforming to the system through a judging module, then adding warmup scripts at a Driver end through a calculation mode submitting module and running warmup scripts. In this embodiment, the calculation mode in the calculation mode submitting module and the core allocation mode in the core allocation mode determining module have corresponding relations respectively, so that the corresponding calculation mode is ensured to match with the corresponding core allocation mode, and finally, the task execution time required by executing the corresponding task in different calculation modes is accurately calculated, which is beneficial to improving the accuracy of the task execution time comparison result. In this embodiment, the calculation mode submitting module adopts the warmup script, which only needs to iterate the first calculation mode and the second calculation mode once, and submits the first calculation mode and the second calculation mode sequentially, so that the scheme can combine with the current actual application scene, and by using one iteration calculation, the calculation mode with the shortest actual time is determined through comparison, so that the Spark calculation efficiency can be effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic diagram of a method for performing SPARK KMEANS task core allocation in an ARM two-way server in the background art;
fig. 2 is a flow chart of a SPARK KMEANS core allocation method for an ARM two-way server according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a SPARK KMEANS core allocation method for an ARM two-way server according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a SPARK KMEANS core distribution system for an ARM two-way server according to an embodiment of the present application.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
For a better understanding of the present application, embodiments of the present application are explained in detail below with reference to the drawings.
Example 1
Referring to fig. 2, fig. 2 is a flow chart of a SPARK KMEANS core allocation method for an ARM two-way server according to an embodiment of the present application. As can be seen from fig. 1, the SPARK KMEANS core allocation method for the ARM two-way server in this embodiment mainly includes the following steps:
S0: judging whether the core number of the current SPARK KMEANS task is larger than or equal to the core number of the CPU0 in the ARM two-way server. Wherein, CPU0 is the CPU that links directly with the hard disk.
If the core number of the current SPARK KMEANS task is smaller than that of the CPU0 in the ARM two-way server, the current SPARK KMEANS task is directly distributed to the CPU0, and all iterative computation of SPARK KMEANS is completed. That is, when the current SPARK KMEANS task core number is smaller than the core number of the CPU0 in the ARM two-way server, the method in this embodiment is not applicable.
If the core number of the current SPARK KMEANS task is greater than or equal to the core number of the CPU0 in the ARM two-way server, executing step S1: adding warmup scripts at the Driver end, and sequentially submitting a first calculation mode and a second calculation mode by running warmup scripts. The first calculation mode is as follows: the current SPARK KMEANS task core number is set to the core number of CPU0, that is, the current SPARK KMEANS task core number is reduced to the same core number as CPU 0. The second calculation mode is: the number of cores of the current SPARK KMEANS tasks is kept unchanged.
Specifically, running warmup a script, the method for sequentially submitting a first computing mode and a second computing mode includes the following steps:
s11: and defining a first calculation mode and a second calculation mode at the Driver end according to the obtained SPARK KMEANS tasks.
S12: and performing first iteration on the first computing mode and the second computing mode, and submitting the first computing mode and the second computing mode in sequence.
Only one iteration is performed on both calculation modes, which is beneficial to saving time and improving the efficiency of SPARK KMEANS core allocation. It should be noted that the two calculation modes may not be submitted simultaneously, and need to be submitted sequentially, so as to avoid confusion, thereby ensuring accuracy of calculation mode determination.
With continued reference to fig. 2, it can be seen that, adding warmup scripts at the Driver end, and executing warmup scripts to sequentially submit the first computing mode and the second computing mode, and then executing step S2: and creating Spark Context at the Driver according to the obtained SPARK KMEANS tasks.
S3: the Spark Context applies for allocation Executor processes to the resource manager according to the current computing mode, and defines a first core allocation mode and a second core allocation mode.
The mode in which all current Executor processes are allocated to CPU0 is defined as a first core allocation mode, the mode in which current Executor processes are allocated to CPU0 first and the remaining Executor processes are allocated to CPU1 is defined as a second core allocation mode.
S4: and distributing Executor processes to corresponding CPUs according to the current core distribution mode.
Specifically, step S4 includes the following procedure:
S41: if the current core allocation pattern is the first core allocation pattern, then Executor processes are all allocated to CPU0.
S42: if the current core allocation pattern is the second core allocation pattern, then Executor processes are allocated to CPU0 and the remaining Executor processes are allocated to CPU1.
With continued reference to fig. 2, S5: the first task or the second task is submitted by the task scheduler according to the current computing mode.
Wherein the first task is matched with a first computing mode and the second task is matched with a second computing mode. Specifically, step S5 further includes the following steps:
s51: the Executor process distributed in the CPU sends a request for applying a task to the Spark Context.
S52: spark Context distributes Kmeans applications to Executor according to the request.
S53: the Spark Context is constructed into a DAG graph, and the DAG graph is decomposed into task sets and sent to a task scheduler.
S54: if the current computing mode is the first computing mode, the task scheduler marks the task set as a first task and the first task uses the first core allocation mode.
S55: if the current computing mode is the second computing mode, the task scheduler marks the task set as a second task and the second task uses the first core allocation mode.
S56: either the first task or the second task is sent to Executor.
With continued reference to fig. 2, after submitting the first task or the second task through the task scheduler according to the current computing mode, step S6 is performed: executing the current task, and acquiring task execution time under different calculation modes, wherein the task execution time in the embodiment comprises the following steps: and calculating the running time and the cross-path data transmission time.
Specifically, if the current task is the first task, acquiring the calculation running time in the first calculation mode, and taking the calculation running time as the task execution time in the first mode. And if the current task is the second task, acquiring the calculation running time and the cross-road data transmission time in the second calculation mode, and summing the calculation running time and the cross-road data transmission time to obtain a summation result as the task execution time in the second mode.
Further, after step S12 in this embodiment, step S13 is further included: a second iteration is performed on the first computing mode and the second computing mode.
S14: and sequentially submitting the first calculation mode and the second calculation mode after the second iteration until the current task is executed, and acquiring task execution time under different calculation modes after the two iterations.
Specifically, in step S14, the method for executing the current task and obtaining the task execution time in different calculation modes after two iterations is as follows:
S141: for the first calculation mode, averaging task time after two iterations to serve as task execution time in the first calculation mode;
S142: and averaging task time after two iterations aiming at the second calculation mode to be used as task execution time in the second calculation mode.
That is, for the first calculation mode, two task execution times are obtained after two iterations, and the two task execution times are averaged to be used as the final task execution time in the first calculation mode. The principle is the same for the second calculation mode. Then step S7 is performed. The warmup script shaft is added at the Driver end, and warmup scripts are run twice, which is equivalent to performing two iterations, and finally, the obtained task execution time under different calculation modes is closer to the actual task execution time, so that the accuracy of the task execution time under different calculation modes is improved, a basis is provided for the calculation mode with the shortest time for subsequent determination, the calculation mode with the shortest time for more accurate determination is facilitated, and the calculation efficiency of Spark is further improved.
With continued reference to fig. 2, after acquiring the task execution time in the different computing modes, step S7 is executed: and comparing task execution time under different calculation modes, and determining the calculation mode with the shortest time.
S8: all iterative calculations of SPARK KMEANS are completed at Executor using the shortest-in-time calculation mode.
Further, the SPARK KMEANS core allocation method for the ARM two-way server in this embodiment further includes step S9: all SPARK KMEANS computing resources running on Executor are released.
By releasing the computing resources, the resource space can be saved, and the computing efficiency of Spark can be improved.
The principle of SPARK KMEANS core allocation method for the ARM two-way server in this embodiment in practical application can be seen in fig. 3. As can be seen from fig. 3, the following steps may be adopted in practical application to implement SPARK KMEANS core allocation for the ARM two-way server.
1) According to section 0.1 of fig. 3, warmup functions are added at the Driver end.
After SPARK KMEANS tasks are received, reducing the total core number of the tasks to the core number of CPU0 (CPU directly connected with a hard disk), and determining the core number as a calculation mode 1; the total core number of the task is not changed, the task is defined as a calculation mode 2, and both calculation modes are only iterated once and submitted in sequence.
2) According to section 1.1 in fig. 3, the Driver end creates SparkContext according to the task; sparkContext applies for allocation Executor of resources to the resource manager; if the mode is the calculation mode 1, the resource manager distributes Executor to the CPU0 completely, and defines the core distribution mode 1; if the calculation mode 2 is adopted, the core number required by the task is firstly distributed to the CPU0, the rest cores required by the task are distributed to the CPU1, and the core distribution mode 2 is defined.
3) According to part 2.1 of fig. 3, different core allocation patterns are determined, and a Executor process is acquired.
4) The assigned Executor process applies for tasks to SparkContext and SparkContext distributes Kmeans applications to Executor.
5) SparkContext to construct a DAG graph, decomposing the DAG graph into task sets, and sending the task sets to a task scheduler.
6) According to section 3.1 of FIG. 3, if in computing mode 1, the task scheduler marks the task set as task 1, task 1 monitors computing runtime T1 using core allocation mode 1. If the task is in the calculation mode 2, the task scheduler marks the task set as the task 2, the task 2 uses the core allocation mode 2 to record the calculation operation time T2 and the cross-path data transmission time T, and the task is sent to Executor for operation.
7) Task 1 and task 2 are completed at Executor, respectively.
8) According to the 4.1 part in fig. 3, if T1 is smaller than t2+t, submitting the calculation tasks using the calculation mode 1 to complete all iterations to the driver, and if T1 is greater than or equal to t2+t, submitting the calculation tasks using the calculation mode 2 to complete all iterations to the driver, and releasing all resources after running.
Example two
Referring to fig. 4 on the basis of the embodiments shown in fig. 2 and fig. 3, fig. 4 is a schematic structural diagram of a SPARK KMEANS core distribution system for an ARM two-way server according to an embodiment of the present application. As can be seen from fig. 4, the SPARK KMEANS core distribution system for the ARM two-way server in this embodiment mainly includes: the system comprises a judging module, a computing mode submitting module, a Spark Context creating module, a core allocation mode determining module, a resource allocation module, a task submitting module, a task execution time obtaining module, a computing mode determining module and an iterative computing module.
The judging module is used for judging whether the core number of the current SPARK KMEANS task is larger than or equal to that of a CPU0 in the ARM two-way server, if not, the current SPARK KMEANS task is directly distributed to the CPU0, all iterative computation of SPARK KMEANS is completed, the CPU0 is a CPU directly connected with a hard disk, and if so, the computing mode submitting module is started; the computing mode submitting module is used for adding warmup scripts at the Driver end when the core number of the current SPARK KMEANS tasks is smaller than that of CPU0 in the ARM two-way server, and sequentially submitting a first computing mode and a second computing mode by running warmup scripts, wherein the first computing mode is as follows: setting the current SPARK KMEANS task core number as the core number of the CPU0, wherein the second calculation mode is as follows: keeping the current SPARK KMEANS task cores unchanged; the Spark Context creation module is used for creating Spark Context at the Driver end according to the SPARK KMEANS tasks acquired; the core allocation mode determining module is configured to apply for allocation Executor of processes to the resource manager through Spark Context according to a current calculation mode, define a first core allocation mode and a second core allocation mode, wherein the mode of allocating all current Executor processes to CPU0 is defined as a first core allocation mode, the mode of allocating current Executor processes to CPU0 first and the mode of reallocating remaining Executor processes to CPU1 is defined as a second core allocation mode; the resource allocation module is used for allocating Executor processes to corresponding CPUs according to the current core allocation mode; the task submitting module is used for submitting a first task or a second task through the task scheduler according to the current computing mode, wherein the first task is matched with the first computing mode, and the second task is matched with the second computing mode; the task execution time acquisition module is used for executing the current task and acquiring task execution time under different calculation modes, wherein the task execution time comprises the following steps of: calculating the running time and the cross-path data transmission time; the computing mode determining module is used for comparing task execution time under different computing modes and determining the computing mode with the shortest time; and the iterative calculation module is used for completing all iterative calculations of SPARK KMEANS by using the shortest calculation mode.
Further, the computing mode submitting module includes: a calculation mode definition unit and a first iterative calculation unit. The computing mode defining unit is used for defining a first computing mode and a second computing mode at the Driver end according to the obtained SPARK KMEANS tasks; the first iterative computing unit is used for carrying out first iteration on the first computing mode and the second computing mode and submitting the first computing mode and the second computing mode in sequence.
The computing mode submission module further includes: and the second iteration calculation unit is used for carrying out second iteration on the first calculation mode and the second calculation mode, and sequentially submitting the first calculation mode and the second calculation mode after the second iteration.
The parts of this embodiment that are not described in detail can be referred to as first embodiment in fig. 2 and 3, and reference may be made to the two embodiments, which are not described in detail herein.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A SPARK KMEANS core allocation method for an ARM two-way server, the method comprising:
judging whether the core number of the current SPARK KMEANS task is larger than or equal to the core number of a CPU0 in the ARM two-way server, wherein the CPU0 is a CPU directly connected with a hard disk;
If not, directly distributing the current SPARK KMEANS tasks to the CPU0, and completing all iterative calculations of SPARK KMEANS;
If yes, adding warmup scripts at the Driver end, and sequentially submitting a first calculation mode and a second calculation mode by running the warmup scripts, wherein the first calculation mode is as follows: setting the current SPARK KMEANS task core number as the core number of the CPU0, wherein the second calculation mode is as follows: keeping the current SPARK KMEANS task cores unchanged;
creating Spark Context at the Driver end according to the SPARK KMEANS tasks;
According to the current computing mode, the Spark Context applies for allocation Executor processes to a resource manager, and defines a first core allocation mode and a second core allocation mode, wherein the mode of totally allocating the current Executor processes to CPU0 is defined as a first core allocation mode, the mode of firstly allocating the current Executor processes to CPU0 and reallocating the remaining Executor processes to CPU1 is defined as a second core allocation mode;
according to the current core allocation mode, executor processes are allocated to corresponding CPUs;
submitting a first task or a second task through a task scheduler according to the current computing mode, wherein the first task is matched with the first computing mode, and the second task is matched with the second computing mode;
Executing a current task, and acquiring task execution time under different calculation modes, wherein the task execution time comprises the following steps: calculating the running time and the cross-path data transmission time;
comparing task execution time under different calculation modes, and determining the calculation mode with the shortest time;
all iterative calculations of SPARK KMEANS are completed at Executor using the shortest-in-time calculation mode.
2. The SPARK KMEANS core allocation method for an ARM two-way server according to claim 1, wherein the method for sequentially submitting the first computing mode and the second computing mode by running the warmup script comprises:
Defining a first calculation mode and a second calculation mode at a Driver end according to the SPARK KMEANS tasks;
and performing first iteration on the first computing mode and the second computing mode, and submitting the first computing mode and the second computing mode in sequence.
3. The SPARK KMEANS core allocation method for an ARM two-way server according to claim 1, wherein said allocating Executor processes to the corresponding CPUs according to the current core allocation mode includes:
if the current core allocation mode is the first core allocation mode, all Executor processes are allocated to the CPU0;
If the current core allocation pattern is the second core allocation pattern, then Executor processes are allocated to CPU0 and the remaining Executor processes are allocated to CPU1.
4. The SPARK KMEANS core allocation method for an ARM two-way server according to claim 1, wherein said submitting the first task or the second task through the task scheduler according to the current calculation mode includes:
The Executor processes distributed in the CPU send requests for applying tasks to Spark Context;
the Spark Context distributes the Kmeans application program to Executor according to the request;
The Spark Context is constructed into a DAG graph, and the DAG graph is decomposed into task sets and sent to a task scheduler;
If the current computing mode is a first computing mode, the task scheduler marks the task set as a first task, and the first task uses a first core allocation mode;
If the current computing mode is a second computing mode, the task scheduler marks the task set as a second task, and the second task uses the first core allocation mode;
The first task or the second task is sent to Executor.
5. The SPARK KMEANS core allocation method for an ARM two-way server according to claim 1, wherein the executing the current task and obtaining task execution times in different computing modes includes:
if the current task is the first task, acquiring the calculation running time in the first calculation mode;
And if the current task is the second task, acquiring the calculation running time and the cross-road data transmission time in the second calculation mode, and summing the calculation running time and the cross-road data transmission time.
6. The SPARK KMEANS core allocation method for an ARM two-way server of claim 2, wherein after performing a first iteration on the first computing mode and the second computing mode and submitting the first computing mode and the second computing mode in sequence, the method further comprises:
performing a second iteration on the first computing mode and the second computing mode;
And sequentially submitting the first calculation mode and the second calculation mode after the second iteration until the current task is executed, and acquiring task execution time under different calculation modes after the two iterations.
7. The SPARK KMEANS core allocation method for an ARM two-way server according to claim 6, wherein the performing the current task and obtaining task execution times in different calculation modes after two iterations include:
for the first calculation mode, averaging task time after two iterations to serve as task execution time in the first calculation mode;
And averaging task time after two iterations aiming at the second calculation mode to be used as task execution time in the second calculation mode.
8. A SPARK KMEANS core allocation method for an ARM two-way server according to any one of claims 1-7, wherein after all iterative calculations of SPARK KMEANS are completed on Executor using the shortest-in-use calculation mode, the method further comprises:
All SPARK KMEANS computing resources running on Executor are released.
9. A SPARK KMEANS core distribution system for an ARM two-way server, the system comprising:
The judging module is used for judging whether the core number of the current SPARK KMEANS task is larger than or equal to that of a CPU0 in the ARM two-way server, if not, directly distributing the current SPARK KMEANS task to the CPU0 to complete all iterative computation of SPARK KMEANS, wherein the CPU0 is a CPU directly connected with a hard disk, and if so, starting a computation mode submitting module;
The calculation mode submitting module is used for adding warmup scripts at the Driver end when the core number of the current SPARK KMEANS tasks is smaller than that of CPU0 in the ARM two-way server, and submitting a first calculation mode and a second calculation mode in sequence by running the warmup scripts, wherein the first calculation mode is as follows: setting the current SPARK KMEANS task core number as the core number of the CPU0, wherein the second calculation mode is as follows: keeping the current SPARK KMEANS task cores unchanged;
the Spark Context creation module is used for creating Spark Context at the Driver end according to the SPARK KMEANS tasks acquired;
The core allocation mode determining module is configured to apply for allocation Executor of processes to the resource manager through the Spark Context according to the current computing mode, and define a first core allocation mode and a second core allocation mode, where a mode of allocating all current Executor processes to CPU0 is defined as a first core allocation mode, a mode of allocating current Executor processes to CPU0 first, and a mode of reallocating remaining Executor processes to CPU1 is defined as a second core allocation mode;
The resource allocation module is used for allocating Executor processes to corresponding CPUs according to the current core allocation mode;
the task submitting module is used for submitting a first task or a second task through the task scheduler according to the current computing mode, wherein the first task is matched with the first computing mode, and the second task is matched with the second computing mode;
The task execution time acquisition module is used for executing the current task and acquiring task execution time under different calculation modes, wherein the task execution time comprises the following steps: calculating the running time and the cross-path data transmission time;
The computing mode determining module is used for comparing task execution time under different computing modes and determining the computing mode with the shortest time;
and the iterative calculation module is used for completing all iterative calculations of SPARK KMEANS by using the shortest calculation mode.
10. The SPARK KMEANS core distribution system for an ARM two-way server of claim 9, wherein said computing mode commit module comprises:
The computing mode defining unit is used for defining a first computing mode and a second computing mode at the Driver end according to the obtained SPARK KMEANS tasks;
The first iterative computing unit is used for carrying out first iteration on the first computing mode and the second computing mode and submitting the first computing mode and the second computing mode in sequence.
CN202210652744.6A 2022-06-10 2022-06-10 SPARK KMEANS core allocation method and system for ARM two-way server Active CN115061790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210652744.6A CN115061790B (en) 2022-06-10 2022-06-10 SPARK KMEANS core allocation method and system for ARM two-way server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210652744.6A CN115061790B (en) 2022-06-10 2022-06-10 SPARK KMEANS core allocation method and system for ARM two-way server

Publications (2)

Publication Number Publication Date
CN115061790A CN115061790A (en) 2022-09-16
CN115061790B true CN115061790B (en) 2024-05-14

Family

ID=83200095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210652744.6A Active CN115061790B (en) 2022-06-10 2022-06-10 SPARK KMEANS core allocation method and system for ARM two-way server

Country Status (1)

Country Link
CN (1) CN115061790B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040407A (en) * 2017-03-15 2017-08-11 成都中讯创新科技股份有限公司 A kind of HPCC dynamic node operational method
CN107291550A (en) * 2017-06-22 2017-10-24 华中科技大学 A kind of Spark platform resources dynamic allocation method and system for iterated application
WO2019113508A1 (en) * 2017-12-07 2019-06-13 Fractal Industries, Inc. A system and methods for multi-language abstract model creation for digital environment simulations
WO2022001209A1 (en) * 2020-06-30 2022-01-06 深圳前海微众银行股份有限公司 Job execution method, apparatus and system, and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040407A (en) * 2017-03-15 2017-08-11 成都中讯创新科技股份有限公司 A kind of HPCC dynamic node operational method
CN107291550A (en) * 2017-06-22 2017-10-24 华中科技大学 A kind of Spark platform resources dynamic allocation method and system for iterated application
WO2019113508A1 (en) * 2017-12-07 2019-06-13 Fractal Industries, Inc. A system and methods for multi-language abstract model creation for digital environment simulations
WO2022001209A1 (en) * 2020-06-30 2022-01-06 深圳前海微众银行股份有限公司 Job execution method, apparatus and system, and computer-readable storage medium

Also Published As

Publication number Publication date
CN115061790A (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN111381950A (en) Task scheduling method and system based on multiple copies for edge computing environment
US10313265B1 (en) System and methods for sharing memory subsystem resources among datacenter applications
Zheng et al. Target-based resource allocation for deep learning applications in a multi-tenancy system
CN111800274B (en) Verifiable calculation energy consumption optimization method based on block chain
Ouyang et al. Straggler detection in parallel computing systems through dynamic threshold calculation
CN113296905A (en) Scheduling method, scheduling device, electronic equipment, storage medium and software product
CN109815021B (en) Resource key tree method and system for recursive tree modeling program
CN116263701A (en) Computing power network task scheduling method and device, computer equipment and storage medium
CN105740085A (en) Fault tolerance processing method and device
Xu et al. Laser: A deep learning approach for speculative execution and replication of deadline-critical jobs in cloud
CN114911612A (en) Task scheduling method for CPU-GPU heterogeneous resources
Martyshkin et al. Using queuing theory to describe adaptive mathematical models of computing systems with resource virtualization and its verification using a virtual server with a configuration similar to the configuration of a given model
CN115061790B (en) SPARK KMEANS core allocation method and system for ARM two-way server
Alsenani et al. ReMot reputation and resource-based model to estimate the reliability of the host machines in volunteer cloud environment
CN109324872B (en) Method and system for verifying virtual machine change request
US8966225B2 (en) Reducing number of processing units generating calculation result when communication time to another node takes longer than calculation time
Martyshkin et al. Queueing Theory to Describe Adaptive Mathematical Models of Computational Systems with Resource Virtualization and Model Verification by Similarly Configured Virtual Server
CN111930485B (en) Job scheduling method based on performance expression
Honda et al. Mapping method of matlab/simulink model for embedded many-core platform
CN106844024B (en) GPU/CPU scheduling method and system of self-learning running time prediction model
CN110928659B (en) Numerical value pool system remote multi-platform access method with self-adaptive function
Win et al. Optimized resource allocation model in cloud computing system
Ortiz et al. Affinity-based network interfaces for efficient communication on multicore architectures
US20220107817A1 (en) Dynamic System Parameter for Robotics Automation
CN111400013B (en) Method and system for processing data stream of multi-core processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant