CN114048045A

CN114048045A - Communication performance prediction method for communication competition among parallel application cores

Info

Publication number: CN114048045A
Application number: CN202111295681.5A
Authority: CN
Inventors: 肖利民; 王泽红; 韩萌; 徐向荣; 朱乃威; 常佳辉; 王志鹏
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-02-15
Anticipated expiration: 2041-11-03
Also published as: CN114048045B

Abstract

The invention discloses a communication performance prediction method for parallel application inter-core communication competition, which comprises the following steps: firstly, constructing a point-to-point communication performance model considering inter-core communication competition under a multi-core architecture; acquiring parallel application communication timing sequence information and process distribution conditions; measuring communication performance indexes in the application running environment according to the communication performance model; and fourthly, predicting parallel application communication overhead by combining the application communication time sequence. The method realizes the prediction of the parallel application communication performance in the multi-core architecture high-performance computing environment, and is beneficial to quickly and accurately describing the single communication overhead under the condition of inter-core communication competition, so that the communication overhead during the parallel application operation is accurately predicted, the optimization effect evaluation is provided for the parallel application communication optimization scheme, and the optimization of the parallel application communication is guided.

Description

Communication performance prediction method for communication competition among parallel application cores

The technical field is as follows:

the invention relates to a parallel application communication performance analysis and prediction method, in particular to a parallel application communication performance prediction method with inter-core communication competition in a high-performance computing environment running in a multi-core architecture.

Background art:

with the widespread application of multi-core architectures in modern parallel computing, high performance computing clusters have shifted from single-level networking, once single-processor, to more complex hierarchical structures. Generally, a high performance computing cluster is composed of a large number of nodes, each of which includes a plurality of multicore processors sharing a memory. Compared with a single-core node, the intra-node communication with lower cost can be carried out among a plurality of computing cores in the multi-core node, and when the inter-node communication is carried out, the plurality of computing cores can compete with each other for link bandwidth and network resources, so that additional communication cost can be caused.

As the size of the parallel application increases, the application communication overhead gradually becomes an important factor limiting the overall performance of the parallel application, and therefore, the optimization of the application communication performance can effectively help to optimize the overall performance of the application. Wherein, the evaluation of the effect of the optimization scheme is a key step in the scheme design process. Because the design and implementation of most optimization schemes are a repeated iteration process, although an accurate optimization effect under each iteration can be obtained by implementing a specific optimization scheme and applying the specific optimization scheme in parallel to a communication performance test under the scheme, repeated execution applied in the iteration process generates a large amount of overhead, and the design period of the optimization scheme is prolonged. As an efficient evaluation means, the application communication performance prediction avoids the application actual operation overhead in the design process of the optimization scheme, and provides a lower-cost and more accurate iteration scheme performance evaluation method for the design and implementation of the optimization scheme.

The existing application communication performance prediction method based on the point-to-point communication model can provide more accurate application communication performance prediction for a single-core node environment, but has limitation in the aspect of parallel application communication performance prediction running in a multi-core architecture. In a multi-core architecture scenario, when a parallel application runs, the communication of the computation cores between different nodes is affected by other cores communicating at the same time in the same node, which results in additional communication overhead. The existing point-to-point communication model does not contain the measurement of communication competition among the cores, and can not provide more accurate prediction results for parallel application under a multi-core architecture.

The invention content is as follows:

aiming at the problems of the method, the invention provides a communication performance prediction method facing communication competition among cores of parallel applications, which is used for predicting the communication performance of the parallel applications running in a multi-core architecture. The method comprises the steps of firstly constructing a point-to-point communication performance model considering inter-core communication competition under a multi-core architecture, then obtaining parallel application communication time sequence information and process distribution conditions, then measuring communication performance indexes in an application running environment according to the communication performance model, and finally predicting parallel application communication overhead by combining application communication time sequences to realize the prediction of parallel application communication performance under a multi-core architecture high-performance computing environment. The method comprises the following specific steps:

(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPU_sOr O_rTime interval G of two continuous sending or receiving times of CPU, link communication delay L, sending message length k, basic time needed for unit length message communication is G, extra cost h of network card processing communication request caused by inter-core communication competition, and extra cost C of unit length message caused by inter-core communication competition, wherein under the condition of inter-core communication competition, total time cost of one-time point-to-point communication is 2O +2h + L + k (O)_s+O_r+ G + C), where h and C compete with the inter-core communicationThe number increases and changes;

(b) acquiring a parallel application communication time sequence and a process distribution condition; starting from parallel application, acquiring the number of parallel application processes, acquiring all communication operations of each process by using the existing parallel application analysis method, and sorting the communication operations on each process into a complete communication time sequence of the parallel application according to the time sequence; acquiring a process distribution condition according to a default layout of an operating environment or a task layout specified by a user, namely acquiring a mapping relation between a process and a node according to the layout, thereby acquiring node information related to parallel application;

(c) measuring network performance parameters of the parallel application communication environment; in order to use the model constructed in the step (a) to depict the communication overhead of the parallel application, based on the information of the nodes related to the parallel application acquired in the step (b), respectively measuring the non-competitive point-to-point round-trip communication parameters between different computational cores of the related nodes and the point-to-point round-trip communication parameters when the communication competition between the cores exists; for each calculation inter-core communication, designing 2+ m measurement processes, and recording the measurement time as t₁、t₂、t₃₁～t_3mWherein m is the core number of the node where the communication source computing core is located; combining the overhead expression of the measured time to construct an equation set capable of solving the parameter values of each item described in the step (a), and solving the parameter values of each item, thereby depicting the point-to-point communication process under the communication competition among different cores;

(d) and calculating the whole communication overhead of the parallel application according to the communication time sequence.

The following notation is provided to describe the parts split during a point-to-point round trip communication:

the specific process of the step (a) is that,

(a-1) lower communication latency and higher communication bandwidth between computing cores on the same node compared to computing cores distributed on different nodes. Therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on cross-node inter-process communication with higher communication overhead. Meanwhile, the influence of inter-core communication competition is mainly shown in the situation that the source computing core and the target computing core are located in different nodes, and when the source computing core and the target computing core are located in the same node, the extra communication overhead caused by inter-core communication competition is not obvious. For a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications which are the same as the source node at the same time, and the more the number of the communication cores of the source node at the same time is, the larger the additional overhead of the communication is.

(a-2) dividing two situations into two types according to whether the source computing core and the target computing core are positioned in the same node or not by the point-to-point communication model (a-1), wherein when the source computing core and the target computing core are positioned in the same node, namely, intra-node communication, the time cost of the point-to-point communication is nearly the same under the condition that inter-core communication competition exists and the inter-core communication competition does not exist, and the total time cost of the point-to-point communication is 2O + L + k (O)_s+O_r+ G); when the source computing core and the target computing core are positioned at different nodes, namely, the nodes communicate with each other, and under the condition of no inter-core communication competition, the total time overhead of point-to-point communication is 2O + L + k (O)_s+O_r+ G), the total time overhead of point-to-point communication is 2O +2h + L + k (O) under the condition of inter-core communication competition_s+O_r+ G + C), where h and C vary as the number of inter-core communication contention increases.

The specific process of the step (c) is that,

(c-1) for all the computing cores of the nodes involved in the parallel application, point-to-point round-trip communication measurement is respectively carried out between the computing cores and all the cores except the computing cores, for one-time computing inter-core communication measurement, the computing core which firstly sends the message and then receives the message is taken as a source computing core, the node where the source computing core is located is taken as a source node, the computing core which firstly receives the message and then sends the message is taken as a target computing core, and the node where the target computing core is located is taken as a target node.

(c-2) measuring time t by specifying a message sending procedure for inter-core point-to-point round trip communication₁、t₂、t₃₁～t_3m。t₁、t₂、t₃The communication behavior of (2) is as shown in fig. 2, fig. 3, fig. 4. Wherein, let t₁The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t₃Measuring the existence of communication competition among cores, making i be from 1 to the maximum value m of the core number of the source node, and measuring a group t under different message sizes k₁、t₂、t₃。

(c-3) obtaining a measurement process t based on the performance model of (a)₁、t₂、t₃₁～t_3mThe time overhead expression of (a) is as follows:

the model parameter expression can be obtained according to the equation set as follows:

wherein the function LS is a least square fitting slope formula:

to average a 1 values:

sequentially solving the network model parameters O and O according to the model parameter expression_s、O_rL, G, h, C. Wherein the parameters O and O_s、O_rOnly the source computing core and the target computing core, L, G the overall environment of the network, and h and C the number of simultaneous communications in the same node when communicating between cores.

And (c-4) carrying out mean value calculation on corresponding core parameters in the nodes to obtain point-to-point communication parameters between the nodes and network model parameters of point-to-point communication in the nodes.

The specific process of the step (d) is,

(d-1) according to the application communication time sequence obtained in the step (b), converting the process communication operation in the application communication time sequence into the communication operation of the involved nodes, wherein the communication operation comprises sending, receiving, waiting and synchronizing, and each communication operation comprises the contents of a source node, a target node, a starting time, a communication data size, a communication operation type and the like, thereby obtaining the overall application communication time sequence based on the nodes.

And (d-2) predicting communication overhead of each communication operation according to the communication timing obtained in the step (d-1). Selecting a corresponding network model parameter value from (c-3) in dependence on the operating communication node. The point-to-point communication among the nodes is influenced by the communication among other nodes in the same node at the same time, and the corresponding h and C parameter values are selected according to the communication quantity among the nodes of the same node at the same time. And (d) substituting the corresponding network model parameter values into the inter-node communication point-to-point model in the step (a-2), thereby predicting the communication overhead of each communication operation.

And (d-3) combining the sequence relation of each step in the application communication time sequence, and calculating and obtaining the application overall communication overhead predicted value based on the time overhead of each communication step.

The invention has the following beneficial effects:

the existing application communication performance prediction method based on a point-to-point communication model can accurately predict application communication overhead in a single-processor node network interconnection environment, and cannot provide a good prediction result for parallel applications running under a multi-core architecture and having inter-core communication competition. The invention provides a parallel application communication prediction method suitable for a multi-core architecture under a high-performance computing environment, which can quickly and accurately describe single communication overhead under the condition of inter-core communication competition, so that the communication overhead during the operation of parallel application can be accurately predicted, the optimization effect evaluation is provided for a parallel application communication optimization scheme, and the optimization of parallel application communication is guided.

Description of the drawings:

fig. 1 is a flowchart of a communication performance prediction method for parallel application inter-core communication contention according to the present invention.

FIG. 2 is a point-to-point round trip test t in the communication performance model parameter obtaining process of the present invention₁A communication behavior diagram;

FIG. 3 is a point-to-point round trip test t during the communication performance model parameter obtaining process of the present invention₂A communication behavior diagram;

FIG. 4 is a point-to-point round trip test t during the communication performance model parameter obtaining process of the present invention₃A communication behavior diagram;

fig. 5 is a schematic diagram of changes of model parameters h and C in a high-performance computing environment of a certain multi-core architecture along with the number of inter-core communication competitions.

The specific implementation mode is as follows:

the present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1 or fig. 1 to 5, a method for predicting communication performance facing parallel application inter-core communication competition includes the following steps:

(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPU_sOr O_rTime interval G of two continuous transmissions or receptions of CPU, link communication delay L, length k of transmitted message, basic time needed for unit length message communication is G, extra overhead h of network card processing communication request caused by inter-core communication competition, and single caused by inter-core communication competitionThe bit length message overhead C is 2O +2h + L + k (O) in total time of one-time point-to-point communication under the condition of inter-core communication competition_s+O_r+ G + C), where h and C vary as the number of inter-core communication contention increases;

For parallel application running in a high-performance computing environment of a multi-core architecture, it is assumed that the parallel application has P processes, is laid out on N nodes, each node uses M computing cores, and a parallel application communication performance prediction method with inter-core communication competition in the multi-core architecture is described with reference to fig. 2, 3, and 4, and specifically is implemented by the following 4 steps:

(a) and constructing a point-to-point communication performance prediction model under a high-performance computing environment of a multi-core architecture.

(a-1) lower communication latency and higher communication bandwidth between computing cores on the same node compared to computing cores distributed on different nodes. Therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on cross-node inter-process communication with higher communication overhead. Meanwhile, the influence of inter-core communication competition is mainly shown in the situation that the source computing core and the target computing core are located in different nodes, and when the source computing core and the target computing core are located in the same node, the extra communication overhead caused by inter-core communication competition is not obvious. For a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications with the same source node at the same time, the more the number of the source node communication cores at the same time is, the larger the overhead of the communication is, as shown in fig. 5, the relationship that the number of the communication changes along with the source node is carried out on the overhead h value and the C value obtained by point-to-point round trip measurement between a pair of different computing cores of the source node and the target node under the super-computing environment of a certain multi-core architecture.

(b) The communication timing and process distribution of the application are obtained, and the method is consistent with the foregoing description.

(c) Network performance parameters of a parallel application communication environment are measured.

(c-1) respectively carrying out point-to-point round-trip communication measurement on N x 4 computing cores related to parallel application and other cores except the computing cores, and for one-time computing inter-core communication measurement, enabling the computing core which firstly sends the message and then receives the message to be a source computing core, enabling a node where the source computing core is located to be a source node, enabling the computing core which firstly receives the message and then sends the message to be a target computing core, and enabling the node where the target computing core is located to be a target node.

(c-2) to the Nth_iM computing cores in each computing node respectively perform point-to-point round-trip communication measurement with all cores of other N-1 nodes, and a time process t is measured by specifying a message sending process₁、t₂、t₃₁～t_3mWherein, t₁、t₂、t₃The communication behavior of (2) is as shown in fig. 2, fig. 3, fig. 4. For the supplement of (a-2), when the source and target computing cores are located at the same node, only t is measured₁、t₂I.e. to node N_iM in_jA computing core, which performs point-to-point round-trip communication measurement with other M-1 cores in the node and only needs to measure t₁、t₂. Wherein, let t₁The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t₃Measuring the existence of communication competition among cores, making i be the number m from 1 to the core of the source node, and measuring a group t under different message sizes k₁、t₂、t₃。

wherein the function LS is a least square fitting slope formula:

to average the a z values:

Claims

1. A communication performance prediction method for communication competition among parallel application cores is characterized by comprising the following steps:

(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPU_sOr O_rTime interval G of two continuous sending or receiving times of CPU, link communication delay L, sending message length k, basic time needed for unit length message communication is G, extra cost h of network card processing communication request caused by inter-core communication competition, and extra cost C of unit length message caused by inter-core communication competition, wherein under the condition of inter-core communication competition, total time cost of one-time point-to-point communication is 2O +2h + L + k (O)_s+O_r+ G + C), where h and C vary as the number of inter-core communication contention increases;

(c) measuring network performance parameters of the parallel application communication environment; to is coming toDescribing the communication overhead of the parallel application by using the model constructed in the step (a), and respectively measuring the non-competitive point-to-point round-trip communication parameters between different computing cores of the related node and the point-to-point round-trip communication parameters when the inter-core communication competition exists on the basis of the information of the node related to the parallel application acquired in the step (b); for each calculation inter-core communication, designing 2+ m measurement processes, and recording the measurement time as t₁、t₂、t₃₁～t_3mWherein m is the core number of the node where the communication source computing core is located; combining the overhead expression of the measured time to construct an equation set capable of solving the parameter values of each item described in the step (a), and solving the parameter values of each item, thereby depicting the point-to-point communication process under the communication competition among different cores;

2. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (a) comprises:

(a-1) having lower communication latency and higher communication bandwidth between computing cores on the same node than computing cores distributed on different nodes, and therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on the cross-node inter-process communication with higher communication overhead, meanwhile, the communication competition influence among the cores is mainly shown in the condition that the source computing core and the target computing core are positioned in different nodes, when the source computing core and the target computing core are located in the same node, the additional communication overhead caused by communication competition among the cores is not obvious, for a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications which are the same as the source node at the same time, and the more the number of the communication cores of the source node at the same time is, the larger the additional overhead of the communication is;

(a-2) the point-to-point communication model can be divided into two types according to whether the source computing core and the target computing core are positioned in the same node or not through the step (a-1), namely when the source computing core and the target computing core are positioned in the same node, the nodeDuring internal communication, the time cost of point-to-point communication is nearly the same under the condition that inter-core communication competition exists and the inter-core communication competition does not exist, and the total time cost of the point-to-point communication is 2O + L + k (O)_s+O_r+ G); when the source computing core and the target computing core are positioned at different nodes, namely, the nodes communicate with each other, and under the condition of no inter-core communication competition, the total time overhead of point-to-point communication is 2O + L + k (O)_s+O_r+ G), the total time overhead of point-to-point communication is 2O +2h + L + k (O) under the condition of inter-core communication competition_s+O_r+ G + C), where h and C vary as the number of inter-core communication contention increases.

3. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (c) comprises:

(c-1) respectively carrying out point-to-point round-trip communication measurement on all the computing cores of the nodes involved in the parallel application except the computing cores, wherein for one-time inter-computing-core communication measurement, the computing core which firstly sends the message and then receives the message is taken as a source computing core, the node where the source computing core is located is taken as a source node, the computing core which firstly receives the message and then sends the message is taken as a target computing core, and the node where the target computing core is located is taken as a target node;

(c-2) measuring time t by specifying a message sending procedure for inter-core point-to-point round trip communication₁、t₂、t₃₁～t_3m. Wherein, let t₁The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t₃Measuring the existence of communication competition among cores, making i be from 1 to the maximum value m of the core number of the source node, and measuring a group t under different message sizes k₁、t₂、t₃；

wherein the function LS is a least square fitting slope formula:

to average the a z values:

sequentially solving the network model parameters O and O according to the model parameter expression_s、O_rL, G, h, C, wherein the parameters O, O_s、O_rOnly related to a source computing core and a target computing core, L, G related to the overall network environment, h and C related to the number of simultaneous communications in the same node during communication between cores;

4. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (d) comprises:

(d-1) according to the application communication time sequence obtained in the step (b), converting the process communication operation in the application communication time sequence into the communication operation of the related node, wherein the communication operation comprises sending, receiving, waiting and synchronizing, and each communication operation comprises the contents of a source node, a target node, a starting time, a communication data size, a communication operation type and the like, so that the node-based application overall communication time sequence is obtained;

and (d-2) predicting communication overhead of each communication operation according to the communication timing obtained in the step (d-1). Selecting a corresponding network model parameter value from (c-3) in dependence on the operating communication node. The point-to-point communication among the nodes is influenced by the communication among other nodes in the same node at the same time, and the corresponding h and C parameter values are selected according to the communication quantity among the nodes of the same node at the same time. Substituting the corresponding network model parameter values into the inter-node communication point-to-point model in the step (a-2), thereby predicting the communication overhead of each communication operation;