CN114048045A - Communication performance prediction method for communication competition among parallel application cores - Google Patents

Communication performance prediction method for communication competition among parallel application cores Download PDF

Info

Publication number
CN114048045A
CN114048045A CN202111295681.5A CN202111295681A CN114048045A CN 114048045 A CN114048045 A CN 114048045A CN 202111295681 A CN202111295681 A CN 202111295681A CN 114048045 A CN114048045 A CN 114048045A
Authority
CN
China
Prior art keywords
communication
point
core
node
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111295681.5A
Other languages
Chinese (zh)
Other versions
CN114048045B (en
Inventor
肖利民
王泽红
韩萌
徐向荣
朱乃威
常佳辉
王志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111295681.5A priority Critical patent/CN114048045B/en
Publication of CN114048045A publication Critical patent/CN114048045A/en
Application granted granted Critical
Publication of CN114048045B publication Critical patent/CN114048045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a communication performance prediction method for parallel application inter-core communication competition, which comprises the following steps: firstly, constructing a point-to-point communication performance model considering inter-core communication competition under a multi-core architecture; acquiring parallel application communication timing sequence information and process distribution conditions; measuring communication performance indexes in the application running environment according to the communication performance model; and fourthly, predicting parallel application communication overhead by combining the application communication time sequence. The method realizes the prediction of the parallel application communication performance in the multi-core architecture high-performance computing environment, and is beneficial to quickly and accurately describing the single communication overhead under the condition of inter-core communication competition, so that the communication overhead during the parallel application operation is accurately predicted, the optimization effect evaluation is provided for the parallel application communication optimization scheme, and the optimization of the parallel application communication is guided.

Description

Communication performance prediction method for communication competition among parallel application cores
The technical field is as follows:
the invention relates to a parallel application communication performance analysis and prediction method, in particular to a parallel application communication performance prediction method with inter-core communication competition in a high-performance computing environment running in a multi-core architecture.
Background art:
with the widespread application of multi-core architectures in modern parallel computing, high performance computing clusters have shifted from single-level networking, once single-processor, to more complex hierarchical structures. Generally, a high performance computing cluster is composed of a large number of nodes, each of which includes a plurality of multicore processors sharing a memory. Compared with a single-core node, the intra-node communication with lower cost can be carried out among a plurality of computing cores in the multi-core node, and when the inter-node communication is carried out, the plurality of computing cores can compete with each other for link bandwidth and network resources, so that additional communication cost can be caused.
As the size of the parallel application increases, the application communication overhead gradually becomes an important factor limiting the overall performance of the parallel application, and therefore, the optimization of the application communication performance can effectively help to optimize the overall performance of the application. Wherein, the evaluation of the effect of the optimization scheme is a key step in the scheme design process. Because the design and implementation of most optimization schemes are a repeated iteration process, although an accurate optimization effect under each iteration can be obtained by implementing a specific optimization scheme and applying the specific optimization scheme in parallel to a communication performance test under the scheme, repeated execution applied in the iteration process generates a large amount of overhead, and the design period of the optimization scheme is prolonged. As an efficient evaluation means, the application communication performance prediction avoids the application actual operation overhead in the design process of the optimization scheme, and provides a lower-cost and more accurate iteration scheme performance evaluation method for the design and implementation of the optimization scheme.
The existing application communication performance prediction method based on the point-to-point communication model can provide more accurate application communication performance prediction for a single-core node environment, but has limitation in the aspect of parallel application communication performance prediction running in a multi-core architecture. In a multi-core architecture scenario, when a parallel application runs, the communication of the computation cores between different nodes is affected by other cores communicating at the same time in the same node, which results in additional communication overhead. The existing point-to-point communication model does not contain the measurement of communication competition among the cores, and can not provide more accurate prediction results for parallel application under a multi-core architecture.
The invention content is as follows:
aiming at the problems of the method, the invention provides a communication performance prediction method facing communication competition among cores of parallel applications, which is used for predicting the communication performance of the parallel applications running in a multi-core architecture. The method comprises the steps of firstly constructing a point-to-point communication performance model considering inter-core communication competition under a multi-core architecture, then obtaining parallel application communication time sequence information and process distribution conditions, then measuring communication performance indexes in an application running environment according to the communication performance model, and finally predicting parallel application communication overhead by combining application communication time sequences to realize the prediction of parallel application communication performance under a multi-core architecture high-performance computing environment. The method comprises the following specific steps:
(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPUsOr OrTime interval G of two continuous sending or receiving times of CPU, link communication delay L, sending message length k, basic time needed for unit length message communication is G, extra cost h of network card processing communication request caused by inter-core communication competition, and extra cost C of unit length message caused by inter-core communication competition, wherein under the condition of inter-core communication competition, total time cost of one-time point-to-point communication is 2O +2h + L + k (O)s+Or+ G + C), where h and C compete with the inter-core communicationThe number increases and changes;
(b) acquiring a parallel application communication time sequence and a process distribution condition; starting from parallel application, acquiring the number of parallel application processes, acquiring all communication operations of each process by using the existing parallel application analysis method, and sorting the communication operations on each process into a complete communication time sequence of the parallel application according to the time sequence; acquiring a process distribution condition according to a default layout of an operating environment or a task layout specified by a user, namely acquiring a mapping relation between a process and a node according to the layout, thereby acquiring node information related to parallel application;
(c) measuring network performance parameters of the parallel application communication environment; in order to use the model constructed in the step (a) to depict the communication overhead of the parallel application, based on the information of the nodes related to the parallel application acquired in the step (b), respectively measuring the non-competitive point-to-point round-trip communication parameters between different computational cores of the related nodes and the point-to-point round-trip communication parameters when the communication competition between the cores exists; for each calculation inter-core communication, designing 2+ m measurement processes, and recording the measurement time as t1、t2、t31~t3mWherein m is the core number of the node where the communication source computing core is located; combining the overhead expression of the measured time to construct an equation set capable of solving the parameter values of each item described in the step (a), and solving the parameter values of each item, thereby depicting the point-to-point communication process under the communication competition among different cores;
(d) and calculating the whole communication overhead of the parallel application according to the communication time sequence.
The following notation is provided to describe the parts split during a point-to-point round trip communication:
Figure BDA0003336497430000021
Figure BDA0003336497430000031
the specific process of the step (a) is that,
(a-1) lower communication latency and higher communication bandwidth between computing cores on the same node compared to computing cores distributed on different nodes. Therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on cross-node inter-process communication with higher communication overhead. Meanwhile, the influence of inter-core communication competition is mainly shown in the situation that the source computing core and the target computing core are located in different nodes, and when the source computing core and the target computing core are located in the same node, the extra communication overhead caused by inter-core communication competition is not obvious. For a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications which are the same as the source node at the same time, and the more the number of the communication cores of the source node at the same time is, the larger the additional overhead of the communication is.
(a-2) dividing two situations into two types according to whether the source computing core and the target computing core are positioned in the same node or not by the point-to-point communication model (a-1), wherein when the source computing core and the target computing core are positioned in the same node, namely, intra-node communication, the time cost of the point-to-point communication is nearly the same under the condition that inter-core communication competition exists and the inter-core communication competition does not exist, and the total time cost of the point-to-point communication is 2O + L + k (O)s+Or+ G); when the source computing core and the target computing core are positioned at different nodes, namely, the nodes communicate with each other, and under the condition of no inter-core communication competition, the total time overhead of point-to-point communication is 2O + L + k (O)s+Or+ G), the total time overhead of point-to-point communication is 2O +2h + L + k (O) under the condition of inter-core communication competitions+Or+ G + C), where h and C vary as the number of inter-core communication contention increases.
The specific process of the step (c) is that,
(c-1) for all the computing cores of the nodes involved in the parallel application, point-to-point round-trip communication measurement is respectively carried out between the computing cores and all the cores except the computing cores, for one-time computing inter-core communication measurement, the computing core which firstly sends the message and then receives the message is taken as a source computing core, the node where the source computing core is located is taken as a source node, the computing core which firstly receives the message and then sends the message is taken as a target computing core, and the node where the target computing core is located is taken as a target node.
(c-2) measuring time t by specifying a message sending procedure for inter-core point-to-point round trip communication1、t2、t31~t3m。t1、t2、t3The communication behavior of (2) is as shown in fig. 2, fig. 3, fig. 4. Wherein, let t1The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t3Measuring the existence of communication competition among cores, making i be from 1 to the maximum value m of the core number of the source node, and measuring a group t under different message sizes k1、t2、t3
(c-3) obtaining a measurement process t based on the performance model of (a)1、t2、t31~t3mThe time overhead expression of (a) is as follows:
Figure BDA0003336497430000041
the model parameter expression can be obtained according to the equation set as follows:
Figure BDA0003336497430000042
wherein the function LS is a least square fitting slope formula:
Figure BDA0003336497430000043
Figure BDA0003336497430000044
to average a 1 values:
Figure BDA0003336497430000045
sequentially solving the network model parameters O and O according to the model parameter expressions、OrL, G, h, C. Wherein the parameters O and Os、OrOnly the source computing core and the target computing core, L, G the overall environment of the network, and h and C the number of simultaneous communications in the same node when communicating between cores.
And (c-4) carrying out mean value calculation on corresponding core parameters in the nodes to obtain point-to-point communication parameters between the nodes and network model parameters of point-to-point communication in the nodes.
The specific process of the step (d) is,
(d-1) according to the application communication time sequence obtained in the step (b), converting the process communication operation in the application communication time sequence into the communication operation of the involved nodes, wherein the communication operation comprises sending, receiving, waiting and synchronizing, and each communication operation comprises the contents of a source node, a target node, a starting time, a communication data size, a communication operation type and the like, thereby obtaining the overall application communication time sequence based on the nodes.
And (d-2) predicting communication overhead of each communication operation according to the communication timing obtained in the step (d-1). Selecting a corresponding network model parameter value from (c-3) in dependence on the operating communication node. The point-to-point communication among the nodes is influenced by the communication among other nodes in the same node at the same time, and the corresponding h and C parameter values are selected according to the communication quantity among the nodes of the same node at the same time. And (d) substituting the corresponding network model parameter values into the inter-node communication point-to-point model in the step (a-2), thereby predicting the communication overhead of each communication operation.
And (d-3) combining the sequence relation of each step in the application communication time sequence, and calculating and obtaining the application overall communication overhead predicted value based on the time overhead of each communication step.
The invention has the following beneficial effects:
the existing application communication performance prediction method based on a point-to-point communication model can accurately predict application communication overhead in a single-processor node network interconnection environment, and cannot provide a good prediction result for parallel applications running under a multi-core architecture and having inter-core communication competition. The invention provides a parallel application communication prediction method suitable for a multi-core architecture under a high-performance computing environment, which can quickly and accurately describe single communication overhead under the condition of inter-core communication competition, so that the communication overhead during the operation of parallel application can be accurately predicted, the optimization effect evaluation is provided for a parallel application communication optimization scheme, and the optimization of parallel application communication is guided.
Description of the drawings:
fig. 1 is a flowchart of a communication performance prediction method for parallel application inter-core communication contention according to the present invention.
FIG. 2 is a point-to-point round trip test t in the communication performance model parameter obtaining process of the present invention1A communication behavior diagram;
FIG. 3 is a point-to-point round trip test t during the communication performance model parameter obtaining process of the present invention2A communication behavior diagram;
FIG. 4 is a point-to-point round trip test t during the communication performance model parameter obtaining process of the present invention3A communication behavior diagram;
fig. 5 is a schematic diagram of changes of model parameters h and C in a high-performance computing environment of a certain multi-core architecture along with the number of inter-core communication competitions.
The specific implementation mode is as follows:
the present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1 or fig. 1 to 5, a method for predicting communication performance facing parallel application inter-core communication competition includes the following steps:
(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPUsOr OrTime interval G of two continuous transmissions or receptions of CPU, link communication delay L, length k of transmitted message, basic time needed for unit length message communication is G, extra overhead h of network card processing communication request caused by inter-core communication competition, and single caused by inter-core communication competitionThe bit length message overhead C is 2O +2h + L + k (O) in total time of one-time point-to-point communication under the condition of inter-core communication competitions+Or+ G + C), where h and C vary as the number of inter-core communication contention increases;
(b) acquiring a parallel application communication time sequence and a process distribution condition; starting from parallel application, acquiring the number of parallel application processes, acquiring all communication operations of each process by using the existing parallel application analysis method, and sorting the communication operations on each process into a complete communication time sequence of the parallel application according to the time sequence; acquiring a process distribution condition according to a default layout of an operating environment or a task layout specified by a user, namely acquiring a mapping relation between a process and a node according to the layout, thereby acquiring node information related to parallel application;
(c) measuring network performance parameters of the parallel application communication environment; in order to use the model constructed in the step (a) to depict the communication overhead of the parallel application, based on the information of the nodes related to the parallel application acquired in the step (b), respectively measuring the non-competitive point-to-point round-trip communication parameters between different computational cores of the related nodes and the point-to-point round-trip communication parameters when the communication competition between the cores exists; for each calculation inter-core communication, designing 2+ m measurement processes, and recording the measurement time as t1、t2、t31~t3mWherein m is the core number of the node where the communication source computing core is located; combining the overhead expression of the measured time to construct an equation set capable of solving the parameter values of each item described in the step (a), and solving the parameter values of each item, thereby depicting the point-to-point communication process under the communication competition among different cores;
(d) and calculating the whole communication overhead of the parallel application according to the communication time sequence.
For parallel application running in a high-performance computing environment of a multi-core architecture, it is assumed that the parallel application has P processes, is laid out on N nodes, each node uses M computing cores, and a parallel application communication performance prediction method with inter-core communication competition in the multi-core architecture is described with reference to fig. 2, 3, and 4, and specifically is implemented by the following 4 steps:
(a) and constructing a point-to-point communication performance prediction model under a high-performance computing environment of a multi-core architecture.
(a-1) lower communication latency and higher communication bandwidth between computing cores on the same node compared to computing cores distributed on different nodes. Therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on cross-node inter-process communication with higher communication overhead. Meanwhile, the influence of inter-core communication competition is mainly shown in the situation that the source computing core and the target computing core are located in different nodes, and when the source computing core and the target computing core are located in the same node, the extra communication overhead caused by inter-core communication competition is not obvious. For a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications with the same source node at the same time, the more the number of the source node communication cores at the same time is, the larger the overhead of the communication is, as shown in fig. 5, the relationship that the number of the communication changes along with the source node is carried out on the overhead h value and the C value obtained by point-to-point round trip measurement between a pair of different computing cores of the source node and the target node under the super-computing environment of a certain multi-core architecture.
(a-2) dividing two situations into two types according to whether the source computing core and the target computing core are positioned in the same node or not by the point-to-point communication model (a-1), wherein when the source computing core and the target computing core are positioned in the same node, namely, intra-node communication, the time cost of the point-to-point communication is nearly the same under the condition that inter-core communication competition exists and the inter-core communication competition does not exist, and the total time cost of the point-to-point communication is 2O + L + k (O)s+Or+ G); when the source computing core and the target computing core are positioned at different nodes, namely, the nodes communicate with each other, and under the condition of no inter-core communication competition, the total time overhead of point-to-point communication is 2O + L + k (O)s+Or+ G), the total time overhead of point-to-point communication is 2O +2h + L + k (O) under the condition of inter-core communication competitions+Or+ G + C), where h and C vary as the number of inter-core communication contention increases.
(b) The communication timing and process distribution of the application are obtained, and the method is consistent with the foregoing description.
(c) Network performance parameters of a parallel application communication environment are measured.
(c-1) respectively carrying out point-to-point round-trip communication measurement on N x 4 computing cores related to parallel application and other cores except the computing cores, and for one-time computing inter-core communication measurement, enabling the computing core which firstly sends the message and then receives the message to be a source computing core, enabling a node where the source computing core is located to be a source node, enabling the computing core which firstly receives the message and then sends the message to be a target computing core, and enabling the node where the target computing core is located to be a target node.
(c-2) to the NthiM computing cores in each computing node respectively perform point-to-point round-trip communication measurement with all cores of other N-1 nodes, and a time process t is measured by specifying a message sending process1、t2、t31~t3mWherein, t1、t2、t3The communication behavior of (2) is as shown in fig. 2, fig. 3, fig. 4. For the supplement of (a-2), when the source and target computing cores are located at the same node, only t is measured1、t2I.e. to node NiM injA computing core, which performs point-to-point round-trip communication measurement with other M-1 cores in the node and only needs to measure t1、t2. Wherein, let t1The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t3Measuring the existence of communication competition among cores, making i be the number m from 1 to the core of the source node, and measuring a group t under different message sizes k1、t2、t3
(c-3) obtaining a measurement process t based on the performance model of (a)1、t2、t31~t3mThe time overhead expression of (a) is as follows:
Figure BDA0003336497430000071
the model parameter expression can be obtained according to the equation set as follows:
Figure BDA0003336497430000081
wherein the function LS is a least square fitting slope formula:
Figure BDA0003336497430000082
Figure BDA0003336497430000083
to average the a z values:
Figure BDA0003336497430000084
sequentially solving the network model parameters O and O according to the model parameter expressions、OrL, G, h, C. Wherein the parameters O and Os、OrOnly the source computing core and the target computing core, L, G the overall environment of the network, and h and C the number of simultaneous communications in the same node when communicating between cores.
And (c-4) carrying out mean value calculation on corresponding core parameters in the nodes to obtain point-to-point communication parameters between the nodes and network model parameters of point-to-point communication in the nodes.
(d) And calculating the whole communication overhead of the parallel application according to the communication time sequence.
(d-1) according to the application communication time sequence obtained in the step (b), converting the process communication operation in the application communication time sequence into the communication operation of the involved nodes, wherein the communication operation comprises sending, receiving, waiting and synchronizing, and each communication operation comprises the contents of a source node, a target node, a starting time, a communication data size, a communication operation type and the like, thereby obtaining the overall application communication time sequence based on the nodes.
And (d-2) predicting communication overhead of each communication operation according to the communication timing obtained in the step (d-1). Selecting a corresponding network model parameter value from (c-3) in dependence on the operating communication node. The point-to-point communication among the nodes is influenced by the communication among other nodes in the same node at the same time, and the corresponding h and C parameter values are selected according to the communication quantity among the nodes of the same node at the same time. And (d) substituting the corresponding network model parameter values into the inter-node communication point-to-point model in the step (a-2), thereby predicting the communication overhead of each communication operation.
And (d-3) combining the sequence relation of each step in the application communication time sequence, and calculating and obtaining the application overall communication overhead predicted value based on the time overhead of each communication step.

Claims (4)

1. A communication performance prediction method for communication competition among parallel application cores is characterized by comprising the following steps:
(a) constructing a point-to-point communication performance model with an internuclear communication competition background under a multi-core architecture; when the point-to-point communication occurs, the condition that other point-to-point communication exists in a communication source node at the same time is described as a communication model with inter-core communication competition; with reference to the LogGPS model, a point-to-point communication process is decomposed into a plurality of parts of parameter description, wherein the parts comprise a minimum time overhead O for processing a communication sending or receiving request by a CPU and an overhead O per byte for processing a message by the CPUsOr OrTime interval G of two continuous sending or receiving times of CPU, link communication delay L, sending message length k, basic time needed for unit length message communication is G, extra cost h of network card processing communication request caused by inter-core communication competition, and extra cost C of unit length message caused by inter-core communication competition, wherein under the condition of inter-core communication competition, total time cost of one-time point-to-point communication is 2O +2h + L + k (O)s+Or+ G + C), where h and C vary as the number of inter-core communication contention increases;
(b) acquiring a parallel application communication time sequence and a process distribution condition; starting from parallel application, acquiring the number of parallel application processes, acquiring all communication operations of each process by using the existing parallel application analysis method, and sorting the communication operations on each process into a complete communication time sequence of the parallel application according to the time sequence; acquiring a process distribution condition according to a default layout of an operating environment or a task layout specified by a user, namely acquiring a mapping relation between a process and a node according to the layout, thereby acquiring node information related to parallel application;
(c) measuring network performance parameters of the parallel application communication environment; to is coming toDescribing the communication overhead of the parallel application by using the model constructed in the step (a), and respectively measuring the non-competitive point-to-point round-trip communication parameters between different computing cores of the related node and the point-to-point round-trip communication parameters when the inter-core communication competition exists on the basis of the information of the node related to the parallel application acquired in the step (b); for each calculation inter-core communication, designing 2+ m measurement processes, and recording the measurement time as t1、t2、t31~t3mWherein m is the core number of the node where the communication source computing core is located; combining the overhead expression of the measured time to construct an equation set capable of solving the parameter values of each item described in the step (a), and solving the parameter values of each item, thereby depicting the point-to-point communication process under the communication competition among different cores;
(d) and calculating the whole communication overhead of the parallel application according to the communication time sequence.
2. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (a) comprises:
(a-1) having lower communication latency and higher communication bandwidth between computing cores on the same node than computing cores distributed on different nodes, and therefore, for parallel applications with evenly distributed inter-process communication, the overall communication overhead of the application depends mainly on the cross-node inter-process communication with higher communication overhead, meanwhile, the communication competition influence among the cores is mainly shown in the condition that the source computing core and the target computing core are positioned in different nodes, when the source computing core and the target computing core are located in the same node, the additional communication overhead caused by communication competition among the cores is not obvious, for a certain communication among different nodes, except for link delay, communication bandwidth and message size, the communication overhead is mainly influenced by other communications which are the same as the source node at the same time, and the more the number of the communication cores of the source node at the same time is, the larger the additional overhead of the communication is;
(a-2) the point-to-point communication model can be divided into two types according to whether the source computing core and the target computing core are positioned in the same node or not through the step (a-1), namely when the source computing core and the target computing core are positioned in the same node, the nodeDuring internal communication, the time cost of point-to-point communication is nearly the same under the condition that inter-core communication competition exists and the inter-core communication competition does not exist, and the total time cost of the point-to-point communication is 2O + L + k (O)s+Or+ G); when the source computing core and the target computing core are positioned at different nodes, namely, the nodes communicate with each other, and under the condition of no inter-core communication competition, the total time overhead of point-to-point communication is 2O + L + k (O)s+Or+ G), the total time overhead of point-to-point communication is 2O +2h + L + k (O) under the condition of inter-core communication competitions+Or+ G + C), where h and C vary as the number of inter-core communication contention increases.
3. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (c) comprises:
(c-1) respectively carrying out point-to-point round-trip communication measurement on all the computing cores of the nodes involved in the parallel application except the computing cores, wherein for one-time inter-computing-core communication measurement, the computing core which firstly sends the message and then receives the message is taken as a source computing core, the node where the source computing core is located is taken as a source node, the computing core which firstly receives the message and then sends the message is taken as a target computing core, and the node where the target computing core is located is taken as a target node;
(c-2) measuring time t by specifying a message sending procedure for inter-core point-to-point round trip communication1、t2、t31~t3m. Wherein, let t1The time interval w from the calling of the sending command to the calling of the receiving command of the middle-source computing core is far larger than the message round-trip overhead for t3Measuring the existence of communication competition among cores, making i be from 1 to the maximum value m of the core number of the source node, and measuring a group t under different message sizes k1、t2、t3
(c-3) obtaining a measurement process t based on the performance model of (a)1、t2、t31~t3mThe time overhead expression of (a) is as follows:
Figure FDA0003336497420000021
the model parameter expression can be obtained according to the equation set as follows:
Figure FDA0003336497420000031
wherein the function LS is a least square fitting slope formula:
Figure FDA0003336497420000032
Figure FDA0003336497420000033
to average the a z values:
Figure FDA0003336497420000034
sequentially solving the network model parameters O and O according to the model parameter expressions、OrL, G, h, C, wherein the parameters O, Os、OrOnly related to a source computing core and a target computing core, L, G related to the overall network environment, h and C related to the number of simultaneous communications in the same node during communication between cores;
and (c-4) carrying out mean value calculation on corresponding core parameters in the nodes to obtain point-to-point communication parameters between the nodes and network model parameters of point-to-point communication in the nodes.
4. The method for predicting communication performance of parallel application inter-core communication competition according to claim 1, wherein the specific process of the step (d) comprises:
(d-1) according to the application communication time sequence obtained in the step (b), converting the process communication operation in the application communication time sequence into the communication operation of the related node, wherein the communication operation comprises sending, receiving, waiting and synchronizing, and each communication operation comprises the contents of a source node, a target node, a starting time, a communication data size, a communication operation type and the like, so that the node-based application overall communication time sequence is obtained;
and (d-2) predicting communication overhead of each communication operation according to the communication timing obtained in the step (d-1). Selecting a corresponding network model parameter value from (c-3) in dependence on the operating communication node. The point-to-point communication among the nodes is influenced by the communication among other nodes in the same node at the same time, and the corresponding h and C parameter values are selected according to the communication quantity among the nodes of the same node at the same time. Substituting the corresponding network model parameter values into the inter-node communication point-to-point model in the step (a-2), thereby predicting the communication overhead of each communication operation;
and (d-3) combining the sequence relation of each step in the application communication time sequence, and calculating and obtaining the application overall communication overhead predicted value based on the time overhead of each communication step.
CN202111295681.5A 2021-11-03 2021-11-03 Communication performance prediction method for communication competition between parallel application cores Active CN114048045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111295681.5A CN114048045B (en) 2021-11-03 2021-11-03 Communication performance prediction method for communication competition between parallel application cores

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111295681.5A CN114048045B (en) 2021-11-03 2021-11-03 Communication performance prediction method for communication competition between parallel application cores

Publications (2)

Publication Number Publication Date
CN114048045A true CN114048045A (en) 2022-02-15
CN114048045B CN114048045B (en) 2024-06-21

Family

ID=80206984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111295681.5A Active CN114048045B (en) 2021-11-03 2021-11-03 Communication performance prediction method for communication competition between parallel application cores

Country Status (1)

Country Link
CN (1) CN114048045B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6925431B1 (en) * 2000-06-06 2005-08-02 Microsoft Corporation Method and system for predicting communication delays of detailed application workloads
US20100292980A1 (en) * 2009-05-14 2010-11-18 International Business Machines Corporation Application resource model composition from constituent components
CN112383443A (en) * 2020-09-22 2021-02-19 北京航空航天大学 Parallel application communication performance prediction method running in RDMA communication environment
CN113259482A (en) * 2021-06-21 2021-08-13 北京卡普拉科技有限公司 Many-to-many communication mode optimization method and device, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6925431B1 (en) * 2000-06-06 2005-08-02 Microsoft Corporation Method and system for predicting communication delays of detailed application workloads
US20100292980A1 (en) * 2009-05-14 2010-11-18 International Business Machines Corporation Application resource model composition from constituent components
CN112383443A (en) * 2020-09-22 2021-02-19 北京航空航天大学 Parallel application communication performance prediction method running in RDMA communication environment
CN113259482A (en) * 2021-06-21 2021-08-13 北京卡普拉科技有限公司 Many-to-many communication mode optimization method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114048045B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CA3099965C (en) Neuron smearing for accelerated deep learning
CA3108089C (en) Task activating for accelerated deep learning
US20220172030A1 (en) Numerical representation for neural networks
US20200380370A1 (en) Floating-point unit stochastic rounding for accelerated deep learning
US20200380344A1 (en) Neuron smearing for accelerated deep learning
Shi et al. A DAG model of synchronous stochastic gradient descent in distributed deep learning
CN112433853B (en) Heterogeneous perception data partitioning method for supercomputer data parallel application
CN115904539A (en) Online generation method, device and equipment of segmentation strategy and storage medium
Guan et al. Quantifying the impact of uncertainty in embedded systems mapping for NoC based architectures
Ashby et al. The impact of global communication latency at extreme scales on Krylov methods
CN114048045A (en) Communication performance prediction method for communication competition among parallel application cores
CN112383443B (en) Parallel application communication performance prediction method running in RDMA communication environment
CN108846248B (en) Application modeling and performance prediction method
Zhao et al. A method of fast evaluation of an MC placement for network-on-chip
Huang et al. UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-based Matrix Factorization
Zhang et al. A Hierarchical Communication Algorithm for Distributed Deep Learning Training
CN115455342A (en) Parallel computing method and device for unstructured triangular sparse linear equation set
CN116432722A (en) Method and system for evaluating performance of pulse array accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant