CN115834594A - Data collection method for improving high-performance computing application - Google Patents
Data collection method for improving high-performance computing application Download PDFInfo
- Publication number
- CN115834594A CN115834594A CN202211435481.XA CN202211435481A CN115834594A CN 115834594 A CN115834594 A CN 115834594A CN 202211435481 A CN202211435481 A CN 202211435481A CN 115834594 A CN115834594 A CN 115834594A
- Authority
- CN
- China
- Prior art keywords
- node
- transmission
- execution environment
- user
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000013480 data collection Methods 0.000 title claims abstract description 36
- 230000005540 biological transmission Effects 0.000 claims description 78
- 230000008569 process Effects 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 24
- 230000011218 segmentation Effects 0.000 claims description 14
- 230000001419 dependent effect Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000012790 confirmation Methods 0.000 claims description 3
- 230000003111 delayed effect Effects 0.000 claims description 3
- 238000002955 isolation Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 description 10
- 230000008447 perception Effects 0.000 description 8
- 238000013459 approach Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data collection method for improving high-performance computing application, which relates to the field of high-performance computing.
Description
Technical Field
The invention relates to a data collection method for improving high-performance computing application, and belongs to the technical field of high-performance computing.
Background
High performance computing refers to computing systems and environments that typically use many processors or several computers organized in a cluster, with many types of HPC systems ranging from large clusters of standard computers to highly specialized hardware.
High Performance Computing (HPC) systems have more and more computing resources. The service range is gradually expanded, the user group is increasingly complex, and the trend of diversification of user requirements is more and more prominent.
A method and apparatus for improving performance data collection for high performance computing applications is disclosed in chinese patent application (publication No. CN 111611125A), which includes: a performance data comparator of the source node for collecting performance data of an application of the source node from the host fabric interface at a polling frequency; an interface for transmitting a write-back instruction to the host fabric interface, the write-back instruction for causing data to be written to a memory address location of the source node to trigger the wake-up mode; and a frequency selector for: starting a polling frequency as a first polling frequency for a sleep mode; and increasing the polling frequency to a second polling frequency in response to the data in the memory address location identifying the wake-up mode. In the process of file transmission, because of the continuous increase of computing nodes, environment configuration is difficult, so that the deployment speed is slow, the starting speed of an application program is slow, the starting delay is high, and in the process of data collection, the efficiency and the accuracy of data collection are low.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the data collection method for improving the high-performance computing application is provided, and the problems that due to the fact that computing nodes are increased continuously, environment configuration is difficult, deployment speed is low, the starting speed of an application program is low, starting delay is high, and in the data collection process, the data collection efficiency and accuracy are low are solved.
The technical scheme adopted by the invention is as follows: a data collection method for improving high performance computing applications, the method comprising the steps of:
s1: setting an execution environment;
a private execution environment is created for a user by using a layered file system and process isolation, the execution environment is started for the user when the user logs in, the execution environment is automatically deployed when the user runs an application program, and the execution environment comprises a login node, a computing node, a shared memory, a file system and a topology-aware P2P;
s2: deploying a computing node;
setting an upper layer and a lower layer file system in an execution environment, wherein the upper layer is a file required by a program, the lower layer is other files, firstly, transmitting the file required by the current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof to the computing nodes through a shared memory when the number of the computing nodes used by the files is less than the threshold value; when the number of computing nodes used by the file is larger than a threshold value, optimizing the P2P transmission for specific topology awareness;
the container image is divided into an upper layer and a lower layer, wherein an execution environment lighter than a container is realized by arranging the upper layer and the lower layer of file systems, only two covering file system layers are used for avoiding space overhead of the container image, and the lightweight design also reduces network transmission pressure related to environment deployment; moreover, by setting a threshold according to the number of the computing nodes used by the file, when the number of the computing nodes used by the file is less than the threshold, the application program and the execution environment thereof are directly deployed on the computing nodes through the shared memory; when the number of the computing nodes used by the file is larger than a threshold value, the execution environment is deployed to the computing nodes by using topology-aware P2P transmission, so that the advantages of different file transmission modes under different scales are reasonably combined, and the network transmission efficiency is improved;
s3: deploying an execution environment;
creating an isolated process tree for each user, when the user exits, only killing a root process of the process tree, using a covering file system with only two layers, using a node directory as a lower layer of the covering file system, adding an empty directory as an upper layer of each user, and synchronizing the upper layer of the user to a corresponding computing node when realizing automatic deployment of an execution environment;
the method has the advantages that an isolated process tree is created for each user and can be automatically deployed, so that the burden of manually configuring an execution environment on a computing node by the user is reduced, and the privacy protection of the user is realized;
when the number of the computing nodes is increased, the used capacity of the application program is small, and the sharing storage is obvious in advantage; however, when the number of computing nodes is large, even if the transmitted file is small, traffic congestion is easily caused, and therefore, shared storage has obvious advantages;
s3.1: point-to-point;
setting a list of agent nodes and a list of subordinate nodes of each agent node, and when a user runs an application program, analyzing the node list used by the application program of the user to generate a tree structure of P2P transmission, wherein a user login node is a root node of the tree;
the computing nodes are divided into proxy nodes and slave nodes.
S3.2, nodes;
the agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the application program node list, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, a utilization rate threshold value is set, the threshold value is 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate nodes are adjusted to be isolated nodes, and finally the last layer of isolated nodes in the tree;
s3.3: transferring:
after a tree structure is created on a login node, the tree structure is transmitted to a next layer of proxy nodes while a file is transmitted, then each proxy node finds a child node of each proxy node according to the tree structure to continue transmission, and waits for a signal of completion of transmission; after the agent node receives the transmission completion signals of all the child nodes, the agent node generates the transmission completion signals and returns the transmission completion signals to the parent node of the agent node; finally, after the login agent node receives the confirmation signal from the first layer agent node, namely the whole transmission process is finished, the temporarily occupied agent node is set to be in an idle state;
s4: quick response;
s4.1, starting in advance;
if the dependent files of the application program appear on the upper layer, adding the files to the emergency part; the rest files of the upper file system of the user are lag parts; after the transmission of the emergency part is completed, directly starting an execution environment on a corresponding computing node to start an application program;
s4.2: and (3) delayed transmission:
transmitting a file of a lag part, and establishing a function performance model for high-performance calculation;
the system comprises a topology-aware execution environment service, a proxy node, a subordinate node, a private execution environment and a service management server, wherein the topology-aware execution environment service is used for deploying the application program quickly and agilely in high-performance computing, and the proxy node and the subordinate node are distinguished by generating a tree structure, so that a private execution environment is provided for each user in a high-performance computing system, and the quick automatic deployment and the execution process of the application program are realized; in the method, a mechanism of step transmission and early start is also provided to reduce the start delay of the application program, compared with the traditional container-based application program deployment, the method has higher speed and can effectively reduce the network load;
s5: automatic performance modeling;
s5.1: modeling in a segmented mode;
performance of the real dataset C = [ C ] by traversing the function 1 ,C 2 ,...,C i ,...,C n ]In which C is i =[X i ,Y i ]N is the number of data point pairs, denoted by C i As segmentation points, [ C ] is separately aligned using confidence domain reflection least squares 1 ,C 2 ,...,C i ]And [ C i ,...,C n ]Performing fitting modeling, calculating the mean square error after segmentation, and if the mean square error of two-segment models selected from n-2 segments is smaller than a threshold value and the mean square error is minimum, taking the segmentation point as an optimal segmentation point, taking a segmentation performance model established by the segmentation point as a performance model of the function, and applying the performance model of the function to each calculation node;
the method comprises the steps of firstly setting an execution environment, serving through the topology-aware execution environment, and being used for rapid and agile application program deployment in high-performance calculation, building a function performance model of the high-performance calculation by generating a tree structure and distinguishing agent nodes and subordinate nodes, and performing segmented modeling by traversing a function performance real data set, so that full-coverage modeling in the high-performance calculation is realized, a performance model with accurate functions is built, the calculation behaviors of the program are completely and finely depicted, the accuracy of the model is effectively improved, the data collection efficiency and accuracy of the high-performance calculation are also improved by improving the accuracy of the model, and the defects of low data collection speed and disordered collection are improved by matching with the execution environment;
s5.2: circulating;
if the mean square errors of two sections of models selected from the n-2 sections are larger than the threshold value, repeating S5.1;
if any one section of the mean square error of the two sections of the model selected from the n-2 sections is smaller than the threshold value, S6 is carried out;
s6: data collection:
data collection is performed using each computing node to which the performance model is applied.
Preferably, in S3 above, the tree width is set according to the performance of each compute node.
Preferably, the user has the right to self-define the execution environment of the user, and simultaneously, each node runs a daemon process which is only responsible for starting, stopping and deleting the execution environment and executing P2P transmission;
the user self-defines the execution environment and sets the daemon process, so that the reliability of the execution environment is improved.
Preferably, when the P2P transmission is in operation, if some nodes fail, the transmission will be blocked in the P2P transmission tree; after time out, reporting the transmission path of the network fault to a root node;
the P2P transmission can also be used as an auxiliary tool for monitoring the network state through the P2P transmission operation.
Preferably, the mean square error in S6.1 is obtained by calculating a mean of a sum of squares after a difference is made between a predicted value and a true value, and a calculation formula is as follows:
MSE = E (predicted value-true value) 2 ;
Wherein MSE is mean square error, E is averaging symbol;
the mean square error can better reflect the deviation between the predicted value and the true value.
Preferably, the topology-aware P2P includes a multi-layer tree structure constructed by a plurality of multi-port routers, and switching chip ports of the plurality of routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of directly connected ports is located is provided with a first router formed by a router, and the router group of each of the rest layers is provided with n +2 router numbers, wherein n is the layer number;
the first router group and the plurality of router groups are arranged, and the rest router groups with more two routers than the layer number are arranged, so that the deployment time of the topology-aware P2P during file transmission is better shortened, and the starting delay of an application program is reduced.
Preferably, in S4.2, when the file needs to be read and written, the existence of the file is determined first, and if the file exists, the natural system call is directly entered; if not, the file is still transmitted in the lag part and is called after the transmission is finished.
Preferably, in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unallocated state for use at any time.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) The method is used for deploying the application program quickly and agilely in high-performance calculation by setting the execution environment and carrying out topology-aware execution environment service, building a function performance model of high-performance calculation by generating a tree structure and distinguishing agent nodes and subordinate nodes, and carrying out segmented modeling by traversing a function performance real data set, so that full-coverage modeling in high-performance calculation is realized, a performance model with accurate functions is built, the calculation behavior of the program is completely and finely depicted, the accuracy of the model is effectively improved, the data collection efficiency and accuracy of the high-performance calculation are also improved by improving the accuracy of the model, and the defects of low data collection speed and disordered collection are improved by matching with the execution environment;
2) According to the invention, by arranging the upper layer and the lower layer of file systems, an execution environment lighter than a container is realized, only two covering file system layers are used to avoid space overhead of container mapping, and the lightweight design also reduces network transmission pressure related to environment deployment; moreover, by setting a threshold according to the number of the computing nodes used by the file, when the number of the computing nodes used by the file is less than the threshold, the application program and the execution environment thereof are directly deployed on the computing nodes through the shared memory; when the number of the computing nodes used by the file is larger than a threshold value, P2P transmission specific to topology perception is optimized, the advantages of different file transmission modes under different scales are reasonably combined, and the network transmission efficiency is improved;
3) The invention is used for rapid and agile application program deployment in high-performance computing through topology-aware execution environment service, and provides a private execution environment for each user in a high-performance computing system by generating a tree structure and distinguishing agent nodes and dependent nodes, and realizes rapid and automatic deployment and execution processes of the application program; and a P2P method based on topology perception is designed to reduce deployment time, and in the method, a mechanism of step transmission and early start is also provided to reduce the start delay of the application program.
Drawings
FIG. 1 is a flow chart of a data collection method for improving high performance computing applications in accordance with the present invention;
FIG. 2 is a block diagram of an execution environment for a data collection method for improving high performance computing applications in accordance with the present invention;
FIG. 3 is a tree structure diagram of topology aware P2P for improving the data collection method of high performance computing applications of the present invention;
FIG. 4 is a graph of the results of an execution environment transfer 18mb file size deployment time experiment;
FIG. 5 is a diagram illustrating the results of an execution environment transmission 336mb file size deployment time experiment according to the present invention;
FIG. 6 is a transport mode network load state for the data collection method of the present invention for improving high performance computing applications.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1: as shown in fig. 1-6, the present embodiment provides a data collection method for improving high performance computing applications, comprising the steps of;
s1: setting an execution environment;
a private execution environment is created for a user by using a layered file system and process isolation, the execution environment is started for the user when the user logs in, the execution environment is automatically deployed when the user runs an application program, and the execution environment comprises a login node, a computing node, a shared memory, a file system and a topology-aware P2P;
s2: deploying computing nodes;
setting an upper layer and a lower layer file system in an execution environment, wherein the upper layer is a file required by a program, the lower layer is other files, firstly, transmitting the file required by the current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof to the computing nodes through a shared memory when the number of the computing nodes used by the files is less than the threshold value; when the number of the computing nodes used by the file is larger than a threshold value, deploying the execution environment to the computing nodes by using topology-aware P2P transmission;
the container image is divided into an upper layer and a lower layer, wherein an execution environment lighter than a container is realized by arranging the upper layer and the lower layer of file systems, only two covering file system layers are used for avoiding space overhead of the container image, and the lightweight design also reduces network transmission pressure related to environment deployment; moreover, by setting a threshold according to the number of the computing nodes used by the file, when the number of the computing nodes used by the file is less than the threshold, the application program and the execution environment thereof are directly deployed on the computing nodes through the shared memory; when the number of the computing nodes used by the file is larger than a threshold value, P2P transmission specific to topology perception is optimized, the advantages of different file transmission modes under different scales are reasonably combined, and the network transmission efficiency is improved;
in this embodiment, an experiment is performed, and a P2P transmission method that does not know topology is added in addition to sbcast: the tree shapes of random-P2P, random P2P and P2P with topology perception capability are expected to be completely the same, and the only difference between the random P2P and the topology perception P2P is that the positions of the proxy node and the slave node of the random P2P in the tree structure are random; in this embodiment, 15, 120, 1080, 8760, and 17560 are selected as cluster sizes for testing, and 18mb and 336mb files are transmitted respectively;
deployment times (in seconds) for one-to-one, shared storage, sbcast, topology aware P2P, random P2P, and the method of the present invention are as follows in tables 1 and 2:
TABLE 1
TABLE 2
Wherein, table 1 is the deployment time required for transmitting the transmission method of the 18mb file size, and table 2 is the deployment time required for transmitting the transmission method of the 336mb file size;
the data in the table above is made into a line graph, and as seen from fig. 4-5, the deployment time of the one-to-one method increases linearly with the increase of the number of computing nodes, and the efficiency is low; when the number of the calculation nodes of the calculation number exceeds 1080, the activity is stronger when the number of the calculation nodes is smaller based on a sharing storage method, and the activity also depends on the size of the file; however, when the number of compute nodes is small, the shared storage based approach is more efficient than any other approach.
For the three P2P methods, the P2P with the topology-aware function always has the shortest deployment time, and particularly, as the number of computing nodes increases, the advantages of the P2P with the topology-aware function become more and more obvious. In terms of TEES application deployment, when the number of compute nodes reaches 17560, topology-aware P2P is 65% faster than random P2P and 63% faster than sbcast. For container-based deployments, topology-aware P2P is 21% faster than randomized P2P and 25% faster than sbcast.
When the size of the transmitted file is small, the P2P with topology perception has better acceleration, and the P2P with topology perception reduces the time for establishing connection; therefore, in the case of a small-sized file, the topology-aware P2P has a better acceleration effect.
In this embodiment, the network load conditions of each transmission method are tested, that is, the flow is monitored, and the test results are shown in table 3 below.
TABLE 3
From the above table, it is known that the one-to-one method and the shared storage based method have similar network load; the difference between the two is that in the one-to-one approach, the logging node experiences a great network pressure; in the shared storage based approach, this network pressure is transferred to the shared storage. Compared with the two methods, the topology-aware P2P method reduces the network load by 75% in the case of large-scale nodes.
In fig. 4, the order of the data of the pie chart is arranged in the order of the number of nodes below.
The network load of random P2P and sbcast is similar, and the topology-aware P2P reduces the network load by more than 85% compared to both methods.
S3: deploying an execution environment;
creating an isolated process tree for each user, when the user exits, only killing a root process of the process tree, using an overlay file system with only two layers, using a node directory as a lower layer of the overlay file system, adding an empty directory as an upper layer of each user, and synchronizing the upper layer of the user to a corresponding computing node when realizing automatic deployment of an execution environment;
the method has the advantages that an isolated process tree is created for each user and can be automatically deployed, so that the burden of manually configuring an execution environment on a computing node by the user is reduced, and the privacy protection of the user is realized.
When the number of the computing nodes is increased, the used capacity of the application program is small, and the sharing storage is obvious in advantage; however, when the number of compute nodes is large, traffic congestion is easily caused even if the file to be transferred is small, and therefore, shared storage has a significant advantage.
S3.1: point-to-point;
setting a list of agent nodes and a list of subordinate nodes of each agent node, and when a user runs an application program, analyzing the node list used by the user application program to generate a tree structure of P2P transmission, wherein a user login node is a root node of a tree;
the computing nodes are divided into proxy nodes and slave nodes.
S3.2, nodes;
the agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the application program node list, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, a utilization rate threshold value is set, the threshold value is 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate nodes are called as isolated nodes, and finally, the isolated nodes are positioned at the last layer of the tree;
in this embodiment, first, a plurality of computing nodes are integrated into one node group. In our sky river platform, there are 8 compute nodes integrated into one node group. However, only one of the 8 nodes (proxy node) has a high-speed network card and is directly connected to the intermediate topology. Therefore, the node has better network performance, and the interaction between the other 7 nodes and the intermediate topology needs to pass through the proxy node.
In view of this particular intra-cluster topology, we have designed P2P with topology awareness capabilities. We maintain a list of proxy nodes and a list of slave nodes for each proxy node. When a user submits a job (runs an application), the list of nodes used by the user application will be analyzed to generate a tree structure for P2P transmission. The user's login node is considered the root node of the tree.
S3.3: transferring:
after a tree structure is created on a login node, the tree structure is transmitted to a next layer of proxy nodes while a file is transmitted, then each proxy node finds a child node of each proxy node according to the tree structure to continue transmission, and waits for a signal of completion of transmission; when the agent node receives the transmission completion signals of all the child nodes, the agent node generates the transmission completion signals and returns the transmission completion signals to the parent node of the agent node; finally, after the login agent node receives the confirmation signal from the first layer agent node, namely the whole transmission process is finished, the temporarily occupied agent node is set to be in an idle state;
s4: quick response;
s4.1, starting in advance;
if the dependent files of the application program appear on the upper layer, adding the files to the emergency part; the rest files of the upper file system of the user are lag parts; after the transmission of the emergency part is finished, directly starting an execution environment on a corresponding computing node to start an application program;
s4.2: and (3) delayed transmission:
transmitting a file of a lag part, and establishing a function performance model for high-performance calculation;
the system comprises a topology-aware execution environment service, a proxy node, a subordinate node, a private execution environment and a service management server, wherein the topology-aware execution environment service is used for deploying the application program quickly and agilely in high-performance computing, and the proxy node and the subordinate node are distinguished by generating a tree structure, so that a private execution environment is provided for each user in a high-performance computing system, and the quick automatic deployment and the execution process of the application program are realized; and a P2P method based on topology perception is designed to reduce deployment time, and in the method, a mechanism of step transmission and early start is also provided to reduce the start delay of the application program.
In this embodiment, when a user logs in to a login node, a TEES daemon on the login node will start an execution environment for the user. The user will log into this private execution environment. In such an execution environment, the user can directly use the standard system environment. At the same time, the user has the right to do any customization. The user is free to develop and debug applications and configure the environment. All this occurs in the upper file system. The changes made by the user do not affect the real system environment.
When a user launches an application by submitting it to a resource management system, such as SLURM and PBS, the following functions will be performed for the daemon on the logging node: the list of computing nodes is analyzed, a transmission method (P2P based on shared storage or based on topology awareness) is selected, and if the P2P method based on topology awareness is used, a tree structure is generated. Meanwhile, the daemon process will perform dependency analysis on the user application and divide the upper layer into an urgent part and a hysteresis part. The urgent part will then be immediately deployed on the compute node. An execution environment is started on each node and an application is run. Finally, a hysteresis portion is passed.
Only application development and environment configuration are needed on the login node; then, the application program can be directly run on the computing node through the resource management system; the whole deployment and starting process is very fast, and good user experience can be generated.
In this embodiment, the start-up delay time (unit second) of the execution environment is detected, and the detection result is as follows:
TABLE 4
In the above table, in the method of the present invention, in order to combine topology aware P2P and shared storage, the transmission method of the present invention implements deployment and startup of a typical application program on 17560 computing nodes within 3 seconds. The container-based method uses an sbcast deployment method, and the application program starting delay is about 30 seconds, which is 20 times slower than the method of the invention; the P2P method with topology awareness used in the container-based method can reduce the startup delay by about 25%.
S5: automatic performance modeling;
s5.1: modeling in a segmented mode;
performance of the real dataset C = [ C ] by traversing the function 1 ,C 2 ,...,C i ,...,C n ]In which C is i =[X i ,Y i ]N is the number of data point pairs, denoted by C i As segmentation points, [ C ] is separately aligned using confidence domain reflection least squares 1 ,C 2 ,...,C i ]And [ C i ,...,C n ]Performing fitting modeling, wherein by calculating the mean square error after segmentation, if the mean square errors of two-segment models selected from n-2 segments are smaller than a threshold value and the mean square error is minimum, the segmentation point is used as an optimal segmentation point, a segmentation performance model established by the segmentation point is used as a performance model of a function, and the performance model of the function is applied to each calculation node;
the method comprises the steps of firstly setting an execution environment, serving through the topology-aware execution environment, deploying an application program quickly and agilely in high-performance calculation, building a function performance model of the high-performance calculation by generating a tree structure and distinguishing agent nodes and subordinate nodes, and then performing segmented modeling by traversing a function performance real data set, so that full-coverage modeling in the high-performance calculation is realized, a performance model with accurate functions is built, the calculation behaviors of the program are comprehensively and finely depicted, the accuracy of the model is effectively improved, the data collection efficiency and accuracy of the high-performance calculation are improved by improving the accuracy of the model, and the defects of low data collection speed and disordered collection are overcome by matching with the execution environment.
S5.2: circulating;
if the mean square errors of two sections of models selected from the n-2 sections are larger than the threshold value, repeating S5.1;
if any one section of the mean square error of the two sections of the model selected from the n-2 sections is smaller than the threshold value, S6 is carried out;
s6: data collection:
data collection is performed using each computing node to which the performance model is applied.
Preferably, in S3, the tree width is set according to the performance of each compute node.
Preferably, the user has the right to self-define the execution environment of the user, and simultaneously, each node runs a daemon process which is only responsible for starting, stopping and deleting the execution environment and executing P2P transmission;
the user self-defines the execution environment and sets the daemon process, so that the reliability of the execution environment is improved.
Preferably, when P2P transmission is in operation, if some nodes fail, transmission will be prevented in the P2P transmission tree; after time out, reporting the transmission path of the network fault to a root node;
the P2P transmission can also be used as an auxiliary tool for monitoring the network state through the P2P transmission operation.
Preferably, in S6.1, the mean square error is obtained by calculating a mean of a sum of squares after the difference between the predicted value and the true value is made, and the calculation formula is as follows:
MSE = E (predicted value-true value) 2 ;
Wherein MSE is mean square error, E is averaging symbol;
the mean square error can better reflect the deviation between the predicted value and the true value.
Preferably, the topology-aware P2P includes a multi-layer tree structure constructed by a plurality of multi-port routers, and the switching chip ports of the routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of directly connected ports is located is provided with a first router formed by a router, and the router group of each of the rest layers is provided with n +2 router numbers, wherein n is the layer number;
the first router group and the plurality of router groups are arranged, and the rest router groups with more two routers than the layer number are arranged, so that the deployment time of the topology-aware P2P during file transmission is better shortened, and the starting delay of an application program is reduced.
Preferably, in S4.2, when the file needs to be read and written, the existence of the file is determined first, and if the file exists, the natural system call is directly entered; if not, the file is still transmitted in the lag part and is called after the transmission is finished.
Preferably, in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unallocated state for ready access.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention should be determined by the scope of the claims.
Claims (8)
1. A data collection method for improving high performance computing applications, characterized by: the method comprises the following steps:
s1: setting an execution environment;
a private execution environment is created for a user by using a layered file system and process isolation, the execution environment is started for the user when the user logs in, the execution environment is automatically deployed when the user runs an application program, and the execution environment comprises a login node, a computing node, a shared memory, a file system and a topology-aware P2P;
s2: deploying computing nodes;
setting an upper layer and a lower layer file system in an execution environment, wherein the upper layer is a file required by a program, the lower layer is other files, firstly, transmitting the file required by the current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof to the computing nodes through a shared memory when the number of the computing nodes used by the files is less than the threshold value; when the number of the computing nodes used by the file is larger than a threshold value, deploying the execution environment to the computing nodes by using topology-aware P2P transmission;
s3: deploying an execution environment;
creating an isolated process tree for each user, when the user exits, only killing a root process of the process tree, using an overlay file system with only two layers, using a node directory as a lower layer of the overlay file system, adding an empty directory as an upper layer of each user, and synchronizing the upper layer of the user to a corresponding computing node when realizing automatic deployment of an execution environment;
s3.1: point-to-point;
setting a list of agent nodes and a list of subordinate nodes of each agent node, and when a user runs an application program, analyzing the node list used by the user application program to generate a tree structure of topology-aware P2P transmission, wherein a user login node is a root node of the tree;
s3.2, nodes;
the agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the application program node list, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, a utilization rate threshold value is set, the threshold value is 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate nodes are adjusted to be isolated nodes, and finally the last layer of isolated nodes in the tree;
s3.3: transferring:
after a tree structure is created on a login node, the tree structure is transmitted to a next layer of proxy nodes while a file is transmitted, then each proxy node finds a child node of each proxy node according to the tree structure to continue transmission, and waits for a signal of completion of transmission; after the agent node receives the transmission completion signals of all the child nodes, the agent node generates a transmission completion signal and returns the transmission completion signal to the father node of the agent node; finally, after the login node receives the confirmation signal from the first-layer proxy node, namely the whole transmission process is finished, the temporarily occupied proxy node is set to be in an idle state;
s4: quick response;
s4.1, starting in advance;
if the dependent files of the application appear on the upper layer, adding the files to the urgent part of the upper-layer file system of the user; the rest files of the upper file system of the user are lag parts; after the transmission of the emergency part is finished, directly starting an execution environment on a corresponding computing node to start an application program;
s4.2: and (3) delayed transmission:
transmitting a file of a lag part, and establishing a function performance model for high-performance calculation;
s5: automatic performance modeling;
s5.1: modeling in a segmented mode;
performance of the real dataset C = [ C ] by traversing the function 1 ,C 2 ,...,C i ,...,C n ]In which C is i =[X i ,Y i ]N is the number of data point pairs, denoted by C i As segmentation points, [ C ] is separately aligned using confidence domain reflection least squares 1 ,C 2 ,...,C i ]And [ C i ,...,C n ]Performing fitting modeling, calculating the mean square error after segmentation, and if the mean square errors of two segment models selected from n-2 segments are smaller than a threshold value and the mean square error is minimum, taking the segment point as an optimal segment point, and building the optimal segment point by using the segment pointTaking the vertical sectional performance model as the performance model of the function, and applying the performance model of the function to each computing node;
s5.2: circulating;
if the mean square errors of two sections of models selected from the n-2 sections are larger than the threshold value, repeating S5.1; if any one section of the mean square error of the two sections of the model selected from the n-2 sections is smaller than the threshold value, S6 is carried out;
s6: data collection:
data collection is performed using each computing node to which the performance model is applied.
2. A method of data collection for improving high performance computing applications according to claim 1, wherein: in S3, the tree width is set according to the performance of each computing node.
3. A method of data collection for improving high performance computing applications according to claim 2, wherein: the user can customize the execution environment, and simultaneously, each agent node runs a daemon process which is only responsible for starting, stopping and deleting the execution environment and executing topology-aware P2P transmission.
4. A method of data collection for improving high performance computing applications according to claim 3, wherein: when the P2P transmission works, if a node fails, the transmission is prevented in a topology-aware P2P transmission tree; after the timeout, the transmission path of the network failure will be reported to the root node.
5. A method of data collection for improving high performance computing applications according to claim 1, wherein: in S6.1, after the mean square error is obtained by subtracting the predicted value from the true value, the mean value of the sum of squares is obtained, and the calculation formula is as follows:
MSE = E (predicted value-true value) 2 ;
Where MSE is the mean square error and E is the averaging sign.
6. A method of data collection for improving high performance computing applications according to claim 5, wherein: the topology-aware P2P comprises a multi-layer tree structure constructed by a plurality of multi-port routers, and the switching chip ports of the routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of directly connected ports is located is provided with a first router formed by a router, and the router group of each of the rest layers is provided with n +2 router numbers, wherein n is the number of the layers.
7. A method of data collection for improving high performance computing applications according to claim 1, wherein: in S4.2, when the file needs to be read and written, the existence of the file is firstly determined, and if the file exists, the natural system call is directly entered; if not, the file is still transmitted in the lag part and is called after the transmission is finished.
8. A method of data collection for improving high performance computing applications according to claim 1, wherein: in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unallocated state for ready access.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211435481.XA CN115834594B (en) | 2022-11-16 | 2022-11-16 | Data collection method for improving high-performance computing application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211435481.XA CN115834594B (en) | 2022-11-16 | 2022-11-16 | Data collection method for improving high-performance computing application |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115834594A true CN115834594A (en) | 2023-03-21 |
CN115834594B CN115834594B (en) | 2024-04-19 |
Family
ID=85528528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211435481.XA Active CN115834594B (en) | 2022-11-16 | 2022-11-16 | Data collection method for improving high-performance computing application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115834594B (en) |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1747446A (en) * | 2005-10-21 | 2006-03-15 | 清华大学 | Application layer group broadcasting method with integrated type and distributed type combination |
US20080144548A1 (en) * | 2006-12-14 | 2008-06-19 | Elster Electricity, Llc | Optimization of redundancy and throughput in an automated meter data collection system using a wireless network |
CN101883039A (en) * | 2010-05-13 | 2010-11-10 | 北京航空航天大学 | Data transmission network of large-scale clustering system and construction method thereof |
CN104022911A (en) * | 2014-06-27 | 2014-09-03 | 哈尔滨工业大学 | Content route managing method of fusion type content distribution network |
CN104380660A (en) * | 2012-04-13 | 2015-02-25 | 思杰系统有限公司 | Systems and methods for trap monitoring in multi-core and cluster systems |
CN110022299A (en) * | 2019-03-06 | 2019-07-16 | 浙江天脉领域科技有限公司 | A kind of method of ultra-large distributed network computing |
CN110177020A (en) * | 2019-06-18 | 2019-08-27 | 北京计算机技术及应用研究所 | A kind of High-Performance Computing Cluster management method based on Slurm |
CN110866046A (en) * | 2019-10-28 | 2020-03-06 | 北京大学 | Extensible distributed query method and device |
CN110990448A (en) * | 2019-10-28 | 2020-04-10 | 北京大学 | Distributed query method and device supporting fault tolerance |
CN111046065A (en) * | 2019-10-28 | 2020-04-21 | 北京大学 | Extensible high-performance distributed query processing method and device |
CN111131146A (en) * | 2019-11-08 | 2020-05-08 | 北京航空航天大学 | Multi-supercomputing center software system deployment and incremental updating method in wide area environment |
CN112055048A (en) * | 2020-07-29 | 2020-12-08 | 北京智融云河科技有限公司 | P2P network communication method and system for high-throughput distributed account book |
CN112445675A (en) * | 2019-09-02 | 2021-03-05 | 无锡江南计算技术研究所 | Large-scale parallel program performance data rapid collection method based on layer tree network |
CN113285457A (en) * | 2021-05-19 | 2021-08-20 | 山东大学 | Distributed economic dispatching method and system for regional power system under non-ideal communication |
CN113630269A (en) * | 2021-07-29 | 2021-11-09 | 中国人民解放军国防科技大学 | Topology-aware-based high-performance computing system operating environment deployment acceleration method and system |
CN114079567A (en) * | 2020-08-21 | 2022-02-22 | 东北大学秦皇岛分校 | Block chain-based universal IP tracing system and method |
US20220321480A1 (en) * | 2021-03-25 | 2022-10-06 | Itron, Inc. | Enforcing access to endpoint resources |
-
2022
- 2022-11-16 CN CN202211435481.XA patent/CN115834594B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1747446A (en) * | 2005-10-21 | 2006-03-15 | 清华大学 | Application layer group broadcasting method with integrated type and distributed type combination |
US20080144548A1 (en) * | 2006-12-14 | 2008-06-19 | Elster Electricity, Llc | Optimization of redundancy and throughput in an automated meter data collection system using a wireless network |
CN101883039A (en) * | 2010-05-13 | 2010-11-10 | 北京航空航天大学 | Data transmission network of large-scale clustering system and construction method thereof |
CN104380660A (en) * | 2012-04-13 | 2015-02-25 | 思杰系统有限公司 | Systems and methods for trap monitoring in multi-core and cluster systems |
CN104022911A (en) * | 2014-06-27 | 2014-09-03 | 哈尔滨工业大学 | Content route managing method of fusion type content distribution network |
CN110022299A (en) * | 2019-03-06 | 2019-07-16 | 浙江天脉领域科技有限公司 | A kind of method of ultra-large distributed network computing |
CN110177020A (en) * | 2019-06-18 | 2019-08-27 | 北京计算机技术及应用研究所 | A kind of High-Performance Computing Cluster management method based on Slurm |
CN112445675A (en) * | 2019-09-02 | 2021-03-05 | 无锡江南计算技术研究所 | Large-scale parallel program performance data rapid collection method based on layer tree network |
CN110990448A (en) * | 2019-10-28 | 2020-04-10 | 北京大学 | Distributed query method and device supporting fault tolerance |
CN111046065A (en) * | 2019-10-28 | 2020-04-21 | 北京大学 | Extensible high-performance distributed query processing method and device |
CN110866046A (en) * | 2019-10-28 | 2020-03-06 | 北京大学 | Extensible distributed query method and device |
CN111131146A (en) * | 2019-11-08 | 2020-05-08 | 北京航空航天大学 | Multi-supercomputing center software system deployment and incremental updating method in wide area environment |
CN112055048A (en) * | 2020-07-29 | 2020-12-08 | 北京智融云河科技有限公司 | P2P network communication method and system for high-throughput distributed account book |
CN114079567A (en) * | 2020-08-21 | 2022-02-22 | 东北大学秦皇岛分校 | Block chain-based universal IP tracing system and method |
US20220321480A1 (en) * | 2021-03-25 | 2022-10-06 | Itron, Inc. | Enforcing access to endpoint resources |
CN113285457A (en) * | 2021-05-19 | 2021-08-20 | 山东大学 | Distributed economic dispatching method and system for regional power system under non-ideal communication |
CN113630269A (en) * | 2021-07-29 | 2021-11-09 | 中国人民解放军国防科技大学 | Topology-aware-based high-performance computing system operating environment deployment acceleration method and system |
Non-Patent Citations (3)
Title |
---|
DAMIA CASTELLÀ, HECTOR BLANCO, FRANCESC GINÉ & FRANCESC SOLSONA: "A Computing Resource Discovery Mechanism over a P2P Tree Topology", 《SPRINGERLINK》, 31 December 2010 (2010-12-31) * |
SHIVANGI SURATI A, DEVESH C. JINWALA A, SANJAY GARG B: "A survey of simulators for P2P overlay networks with a case study of the P2P tree overlay using an event-driven simulator", 《SCIENCEDIRECT》, 10 August 2017 (2017-08-10) * |
夏中: "基于对等网络的视频调度策略研究与实现", 《中国优秀硕士学位论文全文数据库》, 15 March 2022 (2022-03-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN115834594B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7502850B2 (en) | Verifying resource functionality before use by a grid job submitted to a grid environment | |
US9479395B2 (en) | Model framework to facilitate robust programming of distributed workflows | |
US11226753B2 (en) | Adaptive namespaces for multipath redundancy in cluster based computing systems | |
US10826812B2 (en) | Multiple quorum witness | |
CN106506201A (en) | VNF moving methods, MANO and system | |
US7840834B2 (en) | Multi-directional fault detection system | |
US9736046B1 (en) | Path analytics using codebook correlation | |
US11210150B1 (en) | Cloud infrastructure backup system | |
CN109144813A (en) | A kind of cloud computing system server node fault monitoring system and method | |
CN113285822A (en) | Data center troubleshooting mechanism | |
US11546224B2 (en) | Virtual network layer for distributed systems | |
CN110417864A (en) | A kind of method and system determining monitoring configuration file based on network mapping tool | |
US10656988B1 (en) | Active monitoring of packet loss in networks using multiple statistical models | |
CN118214648A (en) | Dual-computer hot standby management method and computing equipment | |
CN110263371B (en) | IMA dynamic reconstruction process configuration path generation method based on AADL | |
CN115150253B (en) | Fault root cause determining method and device and electronic equipment | |
CN115834594B (en) | Data collection method for improving high-performance computing application | |
CN111258718A (en) | High-availability service testing method and system based on virtualization platform | |
US11729082B2 (en) | Techniques for providing inter-cluster dependencies | |
Corsava et al. | Intelligent architecture for automatic resource allocation in computer clusters | |
US20230153725A1 (en) | Techniques for determining service risks and causes | |
US9372816B2 (en) | Advanced programmable interrupt controller identifier (APIC ID) assignment for a multi-core processing unit | |
Kitamura et al. | Development of a Server Management System Incorporating a Peer-to-Peer Method for Constructing a High-availability Server System | |
AT&T | ||
US7826379B2 (en) | All-to-all sequenced fault detection system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |