CN115834594B - Data collection method for improving high-performance computing application - Google Patents

Data collection method for improving high-performance computing application Download PDF

Info

Publication number
CN115834594B
CN115834594B CN202211435481.XA CN202211435481A CN115834594B CN 115834594 B CN115834594 B CN 115834594B CN 202211435481 A CN202211435481 A CN 202211435481A CN 115834594 B CN115834594 B CN 115834594B
Authority
CN
China
Prior art keywords
node
transmission
execution environment
nodes
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211435481.XA
Other languages
Chinese (zh)
Other versions
CN115834594A (en
Inventor
甘润东
龙玉江
王策
李洵
卫薇
卢仁猛
钟掖
龙娜
陈卿
陈利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211435481.XA priority Critical patent/CN115834594B/en
Publication of CN115834594A publication Critical patent/CN115834594A/en
Application granted granted Critical
Publication of CN115834594B publication Critical patent/CN115834594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data collection method for improving high-performance computing application, which relates to the field of high-performance computing.

Description

Data collection method for improving high-performance computing application
Technical Field
The invention relates to a data collection method for improving high-performance computing application, and belongs to the technical field of high-performance computing.
Background
High performance computing refers to computing systems and environments that typically use many processors or several computers organized in a cluster, with many types of HPC systems ranging from large clusters of standard computers to highly specialized hardware.
High Performance Computing (HPC) systems have more and more computing resources. The service range is gradually enlarged, the user group is increasingly complex, and the trend of diversification of the user demands is more and more prominent.
A method and apparatus for improving performance data collection for high performance computing applications disclosed in chinese patent application publication No. CN111611125A, which includes: a performance data comparator of the source node for collecting performance data of an application of the source node from the host fabric interface at a polling frequency; an interface for transmitting a write back instruction to the host fabric interface, the write back instruction for causing data to be written to a memory address location of the source node to trigger the wake mode; and a frequency selector for: initiating a polling frequency to a first polling frequency for a sleep mode; and increasing the polling frequency to a second polling frequency in response to the data in the memory address location identifying the wake mode. In the process of file transmission, the high-performance calculation related in the patent has the defects of slow deployment speed, slow starting speed of an application program, higher starting delay and lower data collection efficiency and accuracy in the process of data collection due to the continuous increase of calculation nodes and difficult environment configuration.
Disclosure of Invention
The invention aims to solve the technical problems that: the data collection method for improving high-performance computing application solves the problems that the deployment speed is low, the starting speed of an application program is low, the starting delay is high, and the data collection efficiency and accuracy are low in the data collection process due to the fact that computing nodes are continuously increased and environment configuration is difficult.
The technical scheme adopted by the invention is as follows: a data collection method for improving high performance computing applications, the method comprising the steps of:
S1: setting an execution environment;
Creating a private execution environment for a user by using a layered file system and process isolation, starting the execution environment for the user when the user logs in, and automatically deploying the execution environment when the user runs an application program, wherein the execution environment comprises a login node, a computing node, a shared memory, a file system and topology perception P2P;
s2: deploying computing nodes;
Setting an upper file system and a lower file system by an execution environment, wherein the upper file system is a file required by a program, the lower file system is other files, firstly, transmitting the file required by a current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof on the computing nodes through a shared memory when the number of the computing nodes used by the files is smaller than the threshold value; when the number of computing nodes used by the file is greater than a threshold, optimizing P2P transmissions for a particular topology awareness;
By arranging the file systems of the upper layer and the lower layer, an execution environment lighter than a container is realized, and space overhead of a container image is avoided by using only two covered file system layers, so that network transmission pressure related to environment deployment is reduced by the lightweight design; setting a threshold according to the number of computing nodes used by the file, and directly deploying the application program and the execution environment thereof on the computing nodes through the shared memory when the number of the computing nodes used by the file is smaller than the threshold; when the number of the computing nodes used by the file is larger than a threshold value, the execution environment is deployed to the computing nodes by using topology-aware P2P transmission, so that the advantages of different file transmission modes under different scales are reasonably combined, and the network transmission efficiency is improved;
S3: deploying an execution environment;
Creating an isolated process tree for each user, only killing the root process of the process tree when the user exits, using a cover file system with only two layers, using node directories as the lower layer of the cover file system, superposing an empty directory as the upper layer of each user, and synchronizing the upper layers of the users to corresponding computing nodes when the automatic deployment of an execution environment is realized;
The method comprises the steps of establishing an isolated process tree for each user, automatically deploying, manually configuring an execution environment on a computing node by the user, and protecting privacy of the user;
When the number of the computing nodes is increased, the capacity used by the application program is small, and the shared storage has obvious advantages; however, when the number of the computing nodes is large, even if the transmitted file is small, traffic congestion is easy to cause, so that the shared storage has obvious advantages;
S3.1: point-to-point;
Setting a list of proxy nodes and a list of subordinate nodes of each proxy node, and analyzing a node list used by a user application program to generate a P2P transmission tree structure when the user runs the application program, wherein a user login node is a root node of the tree;
The computing nodes are divided into proxy nodes and subordinate nodes.
S3.2, nodes;
The agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the node list of the application program, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, the utilization rate threshold value is set to be 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate nodes are adjusted to be isolated nodes, and finally the last layer of isolated node in the tree;
s3.3: and (3) transmission:
After the tree structure is created on the login node, transmitting the tree structure to the next layer of proxy nodes while transmitting the file, and then each proxy node finds its child node according to the tree structure to continue transmission and waits for a signal that the transmission is completed; when the proxy node receives the transmission completion signals of all the child nodes, the proxy node generates the transmission completion signals and returns the transmission completion signals to the parent node; finally, after the login proxy node receives the confirmation signal from the first layer proxy node, namely the whole transmission process is completed, setting the temporarily occupied proxy node into an idle state;
S4: a quick response;
S4.1, starting in advance;
if the dependent files of the application program appear in the upper layer, adding the files to the urgent part; the rest files of the upper file system of the user are hysteresis parts; when the transmission of the emergency part is completed, directly starting an execution environment on the corresponding computing node to start the application program;
s4.2: hysteresis transmission:
Transmitting the file of the hysteresis part, and establishing a function performance model of high-performance calculation;
The system comprises a topology-aware execution environment service, a tree structure generation module, a proxy node generation module, a subordinate node generation module and a control module, wherein the topology-aware execution environment service is used for quick and quick application program deployment in high-performance computing, the proxy node and the subordinate node are distinguished by generating the tree structure, a private execution environment is provided for each user in the high-performance computing system, and quick and automatic deployment of the application program and an execution process of the application program are realized; in addition, a P2P method based on topology perception is designed to reduce deployment time, and in the method, a mechanism of step-by-step transmission and early starting is also provided to reduce starting delay of an application program, so that compared with the traditional container-based application program deployment, the method has the advantages of higher speed and effectively reduced network load;
S5: automatic performance modeling;
s5.1: modeling in a sectional mode;
through traversing the real data set C= [ C 1,C2,...,Ci,...,Cn ] of function performance, wherein C i=[Xi,Yi ] and n are the number of data point pairs, C i is taken as a segmentation point, [ C 1,C2,...,Ci ] and [ C i,...,Cn ] are respectively subjected to fitting modeling by using a trust domain reflection least square method, if the mean square error of two segments of models selected from n-2 segments is smaller than a threshold value and the mean square error is minimum through calculating the mean square error after segmentation, the segmentation point is taken as an optimal segmentation point, a segmentation performance model established by the segmentation point is taken as a performance model of the function, and the performance model of the function is applied to each calculation node;
The method comprises the steps of firstly setting an execution environment, serving through a topology-aware execution environment, deploying an application program for quick and agile operation in high-performance calculation, creating a function performance model of the high-performance calculation by generating a tree structure and distinguishing proxy nodes and subordinate nodes, and then carrying out sectional modeling by traversing a function performance real data set, so that full coverage modeling in the high-performance calculation is realized, an accurate performance model of each function is created, the calculation behaviors of the program are comprehensively and finely depicted, the accuracy of the model is effectively improved, the data collection efficiency and accuracy of the high-performance calculation are also improved, and the defects of slow data collection speed and collection confusion are overcome by matching with the execution environment;
S5.2: cycling;
If the mean square error of two sections selected from the n-2 sections is greater than a threshold value, repeating S5.1;
if any section of the mean square error of the two sections selected from the n-2 sections is smaller than a threshold value, S6 is carried out;
S6: and (3) data collection:
And collecting data by utilizing each computing node to which the performance model is applied.
Preferably, in S3 above, the tree width is set according to the performance of each computing node.
Preferably, the user has the right to customize own execution environment, and each node runs a daemon which is only responsible for starting, stopping and deleting the execution environment and executing P2P transmission;
the user self-defines the execution environment and sets the daemon, so that the reliability of the execution environment is improved.
Preferably, in operation, if some nodes fail, the P2P transmission will be prevented from being transmitted in the P2P transmission tree; after timeout, reporting the transmission path of the network failure to the root node;
the P2P transmission may also be used as an auxiliary tool for network status monitoring through the P2P transmission work.
Preferably, in S5.1, the mean square error is obtained by taking the difference between the predicted value and the true value and then summing the squares, and the calculation formula is as follows:
Mse=e (predicted value-true value) 2;
Wherein MSE is mean square error, E is average symbol;
the mean square error can better reflect the deviation between the predicted value and the true value.
Preferably, the topology aware P2P includes a multi-layer tree structure constructed by a plurality of multi-port routers, and the ports of the switching chips of the routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of ports is located is provided with a first router formed by one router, and the router groups of each other layer are provided with n+2 router numbers, wherein n is the layer number;
By setting the first router group and the plurality of router groups and setting the rest router groups with two routers more than the layer number, the topology aware P2P deployment time during file transmission is reduced, and the starting delay of an application program is reduced.
Preferably, in the above step S4.2, when a file needs to be read or written, the existence of the file is determined first, and if the existence exists, a natural system call is directly entered; if not, the file is still transmitted in the hysteresis part and is called after the transmission is completed.
Preferably, in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unassigned state for access at any time.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) The invention firstly establishes the execution environment, uses the topology-aware execution environment service for quick and agile application program deployment in high-performance computation, establishes the function performance model of the high-performance computation by generating a tree structure and distinguishing proxy nodes and subordinate nodes, and then carries out sectional modeling by traversing the function performance real data set, thereby realizing full coverage modeling in the high-performance computation, establishing the accurate performance model of each function, comprehensively and finely describing the computation behavior of the program, effectively improving the accuracy of the model, improving the data collection efficiency and accuracy of the high-performance computation, and improving the defects of slow data collection speed and collection confusion by matching with the execution environment;
2) According to the invention, by arranging the upper file system and the lower file system, a lighter-weight execution environment than a container is realized, and only two covered file system layers are used for avoiding the space overhead of the container image, and the lighter-weight design also reduces the network transmission pressure related to environment deployment; setting a threshold according to the number of computing nodes used by the file, and directly deploying the application program and the execution environment thereof on the computing nodes through the shared memory when the number of the computing nodes used by the file is smaller than the threshold; when the number of the computing nodes used by the file is larger than a threshold value, optimizing the P2P transmission aiming at specific topology perception, reasonably combining the advantages of different file transmission modes under different scales, and improving the network transmission efficiency;
3) The invention is used for rapidly and agilely deploying the application program in high-performance computing through the topology-aware execution environment service, and provides a private execution environment for each user in the high-performance computing system by generating a tree structure and distinguishing the proxy node from the subordinate node, thereby realizing rapid and automatic deployment of the application program and the execution process thereof; in addition, a P2P method based on topology perception is designed to reduce deployment time, and in the method, a mechanism of step-by-step transmission and early starting is also provided to reduce starting delay of an application program, so that compared with the traditional container-based application program deployment, the method is faster and can effectively reduce network load.
Drawings
FIG. 1 is a flow chart of a data collection method for improving high performance computing applications of the present invention;
FIG. 2 is a block diagram of an execution environment for a data collection method for improving high performance computing applications in accordance with the present invention;
FIG. 3 is a tree structure diagram of a topology aware P2P for improving the data collection method of high performance computing applications of the present invention;
FIG. 4 is a graph of experimental results of an execution environment transmission 18mb file size deployment time according to the present invention;
FIG. 5 is a diagram showing experimental results of a 336mb file size deployment time transmitted by an execution environment according to the present invention;
FIG. 6 is a transmission mode network load state of the data collection method for improving high performance computing applications of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
Example 1: as shown in fig. 1-6, the data collection method of the present embodiment for improving high performance computing applications includes the following steps;
S1: setting an execution environment;
Creating a private execution environment for a user by using a layered file system and process isolation, starting the execution environment for the user when the user logs in, and automatically deploying the execution environment when the user runs an application program, wherein the execution environment comprises a login node, a computing node, a shared memory, a file system and topology perception P2P;
s2: deploying computing nodes;
Setting an upper file system and a lower file system by an execution environment, wherein the upper file system is a file required by a program, the lower file system is other files, firstly, transmitting the file required by a current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof on the computing nodes through a shared memory when the number of the computing nodes used by the files is smaller than the threshold value; when the number of computing nodes used by the file is greater than a threshold, deploying the execution environment to the computing nodes using topology aware P2P transmission;
By arranging the file systems of the upper layer and the lower layer, an execution environment lighter than a container is realized, and space overhead of a container image is avoided by using only two covered file system layers, so that network transmission pressure related to environment deployment is reduced by the lightweight design; setting a threshold according to the number of computing nodes used by the file, and directly deploying the application program and the execution environment thereof on the computing nodes through the shared memory when the number of the computing nodes used by the file is smaller than the threshold; when the number of the computing nodes used by the file is larger than a threshold value, optimizing the P2P transmission aiming at specific topology perception, reasonably combining the advantages of different file transmission modes under different scales, and improving the network transmission efficiency;
In this embodiment, experiments were performed, and besides sbcast, a topology-unaware P2P transmission method was added: the tree shapes of the random P2P, the random P2P and the P2P with topology perception capability are expected to be identical, and the only difference between the random P2P and the topology perception P2P is that the positions of the proxy nodes and the slave nodes of the random P2P in the tree structure are random; in this embodiment, 15, 120, 1080, 8760 and 17560 are selected as the cluster sizes for testing, and 18mb and 336mb files are transmitted respectively;
The deployment time (in seconds) for one-to-one, shared storage, sbcast, topology aware P2P, random P2P, and methods of the present invention are as follows in tables 1 and 2:
TABLE 1
TABLE 2
Wherein, table 1 is the deployment time required for transmitting a transmission method of 18mb file size, and table 2 is the deployment time required for transmitting a transmission method of 336mb file size;
the data of the table are made into a line graph, and as seen from fig. 4-5, the deployment time of the one-to-one method is linearly increased along with the increase of the number of the computing nodes, so that the efficiency is low; when the number of the calculated nodes of the calculated number exceeds 1080, the method based on shared storage has stronger activity when the number of the calculated nodes is smaller, and also depends on the size of the file; but when the number of compute nodes is small, the shared storage based approach is more efficient than any other approach.
For the three P2P methods, the P2P with the topology sensing function always has the shortest deployment time, and particularly, the advantages of the P2P method with the topology sensing become more and more obvious with the increase of the number of computing nodes. In TEES application deployment, topology aware P2P is 65% faster than random P2P and 63% faster than sbcast when the number of compute nodes reaches 17560. For container-based deployments, topology aware P2P is 21% faster than randomized P2P and 25% faster than sbcast.
When the size of the transmitted file is smaller, the P2P with topology perception has better acceleration, and the P2P with topology perception reduces the time for establishing connection; therefore, in the case of small-sized files, topology aware P2P has better acceleration effect.
In this embodiment, an experiment is performed on the network load status of each transmission method, that is, the flow rate is monitored, and the experimental results are shown in table 3 below.
TABLE 3 Table 3
As can be seen from the above table, the one-to-one method and the shared storage-based method have similar network loads; the difference between the two is that in a one-to-one approach, the login node experiences significant network pressure; in a shared storage based approach, this network pressure is transferred to the shared storage. Compared with the two methods, the topology-aware P2P method reduces the network load by 75% in the case of large-scale nodes.
In fig. 4, the order of the data of the pie chart is arranged in order of the number of nodes below.
The network load of random P2P and sbcast is similar, compared to both methods, topology aware P2P is reduced by more than 85% of the network load.
S3: deploying an execution environment;
Creating an isolated process tree for each user, only killing the root process of the process tree when the user exits, using a cover file system with only two layers, using node directories as the lower layer of the cover file system, superposing an empty directory as the upper layer of each user, and synchronizing the upper layers of the users to corresponding computing nodes when the automatic deployment of an execution environment is realized;
By creating an isolated process tree for each user and enabling automatic deployment, the burden of manually configuring an execution environment on a computing node by the user is reduced, and privacy protection of the user is achieved.
When the number of the computing nodes is increased, the capacity used by the application program is small, and the shared storage has obvious advantages; however, when the number of computing nodes is large, even if the transmitted file is small, traffic congestion is easily caused, and thus, shared storage has a significant advantage.
S3.1: point-to-point;
Setting a list of proxy nodes and a list of subordinate nodes of each proxy node, and analyzing a node list used by a user application program to generate a P2P transmission tree structure when the user runs the application program, wherein a user login node is a root node of the tree;
The computing nodes are divided into proxy nodes and subordinate nodes.
S3.2, nodes;
The agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the node list of the application program, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, the utilization rate threshold value is set to be 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate node is adjusted to be an isolated node, and finally the subordinate node is at the last layer of the isolated node of the tree;
In this embodiment, first, a plurality of computing nodes are integrated into one node group. In our Tianhe platform, 8 computing nodes are integrated into one node group. However, only one of the 8 nodes (the proxy node) has a high-speed network card and is directly connected to the intermediate topology. Thus, the node has better network performance, and interactions between the other 7 nodes and the intermediate topology need to pass through the proxy node.
Considering this special intra-group topology, we have designed P2P with topology aware capabilities. We maintain a list of proxy nodes and a list of slave nodes for each proxy node. When a user submits a job (runs an application), the list of nodes used by the user application will be analyzed to generate a tree structure for the P2P transmission. The login node of the user is considered the root node of the tree.
S3.3: and (3) transmission:
After the tree structure is created on the login node, transmitting the tree structure to the next layer of proxy nodes while transmitting the file, and then each proxy node finds its child node according to the tree structure to continue transmission and waits for a signal that the transmission is completed; when the proxy node receives the transmission completion signals of all the child nodes, the proxy node generates the transmission completion signals and returns the transmission completion signals to the parent node; finally, after the login proxy node receives the confirmation signal from the first layer proxy node, namely the whole transmission process is completed, setting the temporarily occupied proxy node into an idle state;
S4: a quick response;
S4.1, starting in advance;
if the dependent files of the application program appear in the upper layer, adding the files to the urgent part; the rest files of the upper file system of the user are hysteresis parts; when the transmission of the emergency part is completed, directly starting an execution environment on the corresponding computing node to start the application program;
s4.2: hysteresis transmission:
Transmitting the file of the hysteresis part, and establishing a function performance model of high-performance calculation;
The system comprises a topology-aware execution environment service, a tree structure generation module, a proxy node generation module, a subordinate node generation module and a control module, wherein the topology-aware execution environment service is used for quick and quick application program deployment in high-performance computing, the proxy node and the subordinate node are distinguished by generating the tree structure, a private execution environment is provided for each user in the high-performance computing system, and quick and automatic deployment of the application program and an execution process of the application program are realized; in addition, a P2P method based on topology perception is designed to reduce deployment time, and in the method, a mechanism of step-by-step transmission and early starting is also provided to reduce starting delay of an application program, so that compared with the traditional container-based application program deployment, the method is faster and can effectively reduce network load.
In this embodiment, when a user logs into a login node, the TEES daemon on the login node will launch an execution environment for the user. The user will log into this private execution environment. In such an execution environment, the user can directly use a standard system environment. At the same time, the user is entitled to any customization. The user is free to develop and debug applications and configure environments. All of this occurs in the upper level file system. The modifications made by the user do not affect the real system environment.
When a user runs an application by submitting the application to a resource management system, such as SLURM and PBS, the daemon on the login node will perform the following functions: the list of compute nodes is analyzed, a transmission method (based on shared storage or topology-aware based P2P) is selected, and if the topology-aware based P2P method is used, a tree structure is generated. At the same time, the daemon will perform dependency analysis on the user application and divide the upper layers into an urgent part and a hysteresis part. The emergency portion will then be deployed immediately on the compute node. An execution environment is started on each node and an application is run. Finally, the hysteresis is transferred.
Only application development and environment configuration are needed on the login node; then, the application program can directly run on the computing node through the resource management system; the whole deployment and start-up process is very fast, and a good user experience can be produced.
In this embodiment, the start-up delay time (unit seconds) of the execution environment is detected, and the detection results are shown in table 4 below:
TABLE 4 Table 4
In the table above, the method of the invention combines topology aware P2P and shared storage, and the transmission method of the invention achieves deployment and startup of a typical application on 17560 computing nodes within 3 seconds. The container-based approach uses a sbcast deployment approach with an application start-up delay of about 30 seconds, 20 times slower than the present approach; the P2P approach with topology awareness used in the container-based approach may reduce start-up delay by about 25%.
S5: automatic performance modeling;
s5.1: modeling in a sectional mode;
through traversing the real data set C= [ C 1,C2,...,Ci,...,Cn ] of function performance, wherein C i=[Xi,Yi ] and n are the number of data point pairs, C i is taken as a segmentation point, [ C 1,C2,...,Ci ] and [ C i,...,Cn ] are respectively subjected to fitting modeling by using a trust domain reflection least square method, and if the mean square error of two sections of models selected from n-2 sections is smaller than a threshold value and the mean square error is minimum through calculating the mean square error after segmentation, the segmentation point is taken as an optimal segmentation point, a segmentation performance model established by the segmentation point is taken as a performance model of the function, and the performance model of the function is applied to each calculation node;
The method comprises the steps of setting an execution environment, serving through the execution environment perceived by topology, for quick and agile application program deployment in high-performance computing, establishing a function performance model of the high-performance computing by generating a tree structure and distinguishing proxy nodes and subordinate nodes, and then carrying out sectional modeling by traversing a function performance real data set, so that full coverage modeling in the high-performance computing is realized, establishing accurate performance models of all functions, comprehensively and finely describing computing behaviors of programs, effectively improving the accuracy of the models, improving the data collection efficiency and accuracy of the high-performance computing, and improving the defects of slow data collection speed and collection confusion by matching with the execution environment.
S5.2: cycling;
If the mean square error of two sections selected from the n-2 sections is greater than a threshold value, repeating S5.1;
if any section of the mean square error of the two sections selected from the n-2 sections is smaller than a threshold value, S6 is carried out;
S6: and (3) data collection:
And collecting data by utilizing each computing node to which the performance model is applied.
Preferably, in S3, the tree width is set according to the performance of each computing node.
Preferably, the user has the right to customize own execution environment, and each node runs a daemon which is only responsible for starting, stopping and deleting the execution environment and executing P2P transmission;
the user self-defines the execution environment and sets the daemon, so that the reliability of the execution environment is improved.
Preferably, in operation, if some nodes fail, the P2P transmission will be blocked from transmission in the P2P transmission tree; after timeout, reporting the transmission path of the network failure to the root node;
the P2P transmission may also be used as an auxiliary tool for network status monitoring through the P2P transmission work.
Preferably, in S5.1, the mean square of the sum of squares is obtained by taking the difference between the predicted value and the true value, and the calculation formula is as follows:
Mse=e (predicted value-true value) 2;
Wherein MSE is mean square error, E is average symbol;
the mean square error can better reflect the deviation between the predicted value and the true value.
Preferably, the topology aware P2P comprises a multi-layer tree structure constructed by a plurality of multi-port routers, and the ports of the switching chips of the routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of ports is located is provided with a first router formed by one router, and the router groups of each other layer are provided with n+2 router numbers, wherein n is the layer number;
By setting the first router group and the plurality of router groups and setting the rest router groups with two routers more than the layer number, the topology aware P2P deployment time during file transmission is reduced, and the starting delay of an application program is reduced.
Preferably, in S4.2, when a file needs to be read or written, the existence of the file is determined first, and if the existence exists, a natural system call is directly entered; if not, the file is still transmitted in the hysteresis part and is called after the transmission is completed.
Preferably, in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unassigned state for ready access.
The foregoing is merely illustrative of the present invention, and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention, and therefore, the scope of the present invention shall be defined by the scope of the appended claims.

Claims (8)

1. A data collection method for improving high performance computing applications, comprising the steps of;
S1: setting an execution environment;
Creating a private execution environment for a user by using a layered file system and process isolation, starting the execution environment for the user when the user logs in, and automatically deploying the execution environment when the user runs an application program, wherein the execution environment comprises a login node, a computing node, a shared memory, a file system and topology perception P2P;
s2: deploying computing nodes;
Setting an upper file system and a lower file system by an execution environment, wherein the upper file system is a file required by a program, the lower file system is other files, firstly, transmitting the file required by a current application program, then starting the application program, finally, transmitting the rest files, setting a threshold value according to the number of computing nodes used by the files, and directly deploying the application program and the execution environment thereof on the computing nodes through a shared memory when the number of the computing nodes used by the files is smaller than the threshold value; when the number of computing nodes used by the file is greater than a threshold, optimizing P2P transmissions for a particular topology awareness;
S3: deploying an execution environment;
Creating an isolated process tree for each user, only killing the root process of the process tree when the user exits, using a cover file system with only two layers, using node directories as the lower layer of the cover file system, superposing an empty directory as the upper layer of each user, and synchronizing the upper layers of the users to corresponding computing nodes when the automatic deployment of an execution environment is realized;
S3.1: point-to-point;
Setting a list of proxy nodes and a list of subordinate nodes of each proxy node, and analyzing a node list used by a user application program when the user runs the application program to generate a tree structure of topology perception P2P transmission, wherein a user login node is a root node of the tree;
s3.2, nodes;
the agent node is positioned at the top of the tree, if the agent node used by the application program is positioned in a node list of a certain subordinate node, the subordinate node is a child node of the agent node, if the agent node is not positioned in the node list of the application program, the node is in an idle state, the utilization rate of the subordinate node is calculated at the moment, the utilization rate threshold value is set to be 50%, if the utilization rate is more than 50%, the idle agent node is added into the P2P tree, the agent node is temporarily set to be in an allocated state, the subordinate node in the node list is also added as the child node of the agent node in the tree, if the agent node is not idle, the subordinate nodes are adjusted to be isolated nodes, and finally the last layer of isolated node in the tree;
s3.3: and (3) transmission:
After the tree structure is created on the login node, transmitting the tree structure to the next layer of proxy nodes while transmitting the file, and then each proxy node finds its child node according to the tree structure to continue transmission and waits for a signal that the transmission is completed; when the proxy node receives the transmission completion signals of all the child nodes, the proxy node generates the transmission completion signals and returns the transmission completion signals to the parent node; finally, after the login node receives the confirmation signal from the first layer proxy node, namely the whole transmission process is completed, setting the temporarily occupied proxy node into an idle state;
S4: a quick response;
S4.1, starting in advance;
if the dependent files of the application program appear in the upper layer, adding the files to the urgent part; the rest files of the upper file system of the user are hysteresis parts; when the transmission of the emergency part is completed, directly starting an execution environment on the corresponding computing node to start the application program;
s4.2: hysteresis transmission:
Transmitting the file of the hysteresis part, and establishing a function performance model of high-performance calculation;
S5: automatic performance modeling;
s5.1: modeling in a sectional mode;
Through traversing the real data set C= [ C 1,C2,...,Ci,...,Cn ] of function performance, wherein C i=[Xi,Yi ] and n are the number of data point pairs, C i is taken as a segmentation point, [ C 1,C2,...,Ci ] and [ C i,...,Cn ] are respectively subjected to fitting modeling by using a trust domain reflection least square method, if the mean square error of two segments of models selected from n-2 segments is smaller than a threshold value and the mean square error is minimum through calculating the mean square error after segmentation, the segmentation point is taken as an optimal segmentation point, a segmentation performance model established by the segmentation point is taken as a performance model of the function, and the performance model of the function is applied to each calculation node;
S5.2: cycling;
If the mean square error of two sections selected from the n-2 sections is greater than a threshold value, repeating S6.1;
S6: and (3) data collection:
And collecting data by utilizing each computing node to which the performance model is applied.
2. The data collection method for improving high performance computing applications of claim 1, wherein: in S3, the tree width is set according to the performance of each computing node.
3. The data collection method for improving high performance computing applications of claim 2, wherein: the user can customize the execution environment, and each proxy node runs a daemon only responsible for starting, stopping and deleting the execution environment and performing topology aware P2P transmissions.
4. A data collection method for improving high performance computing applications as claimed in claim 3, wherein: when the P2P transmission works, if some nodes fail, the transmission is prevented in a topology aware P2P transmission tree; after timeout, the transmission path of the network failure will be reported to the root node.
5. The data collection method for improving high performance computing applications of claim 4, wherein: and in the step S5.1, the mean value of the square sum is obtained after the mean square error is obtained by taking the difference between the predicted value and the true value, and the calculation formula is as follows:
Mse=e (predicted value-true value) 2;
where MSE is the mean square error and E is the average symbol.
6. The data collection method for improving high performance computing applications of claim 5, wherein: the topology aware P2P comprises a multi-layer tree structure constructed by a plurality of multi-port routers, and the ports of the exchange chips of the routers on every other layer are directly connected; the upper layer of the layer where the router exchange chip with a plurality of ports is located is provided with a first router formed by one router, and the router groups of each other layer are provided with n+2 router numbers, wherein n is the layer number.
7. The data collection method for improving high performance computing applications of claim 6, wherein: in the step S4.2, when the files need to be read and written, the existence of the files is determined firstly, and if the files exist, the natural system call is directly entered; if not, the file is still transmitted in the hysteresis part and is called after the transmission is completed.
8. The data collection method for improving high performance computing applications of claim 7, wherein: in S3.2, if the utilization rate is less than 50%, the idle proxy node is added to the P2P tree, and the proxy node is temporarily set to an unallocated state for access at any time.
CN202211435481.XA 2022-11-16 2022-11-16 Data collection method for improving high-performance computing application Active CN115834594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211435481.XA CN115834594B (en) 2022-11-16 2022-11-16 Data collection method for improving high-performance computing application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211435481.XA CN115834594B (en) 2022-11-16 2022-11-16 Data collection method for improving high-performance computing application

Publications (2)

Publication Number Publication Date
CN115834594A CN115834594A (en) 2023-03-21
CN115834594B true CN115834594B (en) 2024-04-19

Family

ID=85528528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211435481.XA Active CN115834594B (en) 2022-11-16 2022-11-16 Data collection method for improving high-performance computing application

Country Status (1)

Country Link
CN (1) CN115834594B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1747446A (en) * 2005-10-21 2006-03-15 清华大学 Application layer group broadcasting method with integrated type and distributed type combination
CN101883039A (en) * 2010-05-13 2010-11-10 北京航空航天大学 Data transmission network of large-scale clustering system and construction method thereof
CN104022911A (en) * 2014-06-27 2014-09-03 哈尔滨工业大学 Content route managing method of fusion type content distribution network
CN104380660A (en) * 2012-04-13 2015-02-25 思杰系统有限公司 Systems and methods for trap monitoring in multi-core and cluster systems
CN110022299A (en) * 2019-03-06 2019-07-16 浙江天脉领域科技有限公司 A kind of method of ultra-large distributed network computing
CN110177020A (en) * 2019-06-18 2019-08-27 北京计算机技术及应用研究所 A kind of High-Performance Computing Cluster management method based on Slurm
CN110866046A (en) * 2019-10-28 2020-03-06 北京大学 Extensible distributed query method and device
CN110990448A (en) * 2019-10-28 2020-04-10 北京大学 Distributed query method and device supporting fault tolerance
CN111046065A (en) * 2019-10-28 2020-04-21 北京大学 Extensible high-performance distributed query processing method and device
CN111131146A (en) * 2019-11-08 2020-05-08 北京航空航天大学 Multi-supercomputing center software system deployment and incremental updating method in wide area environment
CN112055048A (en) * 2020-07-29 2020-12-08 北京智融云河科技有限公司 P2P network communication method and system for high-throughput distributed account book
CN112445675A (en) * 2019-09-02 2021-03-05 无锡江南计算技术研究所 Large-scale parallel program performance data rapid collection method based on layer tree network
CN113285457A (en) * 2021-05-19 2021-08-20 山东大学 Distributed economic dispatching method and system for regional power system under non-ideal communication
CN113630269A (en) * 2021-07-29 2021-11-09 中国人民解放军国防科技大学 Topology-aware-based high-performance computing system operating environment deployment acceleration method and system
CN114079567A (en) * 2020-08-21 2022-02-22 东北大学秦皇岛分校 Block chain-based universal IP tracing system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8073384B2 (en) * 2006-12-14 2011-12-06 Elster Electricity, Llc Optimization of redundancy and throughput in an automated meter data collection system using a wireless network
US11570111B2 (en) * 2021-03-25 2023-01-31 Itron, Inc. Enforcing access to endpoint resources

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1747446A (en) * 2005-10-21 2006-03-15 清华大学 Application layer group broadcasting method with integrated type and distributed type combination
CN101883039A (en) * 2010-05-13 2010-11-10 北京航空航天大学 Data transmission network of large-scale clustering system and construction method thereof
CN104380660A (en) * 2012-04-13 2015-02-25 思杰系统有限公司 Systems and methods for trap monitoring in multi-core and cluster systems
CN104022911A (en) * 2014-06-27 2014-09-03 哈尔滨工业大学 Content route managing method of fusion type content distribution network
CN110022299A (en) * 2019-03-06 2019-07-16 浙江天脉领域科技有限公司 A kind of method of ultra-large distributed network computing
CN110177020A (en) * 2019-06-18 2019-08-27 北京计算机技术及应用研究所 A kind of High-Performance Computing Cluster management method based on Slurm
CN112445675A (en) * 2019-09-02 2021-03-05 无锡江南计算技术研究所 Large-scale parallel program performance data rapid collection method based on layer tree network
CN110866046A (en) * 2019-10-28 2020-03-06 北京大学 Extensible distributed query method and device
CN111046065A (en) * 2019-10-28 2020-04-21 北京大学 Extensible high-performance distributed query processing method and device
CN110990448A (en) * 2019-10-28 2020-04-10 北京大学 Distributed query method and device supporting fault tolerance
CN111131146A (en) * 2019-11-08 2020-05-08 北京航空航天大学 Multi-supercomputing center software system deployment and incremental updating method in wide area environment
CN112055048A (en) * 2020-07-29 2020-12-08 北京智融云河科技有限公司 P2P network communication method and system for high-throughput distributed account book
CN114079567A (en) * 2020-08-21 2022-02-22 东北大学秦皇岛分校 Block chain-based universal IP tracing system and method
CN113285457A (en) * 2021-05-19 2021-08-20 山东大学 Distributed economic dispatching method and system for regional power system under non-ideal communication
CN113630269A (en) * 2021-07-29 2021-11-09 中国人民解放军国防科技大学 Topology-aware-based high-performance computing system operating environment deployment acceleration method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Computing Resource Discovery Mechanism over a P2P Tree Topology;Damia Castellà, Hector Blanco, Francesc Giné & Francesc Solsona;《SpringerLink》;20101231;全文 *
A survey of simulators for P2P overlay networks with a case study of the P2P tree overlay using an event-driven simulator;Shivangi Surati a, Devesh C. Jinwala a, Sanjay Garg b;《ScienceDirect》;20170810;全文 *
基于对等网络的视频调度策略研究与实现;夏中;《中国优秀硕士学位论文全文数据库》;20220315;全文 *

Also Published As

Publication number Publication date
CN115834594A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
US7502850B2 (en) Verifying resource functionality before use by a grid job submitted to a grid environment
EP3255833B1 (en) Alarm information processing method, relevant device and system
CN112948063B (en) Cloud platform creation method and device, cloud platform and cloud platform implementation system
CN1719415A (en) Method and system for management of a scalable computer system
CN110008005B (en) Cloud platform-based power grid communication resource virtual machine migration system and method
CN108737163B (en) SDN controller application performance analysis method based on OpenFlow protocol
CN111240806A (en) Distributed container mirror image construction scheduling system and method
Engelmann et al. A combinatorial reliability analysis of generic service function chains in data center networks
CN114666335A (en) DDS-based distributed system load balancing device
CN115834594B (en) Data collection method for improving high-performance computing application
CN113965576A (en) Container-based big data acquisition method and device, storage medium and equipment
CN115150253B (en) Fault root cause determining method and device and electronic equipment
Nozomi et al. Unavailability-aware backup allocation model for middleboxes with two-stage shared protection
US9372816B2 (en) Advanced programmable interrupt controller identifier (APIC ID) assignment for a multi-core processing unit
Corsava et al. Intelligent architecture for automatic resource allocation in computer clusters
CN114816914A (en) Data processing method, equipment and medium based on Kubernetes
AT&T
CN111813621A (en) Data processing method, device, equipment and medium based on Flume data middlebox
CN112866009A (en) Virtual network fault diagnosis method and device for integrated service station
Cheng et al. The anomaly detection mechanism using extreme learning machine for service function chaining
Zhang et al. The reliability mapping monitoring method of network function virtualization
CN111209236A (en) Communication method of multistage cascade expander
Kitamura Configuration of a Power-saving High-availability Server System Incorporating a Hybrid Operation Method
CN118214648A (en) Dual-computer hot standby management method and computing equipment
KR20170074663A (en) Method for collecting and storing continuous data and system for the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant