CN111343047A - Method and system for monitoring IB network flow - Google Patents

Method and system for monitoring IB network flow Download PDF

Info

Publication number
CN111343047A
CN111343047A CN202010111136.5A CN202010111136A CN111343047A CN 111343047 A CN111343047 A CN 111343047A CN 202010111136 A CN202010111136 A CN 202010111136A CN 111343047 A CN111343047 A CN 111343047A
Authority
CN
China
Prior art keywords
network
monitoring
cluster
nodes
port
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010111136.5A
Other languages
Chinese (zh)
Inventor
冯岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010111136.5A priority Critical patent/CN111343047A/en
Publication of CN111343047A publication Critical patent/CN111343047A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method and a system for monitoring IB network flow, wherein the method comprises the steps of deploying a server cluster, and allocating an IB network card and a drive for nodes in the cluster; opening IB networking service, and carrying out IB communication of the cluster server; and running parallel software on the multiple nodes, executing a monitoring script on the node to be tested, and acquiring the occupation condition of the IB network flow. The parallel software is operated on the cluster nodes, and the monitoring script is executed on the nodes at the same time to obtain the real-time flow value sent and received by the performance counting port, so that whether the IB network on the current node is occupied or not is judged, the defect that the service condition of the IB network cannot be obtained in the prior art is overcome, and the daily maintenance of the cluster network is facilitated.

Description

Method and system for monitoring IB network flow
Technical Field
The invention relates to the technical field of network monitoring, in particular to a method and a system for monitoring IB network flow.
Background
IB (InfiniBand ) is a computer network communications standard for high performance computing. It has extremely high throughput and extremely low latency for data interconnection from computer to computer. InfiniBand also serves as a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. IB is currently widely used in high Performance computing clusters HPC (high Performance computing) and large-scale storage systems. Under the condition that an example model is large, scientific research tasks in HPC often need a plurality of servers and a plurality of cores to perform parallel computation so as to shorten the computation time. The parallel computation among a plurality of servers can greatly improve the efficiency by carrying out data interaction through an IB network.
The network monitoring tool of the Linux system can only monitor the flow condition of the ethernet network, cannot monitor the real-time flow of the IB network, and can only reflect the use condition of the IB network laterally by means of the test tool of the IB, which causes inconvenience to daily operation and maintenance.
The prior patent application with patent number 201310253119.5 discloses an InfiniBand network detection method, which specifically includes acquiring a first corresponding relationship between a device name and an LID number of a device in an InfiniBand network and a second corresponding relationship between a physical port number and a logical port number of each port in the device; acquiring an LID number of a device where an error port in an InfiniBand network is located and a logical port number of the error port; and acquiring the device name of the device where the error port is located and the physical port number of the error port according to the first corresponding relation, the second corresponding relation, the LID number of the device where the error port is located and the logical port number of the error port. The scheme of the patent application is mainly used for detecting a fault port, and the real-time IB network use condition still cannot be obtained.
Disclosure of Invention
The invention provides a method and a system for monitoring IB network flow, which are used for solving the problem that the operation and maintenance are not changed because the existing scheme cannot acquire the service condition of an IB network in real time.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for monitoring IB network traffic, the method comprising the steps of:
deploying a server cluster, and configuring an IB network card and a drive for nodes in the cluster;
opening IB networking service, and carrying out IB communication of the cluster server;
and running parallel software on the multiple nodes, executing a monitoring script on the node to be tested, and acquiring the occupation condition of the IB network flow.
Further, the execution process of the monitoring script is as follows:
setting the acquisition frequency of data;
respectively reading the number of bytes sent and received by a port of the performance counter based on the frequency;
displaying the reading result line by line;
and judging whether the IB network is occupied or not according to the displayed result.
Further, an ethernet network used for basic communication and management is also deployed in the server cluster.
Further, the specific process of starting the IB networking service and performing IB communication of the cluster server is as follows:
opening opensm service on any node;
and checking the IB network card state, and configuring an IB network configuration file to enable all the devices in the cluster to access the same IB switch.
Further, the node for starting the opensm service is a cluster master node or a management node.
Further, the parallel software comprises a program running by using a server, and the program supports cross-node multi-machine parallel.
Further, the execution process of the monitoring script further includes:
setting a time interval, and respectively counting the number of bytes sent and received by a port of a performance counter in the time interval;
and recording the statistical result according to the statistical time, wherein the statistical result comprises the most value/average value of the number of bytes sent and received by the performance counting port.
A second aspect of the present invention provides a system for monitoring IB network traffic, the system comprising:
the cluster deployment module is used for deploying the server cluster and allocating an IB network card and a drive for nodes in the cluster;
the network configuration module is used for starting IB networking service and carrying out IB communication of the cluster server;
and the flow monitoring module executes a monitoring script on the node to be tested when the parallel software is run on the plurality of nodes, and acquires the occupation condition of the IB network flow.
Further, the flow monitoring module includes:
a setting unit for setting an acquisition frequency of data;
the information acquisition unit is used for respectively reading the byte number sent and received by the port of the performance counter based on the frequency;
the display unit is used for displaying the reading result line by line;
and the judging unit is used for judging whether the IB network is occupied or not according to the displayed result.
Further, the flow monitoring module further includes:
the counting unit is used for setting a time interval and respectively counting the number of bytes sent and received by a port of the performance counter in the time interval;
and the analysis unit is used for recording the statistical result according to the statistical time, wherein the statistical result comprises the most value/average value of the number of bytes sent and received by the performance counting port.
The system for monitoring IB network traffic according to the second aspect of the present invention can implement the methods in the first aspect and the implementation manners of the first aspect, and achieve the same effects.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
1. the parallel software is operated on the cluster nodes, and the monitoring script is executed on the nodes at the same time to obtain the real-time flow value sent and received by the performance counting port, so that whether the IB network on the current node is occupied or not is judged, the defect that the service condition of the IB network cannot be obtained in the prior art is overcome, and the daily maintenance of the cluster network is facilitated.
2. On the basis of obtaining the real-time flow value, further statistical analysis is carried out on the sent and received flow values, the occupancy peak and the average value of the IB network are obtained, and the using condition of the IB network is further known.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the operation of the script of the present invention;
fig. 3 is a schematic diagram of the system of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, a method for monitoring IB network traffic of the present invention includes the following steps:
s1, deploying a server cluster, and configuring an IB network card and a drive for nodes in the cluster;
s2, opening IB networking service, and carrying out IB communication of the cluster server;
and S3, running parallel software on the multiple nodes, executing a monitoring script on the node to be tested, and acquiring the occupation condition of the IB network traffic.
In step S1, taking the deployment of a high performance cluster as an example, multiple servers are needed, each server is equipped with an Infiniband network card, and they are deployed to form a cluster. And installing a Linux system and a corresponding IB driver on each server, and keeping the IB driver versions consistent. The servers in the cluster are also equipped with an Ethernet network for basic communication and management.
In step S2, the specific process of starting the IB networking service and performing IB communication with the cluster server is as follows:
opening opensm service on any node; and checking the IB network card state, and configuring an IB network configuration file, so that all the devices in the cluster are accessed to the same IB switch, and the IB communication of the cluster server is realized. The node for starting the opensm service is a cluster main node or a management node.
In step S3, the parallel software includes a program run by a server, such as a program software used in physics, mathematics, materials science, etc. such as vasp, gaussian, etc., and the program supports cross-node multi-machine parallel.
As shown in fig. 2, the execution process of the monitoring script is as follows: setting the acquisition frequency of data; respectively reading the number of bytes sent and received by a port of the performance counter based on the frequency; displaying the reading result line by line; and judging whether the IB network is occupied or not according to the displayed result.
In order to further understand the usage of the IB network, the following steps are also executed in the monitoring script: setting a time interval, and respectively counting the number of bytes sent and received by a port of a performance counter in the time interval; and recording the statistical result according to the statistical time, wherein the statistical result comprises the most value/average value of the number of bytes sent and received by the performance counting port.
As shown in fig. 3, the system for monitoring IB network traffic of the present invention includes a cluster deployment module 1, a network configuration module 2, and a traffic monitoring module 3.
The cluster deployment module 1 is used for deploying a server cluster and allocating an IB network card and a drive for nodes in the cluster; the network configuration module 2 is used for starting IB networking service and carrying out IB communication of the cluster server; when the flow monitoring module 3 runs parallel software on multiple nodes, the monitoring script is executed on the node to be tested, and the occupation condition of the IB network flow is obtained.
The flow monitoring module 3 includes a setting unit 31, an information obtaining unit 32, a display unit 33, a judging unit 34, a counting unit 35, and an analyzing unit 36.
The setting unit 31 is used for setting the acquisition frequency of data; the information obtaining unit 32 respectively reads the number of bytes sent and received by the port of the performance counter based on the frequency; the display unit 33 is configured to display the reading result line by line; the judgment unit 34 judges whether the IB network is occupied based on the displayed result. The counting unit 35 sets a time interval, and counts the number of bytes sent and received by the port of the performance counter in the time interval respectively; the analysis unit 36 records the statistical result including the maximum/average value of the number of bytes sent and received by the performance count port according to the statistical time.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A method of monitoring IB network traffic, the method comprising the steps of:
deploying a server cluster, and configuring an IB network card and a drive for nodes in the cluster;
opening IB networking service, and carrying out IB communication of the cluster server;
and running parallel software on the multiple nodes, executing a monitoring script on the node to be tested, and acquiring the occupation condition of the IB network flow.
2. The method of monitoring IB network traffic of claim 1, wherein the monitoring script is executed by:
setting the acquisition frequency of data;
respectively reading the number of bytes sent and received by a port of the performance counter based on the frequency;
displaying the reading result line by line;
and judging whether the IB network is occupied or not according to the displayed result.
3. The method of monitoring IB network traffic of claim 2, wherein the server cluster further deploys an ethernet network for basic communication and management.
4. The method for monitoring IB network traffic as claimed in claim 2, wherein the specific process of starting the IB networking service and performing IB communication of the cluster server is as follows:
opening opensm service on any node;
and checking the IB network card state, and configuring an IB network configuration file to enable all the devices in the cluster to access the same IB switch.
5. The method of monitoring IB network traffic of claim 4, wherein the node that turns on opensm services is a cluster master node or a management node.
6. The method of monitoring IB network traffic of claim 2, wherein the parallelization software comprises a program run using a server, the program supporting multi-machine parallelization across nodes.
7. A method of monitoring IB network traffic according to any of claims 2-6, wherein the execution of the monitoring script further comprises:
setting a time interval, and respectively counting the number of bytes sent and received by a port of a performance counter in the time interval;
and recording the statistical result according to the statistical time, wherein the statistical result comprises the most value/average value of the number of bytes sent and received by the performance counting port.
8. A system for monitoring IB network traffic, the system comprising:
the cluster deployment module is used for deploying the server cluster and allocating an IB network card and a drive for nodes in the cluster;
the network configuration module is used for starting IB networking service and carrying out IB communication of the cluster server;
and the flow monitoring module executes a monitoring script on the node to be tested when the parallel software is run on the plurality of nodes, and acquires the occupation condition of the IB network flow.
9. The system for monitoring IB network traffic of claim 8, wherein the traffic monitoring module comprises:
a setting unit for setting an acquisition frequency of data;
the information acquisition unit is used for respectively reading the byte number sent and received by the port of the performance counter based on the frequency;
the display unit is used for displaying the reading result line by line;
and the judging unit is used for judging whether the IB network is occupied or not according to the displayed result.
10. The system for monitoring IB network traffic of claim 9, wherein the traffic monitoring module further comprises:
the counting unit is used for setting a time interval and respectively counting the number of bytes sent and received by a port of the performance counter in the time interval;
and the analysis unit is used for recording the statistical result according to the statistical time, wherein the statistical result comprises the most value/average value of the number of bytes sent and received by the performance counting port.
CN202010111136.5A 2020-02-23 2020-02-23 Method and system for monitoring IB network flow Pending CN111343047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010111136.5A CN111343047A (en) 2020-02-23 2020-02-23 Method and system for monitoring IB network flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010111136.5A CN111343047A (en) 2020-02-23 2020-02-23 Method and system for monitoring IB network flow

Publications (1)

Publication Number Publication Date
CN111343047A true CN111343047A (en) 2020-06-26

Family

ID=71183662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010111136.5A Pending CN111343047A (en) 2020-02-23 2020-02-23 Method and system for monitoring IB network flow

Country Status (1)

Country Link
CN (1) CN111343047A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114268568A (en) * 2021-12-22 2022-04-01 快云信息科技有限公司 Network traffic monitoring method, device and equipment
CN117579522A (en) * 2023-12-19 2024-02-20 无锡众星微系统技术有限公司 Bandwidth and delay performance measuring method and circuit of IB network switching chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521119A (en) * 2011-11-15 2012-06-27 浪潮电子信息产业股份有限公司 Method for rapidly detecting cluster parallel efficiency
CN102546202A (en) * 2010-12-17 2012-07-04 无锡江南计算技术研究所 Unlimited bandwidth network flow monitoring method, device and system
CN106201720A (en) * 2016-07-11 2016-12-07 广州高能计算机科技有限公司 Virtual symmetric multi-processors virtual machine creation method, data processing method and system
CN206100022U (en) * 2016-07-21 2017-04-12 广州高能计算机科技有限公司 It calculates cluster system directly to link framework based on infinite bandwidth
CN107040407A (en) * 2017-03-15 2017-08-11 成都中讯创新科技股份有限公司 A kind of HPCC dynamic node operational method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546202A (en) * 2010-12-17 2012-07-04 无锡江南计算技术研究所 Unlimited bandwidth network flow monitoring method, device and system
CN102521119A (en) * 2011-11-15 2012-06-27 浪潮电子信息产业股份有限公司 Method for rapidly detecting cluster parallel efficiency
CN106201720A (en) * 2016-07-11 2016-12-07 广州高能计算机科技有限公司 Virtual symmetric multi-processors virtual machine creation method, data processing method and system
CN206100022U (en) * 2016-07-21 2017-04-12 广州高能计算机科技有限公司 It calculates cluster system directly to link framework based on infinite bandwidth
CN107040407A (en) * 2017-03-15 2017-08-11 成都中讯创新科技股份有限公司 A kind of HPCC dynamic node operational method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114268568A (en) * 2021-12-22 2022-04-01 快云信息科技有限公司 Network traffic monitoring method, device and equipment
CN114268568B (en) * 2021-12-22 2023-08-25 快云信息科技有限公司 Network traffic monitoring method, device and equipment
CN117579522A (en) * 2023-12-19 2024-02-20 无锡众星微系统技术有限公司 Bandwidth and delay performance measuring method and circuit of IB network switching chip
CN117579522B (en) * 2023-12-19 2024-05-10 无锡众星微系统技术有限公司 Bandwidth and delay performance measuring method and circuit of IB network switching chip

Similar Documents

Publication Publication Date Title
CN108039957B (en) Intelligent analysis system for complex network traffic packet
BE1022604B1 (en) EFFICIENT MONITORING OF A DATA CENTER
US7872982B2 (en) Implementing an error log analysis model to facilitate faster problem isolation and repair
CN107508722B (en) Service monitoring method and device
US20050021715A1 (en) Automated capturing and characterization of network traffic using feedback
US20020091977A1 (en) Method and system for multi-user channel allocation for a multi-channel analyzer
CN108984332A (en) A kind of device and method of location-server delay machine failure
CN103200046A (en) Method and system for monitoring network cell device performance
CN111343047A (en) Method and system for monitoring IB network flow
CN109408341A (en) Distributed memory system method for monitoring performance, device, equipment and readable storage medium storing program for executing
CN111371640A (en) SDN controller-based traffic collection analysis method and system
CN106911519B (en) Data acquisition monitoring method and device
CN112994972B (en) Distributed probe monitoring platform
CN108932196A (en) A kind of parallel automated testing method, system, equipment and readable storage medium storing program for executing
CN101841541B (en) Method and system for monitoring cluster based on multicast network
CN112134754A (en) Pressure testing method and device, network equipment and storage medium
CN114048090A (en) K8S-based container cloud platform monitoring method and device and storage medium
CN108696371B (en) Network fault determination method and system
CN112333020A (en) Network security monitoring and data message analyzing system based on quintuple
CN113542092A (en) Openstack-based automatic drainage method
CN116204386B (en) Method, system, medium and equipment for automatically identifying and monitoring application service relationship
CN111176950A (en) Method and equipment for monitoring network card of server cluster
CN1324850C (en) Testing method for nucleus plate of digital user inserting into module
CN114090382A (en) Health inspection method and device for super-converged cluster
CN114328093A (en) Hadoop-based monitoring method, system, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626