CN111817892B - Network management method, system, electronic equipment and storage medium - Google Patents

Network management method, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN111817892B
CN111817892B CN202010662970.3A CN202010662970A CN111817892B CN 111817892 B CN111817892 B CN 111817892B CN 202010662970 A CN202010662970 A CN 202010662970A CN 111817892 B CN111817892 B CN 111817892B
Authority
CN
China
Prior art keywords
network card
network
main
storage
card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010662970.3A
Other languages
Chinese (zh)
Other versions
CN111817892A (en
Inventor
刘杰
罗浩
安祥文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202010662970.3A priority Critical patent/CN111817892B/en
Publication of CN111817892A publication Critical patent/CN111817892A/en
Application granted granted Critical
Publication of CN111817892B publication Critical patent/CN111817892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application discloses a network management method, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps: acquiring a network card state of a main network card according to a preset period; if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes; judging whether the network card states of the main network cards of all other storage nodes are abnormal states or not; if yes, switching the network card; if not, the network card switching operation is not executed; when the main network card of the storage node is detected to be recovered to the normal state, judging whether the network card states of the main network cards of other storage nodes are all in the normal state; if yes, switching the network card; if not, the network card switching operation is not executed. The method and the device can accurately detect the network state and improve the master-slave switching efficiency of the network card. The application also discloses a network management system, an electronic device and a storage medium, which have the beneficial effects.

Description

Network management method, system, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a network management method and system, an electronic device, and a storage medium.
Background
With the continuous development of information technology, data storage, which is one of the core elements of data resources, has also come to the period of rapid development. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes the bottleneck of the system performance, is also the focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, not only improves the reliability, the availability and the access efficiency of the system, but also is easy to expand, thereby being accepted and accepted by more and more enterprise units.
In a distributed storage networking environment, two network cards are generally connected to two switches respectively so as to implement network redundancy, and if a subnet manager detects that the network card is abnormal in state, network switching is performed. However, when the distributed storage system is tested, a fault that can interrupt the network needs to be injected into a certain storage node, and if the main/standby switching judgment mechanism is used, the network card is switched incorrectly.
Therefore, how to accurately detect the network state and improve the primary/standby switching efficiency of the network card is a technical problem that needs to be solved by technical personnel in the field at present.
Disclosure of Invention
The application aims to provide a network management method, a network management system, a storage medium and an electronic device, which can accurately detect the network state and improve the primary and standby switching efficiency of a network card.
In order to solve the above technical problem, the present application provides a network management method, which is applied to storage nodes in a distributed storage system, where the storage nodes include a primary network card and a standby network card, and the network management method includes:
acquiring a network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the network card into the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
Optionally, the master network cards of all the storage nodes in the distributed storage system are connected to the first Infiniband switch, and the standby network cards of all the storage nodes in the distributed storage system are connected to the second Infiniband switch.
Optionally, the obtaining the network card status of the master network card of the other storage nodes in the distributed storage system includes:
and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
Optionally, the method further includes:
and if the network card state of the main network card of the storage node is an abnormal state and the network card states of the main network cards of all the other storage nodes are not uniform and are in the abnormal state, judging that the storage node is injected with a network fault.
Optionally, the method further includes:
if the storage node is accessed into the Infiniband network by using the main network card, setting the current network card identifier of the storage node as the global unique identifier of the main network card of the storage node;
and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node.
Optionally, after obtaining the network card status of the primary network card according to a preset period, the method further includes:
setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node;
correspondingly, obtaining the network card status of the main network cards of other storage nodes in the distributed storage system includes:
acquiring network card flag bits of other storage nodes in the distributed storage system;
and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
The application also provides a network management system, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management system comprises:
the first network card state acquisition module is used for acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
the second network card state acquisition module is used for acquiring the network card states of the main network cards of other storage nodes in the distributed storage system if the network card state of the main network card of the storage node is an abnormal state;
the first network card switching module is used for judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if yes, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
the second network card switching module is used for judging whether the network card states of the main network cards of other storage nodes are all normal states or not when the network card state of the main network card of the storage node is detected to be recovered to be the normal state; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
Optionally, the second network card status acquiring module is a module configured to remotely acquire, through the management network interface of the storage node, the network card status of the main network card of another storage node in the distributed storage system.
The present application also provides a storage medium having stored thereon a computer program that, when executed, performs the steps performed by the above-described network management method.
The application also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps executed by the network management method when calling the computer program in the memory.
The application provides a network management method, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps: acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card; if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system; judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method is applied to the storage nodes in the distributed storage system, the distributed storage system can comprise a plurality of storage nodes, the network card state of the main network card is obtained according to a preset period, and if the main network card is in an abnormal state, the main network card state of other storage nodes in the distributed storage system is obtained. The reason for causing the storage node main network card state abnormity can be a switch failure, and can also be that the storage node is injected with a failure. If the storage node is injected with a fault which can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and the main network card state of other storage nodes can be obtained to judge whether the main network card state is abnormal or not caused by the switch fault. If the main network card states of other storage nodes are abnormal states, the switch fault exists, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. The method and the device further judge whether the network card states of the main network cards of other storage nodes are recovered to be normal or not after the main network cards of the storage nodes are recovered to be normal, if the main network cards of all the storage nodes of the distributed storage system are recovered to be normal, network card switching operation is executed, and network card error switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the network card master-slave switching efficiency is improved. The application also provides a network management system, an electronic device and a storage medium, which have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings required for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a network management method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a network management system according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a network management method according to an embodiment of the present disclosure.
The specific steps may include:
s101: acquiring a network card state of a main network card according to a preset period;
the embodiment can be applied to storage nodes in a distributed storage system, the distributed storage system can include a plurality of storage nodes, each storage node can be provided with a main network card and a standby network card, all the main network cards are connected with a main switch, and all the standby network cards are connected with a standby switch. When the storage node switches the current network card to the main network card, the storage node accesses the network through the main network card and the main switch; and when the storage node switches the current network card to the standby network card, the storage node accesses the network through the standby network card and the standby switch.
As a possible implementation manner, when the network card status of the primary network card of the storage node is in a normal state, the storage node may access the Infiniband network through the primary network card when the network card status of the primary network card is in the normal state. Further, the master network card of all the storage nodes in the distributed storage system is connected with a first Infiniband switch (i.e., a master switch), and the standby network cards of all the storage nodes in the distributed storage system are connected with a second Infiniband switch (i.e., a standby switch). The embodiment may acquire the network card status of the main network card according to a preset period. Specifically, in this embodiment, the running state parameter (such as the transmission rate) of the main network card may be read and compared with the preset parameter, if the running state parameter is not the preset parameter, the network card state of the main network card is determined to be the abnormal state, and if the running state parameter is the preset parameter or is within the value range corresponding to the preset parameter, the network card state of the main network card is determined to be the normal state. Infiniband, abbreviated IB, is a computer network communications standard for high performance computing, and has extremely high throughput and extremely low latency for data interconnections between computers. InfiniBand also serves as a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems.
S102: if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
before this step, there may also be an operation in which the storage node determines whether the network card state of the main network card is an abnormal state, if the network card state of the main network card is a normal state, the relevant operation of S101 may be repeatedly executed, and if the network card state of the main network card is an abnormal state, the relevant operation of S102 may be executed. The step is established on the basis that the network card state of the main network card is judged to be an abnormal state, and the reason for causing the abnormal state of the network card can comprise switch faults (such as sudden power failure of the switch) and also can comprise faults that storage nodes are injected to interrupt the network.
As a possible implementation manner, the present embodiment may obtain the network card status of the primary network card of the other storage node by the following manner: and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
S103: judging whether the network card states of the main network cards of all other storage nodes are abnormal states or not; if yes, entering S104; if not, ending the process, and not executing the network card switching operation;
in the step, after the network card state of the main network card of the storage node is determined to be the abnormal state, the reason for causing the network card state to be the abnormal state is determined according to the network card states of the main network cards of other storage nodes. If the switch connected with the main network card fails, all storage nodes in the distributed storage system cannot access the network through the main network card; if the storage node is injected with a fault capable of interrupting the network, other storage nodes in the distributed storage system can still be logged into the network through the main network card, that is, the network card states of the main network cards of other storage nodes are normal states.
Further, if the network card status of the main network card of the storage node is an abnormal status and the network card statuses of the main network cards of all other storage nodes are not uniform to be abnormal statuses, it is determined that the storage node has been injected with a network fault.
S104: and switching the network card and accessing the Infiniband network by using the standby network card of the storage node.
The method comprises the steps that on the basis that the network card states of main network cards of all storage nodes in the distributed storage system are determined to be abnormal states, the current network card of the storage node is switched from the main network card to a standby network card, and then the standby network card is used for accessing the Infiniband network.
As a possible implementation manner, after executing S104, there may be an operation of detecting a network card state of the main network card of the storage node, and when it is detected that the network card state of the main network card of the storage node is recovered to a normal state, it is determined whether the network card states of the main network cards of the other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method provided by the embodiment is applied to storage nodes in a distributed storage system, the distributed storage system may include a plurality of storage nodes, the network card state of the primary network card is acquired according to a preset period, and if the primary network card is in an abnormal state, the primary network card state of other storage nodes in the distributed storage system is acquired. The cause of the abnormal state of the main network card of the storage node can be a switch failure or a failure injected into the storage node. If the storage node is injected with a fault that can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and in this embodiment, whether the state of the main network card is abnormal due to the switch fault or not can be judged by acquiring the state of the main network card of other storage nodes. If the main network card states of other storage nodes are all abnormal states, the fact that the switch fails is indicated, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. In this embodiment, it is further determined whether the network card states of the main network cards of the other storage nodes are all recovered to be normal after the main network card of the storage node is recovered to be normal, and if the main network cards of all the storage nodes of the distributed storage system are all recovered to be normal, a network card switching operation is executed, so that network card wrong switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the primary and standby network card switching efficiency is improved.
As a further introduction to the embodiment corresponding to fig. 1, after the network cards are switched and the standby network card of the storage node is used to access the Infiniband network, the network card state of the main network card of the storage node may be continuously detected, and when it is detected that the network card state of the main network card of the storage node is abnormal and returns to the normal state, it may be determined whether the network card states of the main network cards of the other storage nodes are all normal states; and if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node.
By way of further introduction to the corresponding embodiment of fig. 1, if the storage node accesses the Infiniband network by using a primary network card, setting a current network card identifier of the storage node as a globally unique identifier of the primary network card of the storage node; and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node. The user can determine the network card currently used by the storage node according to the current network card identifier.
As a further introduction to the embodiment corresponding to fig. 1, after the network card status of the primary network card is obtained according to the preset period, the value of the network card flag bit of the storage node may also be set according to the network card status of the primary network card of the storage node. Correspondingly, the operation of acquiring the network card status of the host network card of the other storage node may include: acquiring network card flag bits of other storage nodes in the distributed storage system; and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
The flow described in the above embodiment is explained below by an embodiment in practical use. In the actual use process, after the Infiniband switch serving as the main switch is powered off, the Infiniband network is interrupted for about 60s, and in the period of time, the subnet manager judges that the switch fails, then the subnet is switched, and the network communication is recovered. When the Infiniband network is interrupted, storage cluster services are unavailable and customer traffic is interrupted accordingly. Currently, there is no means to quickly detect Infiniband network failures, reduce the 60s network outage time, and avoid uninterrupted storage traffic. The application provides a design method for uninterrupted switching of main and standby Infiniband switches in a distributed storage environment, which can solve the problems of overlong main and standby network card switching time and low network card switching efficiency in the related technology, and specifically comprises the following steps:
step 1: setting an exchanger where the network card A is located as a main exchanger and an exchanger where the network card B is located as a standby exchanger according to global unique identifiers (guid information) of two network cards A (namely, a main network card) and a network card B (namely, a standby network card) on a storage node;
step 2: scanning the state of the network card A at regular time, and if the state of the network card A is normal, exiting the process to wait for the next check period; if the network card A is abnormal, executing the step 3;
and step 3: remotely checking the states of the network cards A on other nodes in the cluster through the management network ports of the storage nodes when the network cards A are abnormal, and if the network cards A on other nodes are normal, locally carrying out no adjustment; if the network cards A on other nodes are also abnormal, the Infiniband network switching is completed, and the switch A (namely a first Infiniband switch) is switched to a switch B (namely a second Infiniband switch); after the completion, executing the step 4;
and 4, step 4: continuing to check the state of the network card A, if the network card A works again, checking whether the network cards A on other nodes in the cluster work normally, if not, adjusting, and if so, switching the Infiniband network to the switch A again; after completion, step 1 is performed.
The embodiment provides a design method for switching main and standby Infiniband switches without cutting off flow under a distributed storage environment according to the characteristics of AS13000 distributed storage, which can quickly detect network abnormality and complete subnet switching after the main Infiniband switch is powered off, so that the interruption time of the Infiniband network is controlled within 1 second, and at the moment, because a storage cluster has an internal cache, the problem of switching off flow of the main and standby Infiniband switches can be solved, and continuous reading and writing and non-flow of front-end services are ensured.
In the embodiment, the state of the Infiniband network is actively monitored, and the fault network is actively switched when the network fails each time, so that the network interruption time is greatly shortened, the purpose of uninterrupted service when the switch fails is achieved, and the problem of uninterrupted switching of the main and standby Infiniband switches is solved.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a network management system according to an embodiment of the present application; the storage node applied to the distributed storage system comprises a main network card and a standby network card, and the network management system comprises:
a first network card status acquiring module 100, configured to acquire a network card status of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
a second network card status obtaining module 200, configured to obtain, if the network card status of the main network card of the storage node is an abnormal status, network card statuses of main network cards of other storage nodes in the distributed storage system;
the first network card switching module 300 is configured to determine whether the network card states of the main network cards of all the other storage nodes are abnormal states; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
the second network card switching module 400 is configured to, when it is detected that the network card status of the main network card of the storage node is recovered to a normal status, determine whether the network card statuses of the main network cards of the other storage nodes are all normal statuses; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method provided by the embodiment is applied to storage nodes in a distributed storage system, the distributed storage system may include a plurality of storage nodes, the network card state of the main network card is acquired according to a preset period, and if the main network card is in an abnormal state, the main network card state of other storage nodes in the distributed storage system is acquired. The cause of the abnormal state of the main network card of the storage node can be a switch failure or a failure injected into the storage node. If a storage node is injected with a fault that can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and in this embodiment, whether the state of the main network card is abnormal due to the switch fault or not can be determined by acquiring the state of the main network card of other storage nodes. If the main network card states of other storage nodes are all abnormal states, the fact that the switch fails is indicated, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. In this embodiment, it is further determined whether the network card states of the master network cards of the other storage nodes are all recovered to be normal after the master network card of the storage node is recovered to be normal, and if the master network cards of all the storage nodes of the distributed storage system are all recovered to be normal, a network card switching operation is executed, so that network card wrong switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the primary and standby network card switching efficiency is improved.
Further, the main network cards of all the storage nodes in the distributed storage system are connected with the first Infiniband switch, and the standby network cards of all the storage nodes in the distributed storage system are connected with the second Infiniband switch.
Further, the second network card status acquiring module 200 is specifically a module for remotely acquiring the network card status of the main network card of the other storage node in the distributed storage system through the management network port of the storage node.
Further, the method also comprises the following steps:
and the fault determining module is used for judging that the storage node is injected with a network fault if the network card state of the main network card of the storage node is an abnormal state and the network card states of the main network cards of all other storage nodes are not uniform to be abnormal states.
Further, the method also comprises the following steps:
the current network card identifier setting module is used for setting the current network card identifier of the storage node as a global unique identifier of a main network card of the storage node if the storage node utilizes the main network card to access the Infiniband network; and the network card identifier is also used for setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node if the storage node utilizes the standby network card to access the Infiniband network.
Further, the method also comprises the following steps:
the flag bit setting module is used for setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node after the network card state of the main network card is acquired according to a preset period;
correspondingly, the second network card status obtaining module 200 is configured to obtain network card flag bits of other storage nodes in the distributed storage system; and the network card state of the main network cards of the other storage nodes is determined according to the values of the network card flag bits of the other storage nodes.
Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not repeated here.
The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (9)

1. A network management method is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps:
acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; the main network cards of all storage nodes in the distributed storage system are connected with a first Infiniband switch, and the standby network cards of all storage nodes in the distributed storage system are connected with a second Infiniband switch;
when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
2. The network management method according to claim 1, wherein obtaining the network card status of the master network cards of the other storage nodes in the distributed storage system comprises:
and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
3. The network management method of claim 1, further comprising:
and if the network card states of the main network cards of the storage nodes are abnormal states and the network card states of the main network cards of all the other storage nodes are not uniform to be abnormal states, judging that the storage nodes are injected with network faults.
4. The network management method of claim 1, further comprising:
if the storage node is accessed into the Infiniband network by using the main network card, setting the current network card identifier of the storage node as the global unique identifier of the main network card of the storage node;
and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node.
5. The network management method according to claim 1, further comprising, after acquiring the network card status of the primary network card according to a preset period:
setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node;
correspondingly, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system includes:
acquiring network card flag bits of other storage nodes in the distributed storage system;
and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
6. A network management system is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management system comprises:
the first network card state acquisition module is used for acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
a second network card status acquiring module, configured to acquire, if the network card status of the main network card of the storage node is an abnormal status, network card statuses of main network cards of other storage nodes in the distributed storage system;
the first network card switching module is used for judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if yes, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; the main network cards of all storage nodes in the distributed storage system are connected with a first Infiniband switch, and the standby network cards of all storage nodes in the distributed storage system are connected with a second Infiniband switch;
the second network card switching module is used for judging whether the network card states of the main network cards of other storage nodes are all normal states or not when the network card state of the main network card of the storage node is detected to be recovered to be the normal state; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
7. The network management system according to claim 6, wherein the second network card status acquiring module is a module configured to remotely acquire, through the management network port of the storage node, the network card status of the master network card of the other storage node in the distributed storage system.
8. An electronic device, comprising a memory in which a computer program is stored and a processor which, when it is called up in the memory, implements the steps of the network management method according to any one of claims 1 to 5.
9. A storage medium having stored thereon computer-executable instructions which, when loaded and executed by a processor, carry out the steps of a network management method according to any one of claims 1 to 5.
CN202010662970.3A 2020-07-10 2020-07-10 Network management method, system, electronic equipment and storage medium Active CN111817892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010662970.3A CN111817892B (en) 2020-07-10 2020-07-10 Network management method, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010662970.3A CN111817892B (en) 2020-07-10 2020-07-10 Network management method, system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111817892A CN111817892A (en) 2020-10-23
CN111817892B true CN111817892B (en) 2023-04-07

Family

ID=72842670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010662970.3A Active CN111817892B (en) 2020-07-10 2020-07-10 Network management method, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111817892B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143176A (en) * 2021-10-31 2022-03-04 广东浪潮智慧计算技术有限公司 Network configuration method, system and related device of virtualization platform management network

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404568A (en) * 2008-11-17 2009-04-08 国电南瑞科技股份有限公司 Double-network card hot backup redundancy method
CN102263660A (en) * 2011-07-19 2011-11-30 中国舰船研究设计中心 Dual-network card redundancy switching method and device
CN102684946A (en) * 2012-05-25 2012-09-19 中国舰船研究设计中心 Dual-network-interface-card switching performance testing method for information integration system
CN103259678A (en) * 2013-04-28 2013-08-21 华为技术有限公司 Main-auxiliary switching method, device, equipment and system
CN106301836A (en) * 2015-05-25 2017-01-04 北京视联动力国际信息技术有限公司 A kind of method of redundancy backup, terminal and regard networked system
CN106713036A (en) * 2016-12-27 2017-05-24 中国建设银行股份有限公司 Fault processing method and system of mobile terminal payment system
CN107995106A (en) * 2017-12-04 2018-05-04 山东超越数控电子股份有限公司 A kind of interchanger redundant system of data storing platform
CN109831341A (en) * 2019-03-19 2019-05-31 中国电子科技集团公司第三十六研究所 A kind of fast switch over method and device of redundancy double netcard
CN111212127A (en) * 2019-12-29 2020-05-29 浪潮电子信息产业股份有限公司 Storage cluster, service data maintenance method, device and storage medium
CN111212451A (en) * 2019-12-26 2020-05-29 曙光信息产业股份有限公司 Method and device for switching network transmission channel

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404568A (en) * 2008-11-17 2009-04-08 国电南瑞科技股份有限公司 Double-network card hot backup redundancy method
CN102263660A (en) * 2011-07-19 2011-11-30 中国舰船研究设计中心 Dual-network card redundancy switching method and device
CN102684946A (en) * 2012-05-25 2012-09-19 中国舰船研究设计中心 Dual-network-interface-card switching performance testing method for information integration system
CN103259678A (en) * 2013-04-28 2013-08-21 华为技术有限公司 Main-auxiliary switching method, device, equipment and system
CN106301836A (en) * 2015-05-25 2017-01-04 北京视联动力国际信息技术有限公司 A kind of method of redundancy backup, terminal and regard networked system
CN106713036A (en) * 2016-12-27 2017-05-24 中国建设银行股份有限公司 Fault processing method and system of mobile terminal payment system
CN107995106A (en) * 2017-12-04 2018-05-04 山东超越数控电子股份有限公司 A kind of interchanger redundant system of data storing platform
CN109831341A (en) * 2019-03-19 2019-05-31 中国电子科技集团公司第三十六研究所 A kind of fast switch over method and device of redundancy double netcard
CN111212451A (en) * 2019-12-26 2020-05-29 曙光信息产业股份有限公司 Method and device for switching network transmission channel
CN111212127A (en) * 2019-12-29 2020-05-29 浪潮电子信息产业股份有限公司 Storage cluster, service data maintenance method, device and storage medium

Also Published As

Publication number Publication date
CN111817892A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN108847982B (en) Distributed storage cluster and node fault switching method and device thereof
US7802128B2 (en) Method to avoid continuous application failovers in a cluster
CN105187249B (en) A kind of fault recovery method and device
KR100420266B1 (en) Apparatus and method for improving the availability of cluster computer systems
CN110807064B (en) Data recovery device in RAC distributed database cluster system
US20080288812A1 (en) Cluster system and an error recovery method thereof
EP3306476B1 (en) Method and apparatus for hot cpu removal and hot cpu adding during operation
US10698605B2 (en) Multipath storage device based on multi-dimensional health diagnosis
CN104036043A (en) High availability method of MYSQL and managing node
CN109120522B (en) Multipath state monitoring method and device
CN111176888A (en) Cloud storage disaster recovery method, device and system
CN111104283A (en) Fault detection method, device, equipment and medium of distributed storage system
CN111817892B (en) Network management method, system, electronic equipment and storage medium
CN108512753B (en) Method and device for transmitting messages in cluster file system
CN114064374A (en) Fault detection method and system based on distributed block storage
CN111309515B (en) Disaster recovery control method, device and system
CN112491633B (en) Fault recovery method, system and related components of multi-node cluster
CN114189429A (en) System, method, device and medium for monitoring server cluster faults
CN100463373C (en) Centralized control and hierarchical implementing switching control method and device
CN115550287B (en) Method for establishing remote copy relationship and related device
CN113609104B (en) Method and device for accessing distributed storage system by key value of partial fault
CN115150253B (en) Fault root cause determining method and device and electronic equipment
CN113868000B (en) Link fault repairing method, system and related components
US20230090032A1 (en) Storage system and control method
CN112000500A (en) Communication fault determining method, processing method and storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant