CN111817892B - Network management method, system, electronic equipment and storage medium - Google Patents
Network management method, system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111817892B CN111817892B CN202010662970.3A CN202010662970A CN111817892B CN 111817892 B CN111817892 B CN 111817892B CN 202010662970 A CN202010662970 A CN 202010662970A CN 111817892 B CN111817892 B CN 111817892B
- Authority
- CN
- China
- Prior art keywords
- network card
- network
- main
- storage
- card
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The application discloses a network management method, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps: acquiring a network card state of a main network card according to a preset period; if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes; judging whether the network card states of the main network cards of all other storage nodes are abnormal states or not; if yes, switching the network card; if not, the network card switching operation is not executed; when the main network card of the storage node is detected to be recovered to the normal state, judging whether the network card states of the main network cards of other storage nodes are all in the normal state; if yes, switching the network card; if not, the network card switching operation is not executed. The method and the device can accurately detect the network state and improve the master-slave switching efficiency of the network card. The application also discloses a network management system, an electronic device and a storage medium, which have the beneficial effects.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a network management method and system, an electronic device, and a storage medium.
Background
With the continuous development of information technology, data storage, which is one of the core elements of data resources, has also come to the period of rapid development. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes the bottleneck of the system performance, is also the focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, not only improves the reliability, the availability and the access efficiency of the system, but also is easy to expand, thereby being accepted and accepted by more and more enterprise units.
In a distributed storage networking environment, two network cards are generally connected to two switches respectively so as to implement network redundancy, and if a subnet manager detects that the network card is abnormal in state, network switching is performed. However, when the distributed storage system is tested, a fault that can interrupt the network needs to be injected into a certain storage node, and if the main/standby switching judgment mechanism is used, the network card is switched incorrectly.
Therefore, how to accurately detect the network state and improve the primary/standby switching efficiency of the network card is a technical problem that needs to be solved by technical personnel in the field at present.
Disclosure of Invention
The application aims to provide a network management method, a network management system, a storage medium and an electronic device, which can accurately detect the network state and improve the primary and standby switching efficiency of a network card.
In order to solve the above technical problem, the present application provides a network management method, which is applied to storage nodes in a distributed storage system, where the storage nodes include a primary network card and a standby network card, and the network management method includes:
acquiring a network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the network card into the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
Optionally, the master network cards of all the storage nodes in the distributed storage system are connected to the first Infiniband switch, and the standby network cards of all the storage nodes in the distributed storage system are connected to the second Infiniband switch.
Optionally, the obtaining the network card status of the master network card of the other storage nodes in the distributed storage system includes:
and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
Optionally, the method further includes:
and if the network card state of the main network card of the storage node is an abnormal state and the network card states of the main network cards of all the other storage nodes are not uniform and are in the abnormal state, judging that the storage node is injected with a network fault.
Optionally, the method further includes:
if the storage node is accessed into the Infiniband network by using the main network card, setting the current network card identifier of the storage node as the global unique identifier of the main network card of the storage node;
and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node.
Optionally, after obtaining the network card status of the primary network card according to a preset period, the method further includes:
setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node;
correspondingly, obtaining the network card status of the main network cards of other storage nodes in the distributed storage system includes:
acquiring network card flag bits of other storage nodes in the distributed storage system;
and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
The application also provides a network management system, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management system comprises:
the first network card state acquisition module is used for acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
the second network card state acquisition module is used for acquiring the network card states of the main network cards of other storage nodes in the distributed storage system if the network card state of the main network card of the storage node is an abnormal state;
the first network card switching module is used for judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if yes, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
the second network card switching module is used for judging whether the network card states of the main network cards of other storage nodes are all normal states or not when the network card state of the main network card of the storage node is detected to be recovered to be the normal state; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
Optionally, the second network card status acquiring module is a module configured to remotely acquire, through the management network interface of the storage node, the network card status of the main network card of another storage node in the distributed storage system.
The present application also provides a storage medium having stored thereon a computer program that, when executed, performs the steps performed by the above-described network management method.
The application also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps executed by the network management method when calling the computer program in the memory.
The application provides a network management method, which is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps: acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card; if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system; judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method is applied to the storage nodes in the distributed storage system, the distributed storage system can comprise a plurality of storage nodes, the network card state of the main network card is obtained according to a preset period, and if the main network card is in an abnormal state, the main network card state of other storage nodes in the distributed storage system is obtained. The reason for causing the storage node main network card state abnormity can be a switch failure, and can also be that the storage node is injected with a failure. If the storage node is injected with a fault which can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and the main network card state of other storage nodes can be obtained to judge whether the main network card state is abnormal or not caused by the switch fault. If the main network card states of other storage nodes are abnormal states, the switch fault exists, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. The method and the device further judge whether the network card states of the main network cards of other storage nodes are recovered to be normal or not after the main network cards of the storage nodes are recovered to be normal, if the main network cards of all the storage nodes of the distributed storage system are recovered to be normal, network card switching operation is executed, and network card error switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the network card master-slave switching efficiency is improved. The application also provides a network management system, an electronic device and a storage medium, which have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings required for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a network management method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a network management system according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a network management method according to an embodiment of the present disclosure.
The specific steps may include:
s101: acquiring a network card state of a main network card according to a preset period;
the embodiment can be applied to storage nodes in a distributed storage system, the distributed storage system can include a plurality of storage nodes, each storage node can be provided with a main network card and a standby network card, all the main network cards are connected with a main switch, and all the standby network cards are connected with a standby switch. When the storage node switches the current network card to the main network card, the storage node accesses the network through the main network card and the main switch; and when the storage node switches the current network card to the standby network card, the storage node accesses the network through the standby network card and the standby switch.
As a possible implementation manner, when the network card status of the primary network card of the storage node is in a normal state, the storage node may access the Infiniband network through the primary network card when the network card status of the primary network card is in the normal state. Further, the master network card of all the storage nodes in the distributed storage system is connected with a first Infiniband switch (i.e., a master switch), and the standby network cards of all the storage nodes in the distributed storage system are connected with a second Infiniband switch (i.e., a standby switch). The embodiment may acquire the network card status of the main network card according to a preset period. Specifically, in this embodiment, the running state parameter (such as the transmission rate) of the main network card may be read and compared with the preset parameter, if the running state parameter is not the preset parameter, the network card state of the main network card is determined to be the abnormal state, and if the running state parameter is the preset parameter or is within the value range corresponding to the preset parameter, the network card state of the main network card is determined to be the normal state. Infiniband, abbreviated IB, is a computer network communications standard for high performance computing, and has extremely high throughput and extremely low latency for data interconnections between computers. InfiniBand also serves as a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems.
S102: if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
before this step, there may also be an operation in which the storage node determines whether the network card state of the main network card is an abnormal state, if the network card state of the main network card is a normal state, the relevant operation of S101 may be repeatedly executed, and if the network card state of the main network card is an abnormal state, the relevant operation of S102 may be executed. The step is established on the basis that the network card state of the main network card is judged to be an abnormal state, and the reason for causing the abnormal state of the network card can comprise switch faults (such as sudden power failure of the switch) and also can comprise faults that storage nodes are injected to interrupt the network.
As a possible implementation manner, the present embodiment may obtain the network card status of the primary network card of the other storage node by the following manner: and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
S103: judging whether the network card states of the main network cards of all other storage nodes are abnormal states or not; if yes, entering S104; if not, ending the process, and not executing the network card switching operation;
in the step, after the network card state of the main network card of the storage node is determined to be the abnormal state, the reason for causing the network card state to be the abnormal state is determined according to the network card states of the main network cards of other storage nodes. If the switch connected with the main network card fails, all storage nodes in the distributed storage system cannot access the network through the main network card; if the storage node is injected with a fault capable of interrupting the network, other storage nodes in the distributed storage system can still be logged into the network through the main network card, that is, the network card states of the main network cards of other storage nodes are normal states.
Further, if the network card status of the main network card of the storage node is an abnormal status and the network card statuses of the main network cards of all other storage nodes are not uniform to be abnormal statuses, it is determined that the storage node has been injected with a network fault.
S104: and switching the network card and accessing the Infiniband network by using the standby network card of the storage node.
The method comprises the steps that on the basis that the network card states of main network cards of all storage nodes in the distributed storage system are determined to be abnormal states, the current network card of the storage node is switched from the main network card to a standby network card, and then the standby network card is used for accessing the Infiniband network.
As a possible implementation manner, after executing S104, there may be an operation of detecting a network card state of the main network card of the storage node, and when it is detected that the network card state of the main network card of the storage node is recovered to a normal state, it is determined whether the network card states of the main network cards of the other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method provided by the embodiment is applied to storage nodes in a distributed storage system, the distributed storage system may include a plurality of storage nodes, the network card state of the primary network card is acquired according to a preset period, and if the primary network card is in an abnormal state, the primary network card state of other storage nodes in the distributed storage system is acquired. The cause of the abnormal state of the main network card of the storage node can be a switch failure or a failure injected into the storage node. If the storage node is injected with a fault that can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and in this embodiment, whether the state of the main network card is abnormal due to the switch fault or not can be judged by acquiring the state of the main network card of other storage nodes. If the main network card states of other storage nodes are all abnormal states, the fact that the switch fails is indicated, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. In this embodiment, it is further determined whether the network card states of the main network cards of the other storage nodes are all recovered to be normal after the main network card of the storage node is recovered to be normal, and if the main network cards of all the storage nodes of the distributed storage system are all recovered to be normal, a network card switching operation is executed, so that network card wrong switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the primary and standby network card switching efficiency is improved.
As a further introduction to the embodiment corresponding to fig. 1, after the network cards are switched and the standby network card of the storage node is used to access the Infiniband network, the network card state of the main network card of the storage node may be continuously detected, and when it is detected that the network card state of the main network card of the storage node is abnormal and returns to the normal state, it may be determined whether the network card states of the main network cards of the other storage nodes are all normal states; and if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node.
By way of further introduction to the corresponding embodiment of fig. 1, if the storage node accesses the Infiniband network by using a primary network card, setting a current network card identifier of the storage node as a globally unique identifier of the primary network card of the storage node; and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node. The user can determine the network card currently used by the storage node according to the current network card identifier.
As a further introduction to the embodiment corresponding to fig. 1, after the network card status of the primary network card is obtained according to the preset period, the value of the network card flag bit of the storage node may also be set according to the network card status of the primary network card of the storage node. Correspondingly, the operation of acquiring the network card status of the host network card of the other storage node may include: acquiring network card flag bits of other storage nodes in the distributed storage system; and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
The flow described in the above embodiment is explained below by an embodiment in practical use. In the actual use process, after the Infiniband switch serving as the main switch is powered off, the Infiniband network is interrupted for about 60s, and in the period of time, the subnet manager judges that the switch fails, then the subnet is switched, and the network communication is recovered. When the Infiniband network is interrupted, storage cluster services are unavailable and customer traffic is interrupted accordingly. Currently, there is no means to quickly detect Infiniband network failures, reduce the 60s network outage time, and avoid uninterrupted storage traffic. The application provides a design method for uninterrupted switching of main and standby Infiniband switches in a distributed storage environment, which can solve the problems of overlong main and standby network card switching time and low network card switching efficiency in the related technology, and specifically comprises the following steps:
step 1: setting an exchanger where the network card A is located as a main exchanger and an exchanger where the network card B is located as a standby exchanger according to global unique identifiers (guid information) of two network cards A (namely, a main network card) and a network card B (namely, a standby network card) on a storage node;
step 2: scanning the state of the network card A at regular time, and if the state of the network card A is normal, exiting the process to wait for the next check period; if the network card A is abnormal, executing the step 3;
and step 3: remotely checking the states of the network cards A on other nodes in the cluster through the management network ports of the storage nodes when the network cards A are abnormal, and if the network cards A on other nodes are normal, locally carrying out no adjustment; if the network cards A on other nodes are also abnormal, the Infiniband network switching is completed, and the switch A (namely a first Infiniband switch) is switched to a switch B (namely a second Infiniband switch); after the completion, executing the step 4;
and 4, step 4: continuing to check the state of the network card A, if the network card A works again, checking whether the network cards A on other nodes in the cluster work normally, if not, adjusting, and if so, switching the Infiniband network to the switch A again; after completion, step 1 is performed.
The embodiment provides a design method for switching main and standby Infiniband switches without cutting off flow under a distributed storage environment according to the characteristics of AS13000 distributed storage, which can quickly detect network abnormality and complete subnet switching after the main Infiniband switch is powered off, so that the interruption time of the Infiniband network is controlled within 1 second, and at the moment, because a storage cluster has an internal cache, the problem of switching off flow of the main and standby Infiniband switches can be solved, and continuous reading and writing and non-flow of front-end services are ensured.
In the embodiment, the state of the Infiniband network is actively monitored, and the fault network is actively switched when the network fails each time, so that the network interruption time is greatly shortened, the purpose of uninterrupted service when the switch fails is achieved, and the problem of uninterrupted switching of the main and standby Infiniband switches is solved.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a network management system according to an embodiment of the present application; the storage node applied to the distributed storage system comprises a main network card and a standby network card, and the network management system comprises:
a first network card status acquiring module 100, configured to acquire a network card status of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
a second network card status obtaining module 200, configured to obtain, if the network card status of the main network card of the storage node is an abnormal status, network card statuses of main network cards of other storage nodes in the distributed storage system;
the first network card switching module 300 is configured to determine whether the network card states of the main network cards of all the other storage nodes are abnormal states; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed;
the second network card switching module 400 is configured to, when it is detected that the network card status of the main network card of the storage node is recovered to a normal status, determine whether the network card statuses of the main network cards of the other storage nodes are all normal statuses; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
The network management method provided by the embodiment is applied to storage nodes in a distributed storage system, the distributed storage system may include a plurality of storage nodes, the network card state of the main network card is acquired according to a preset period, and if the main network card is in an abnormal state, the main network card state of other storage nodes in the distributed storage system is acquired. The cause of the abnormal state of the main network card of the storage node can be a switch failure or a failure injected into the storage node. If a storage node is injected with a fault that can interrupt the network, no matter which network card the storage node is switched to, the network function cannot be realized, and in this embodiment, whether the state of the main network card is abnormal due to the switch fault or not can be determined by acquiring the state of the main network card of other storage nodes. If the main network card states of other storage nodes are all abnormal states, the fact that the switch fails is indicated, the network cards can be switched, and the standby network cards of the storage nodes can be used for accessing the Infiniband network. In this embodiment, it is further determined whether the network card states of the master network cards of the other storage nodes are all recovered to be normal after the master network card of the storage node is recovered to be normal, and if the master network cards of all the storage nodes of the distributed storage system are all recovered to be normal, a network card switching operation is executed, so that network card wrong switching caused by network card state misjudgment is avoided. Therefore, the network state can be accurately detected, and the primary and standby network card switching efficiency is improved.
Further, the main network cards of all the storage nodes in the distributed storage system are connected with the first Infiniband switch, and the standby network cards of all the storage nodes in the distributed storage system are connected with the second Infiniband switch.
Further, the second network card status acquiring module 200 is specifically a module for remotely acquiring the network card status of the main network card of the other storage node in the distributed storage system through the management network port of the storage node.
Further, the method also comprises the following steps:
and the fault determining module is used for judging that the storage node is injected with a network fault if the network card state of the main network card of the storage node is an abnormal state and the network card states of the main network cards of all other storage nodes are not uniform to be abnormal states.
Further, the method also comprises the following steps:
the current network card identifier setting module is used for setting the current network card identifier of the storage node as a global unique identifier of a main network card of the storage node if the storage node utilizes the main network card to access the Infiniband network; and the network card identifier is also used for setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node if the storage node utilizes the standby network card to access the Infiniband network.
Further, the method also comprises the following steps:
the flag bit setting module is used for setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node after the network card state of the main network card is acquired according to a preset period;
correspondingly, the second network card status obtaining module 200 is configured to obtain network card flag bits of other storage nodes in the distributed storage system; and the network card state of the main network cards of the other storage nodes is determined according to the values of the network card flag bits of the other storage nodes.
Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not repeated here.
The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Claims (9)
1. A network management method is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management method comprises the following steps:
acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
if the network card state of the main network card of the storage node is an abnormal state, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system;
judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if so, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; the main network cards of all storage nodes in the distributed storage system are connected with a first Infiniband switch, and the standby network cards of all storage nodes in the distributed storage system are connected with a second Infiniband switch;
when the network card state of the main network card of the storage node is detected to be recovered to a normal state, judging whether the network card states of the main network cards of other storage nodes are all normal states; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
2. The network management method according to claim 1, wherein obtaining the network card status of the master network cards of the other storage nodes in the distributed storage system comprises:
and remotely acquiring the network card states of the main network cards of other storage nodes in the distributed storage system through the management network ports of the storage nodes.
3. The network management method of claim 1, further comprising:
and if the network card states of the main network cards of the storage nodes are abnormal states and the network card states of the main network cards of all the other storage nodes are not uniform to be abnormal states, judging that the storage nodes are injected with network faults.
4. The network management method of claim 1, further comprising:
if the storage node is accessed into the Infiniband network by using the main network card, setting the current network card identifier of the storage node as the global unique identifier of the main network card of the storage node;
and if the storage node accesses the Infiniband network by using the standby network card, setting the current network card identifier of the storage node as the global unique identifier of the standby network card of the storage node.
5. The network management method according to claim 1, further comprising, after acquiring the network card status of the primary network card according to a preset period:
setting the value of the network card flag bit of the storage node according to the network card state of the main network card of the storage node;
correspondingly, acquiring the network card states of the main network cards of other storage nodes in the distributed storage system includes:
acquiring network card flag bits of other storage nodes in the distributed storage system;
and determining the network card state of the main network card of the other storage nodes according to the values of the network card flag bits of the other storage nodes.
6. A network management system is applied to storage nodes in a distributed storage system, wherein the storage nodes comprise a main network card and a standby network card, and the network management system comprises:
the first network card state acquisition module is used for acquiring the network card state of the main network card according to a preset period; when the network card state of the main network card is a normal state, the storage node is accessed into the Infiniband network through the main network card;
a second network card status acquiring module, configured to acquire, if the network card status of the main network card of the storage node is an abnormal status, network card statuses of main network cards of other storage nodes in the distributed storage system;
the first network card switching module is used for judging whether the network card states of the main network cards of all the other storage nodes are abnormal states or not; if yes, switching the network card and accessing the Infiniband network by using the standby network card of the storage node; if not, the network card switching operation is not executed; the main network cards of all storage nodes in the distributed storage system are connected with a first Infiniband switch, and the standby network cards of all storage nodes in the distributed storage system are connected with a second Infiniband switch;
the second network card switching module is used for judging whether the network card states of the main network cards of other storage nodes are all normal states or not when the network card state of the main network card of the storage node is detected to be recovered to be the normal state; if so, switching the network card and accessing the Infiniband network by using the main network card of the storage node; if not, the network card switching operation is not executed.
7. The network management system according to claim 6, wherein the second network card status acquiring module is a module configured to remotely acquire, through the management network port of the storage node, the network card status of the master network card of the other storage node in the distributed storage system.
8. An electronic device, comprising a memory in which a computer program is stored and a processor which, when it is called up in the memory, implements the steps of the network management method according to any one of claims 1 to 5.
9. A storage medium having stored thereon computer-executable instructions which, when loaded and executed by a processor, carry out the steps of a network management method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010662970.3A CN111817892B (en) | 2020-07-10 | 2020-07-10 | Network management method, system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010662970.3A CN111817892B (en) | 2020-07-10 | 2020-07-10 | Network management method, system, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111817892A CN111817892A (en) | 2020-10-23 |
CN111817892B true CN111817892B (en) | 2023-04-07 |
Family
ID=72842670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010662970.3A Active CN111817892B (en) | 2020-07-10 | 2020-07-10 | Network management method, system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111817892B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143176A (en) * | 2021-10-31 | 2022-03-04 | 广东浪潮智慧计算技术有限公司 | Network configuration method, system and related device of virtualization platform management network |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101404568A (en) * | 2008-11-17 | 2009-04-08 | 国电南瑞科技股份有限公司 | Double-network card hot backup redundancy method |
CN102263660A (en) * | 2011-07-19 | 2011-11-30 | 中国舰船研究设计中心 | Dual-network card redundancy switching method and device |
CN102684946A (en) * | 2012-05-25 | 2012-09-19 | 中国舰船研究设计中心 | Dual-network-interface-card switching performance testing method for information integration system |
CN103259678A (en) * | 2013-04-28 | 2013-08-21 | 华为技术有限公司 | Main-auxiliary switching method, device, equipment and system |
CN106301836A (en) * | 2015-05-25 | 2017-01-04 | 北京视联动力国际信息技术有限公司 | A kind of method of redundancy backup, terminal and regard networked system |
CN106713036A (en) * | 2016-12-27 | 2017-05-24 | 中国建设银行股份有限公司 | Fault processing method and system of mobile terminal payment system |
CN107995106A (en) * | 2017-12-04 | 2018-05-04 | 山东超越数控电子股份有限公司 | A kind of interchanger redundant system of data storing platform |
CN109831341A (en) * | 2019-03-19 | 2019-05-31 | 中国电子科技集团公司第三十六研究所 | A kind of fast switch over method and device of redundancy double netcard |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
CN111212451A (en) * | 2019-12-26 | 2020-05-29 | 曙光信息产业股份有限公司 | Method and device for switching network transmission channel |
-
2020
- 2020-07-10 CN CN202010662970.3A patent/CN111817892B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101404568A (en) * | 2008-11-17 | 2009-04-08 | 国电南瑞科技股份有限公司 | Double-network card hot backup redundancy method |
CN102263660A (en) * | 2011-07-19 | 2011-11-30 | 中国舰船研究设计中心 | Dual-network card redundancy switching method and device |
CN102684946A (en) * | 2012-05-25 | 2012-09-19 | 中国舰船研究设计中心 | Dual-network-interface-card switching performance testing method for information integration system |
CN103259678A (en) * | 2013-04-28 | 2013-08-21 | 华为技术有限公司 | Main-auxiliary switching method, device, equipment and system |
CN106301836A (en) * | 2015-05-25 | 2017-01-04 | 北京视联动力国际信息技术有限公司 | A kind of method of redundancy backup, terminal and regard networked system |
CN106713036A (en) * | 2016-12-27 | 2017-05-24 | 中国建设银行股份有限公司 | Fault processing method and system of mobile terminal payment system |
CN107995106A (en) * | 2017-12-04 | 2018-05-04 | 山东超越数控电子股份有限公司 | A kind of interchanger redundant system of data storing platform |
CN109831341A (en) * | 2019-03-19 | 2019-05-31 | 中国电子科技集团公司第三十六研究所 | A kind of fast switch over method and device of redundancy double netcard |
CN111212451A (en) * | 2019-12-26 | 2020-05-29 | 曙光信息产业股份有限公司 | Method and device for switching network transmission channel |
CN111212127A (en) * | 2019-12-29 | 2020-05-29 | 浪潮电子信息产业股份有限公司 | Storage cluster, service data maintenance method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111817892A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108847982B (en) | Distributed storage cluster and node fault switching method and device thereof | |
US7802128B2 (en) | Method to avoid continuous application failovers in a cluster | |
CN105187249B (en) | A kind of fault recovery method and device | |
KR100420266B1 (en) | Apparatus and method for improving the availability of cluster computer systems | |
CN110807064B (en) | Data recovery device in RAC distributed database cluster system | |
US20080288812A1 (en) | Cluster system and an error recovery method thereof | |
EP3306476B1 (en) | Method and apparatus for hot cpu removal and hot cpu adding during operation | |
US10698605B2 (en) | Multipath storage device based on multi-dimensional health diagnosis | |
CN104036043A (en) | High availability method of MYSQL and managing node | |
CN109120522B (en) | Multipath state monitoring method and device | |
CN111176888A (en) | Cloud storage disaster recovery method, device and system | |
CN111104283A (en) | Fault detection method, device, equipment and medium of distributed storage system | |
CN111817892B (en) | Network management method, system, electronic equipment and storage medium | |
CN108512753B (en) | Method and device for transmitting messages in cluster file system | |
CN114064374A (en) | Fault detection method and system based on distributed block storage | |
CN111309515B (en) | Disaster recovery control method, device and system | |
CN112491633B (en) | Fault recovery method, system and related components of multi-node cluster | |
CN114189429A (en) | System, method, device and medium for monitoring server cluster faults | |
CN100463373C (en) | Centralized control and hierarchical implementing switching control method and device | |
CN115550287B (en) | Method for establishing remote copy relationship and related device | |
CN113609104B (en) | Method and device for accessing distributed storage system by key value of partial fault | |
CN115150253B (en) | Fault root cause determining method and device and electronic equipment | |
CN113868000B (en) | Link fault repairing method, system and related components | |
US20230090032A1 (en) | Storage system and control method | |
CN112000500A (en) | Communication fault determining method, processing method and storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |