CN106130761A

CN106130761A - The recognition methods of the failed network device of data center and device

Info

Publication number: CN106130761A
Application number: CN201610458020.2A
Authority: CN
Inventors: 胡晓赟; 金炜卿; 金国松
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-06-22
Filing date: 2016-06-22
Publication date: 2016-11-16
Anticipated expiration: 2036-06-22
Also published as: CN106130761B

Abstract

This application discloses recognition methods and the device of the failed network device of data center.One detailed description of the invention of described method includes: in response to getting the abnormal information including network appliance IP address, searches for representing the node of the described network equipment in the network topological diagram of data center, and is abnormal nodes by described node updates；For each abnormal nodes, set up abnormal nodes set, wherein, described abnormal nodes set include this abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes；Judging whether each abnormal nodes associated in abnormal nodes set exists the non-abnormal nodes being directly connected to respectively, wherein, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；Will determine that out that there is not the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents is identified as faulty equipment.This embodiment achieves the automatic identification of the failed network device of data center, improves the speed of data center's fault location.

Description

The recognition methods of the failed network device of data center and device

Technical field

The application relates to field of computer technology, is specifically related to network operation technical field, particularly relates to data center The recognition methods of failed network device and device.

Background technology

Along with the explosive increase of internet data, data center network structure is the most complicated, from network equipment level pipe Reason, fault location, the every aspect such as traffic management, the most manually add up and cannot understand data center and run present situation and carry out Management.

There is following problem in the management of existing data center: first, and data are disperseed, data center's internal data kind Class is various, it is impossible to integrated by unified effective method；Second, limitation, the data that data center collects are mostly single-point number According to, such as network flow data, equipment fault data, it is impossible to judge current data center running status from entirety, and carry out fault Effective location；3rd, poor in timeliness, existing management method often from one point data, is judged to be likely to occur event by artificial The equipment of barrier or link, it is impossible to quickly abnormal information is given interpretability result.

Summary of the invention

The purpose of the application is to propose recognition methods and the device of the failed network device of the data center of a kind of improvement, Solve the technical problem that background section above is mentioned.

First aspect, this application provides the recognition methods of the failed network device of a kind of data center, described method bag Include: in response to getting the abnormal information including network appliance IP address, search in the network topological diagram of data center for table Show the node of the described network equipment, and be abnormal nodes by described node updates；For each abnormal nodes, set up abnormal nodes Set, wherein, described abnormal nodes set include this abnormal nodes and and this abnormal nodes between path in do not include non- The abnormal nodes of abnormal nodes；Judge to associate each abnormal nodes in abnormal nodes set respectively whether to exist and be directly connected to Non-abnormal nodes, wherein, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；Will determine that Go out not exist the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents and be identified as faulty equipment.

In certain embodiments, described method also includes: build the step of the network topological diagram of data center, including: adopt The facility information of the network equipment of collection data center, wherein, described facility information include following at least one: device identification is believed Breath, opposite equip. IP address, opposite equip. Mac address, the port flow information of equipment, the error message of equipment local record； The network topology structure of data center is obtained by NMP；Described facility information and institute is set up according to IP address of equipment State the corresponding relation of network topology.

In certain embodiments, will determine that out described in and do not have what the abnormal nodes of the non-abnormal nodes being directly connected to represented The network equipment is identified as faulty equipment, also includes: inquire about for repairing described fault in the reparation operational set pre-set The reparation operation of equipment；Perform described reparation to operate.

In certain embodiments, described execution is described repairs operation, also includes: inquire about described faulty equipment and with described The operation information of the warping apparatus of faulty equipment association, to determine that described faulty equipment has been repaired successfully, wherein, described Operation information include following at least one: the port flow information of equipment, the CPU usage information of equipment, the internal memory of equipment account for Using rate information, wherein, the warping apparatus that described and described faulty equipment associates is and for representing the node of described faulty equipment Between path in do not include the network equipment that the abnormal nodes of non-abnormal nodes represents；If it is not, then to described faulty equipment And the upper layer network equipment of the warping apparatus associated with described faulty equipment sends the first message, wherein, described first message Send out for notifying described upper layer network equipment to stop to described faulty equipment and the warping apparatus that associates with described faulty equipment Send data.

In certain embodiments, described method also includes: determine by described according at least one in following item of information The availability value of the target device of faulty equipment impact or connective value: the connectedness of network topology, faulty equipment and target network Link between the capacity of link, faulty equipment and objective network between the weight of link, faulty equipment and objective network between network Redundancy, wherein, described target device is other network equipments in described data center in addition to described faulty equipment；Compare institute The availability threshold value stating availability value and pre-set, or compare the most described connective value and the connective threshold value pre-set；If Described availability value is less than the availability threshold value pre-set, or described connective value is less than the connective threshold value pre-set, Then sending the second message to the upper layer network equipment of described target device, wherein, described second message is used for notifying described upper strata The network equipment stops sending data to the infrastructure devices of described target device and described target device.

Second aspect, this application provides the identification device of the failed network device of a kind of data center, described device bag Include: abnormal nodes updating block, be configured in response to getting the abnormal information including network appliance IP address, search data For representing the node of the described network equipment in the network topological diagram at center, and it is abnormal nodes by described node updates；Abnormal Node set sets up unit, is configured to for each abnormal nodes, sets up abnormal nodes set, wherein, described abnormal nodes Set include this abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes；Judge Unit, whether each abnormal nodes being configured to judge to associate in abnormal nodes set respectively exists the non-exception being directly connected to Node, wherein, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；Recognition unit, joins Put for will determine that out that the network equipment that the abnormal nodes that there is not the non-abnormal nodes being directly connected to represents is identified as fault and sets Standby.

In certain embodiments, described device also includes network topological diagram construction unit, and described network topological diagram builds single Unit is configured to: gathering the facility information of the network equipment of data center, wherein, described facility information includes following at least one : equipment identification information, opposite equip. IP address, opposite equip. Mac address, the port flow information of equipment, equipment this locality note The error message of record；The network topology structure of data center is obtained by NMP；Institute is set up according to IP address of equipment State the corresponding relation of facility information and described network topology.

In certain embodiments, described recognition unit, also include: repair action queries subelement, be configured in advance In the reparation operational set arranged, inquiry is for repairing the reparation operation of described faulty equipment；Repair operation and perform subelement, join Put for performing described reparation operation.

In certain embodiments, described operation execution subelement of repairing is configured to further: inquire about described faulty equipment And the operation information of the warping apparatus associated with described faulty equipment, to determine that described faulty equipment is the most repaired into Merit, wherein, described operation information include following at least one: the port flow information of equipment, the CPU usage information of equipment, The memory usage information of equipment, wherein, the warping apparatus that described and described faulty equipment associates is and is used for representing described event Path between the node of barrier equipment does not include the network equipment that the abnormal nodes of non-abnormal nodes represents；If it is not, then to Described faulty equipment and the upper layer network equipment of warping apparatus associated with described faulty equipment send the first message, wherein, Described first message is used for notifying described upper layer network equipment to stop to described faulty equipment and with described faulty equipment and associates Warping apparatus send data.

In certain embodiments, described device also includes: impact determines unit, is configured to according in following item of information At least one determines that the availability value of the target device affected by described faulty equipment or connectedness are worth: the connection of network topology Property, the capacity of link, faulty equipment between the weight of link, faulty equipment and objective network between faulty equipment and objective network And the redundancy of link between objective network, wherein, described target device be in described data center in addition to described faulty equipment Other network equipments；Second message sending unit, the availability threshold value being configured to availability value described in comparison with pre-setting, Or compare the most described connective value and the connective threshold value pre-set；If described availability value is less than the availability threshold pre-set Value, or described connective value is less than the connective threshold value pre-set, then send to the upper layer network equipment of described target device Second message, wherein, described second message is used for notifying that described upper layer network equipment stops to described target device and described The infrastructure devices of target device sends data.

The recognition methods of the failed network device of the data center that the application provides and device, first, in response to getting Including the abnormal information of network appliance IP address, search in the network topological diagram of data center for representing the described network equipment Node, and be abnormal nodes by described node updates, secondly, for each abnormal nodes, sets up abnormal nodes set, again, Judge whether each abnormal nodes associated in abnormal nodes set exists the non-abnormal nodes being directly connected to respectively, finally will sentence Break and not exist the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents and be identified as faulty equipment.This embodiment party Formula achieves the automatic identification of the failed network device of data center, improves the speed of data center's fault location.

Accompanying drawing explanation

By the detailed description that non-limiting example is made made with reference to the following drawings of reading, other of the application Feature, purpose and advantage will become more apparent upon:

Fig. 1 is that the application can apply to exemplary system architecture figure therein；

Fig. 2 is the flow process of an embodiment of the recognition methods of the failed network device of the data center according to the application Figure；

Fig. 3 is the signal of an application scenarios of the recognition methods of the failed network device of the data center according to the application Figure；

Fig. 4 is the flow process of another embodiment of the recognition methods of the failed network device of the data center according to the application Figure；

Fig. 5 is showing of another application scenarios of the recognition methods of the failed network device of the data center according to the application It is intended to；

Fig. 6 is that the structure of the embodiment identifying device of the failed network device of the data center according to the application is shown It is intended to；

Fig. 7 is adapted for the structural representation of the computer system of the management station's equipment for realizing the embodiment of the present application.

Detailed description of the invention

With embodiment, the application is described in further detail below in conjunction with the accompanying drawings.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to this invention.It also should be noted that, in order to It is easy to describe, accompanying drawing illustrate only the part relevant to about invention.

It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases Combination mutually.Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.

Fig. 1 shows the recognition methods of the failed network device of the data center that can apply the application or the enforcement of device The exemplary system architecture 100 of example.

As it is shown in figure 1, system architecture 100 can include management station 101, router 102, switch 103,104, server 105,106,107,108 and network 109.Network 109 in order in management station 101 with include router 102, switch 103,104, The medium of transmission link is provided between the managed devices of server 105,106,107,108.Network 109 can include various company Connect type, the most wired, wireless transmission link or fiber optic cables etc..

Managed devices is in addition to shown in Fig. 1, it is also possible to be other network equipments supporting NMP, the most main Machine, NMP can be that SNMP (Simple Network Management Protocol, assist by simple network management View).Running on managed devices and have the software element of succedaneum (agent), this software element can be by SNMP to management station The facility information of 101 report managed devices and concrete equipment operation information.

Management station 101 can be used for representing in the network topological diagram of lookup data center in response to getting abnormal information The node of the described network equipment, and be abnormal nodes by described node updates, afterwards for each abnormal nodes, set up abnormal joint Point set, judges whether each abnormal nodes associated in abnormal nodes set exists the non-abnormal nodes being directly connected to respectively, Finally will determine that out that there is not the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents is identified as faulty equipment, Incorporate the abnormal information of full mesh topology with this, carried out the location of faulty equipment further by graph theory thought, contribute to quickly Fix a breakdown.

It should be understood that the number of management station, router, switch, server and the network in Fig. 1 is only schematically 's.According to realizing needs, can have any number of management station equipment, network and server.

Refer to Fig. 2, it illustrates of recognition methods of the failed network device of the data center according to the application The flow process 200 of embodiment.It should be noted that the identification of the failed network device of data center that the embodiment of the present application is provided Method is typically performed by the management station 101 in Fig. 1.The method comprises the following steps:

Step 201, in response to getting the abnormal information including network appliance IP address, the network searching data center is opened up Flutter for representing the node of the network equipment in figure, and be abnormal nodes by node updates.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) data center can be searched in response to getting the abnormal information including network appliance IP address For representing the node of the network equipment in network topological diagram, and it is abnormal nodes by node updates.Above-mentioned abnormal information can be Agent software element in managed devices sends.It can be that management station is actively sent out that management station obtains the information of managed devices Send request, act on behalf of and return corresponding data according to this request；It can also be management station's normal model of pre-setting some data Enclosing, the abnormal information sent with Receiving Agent, abnormal information refers to that agency can report to base station under the unsolicited state in base station Accuse the information of the not data within normal range.Above-mentioned data are arranged according to the different concrete of equipment, can be the ports of equipment Data on flows, the CPU usage data of equipment, the memory usage data etc. of equipment.

Simple Network Management Protocol, is made up of the standard of one group of network management, comprises an application layer protocol (application layer protocol), database model (database schema) and one group of resource object.This agreement NMS can be supported, in order to monitor whether the equipment being connected on network has any feelings causing the upper concern of management Condition.This agreement is the interconnection that Internet Engineering Task group (IETF, Internet Engineering Task Force) defines A part for fidonetFido bunch.SNMP can make network manager improve network management usefulness, finds in time and solves network problem And the growth of planning network.Network manager can also receive notification message and the alarm event of network node by SNMP Network produced problem is known in report etc..

OSPF (Open Shortest Path First ospf) is an Interior Gateway Protocol (Interior Gateway Protocol is called for short IGP), at single autonomous system (autonomous system, AS) Interior decision-making route.It is a kind of realization to link-state routing protocol, is subordinate to Interior Gateway Protocol (IGP), therefore operates on autonomy Internal system.IS-IS (Intermediate system to intermediate system, Intermediate System-to-Intermediate System) It is a kind of Interior Gateway Protocol, is one of commonly used Interior Gateway Protocol of telecom operators.

In some optional implementations of the present embodiment, said method also includes: the network building data center is opened up Flutter the step of figure, including: gather the facility information of the network equipment of data center, wherein, the said equipment information include with down to One item missing: equipment identification information, opposite equip. IP address, opposite equip. Mac address, the port flow information of equipment, equipment are originally The error message of ground record；The network topology structure of data center is obtained by NMP；Build according to IP address of equipment Vertical the said equipment information and the corresponding relation of above-mentioned network topology.Above-mentioned NMP can be snmp protocol, OSPF association View, Intermediate System to Intermediate System, specifically can use SNMP probe (being used for analyzing, monitoring, detect the software of SNMP flow) to gather data Central interior facility information, uses OSPF probe (being used for analyzing, monitoring, detect the software of OSPF traffic), IS-IS probe Protocol levels devices such as (being used for analyzing, monitoring, detect the software of IS-IS flow) finds that data center network topology is tied automatically Structure.

Step 202, for each abnormal nodes, sets up abnormal nodes set, and wherein, abnormal nodes set includes this exception Node and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) abnormal nodes set can be set up for abnormal nodes each in network topological diagram, wherein, abnormal Node set include this abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes. Non-abnormal nodes refers to not receive the node of relative abnormal information.Path not including, non-abnormal nodes refers to both It is joined directly together, or path between the two only includes abnormal nodes.

Step 203, judges to associate whether each abnormal nodes in abnormal nodes set exists be directly connected to non-respectively Abnormal nodes.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can judge to associate each abnormal nodes in abnormal nodes set respectively and whether exist and directly connect The non-abnormal nodes connect, wherein, association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set.Abnormal joint Point has identical abnormal nodes set and the relatedness of these abnormal nodes is described, the abnormal cause of these abnormal nodes is that these are different In Chang Jiedian, the fault of the network equipment of at least one abnormal nodes mark is caused.

Step 204, will determine that out and there is not the network equipment knowledge that the abnormal nodes of the non-abnormal nodes being directly connected to represents Wei faulty equipment.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can will determine that out and there is not the network that the abnormal nodes of the non-abnormal nodes being directly connected to represents Equipment is identified as faulty equipment.There is not the non-abnormal nodes being directly connected to illustrate that it may result in and all directly connect The exception of equipment.Faulty equipment is referred to as fault suspicion equipment in this application, and the fault determined according to this flow process sets Standby quantity is likely more than real faulty equipment quantity.Occur that abnormal reason is also likely to be the link that faulty equipment connects, Specifically can determine according to practical situation.

In some optional implementations of the present embodiment, there is not the non-abnormal joint being directly connected in above-mentioned will determine that out The network equipment that the abnormal nodes of point represents is identified as faulty equipment, also includes: look in the reparation operational set pre-set Ask the reparation operation for repairing above-mentioned faulty equipment；Perform above-mentioned reparation operation.

In some optional implementations of the present embodiment, the above-mentioned reparation of above-mentioned execution operates, and also includes: inquire about above-mentioned Faulty equipment and the operation information of warping apparatus associated with above-mentioned faulty equipment, to have determined above-mentioned faulty equipment the most Repair successfully, wherein, above-mentioned operation information include following at least one: the port flow information of equipment, the CPU usage of equipment Information, the memory usage information of equipment, wherein, the warping apparatus that above-mentioned and above-mentioned faulty equipment associates is and is used for representing State and the path between the node of faulty equipment does not include the network equipment that the abnormal nodes of non-abnormal nodes represents；If it does not, Then send the first message to above-mentioned faulty equipment and the upper layer network equipment of warping apparatus that associates with above-mentioned faulty equipment, its In, above-mentioned first message is used for notifying above-mentioned upper layer network equipment to stop to above-mentioned faulty equipment and with above-mentioned faulty equipment and closes The warping apparatus of connection sends data.

In some optional implementations of the present embodiment, said method also includes: according in following item of information extremely One item missing determines that the availability value of the target device affected by above-mentioned faulty equipment or connectedness are worth: the connection of network topology Property, the capacity of link, faulty equipment between the weight of link, faulty equipment and objective network between faulty equipment and objective network And the redundancy of link between objective network, wherein, above-mentioned target device be in above-mentioned data center in addition to above-mentioned faulty equipment Other network equipments；Relatively above-mentioned availability value and the availability threshold value pre-set, or the most above-mentioned connective value is with in advance The connective threshold value arranged；If above-mentioned availability value is less than the availability threshold value pre-set, or above-mentioned connective value is less than pre- The connective threshold value first arranged, then send the second message to the upper layer network equipment of above-mentioned target device, and wherein, above-mentioned second disappears Breath sends number for notifying above-mentioned upper layer network equipment to stop to the infrastructure devices of above-mentioned target device and above-mentioned target device According to.

It it is the application of the recognition methods of the failed network device of the data center according to the present embodiment with continued reference to Fig. 3, Fig. 3 One schematic diagram of scene.In the application scenarios of Fig. 3, abnormal information instruction node 1,2,6,7 occurs abnormal, and remaining node is Non-abnormal nodes, can note abnormalities node 1 according to abnormal collection of illustrative plates 300, and 2,6,7 is the most relevant, but node 1 is direct The node 3,4,5 connected is non-abnormal nodes, and the node 8 that node 2 is directly connected to is non-abnormal nodes, and node 6 is directly connected to Node 15 is non-abnormal nodes, can determine that and causes node 1, and 2,6,7 occur that abnormal reason is node 7, the network that node 7 represents Equipment fault.In actual application, can judge according to abnormal collection of illustrative plates connectedness, it is also possible to according to the association ratio between fault Example determines computing formula, analyzes abnormal cause more accurately.

The method that above-described embodiment of the application provides, by getting abnormal information, searches the network topology of data center For representing the node of the described network equipment in figure, and it is abnormal nodes by described node updates, afterwards for each abnormal joint Point, sets up abnormal nodes set, judges whether each abnormal nodes in association abnormal nodes set exists respectively and is directly connected to Non-abnormal nodes, finally will determine that out that there is not the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents knows Not Wei faulty equipment, incorporate the abnormal information of full mesh topology with this, carry out determining of faulty equipment by graph theory thought further Position, contributes to rapidly removing faults.

With further reference to Fig. 4, it illustrates another embodiment of the recognition methods of the failed network device of data center Flow process 400.The flow process 400 of the recognition methods of the failed network device of this data center, comprises the following steps:

Step 401, in response to getting the abnormal information including network appliance IP address, the network searching data center is opened up Flutter for representing the node of the network equipment in figure, and be abnormal nodes by node updates.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) data center can be searched in response to getting the abnormal information including network appliance IP address For representing the node of the network equipment in network topological diagram, and it is abnormal nodes by node updates.

Step 402, for each abnormal nodes, sets up abnormal nodes set.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) abnormal nodes set can be set up for each abnormal nodes, wherein, abnormal nodes set includes This abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes, wherein, abnormal joint Point set include this abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes.

Step 403, judges to associate whether each abnormal nodes in abnormal nodes set exists be directly connected to non-respectively Abnormal nodes.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can judge to associate each abnormal nodes in abnormal nodes set respectively and whether exist and directly connect The non-abnormal nodes connect, wherein, association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set.

Step 404, will determine that out and there is not the network equipment knowledge that the abnormal nodes of the non-abnormal nodes being directly connected to represents Wei faulty equipment.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can will determine that out and there is not the network that the abnormal nodes of the non-abnormal nodes being directly connected to represents Equipment is identified as faulty equipment.

Step 405, inquires about the reparation operation for repairing faulty equipment in the reparation operational set pre-set.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) the reparation behaviour for repairing faulty equipment can be inquired about in the reparation operational set pre-set Make.Repairing operation to arrange according to the difference of concrete equipment, conventional reparation operation can be to restart faulty equipment, startup separator The stand-by equipment of equipment, or send the reminder announced needs above-mentioned faulty equipment of artificial maintenance.

Step 406, performs to repair operation.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) the reparation operation that can perform to inquire in step 405.

Step 407, inquiry faulty equipment and the operation information of warping apparatus associated with faulty equipment, to determine fault Equipment has been repaired successfully, if it is not, then enter step 408, if it is terminates this flow process.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) faulty equipment can be inquired about after step 406 performs to repair operation and associate with faulty equipment The operation information of warping apparatus, to determine that faulty equipment has been repaired successfully, if it is not, then enter step 408, if It is to terminate this flow process.Wherein, operation information include following at least one: the port flow information of equipment, the CPU of equipment make By rate information, the memory usage information of equipment, wherein, the warping apparatus associated with faulty equipment is and is used for representing that fault sets The standby path between node does not include the network equipment that the abnormal nodes of non-abnormal nodes represents.

Step 408, sends first to faulty equipment and the upper layer network equipment of warping apparatus that associates with faulty equipment Message.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can be after step 407 determine faulty equipment repairing failure, to faulty equipment and and fault The upper layer network equipment of the warping apparatus of equipment association sends the first message.Wherein, the first message is used for notifying that upper layer network sets Standby stop to faulty equipment and the warping apparatus transmission data that associates with faulty equipment.Such as, if faulty equipment if The core switch of one machine room, then can notify that this machine room ignored automatically by top service device, external request service not handed over Process to this machine room.

Step 409, determines that the availability value of the target device affected by faulty equipment or connectedness are worth.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) can determine after step 404 identifies faulty equipment that the target affected by faulty equipment sets Standby availability value or connective value, can also determine in step 407 equally and determine after faulty equipment is not repaired successfully and be subject to The availability value of the target device of faulty equipment impact or connective value.

In the present embodiment, can determine according at least one in following item of information and to be affected by described faulty equipment The availability value of target device or connective value: the power of link between the connectedness of network topology, faulty equipment and objective network Weight, the redundancy of link between the capacity of link, faulty equipment and objective network between faulty equipment and objective network.Wherein, institute Stating target device is other network equipments in addition to described faulty equipment in described data center.The selection of specifying information item and setting Standby kind is relevant.Above-mentioned availability value or connective value can also be determined by machine learning method

As it is shown in figure 5, be node 11 for representing the node of faulty equipment, connective according to affecting parameter topological diagram α, link weight beta, link capacity γ, link redundancy δ (can also increase other parameters according to equipment difference), calculate node 111 Node is a* α+b* β+c* γ+d* δ (a, b, c, d are weighted value) because of abnormity point influence.Because impact segmentation kind (can be divided For equipment availability, connectedness etc.), weighted value a, b, c, d will change therewith.

Step 410, the availability threshold value comparing availability value with pre-setting, or relatively connective value with pre-set Connective threshold value, if availability value is less than the availability threshold value pre-set, or connective value is less than the connectedness pre-set Threshold value, then send the second message to the upper layer network equipment of target device.

In the present embodiment, the recognition methods of the failed network device of data center runs on electronic equipment (example thereon Management station as shown in Figure 1) availability value can be compared and the availability threshold value pre-set, or relatively connective value is with in advance The connective threshold value arranged, if availability value is less than the availability threshold value that pre-sets, or connective value is less than and pre-sets Connective threshold value, then send the second message to the upper layer network equipment of target device, and wherein, the second message is used for notifying upper wire Network equipment stops sending data to the infrastructure devices of target device and target device.Such as, if target device is machine room Core switch fault, is computed its availability and is reduced to 0.01, upwards stratum server send the second message be used for notifying its from Move and ignore this machine room, external request service is not given this machine room and processes.If the cluster exchange that target device is certain machine room Machine, is computed its availability and is reduced to 0.01, and this machine room partial fault is described, sends the second message to core switch layer and is used for Notify that this failed cluster switch processes is not given in external request service by it.

Figure 4, it is seen that compared with the embodiment that Fig. 2 is corresponding, the fault network of the data center in the present embodiment The flow process 400 of the recognition methods of equipment highlights the step that after identifying faulty equipment, fault restoration and impact judge.Thus, The scheme that the present embodiment describes can more quickly be fixed a breakdown, and preferably safeguards the normal operation of data center.

With further reference to Fig. 6, as to the realization of method shown in above-mentioned each figure, this application provides a kind of data center The embodiment identifying device of failed network device, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2, should Device specifically can apply in various electronic equipment.

As shown in Figure 6, the identification device 600 of the failed network device of the data center that the present embodiment is above-mentioned includes: abnormal Node updates unit 601, abnormal nodes set set up unit 602, judging unit 603 and recognition unit 604.Wherein, abnormal joint Point updating block 601, is configured in response to getting the abnormal information including network appliance IP address, searches data center For representing the node of the above-mentioned network equipment in network topological diagram, and it is abnormal nodes by above-mentioned node updates；Abnormal nodes collection Building vertical unit 602 jointly, be configured to for each abnormal nodes, set up abnormal nodes set, above-mentioned abnormal nodes set includes This abnormal nodes and and this abnormal nodes between path in do not include the abnormal nodes of non-abnormal nodes；Judging unit 603, Whether each abnormal nodes being configured to judge to associate in abnormal nodes set respectively exists the non-abnormal nodes being directly connected to, Above-mentioned association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；Recognition unit 604, be configured to by Judge not exist the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents and be identified as faulty equipment.

In the present embodiment, abnormal nodes updating block in the identification device 600 of the failed network device of data center 601, abnormal nodes set set up concrete process of unit 602, judging unit 603 and recognition unit 604 can be corresponding real referring to Fig. 2 Execute the associated description of the implementation of the step 201 in example, step 202, step 203 and step 204, do not repeat them here.

In some optional implementations of the present embodiment, said apparatus also includes network topological diagram construction unit, on State network topological diagram construction unit to be configured to: gathering the facility information of the network equipment of data center, wherein, the said equipment is believed Breath include following at least one: equipment identification information, opposite equip. IP address, opposite equip. Mac address, the port flow of equipment The error message of information, equipment local record；The network topology structure of data center is obtained by NMP；According to setting The corresponding relation of the said equipment information and above-mentioned network topology is set up in standby IP address.

In some optional implementations of the present embodiment, above-mentioned recognition unit 604, also include: repair action queries Subelement, is configured to the reparation operation inquired about in the reparation operational set pre-set for repairing above-mentioned faulty equipment； Repair operation and perform subelement, be configured to carry out above-mentioned reparation operation.

In some optional implementations of the present embodiment, above-mentioned reparation operation performs subelement and configures use further In: inquire about above-mentioned faulty equipment and the operation information of warping apparatus associated with above-mentioned faulty equipment, to determine above-mentioned fault Equipment has been repaired successfully, wherein, above-mentioned operation information include following at least one: the port flow information of equipment, set Standby CPU usage information, the memory usage information of equipment, wherein, the warping apparatus that above-mentioned and above-mentioned faulty equipment associates For the path between the node for representing above-mentioned faulty equipment does not include the net that the abnormal nodes of non-abnormal nodes represents Network equipment；If it is not, then to above-mentioned faulty equipment and the upper layer network equipment of warping apparatus that associates with above-mentioned faulty equipment Send the first message, wherein, above-mentioned first message be used for notifying above-mentioned upper layer network equipment stop to above-mentioned faulty equipment and The warping apparatus associated with above-mentioned faulty equipment sends data.

In some optional implementations of the present embodiment, said apparatus also includes: impact determines unit, is configured to Availability value or the connection of the target device affected by above-mentioned faulty equipment is determined according at least one in following item of information Property value: between the connectedness of network topology, faulty equipment and objective network between the weight of link, faulty equipment and objective network The redundancy of link between the capacity of link, faulty equipment and objective network, wherein, above-mentioned target device is in above-mentioned data center Other network equipments in addition to above-mentioned faulty equipment；Second message sending unit, is configured to the above-mentioned availability value of comparison with pre- The availability threshold value first arranged, or compare the most above-mentioned connective value and the connective threshold value pre-set；If above-mentioned availability value is low In the availability threshold value pre-set, or above-mentioned connective value is less than the connective threshold value pre-set, then set to above-mentioned target Standby upper layer network equipment sends the second message, wherein, above-mentioned second message be used for notifying above-mentioned upper layer network equipment stop to The infrastructure devices of above-mentioned target device and above-mentioned target device sends data.

Below with reference to Fig. 7, it illustrates the computer system 700 being suitable to the management station for realizing the embodiment of the present application Structural representation.

As it is shown in fig. 7, computer system 700 includes CPU (CPU) 701, it can be read-only according to being stored in Program in memorizer (ROM) 702 or be loaded into the program random access storage device (RAM) 703 from storage part 708 and Perform various suitable action and process.In RAM 703, also storage has system 700 to operate required various programs and data. CPU 701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 is also connected to always Line 704.

It is connected to I/O interface 705: include the importation 706 of keyboard, mouse etc. with lower component；Penetrate including such as negative electrode The output part 707 of spool (CRT), liquid crystal display (LCD) etc. and speaker etc.；Storage part 708 including hard disk etc.； And include the communications portion 709 of the NIC of such as LAN card, modem etc..Communications portion 709 via such as because of The network of special net performs communication process.Driver 710 is connected to I/O interface 705 also according to needs.Detachable media 711, such as Disk, CD, magneto-optic disk, semiconductor memory etc., be arranged in driver 710, in order to read from it as required Computer program as required be mounted into storage part 708.

Especially, according to embodiment of the disclosure, the process described above with reference to flow chart may be implemented as computer Software program.Such as, embodiment of the disclosure and include a kind of computer program, it includes being tangibly embodied in machine readable Computer program on medium, described computer program comprises the program code for performing the method shown in flow chart.At this In the embodiment of sample, this computer program can be downloaded and installed from network by communications portion 709, and/or from removable Unload medium 711 to be mounted.When this computer program is performed by CPU (CPU) 701, perform in the present processes The above-mentioned functions limited.

Flow chart in accompanying drawing and block diagram, it is illustrated that according to system, method and the computer journey of the various embodiment of the application Architectural framework in the cards, function and the operation of sequence product.In this, each square frame in flow chart or block diagram can generation One module of table, program segment or a part for code, a part for described module, program segment or code comprises one or more For realizing the executable instruction of the logic function of regulation.It should also be noted that some as replace realization in, institute in square frame The function of mark can also occur to be different from the order marked in accompanying drawing.Such as, the square frame that two succeedingly represent is actual On can perform substantially in parallel, they can also perform sometimes in the opposite order, and this is depending on involved function.Also want It is noted that the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, Ke Yiyong The special hardware based system of the function or operation that perform regulation realizes, or can refer to computer with specialized hardware The combination of order realizes.

It is described in the embodiment of the present application involved unit to realize by the way of software, it is also possible to by firmly The mode of part realizes.Described unit can also be arranged within a processor, for example, it is possible to be described as: a kind of processor bag Include abnormal nodes updating block, unit, judging unit and recognition unit are set up in abnormal nodes set.Wherein, the name of these unit Claiming to be not intended that the restriction to this unit itself under certain conditions, such as, abnormal nodes updating block is also described as " for each abnormal nodes, set up the unit of abnormal nodes set ".

As on the other hand, present invention also provides a kind of nonvolatile computer storage media, this non-volatile calculating Machine storage medium can be the nonvolatile computer storage media described in above-described embodiment included in device；Can also be Individualism, is unkitted the nonvolatile computer storage media allocating in management station.Above-mentioned nonvolatile computer storage media Storage has one or more program, when one or more program is performed by an equipment so that described equipment: ring Ying Yu gets the abnormal information including network appliance IP address, searches and is used in the network topological diagram of data center representing described The node of the network equipment, and be abnormal nodes by described node updates；For each abnormal nodes, set up abnormal nodes set, Wherein, described abnormal nodes set include this abnormal nodes and and this abnormal nodes between path in do not include non-abnormal joint The abnormal nodes of point；Judge whether each abnormal nodes associated in abnormal nodes set exists the non-exception being directly connected to respectively Node, wherein, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；Will determine that out and do not deposit The network equipment represented in the abnormal nodes of the non-abnormal nodes being directly connected to is identified as faulty equipment.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology of the particular combination of above-mentioned technical characteristic Scheme, also should contain in the case of without departing from described inventive concept simultaneously, above-mentioned technical characteristic or its equivalent feature carry out Combination in any and other technical scheme of being formed.Such as features described above has similar merit with (but not limited to) disclosed herein The technical scheme that the technical characteristic of energy is replaced mutually and formed.

Claims

1. the recognition methods of the failed network device of a data center, it is characterised in that described method includes:

In response to getting the abnormal information including network appliance IP address, search in the network topological diagram of data center for table Show the node of the described network equipment, and be abnormal nodes by described node updates；

For each abnormal nodes, set up abnormal nodes set, wherein, described abnormal nodes set include this abnormal nodes and And the path between this abnormal nodes does not include the abnormal nodes of non-abnormal nodes；

Judge whether each abnormal nodes associated in abnormal nodes set exists the non-abnormal nodes being directly connected to respectively, its In, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；

Will determine that out that there is not the network equipment that the abnormal nodes of the non-abnormal nodes being directly connected to represents is identified as faulty equipment.

Method the most according to claim 1, it is characterised in that described method also includes:

Build the step of the network topological diagram of data center, including:

Gather the facility information of the network equipment of data center, wherein, described facility information include following at least one: equipment mark Knowledge information, opposite equip. IP address, opposite equip. Mac address, the port flow information of equipment, the mistake letter of equipment local record Breath；

The network topology structure of data center is obtained by NMP；

The corresponding relation of described facility information and described network topology is set up according to IP address of equipment.

Method the most according to claim 1, it is characterised in that described in will determine that out and there is not the non-abnormal joint being directly connected to The network equipment that the abnormal nodes of point represents is identified as faulty equipment, also includes:

The reparation operation for repairing described faulty equipment is inquired about in the reparation operational set pre-set；

Perform described reparation to operate.

Method the most according to claim 3, it is characterised in that described execution is described repairs operation, also includes:

Inquire about described faulty equipment and the operation information of warping apparatus associated with described faulty equipment, to determine described fault Equipment has been repaired successfully, wherein, described operation information include following at least one: the port flow information of equipment, set Standby CPU usage information, the memory usage information of equipment, wherein, the warping apparatus that described and described faulty equipment associates For the path between the node for representing described faulty equipment does not include the net that the abnormal nodes of non-abnormal nodes represents Network equipment；

If it is not, then send to described faulty equipment and the upper layer network equipment of warping apparatus that associates with described faulty equipment First message, wherein, described first message be used for notifying described upper layer network equipment stop to described faulty equipment and with institute The warping apparatus stating faulty equipment association sends data.

5. according to the method according to any one of claim 1-4, it is characterised in that described method also includes:

According at least one in following item of information determine the target device affected by described faulty equipment availability value or Connective value: the weight of link, faulty equipment and objective network between the connectedness of network topology, faulty equipment and objective network Between the capacity of link, the redundancy of link between faulty equipment and objective network, wherein, described target device is in described data Other network equipments in addition to described faulty equipment in the heart；

Relatively described availability value and the availability threshold value pre-set, or compare the most described connective value and the connection pre-set Property threshold value；

If described availability value is less than the availability threshold value pre-set, or described connective value is less than the connectedness pre-set Threshold value, then send the second message to the upper layer network equipment of described target device, and wherein, described second message is used for notifying described Upper layer network equipment stops sending data to the infrastructure devices of described target device and described target device.

6. the identification device of the failed network device of a data center, it is characterised in that described device includes:

Abnormal nodes updating block, is configured in response to getting the abnormal information including network appliance IP address, searches number According to the network topological diagram at center is used for representing the node of the described network equipment, and it is abnormal nodes by described node updates；

Unit is set up in abnormal nodes set, is configured to for each abnormal nodes, sets up abnormal nodes set, wherein, described Abnormal nodes set include this abnormal nodes and and this abnormal nodes between path in do not include the exception of non-abnormal nodes Node；

Whether judging unit, be configured to judge respectively to associate each abnormal nodes in abnormal nodes set and exist and be directly connected to Non-abnormal nodes, wherein, described association abnormal nodes set includes the abnormal nodes with identical abnormal nodes set；

Recognition unit, is configured to will determine that out that there is not the network that the abnormal nodes of the non-abnormal nodes being directly connected to represents sets For being identified as faulty equipment.

Device the most according to claim 6, it is characterised in that described device also includes network topological diagram construction unit, institute State network topological diagram construction unit to be configured to:

The network topology structure of data center is obtained by NMP；

Device the most according to claim 6, it is characterised in that described recognition unit, also includes:

Repair action queries subelement, be configured to inquire about for repairing described fault in the reparation operational set pre-set The reparation operation of equipment；

Repair operation and perform subelement, be configured to carry out described reparation and operate.

Device the most according to claim 8, it is characterised in that described reparation operation performs subelement and configures use further In:

10. according to the device according to any one of claim 6-9, it is characterised in that described device also includes:

Impact determines unit, and being configured to determine according at least one in following item of information is affected by described faulty equipment The availability value of target device or connective value: the power of link between the connectedness of network topology, faulty equipment and objective network Weight, the redundancy of link between the capacity of link, faulty equipment and objective network between faulty equipment and objective network, wherein, institute Stating target device is other network equipments in addition to described faulty equipment in described data center；

Second message sending unit, the availability threshold value being configured to availability value described in comparison with pre-setting, or compare institute State connective value and the connective threshold value pre-set；If described availability value is less than the availability threshold value pre-set, or institute State connective value and be less than the connective threshold value pre-set, then send second to the upper layer network equipment of described target device and disappear Breath, wherein, described second message is used for notifying that described upper layer network equipment stops setting to described target device and described target Standby infrastructure devices sends data.