CN107995030B - Network detection method, network fault detection method and system - Google Patents

Network detection method, network fault detection method and system Download PDF

Info

Publication number
CN107995030B
CN107995030B CN201711216097.XA CN201711216097A CN107995030B CN 107995030 B CN107995030 B CN 107995030B CN 201711216097 A CN201711216097 A CN 201711216097A CN 107995030 B CN107995030 B CN 107995030B
Authority
CN
China
Prior art keywords
network
data
host
probe
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711216097.XA
Other languages
Chinese (zh)
Other versions
CN107995030A (en
Inventor
杨龙
王金龙
邓谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHEZHI HULIAN (BEIJING) SCIENCE & TECHNOLOGY CO LTD
Original Assignee
CHEZHI HULIAN (BEIJING) SCIENCE & TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHEZHI HULIAN (BEIJING) SCIENCE & TECHNOLOGY CO LTD filed Critical CHEZHI HULIAN (BEIJING) SCIENCE & TECHNOLOGY CO LTD
Priority to CN201711216097.XA priority Critical patent/CN107995030B/en
Publication of CN107995030A publication Critical patent/CN107995030A/en
Application granted granted Critical
Publication of CN107995030B publication Critical patent/CN107995030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Abstract

The invention discloses a network detection method, a network fault detection method and a system, wherein the network detection method is suitable for being executed in a network server, the network server is in communication connection with one or more aggregation servers and is accessed to a plurality of internet data centers, and the network detection method comprises the following steps: acquiring network topology information, wherein the network topology information comprises serial numbers and connection relations of all hosts, all switches and all internet data centers; generating a network detection list corresponding to each host according to the network topology information; responding to a list updating request sent by a data probe, sending a network detection list corresponding to a host associated with the data probe to the data probe so as to indicate the data probe to carry out network state detection at a preset first time interval, and sending a detection result to a network server; and receiving the detection results reported by the data probes, forming corresponding index data by the detection results, and sending the index data to the corresponding aggregation server.

Description

Network detection method, network fault detection method and system
Technical Field
The present invention relates to the field of computer networks, and in particular, to a network detection method, a network fault detection method, and a network fault detection system.
Background
An internet company often has thousands of servers, which are distributed in multiple computer rooms throughout the country and are connected with one another to form an intranet of the company. The importance of the communication quality is self-evident when wide network communication exists between the machine rooms of the intranet and between the servers. The communication quality is measured by corresponding index data, and the generation of the index data depends on the detection result obtained by detecting the network space.
The existing network detection schemes are divided into two types, one type is manual detection, time and labor are wasted, data support is lacked, accuracy is low, the other type is active detection of network conditions in a mode that probes are deployed on all online servers, but the problems that network topology coverage is incomplete and detection data are difficult to comprehensively utilize exist. Considering that the network fault detection method usually assumes the result of network detection, and timely and accurately monitors the quality of the cross-machine-room network and locates faults, which is very important for ensuring the normal operation of company services, a new network detection and network fault detection scheme is needed to improve the processing process in order to change the diagnosis and location problem from passive to active.
Disclosure of Invention
To this end, the present invention provides a technical solution for network probing and network failure detection in an attempt to solve or at least alleviate the above-existing problems.
According to an aspect of the present invention, there is provided a network probing method, adapted to be executed in a network server, where the network server is provided with a configuration management database, the configuration management database stores latest network topology information, the network server is communicatively connected to one or more aggregation servers and is accessed to a plurality of internet data centers, each internet data center is provided with a plurality of switches, each switch is accessed to a plurality of hosts, and each host is provided with a corresponding data probe in advance and is communicatively connected to the network server, the method includes the following steps: firstly, acquiring network topology information from a configuration management database, wherein the network topology information comprises serial numbers and connection relations of all hosts, all switches and all internet data centers; generating a network detection list corresponding to each host according to the network topology information; responding to a list updating request sent by a data probe, sending a network detection list corresponding to a host associated with the data probe to the data probe so as to indicate the data probe to carry out network state detection at a preset first time interval, and sending a detection result to a network server; and receiving the detection results reported by the data probes, forming corresponding index data by the detection results, and sending the index data to the corresponding aggregation server.
Optionally, in the network probing method according to the present invention, the step of generating the network probing list corresponding to each host according to the network topology information includes: according to the network topology information, setting other hosts under the switch to which the host belongs as target hosts to be detected for each host; and generating a corresponding network detection list according to the IP addresses of the host and each target host.
Optionally, in the network probing method according to the present invention, the step of generating the network probing list corresponding to each host according to the network topology information includes: according to the network topology information, for each host, setting the host with the same number under other switches included in the internet data center to which the host belongs as a target host to be detected; and generating a corresponding network detection list according to the IP addresses of the host and each target host.
Optionally, in the network probing method according to the present invention, the step of generating the network probing list corresponding to each host according to the network topology information includes: according to the network topology information, for each host, setting the host with the same number under each switch included in other internet centers to be a target host to be detected; and generating a corresponding network detection list according to the IP addresses of the host and each target host.
Optionally, in the network probing method according to the present invention, the step of sending the network probing list to the data probe in response to the list update request sent by the data probe includes: responding to a list updating request sent by the data probe, wherein the list updating request is sent to the network server by the data probe according to a preset second time interval; sending the version of the network detection list corresponding to the associated host to the data probe so as to indicate the data probe to judge whether to update the network detection list according to the version; and receiving the determination update information fed back by the data probe, and sending a network detection list corresponding to the host associated with the data probe to the data probe.
Optionally, in the network probing method according to the present invention, a cache database is further disposed in the network server, and the method further includes: acquiring network topology information stored in a configuration management database according to a preset third time interval; and if the acquired network topology information is changed from the previous network topology information, generating a new network detection list according to the acquired network topology information, and storing the new network detection list in a cache database for updating.
Optionally, in the network probing method according to the present invention, the step of forming corresponding index data from each probing result and sending the index data to the corresponding aggregation server includes: checking each detection result; and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
According to yet another aspect of the invention, there is provided a computing device comprising one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing a network probing method according to the invention.
According to yet another aspect of the present invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform a network probing method according to the present invention.
According to another aspect of the present invention, there is provided a network fault detection method, adapted to be executed in a network fault detection system, where the system includes one or more network servers, an aggregation server, a database server, and a plurality of data probes, the system is accessed to a plurality of internet data centers, each internet data center is deployed with a plurality of switches, each switch is accessed to a plurality of hosts, each host is pre-configured with a corresponding data probe, and the network server stores a network probe list corresponding to the host associated with each data probe, the method includes the following steps: firstly, a data probe acquires a network detection list corresponding to an associated host from a network server, performs network state detection at a preset first time interval according to the acquired network detection list, and sends a detection result to the network server; the network server receives the detection results reported by the data probes, forms corresponding index data with the detection results and sends the index data to the corresponding aggregation server; the aggregation server receives one or more index data reported by the network server, detects the network connection according to the index data with abnormal states, sends out an alarm if the detection result indicates that the network connection fails, performs aggregation calculation on the index data with normal states to obtain corresponding network quality data, and sends the network quality data to the database server; and the database server receives and stores the network quality data sent by the aggregation server.
Optionally, in the network fault detection method according to the present invention, the step of the data probe acquiring, from the network server, a network probe list corresponding to the associated host includes: according to a preset second time interval, sending a list updating request to a network server to obtain a version of a network detection list corresponding to the associated host; and if the version is newer than the version of the current network detection list of the data probe, acquiring the network detection list corresponding to the associated host from the network server to replace the current network detection list of the data probe.
Optionally, in the network fault detection method according to the present invention, the method further includes: the data probe judges whether the consumed system resources are overloaded or not according to a preset resource occupation rule; if the consumed system resources are overloaded, stopping detection; if the consumed system resources are not overloaded, the detection is continued.
Optionally, in the network fault detection method according to the present invention, a configuration management database and a cache database are provided in the network server, the configuration management database stores the latest network topology information, the network topology information includes the numbers and connection relations of the hosts, the switches, and the internet data centers, the method further includes the steps of the network server generating in advance a network probe list corresponding to the host associated with each data probe, and the step of generating in advance a network probe list corresponding to the host associated with each data probe includes: according to the network topology information, setting other hosts under the switch to which the host belongs as target hosts to be detected for each host; setting hosts with the same number as other switches under the Internet data center to which the host belongs as target hosts to be detected; setting hosts with the same number as the hosts under all the switches of other Internet centers to which the host does not belong as target hosts to be detected; and generating a corresponding network detection list according to the IP addresses of the host and each target host, and storing the network detection list in a cache database.
Optionally, in the network fault detection method according to the present invention, the method further includes: the network server acquires network topology information stored in a configuration management database according to a preset third time interval; and if the acquired network topology information is changed from the previous network topology information, generating a new network detection list according to the acquired network topology information, and storing the new network detection list in a cache database for updating.
Optionally, in the network fault detection method according to the present invention, the step of forming corresponding index data from each detection result and sending the index data to the corresponding aggregation server includes: checking each detection result; and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
Optionally, in the network failure detection method according to the present invention, the step of detecting the network connection according to the index data of each abnormal state includes: according to the index data of each abnormal state, detecting the number ratio of hosts with network faults connected to another switch between every two switches under each internet data center and hosts under one switch; and judging that the network connection fault between the switches exists in each switch with the host number ratio exceeding a preset first ratio.
Optionally, in the network failure detection method according to the present invention, the step of detecting the network connection according to the index data of each abnormal state includes: detecting the number ratio of a first switch of each switch, which is connected to the switches under other Internet data centers, under the Internet data center to which the switch belongs, for the switches of which the number ratio of the hosts does not exceed a preset first ratio; and judging the existence of network connection faults between the internet data centers and the exchangers for all the internet data centers with the number of the first exchangers more than a preset second proportion.
Optionally, in the network failure detection method according to the present invention, the step of detecting the network connection according to the index data of each abnormal state includes: detecting the number ratio of a second exchanger of the network data center to which the exchanger belongs and the network fault of the exchanger connected to other internet centers in the exchanger of which the number ratio of the hosts does not exceed the preset first ratio; and judging that the network connection fault between the internet data centers exists in each internet data center with the number of the second switches in a ratio exceeding a preset third ratio.
Optionally, in the network fault detection method according to the present invention, the step of performing aggregation calculation on the index data in the normal states to obtain corresponding network quality data includes: generating sub-position value data according to a preset fourth time interval for the index data in the normal state; aggregating the index data in normal states according to the corresponding switches, and counting the network quality indexes of the switches; aggregating the index data in normal states according to the corresponding internet data centers, and counting the network quality indexes of the internet data centers; and combining the generated place grading value data and the network quality indexes of all the switches and all the Internet data centers to form corresponding network quality data.
According to another aspect of the present invention, there is provided a network fault detection system, including one or more network servers, an aggregation server, a database server, and a plurality of data probes, where the system is accessed to a plurality of internet data centers, each internet data center is deployed with a plurality of switches, each switch is accessed to a plurality of hosts, each host is pre-configured with a corresponding data probe, and the network server stores a network probe list corresponding to the host associated with each data probe, in the system: the data probe is suitable for acquiring a network detection list corresponding to the associated host from the network server, detecting the network state at a preset first time interval according to the acquired network detection list, and sending a detection result to the network server; the network server is suitable for receiving the detection results reported by the data probes, forming corresponding index data by the detection results and then sending the index data to the corresponding aggregation server; the aggregation server is suitable for receiving one or more index data reported by the network server, detecting network connection according to the index data with abnormal states, giving an alarm if the detection result indicates that the network connection fails, performing aggregation calculation on the index data with normal states to obtain corresponding network quality data, and sending the network quality data to the database server; the database server is adapted to receive and store the network quality data sent by the aggregation server.
According to the technical scheme of the network detection, the network detection list corresponding to each host is generated based on the network topology information, the network detection list corresponding to the host associated with the data probe is sent to the data probe to indicate the data probe to carry out network state detection, the detection results reported by the data probe are received, and the detection results form corresponding index data and are sent to the corresponding aggregation server. In the technical scheme, the network topology information comprises the serial numbers and the connection relations of the hosts, the switches and the internet data centers, and the network detection list generated according to the network topology information can cover all the hosts, so that the network full coverage is realized, the calculation amount and the system resource consumption are reduced, the network connection condition between any two hosts can be detected in real time, and the self-adaptive network topology change is realized by issuing the network detection list.
Therefore, according to the technical scheme of the network fault detection, on the basis of the network detection scheme, the abnormal network connection state is actively alarmed, the fault response time is greatly shortened, and the network quality data between any two switches and any two internet data centers can be obtained in real time through distributed aggregation of index data.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a network fault detection system 100 according to one embodiment of the invention;
FIG. 2 illustrates a block diagram of a computing device 200, according to an embodiment of the invention;
FIG. 3 shows a flow diagram of a network fault detection method 300 according to one embodiment of the invention; and
fig. 4 shows a flow diagram of a network probing method 400 according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a schematic diagram of a network fault detection system 100 according to one embodiment of the invention. It should be noted that the network failure detection system 100 in fig. 1 is only exemplary, and in a specific practical situation, there may be different numbers of network servers, aggregation servers, database servers and data probes in the network failure detection system 100, and the present invention does not limit the number of network servers, aggregation servers, database servers and data probes included in the network failure detection system 100. As shown in fig. 1, the network fault detection system 100 includes a network server 700, an aggregation server 800, a database server 900, and N data probes, which are respectively a data probe 1, a data probe 2, … …, and a data probe N, where N is a positive integer.
The network failure detection system 100 is connected to a plurality of Internet Data Centers (IDC), each of which has a plurality of switches, each of which has a plurality of hosts connected thereto, and each of the hosts has a corresponding Data probe preset therein, so that the number of the Data probes is set according to the number of the hosts, which is also N. The network server 700 stores a network probe list corresponding to the host associated with each data probe.
Specifically, for the data probes 1 to N, each data probe acquires a network probing list corresponding to the associated host from the network server 700, performs network state probing at a preset first time interval according to the acquired network probing list, and sends a probing result to the network server 700. The network server 700 receives the detection results reported by the data probes, forms corresponding index data for each detection result, and sends the index data to the corresponding aggregation server 800. The aggregation server 800 receives one or more index data reported by the network server 700, detects network connection according to the index data in each abnormal state, sends out an alarm if the detection result indicates that a network connection fault occurs, performs aggregation calculation on the index data in each normal state to obtain corresponding network quality data, and sends the network quality data to the database server 900. The database server 900 receives and stores the network quality data transmitted by the aggregation server 800.
FIG. 2 shows a block diagram of a computing device 200, according to one embodiment of the invention. In a basic configuration 202, the computing device 200 typically includes a system memory 206 and one or more processors 204. A memory bus 208 may be used for communication between the processor 204 and the system memory 206.
Depending on the desired configuration, the processor 204 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 204 may include one or more levels of cache, such as a level one cache 210 and a level two cache 212, a processor core 214, and registers 216. Example processor cores 214 may include Arithmetic Logic Units (ALUs), Floating Point Units (FPUs), digital signal processing cores (DSP cores), or any combination thereof. The example memory controller 218 may be used with the processor 204, or in some implementations the memory controller 218 may be an internal part of the processor 204.
Depending on the desired configuration, system memory 206 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 206 may include an operating system 220, one or more applications 222, and program data 226. In some implementations, the program 222 can be arranged to execute instructions on the operating system with the program data 224 by the one or more processors 204.
Computing device 200 may also include an interface bus 240 that facilitates communication from various interface devices (e.g., output devices 242, peripheral interfaces 244, and communication devices 246) to the basic configuration 102 via the bus/interface controller 230. The example output device 242 includes a graphics processing unit 248 and an audio processing unit 250. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 252. Example peripheral interfaces 244 can include a serial interface controller 254 and a parallel interface controller 256, which can be configured to facilitate communications with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 258. An example communication device 246 may include a network controller 260, which may be arranged to facilitate communications with one or more other computing devices 262 over a network communication link via one or more communication ports 264.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 200 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-form factor portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-browsing device, a personal headset device, an application-specific device, or a hybrid device that include any of the above functions. Computing device 200 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the computing device 200 may be implemented as a network server, aggregation server, and/or database server and configured to perform a network failure detection method in accordance with the present invention. Among other things, one or more programs 222 of computing device 200 include instructions for performing the network fault detection method in accordance with the present invention.
Fig. 3 shows a flow diagram of a network failure detection method 300 according to one embodiment of the invention. The network server 700, aggregation server 800, database server 900, and data probes 1-N are configured to collectively perform the processing of the network fault detection method 300 by data communication therebetween when performing the network fault detection method 300 according to the present invention, where the one or more programs 222 of the computing device 200 respectively embodied as the network server 700, aggregation server 800, database server 900 include instructions for performing the network fault detection method 300 according to the present invention.
As shown in fig. 3, the method 300 begins at step S311. In step S311, the data probe acquires a network probe list corresponding to the associated host from the network server 700. The network probe list is generated in advance by the network server 700 according to the network topology information, and for convenience of subsequent description, the network probe list corresponding to the host associated with each data probe generated in advance by the network server 700 is described here.
Specifically, the network server 700 is provided with a configuration management database and a cache database, the configuration management database stores the latest network topology information, and the network topology information includes the serial numbers and the connection relationships of the hosts, the switches, and the internet data centers. According to an embodiment of the present invention, when generating the network probing list, the network server 700 obtains the network topology information from the configuration management database, sets, for each host, other hosts under the switch to which the host belongs as target hosts to be probed according to the network topology information, generates a corresponding network probing list according to the IP addresses of the host and each target host, and generates a corresponding network probing list according to the IP addresses of the host and each target host. According to another embodiment of the present invention, when generating the network probing list, the network server 700 obtains the network topology information from the configuration management database, sets, for each host, hosts with the same number as the hosts under other switches included in the internet data center to which the host belongs as target hosts to be probed, and generates the corresponding network probing list according to the IP addresses of the host and each target host. According to another embodiment of the present invention, when generating the network probing list, the network server 700 obtains the network topology information from the configuration management database, sets, for each host, hosts with the same number as the hosts under the switches included in other internet centers to which the host does not belong as target hosts to be probed according to the network topology information, and generates the corresponding network probing list according to the IP addresses of the host and the target hosts.
Considering that the larger the number of target hosts covered by the network probe list, the better the network status can be detected, according to another embodiment of the present invention, the network probe list corresponding to the host associated with each data probe can be generated in advance as follows. Firstly, network topology information is obtained from a configuration management database, other hosts under a switch to which the host belongs are set as target hosts to be detected for each host according to the network topology information, and a corresponding network detection list is generated according to the IP addresses of the host and each target host. Then, the host with the same number under other switches included in the internet data center to which the host belongs is set as a target host to be detected, the host with the same number under each switch included in other internet centers to which the host does not belong is set as the target host to be detected, finally, a corresponding network detection list is generated according to the IP addresses of the host and each target host, and the network detection list is stored in a cache database.
In this embodiment, the network topology information specifically includes: the network fault detection system 100 is accessed to 2 internet data centers and respectively recorded as IDC and IDC, 2 switches are respectively deployed on IDC and IDC, switches deployed on IDC are respectively recorded as SW and SW, 4 hosts are respectively accessed under switches SW, SW and SW, hosts accessed by SW are respectively recorded as H, H and H, the numbers are sequentially 1, 2, 3 and 4, hosts accessed by SW are respectively recorded as H, H and H, and the numbers are sequentially 1, 2, 3 and 4. It can be seen that the number N of data probes is 16 for 16 hosts, and the data probes 1 to 16 are sequentially installed in the hosts H1 to H16. Table 1 shows an example of storage of network topology information according to an embodiment of the present invention, which is specifically as follows:
Figure BDA0001485533720000111
TABLE 1
The following describes a process of generating a corresponding network probe list in advance, taking the generation of the host H1 associated with the data probe 1 as an example. For the host H1, all other hosts H2, H3 and H4 under the switch SW1 to which the host belongs are set as target hosts to be probed, hosts H5 with the same number as the other switches SW2 included in the internet data center IDC1 to which the host belongs are set as target hosts to be probed, hosts H9 and H13 with the same number as the other switches SW3 and SW4 included in the other internet center IDC2 to which the host does not belong are set as target hosts to be probed, corresponding network probe lists L1 are generated according to the IP addresses of the host H1 and the target hosts H5, H9 and H13, and the network probe lists L1 are stored in the cache database. Meanwhile, in consideration of the feasibility and convenience of probing, the network probe list L1 further includes IP addresses of switches and internet data centers to which the host H1, the target hosts H5, H9, and H13 are connected. In addition, if not specifically mentioned later, the data probe 1 and its associated host H1 will be taken as an example to further explain the technical solution of the present invention.
Since the network probe list is changed with the change of the network topology information, according to another embodiment of the present invention, the network server 700 obtains the network topology information stored in the configuration management database at a preset third time interval, and if the obtained network topology information is changed from the previous network topology information, generates a new network probe list according to the obtained network topology information, and stores the new network probe list in the cache database for updating. In this embodiment, the network topology information stored in the configuration management database may be updated in real time, and the third time interval is preset to 30 minutes.
Regarding the process of the data probe acquiring the network probe list corresponding to the associated host from the network server 700 in step S311, according to an embodiment of the present invention, the following manner may be implemented. First, the data probe sends a list update request to the network server 700 according to a preset second time interval to obtain a version of the network probe list corresponding to the associated host. Then, the network server 700 responds to the list update request sent by the data probe, and sends the version of the network probe list corresponding to the host associated with the data probe to instruct the data probe to determine whether to update the network probe list according to the version, where the list update request is sent by the data probe to the network server 700 at a second preset time interval. At this time, the data probe receives the version of the latest network probe list returned by the network server 700, compares the version with the version of the current network probe list, and sends the update-determining information to the network server 700 if the version is newer than the version of the current network probe list of the data probe. The network server 700 receives the determination update information fed back by the data probe, and sends the network probing list corresponding to the associated host to the data probe, so that the data probe acquires the network probing list corresponding to the associated host from the network server 700 to replace the current network probing list of the data probe.
In this embodiment, the second time interval is preset to 10 minutes, and the data probe 1 sends a list update request to the web server 700 every 10 minutes to obtain the version of the web probe list L1 corresponding to the associated host H1. The network server 700 responds to the list update request sent by the data probe 1, and sends the version of the network probe list L1 corresponding to the host H1 associated with the network server to the data probe 1, where the version is 3.1.5. The data probe 1 receives the version of the latest network probe list L1 returned by the network server 700, compares the version with the version of the current network probe list, determines that the network probe list needs to be updated because the version of the current network probe list is 3.1.4, and the version 3.1.5 is newer than the version 3.1.4, and the data probe 1 sends the update determination information to the network server 700. The network server 700 receives the determination update information fed back by the data probe 1, and sends the latest network probe list L1 corresponding to the host H1 associated with the data probe to the data probe, and finally the data probe 1 replaces the current network probe list with the network probe list L1.
Subsequently, step S312 is executed, and the data probe performs network state detection at a preset first time interval according to the acquired network detection list. According to an embodiment of the present invention, the first time interval is preset to 120 seconds, and the data probe 1 probes the network status between the network host H1 and the target hosts H5, H9 and H13 every 120 seconds according to the acquired network probe list L1, typically by PING (Packet Internet Groper) command to check whether the network is in a connected state. In addition, there is no limitation on the protocol used in the probing process, such as HTTP, TCP, UDP, ICMP, and the like. Different probing protocols are used according to different probing requirements, HTTP or TCP can be used if an index of the service quality is required to be biased, and ICMP can be used if an index of the network is required to be biased. Of course, the probing protocols may be combined arbitrarily, such as using HTTP and ICMP, and the probing protocols are generally specified in the network probe list by the network server 700, so that the data probes can obtain the corresponding probing protocols from the network probe list after acquiring the corresponding network probe list.
Next, in step S313, the data probe sends a detection result obtained by detecting the network state to the network server 700. According to an embodiment of the present invention, after the data probe 1 detects one target host, a corresponding detection result is generated, and the detection result generally includes the following fields: the system comprises an operating system, a detection starting time, a detection ending time, a detection protocol, a target host, a switch to which the target host belongs and/or an internet data center to which the target host belongs. Considering that the system is unnecessarily burdened by transmitting a single probe result separately, batch transmission may be adopted for a round of probe results, for example, 128 probe results are compressed and packaged and then transmitted to the network server 700.
Furthermore, in order to control the occupation of server resources and avoid interfering with the normal execution of the on-line service, each data probe needs to continuously check the system resources consumed by itself and perform subsequent processing according to the consumption. According to one embodiment of the invention, the data probe judges whether the consumed system resource is overloaded according to a preset resource occupation rule, if the consumed system resource is overloaded, the detection is stopped, and if the consumed system resource is not overloaded, the detection is continued. In this embodiment, the resource occupancy rule may be preset to have a CPU occupancy below 99% and a memory occupancy below 95%. For the data probe 1, the occupancy rate of the CPU of the host H1 is 75%, and the occupancy rate of the memory is 50%, and it is known that the system resources consumed by the data probe 1 are not overloaded and the probing can be continued.
After receiving the detection results reported by the data probes, the network server 700 executes step S321 to form corresponding index data from the detection results. According to one embodiment of the invention, when the index data is formed, each detection result is checked first, and then the detection result which is qualified in the check is converted into the corresponding index data. The index data includes a current host, a target host, a Metric value, network interaction time and/or a timestamp, the current host is a host executing a probe command, the target host is a probed host, and the Metric value may be a field for measuring an optimal path, such as a number of hops. The checking process usually determines whether the values of the fields in the probing result are within a reasonable range, and if the values of all the fields in the probing result are within the reasonable range, it is determined that the probing result is qualified.
After the index data is formed, the web server 700 proceeds to step S322, and sends the index data generated in step S321 to the corresponding aggregation server according to a preset index transmission rule. Although only the aggregation server 800 is shown in fig. 1, the number of aggregation servers may be set to be plural in order to speed up the processing speed of subsequent network failure diagnosis. When the number of aggregation servers is only 1, for example, only aggregation server 800 is provided, network server 700 may directly send the index data to aggregation server 800. When the number of aggregation servers is plural, according to an embodiment of the present invention, the index transmission rule is preset to select an aggregation server to receive the index data according to a Hash algorithm. Firstly, the addresses of all aggregation servers are sequenced to obtain corresponding sequences, a Hash value is obtained for a Metric value in each index data, a module is taken for the obtained Hash value to obtain a corresponding index value, and the index data is sent to the aggregation server corresponding to the sequence which is the same as the index value. The specific content of the Hash algorithm is the prior mature technology, and is not described herein. It should be noted that, for the preset of the index transmission rule, a corresponding algorithm may be adopted according to different actual situations, which are easily conceivable for a skilled person to know the scheme of the present invention and are also within the protection scope of the present invention, and are not described herein again.
The aggregation server 800 receives one or more index data reported by the network server 700, the index data is formed by the network server 700 according to the detection result reported by each data probe, and then the aggregation server 800 filters each index data, filters out the index data with the network interaction time exceeding a preset threshold or indicating network connection failure, and records the index data as abnormal state index data, and records the rest index data as normal state index data. According to one embodiment of the invention, the network interaction time exceeding the preset threshold value indicates that the PING value obtained after the PING command is executed exceeds the threshold value, and the network interaction time indicates that the network connection fails and indicates that the PING is not passed.
Further, the aggregation server 800 executes step S331 to detect a network connection based on the index data of each abnormal state. According to an embodiment of the invention, when network connection is detected, the number of hosts with network faults connected to another switch between every two switches under each internet data center and the host under one switch is detected according to the index data with abnormal states, and the network connection faults between the switches are judged for each switch with the host number ratio exceeding a preset first ratio. Wherein the first ratio is preset to 0.5. In this embodiment, for the internet data center IDC1, the index data of the state abnormality indicates that the network failure connected to the switch SW2 occurs in the hosts H1, H2, and H3 under the switch SW1, and 4 hosts are counted under SW1, and the ratio of the number of hosts in which the network failure connected to the switch SW2 occurs in the hosts under SW1 is 3/4-0.75, which is greater than the first ratio, and it is determined that the network connection failure between the switches exists in the switch SW 1. For the internet data center IDC2, the index data of the state abnormality indicates that the host H10 under the switch SW3 has a network fault connected to the switch SW4, and 4 hosts are counted under the switch SW3, so that the ratio of the number of hosts with network faults connected to the switch SW4 under the host SW3 is 1/4-0.25, which is greater than the first ratio, and it is determined that the network connection fault between the switches does not exist in the host SW 3.
According to one embodiment of the present invention, for each switch whose host number ratio does not exceed a preset first ratio, a first switch number ratio is detected in which a network failure of each switch connected to other internet data centers occurs under an internet data center to which the switch belongs, and for each internet data center whose first switch number ratio exceeds a preset second ratio, it is determined that a network connection failure between the internet data center and the switch exists. Wherein the second ratio is preset to 0.5. In this embodiment, if the host number ratio of the switch SW3 does not exceed the preset first ratio, the index data of the state abnormality indicates that the network fault of the switch SW2 connected to the internet data center IDC1 occurs under the internet data center IDC2, the switches SW3 and SW4 belong to SW3, and 2 switches are counted under IDC1, and if the first switch number ratio of the network fault of the switch SW2 connected to the switch IDC1 occurring under IDC2 is 2/2-100%, which is greater than the second ratio, it is determined that the network connection fault between the internet data center and the switch exists in IDC 2.
According to another embodiment of the present invention, for each switch whose host number ratio does not exceed a preset first ratio, a second switch number ratio, in which a network failure of a switch connected to another internet center occurs in an internet data center to which the switch belongs, is detected, and for each internet data center whose second switch number ratio exceeds a preset third ratio, a network connection failure between the internet data centers is determined. Wherein the third ratio is preset to 0.5. In this embodiment, if the host number ratio of the switch SW3 does not exceed the preset first ratio, the index data of the abnormal state indicates that the internet data center IDC2 to which the switch SW3 belongs has a network fault connected to the internet data center IDC1 and the switches SW1 and SW2, and the number of switches under IDC1 is 2 in total, and if the second switch number ratio of the network fault connected to the switch under IDC1 in IDC2 is 2/2-100%, which is greater than the preset third ratio, it is determined that the network connection fault between the internet data centers exists in IDC 2.
After the aggregation server 800 completes the network connection detection, step S332 is executed, and if the detection result indicates that a network connection failure occurs, an alarm is issued. According to one embodiment of the invention, the SW1 has a network connection failure between switches, the IDC2 has a network connection failure between the internet data center and the switches and a network connection failure between the internet data centers, thereby determining that the network failure occurs and sending an alarm to relevant workers and systems.
In step S333, the aggregation server 800 performs aggregation calculation on the index data in each normal state to obtain corresponding network quality data. According to an embodiment of the present invention, when generating the network quality data, firstly, the index data in each normal state is generated into quantile value data according to a preset fourth time interval, the index data in each normal state is aggregated according to the corresponding switch, the network quality index of each switch is counted, the index data in each normal state is aggregated according to the corresponding internet data center, the network quality index of each internet data center is counted, and the generated quantile value data and the network quality indexes of each switch and each internet data center are combined to form corresponding network quality data. In this embodiment, the fourth time interval may be preset to 5 minutes and 10 minutes, and then for each index data with a normal state, the index data may be classified according to the source of the current host by using the Hash algorithm and the Metric value in the index data, and then for each current host serving as the source, the index data with a rank of 50% is generated every 5 minutes as a 50-quantile value, the index data with a rank of 99% is generated every 10 minutes as a 99-quantile value, and the 50-quantile value and the 99-quantile value are used as the quantile value data. The method comprises the steps of aggregating the index data with normal states corresponding to the switches SW1, SW2, SW3 and SW4 respectively, generating network quality indexes of the switches after corresponding statistics, aggregating the index data with normal states corresponding to the Internet data centers IDC1 and IDC2 respectively, generating the network quality indexes of the Internet data centers after corresponding statistics, and finally combining the sub-bit value data, the network quality indexes of the switches SW1, SW2, SW3, SW4, the Internet data centers IDC1 and IDC2 to form corresponding network quality data.
After acquiring the network quality data, the aggregation server 800 executes step S334 to transmit the network quality data to the database server 900. According to one embodiment of the invention, aggregation server 800 sends the network quality data to database server 900 for storage.
After receiving the network quality data sent by the aggregation server 800, the database server 900 proceeds to step S341 to store the network quality data sent by the aggregation server 800. According to an embodiment of the present invention, in order to accelerate the query, the network quality data generated through aggregation is first written into a memory cache of the Database server 900, and the cache stores for a certain Time, for example, after one day, the corresponding network quality data is converted into a persistent storage, for example, stored in an OpenTSDB (Open Time Series Database).
Fig. 4 shows a flow diagram of a network probing method 400 according to one embodiment of the invention. When the computing device 200 is implemented as the network server 700, the one or more programs 222 of the computing device 200 include instructions for performing the network probing method 400 according to the present invention. According to an embodiment of the present invention, a configuration management database is disposed in the network server 700, the configuration management database stores the latest network topology information, the network server 700 is communicatively connected to one or more aggregation servers and is connected to a plurality of internet data centers, each internet data center is disposed with a plurality of switches, each switch is connected to a plurality of hosts, and each host is pre-disposed with a corresponding data probe and is communicatively connected to the network server 700.
As shown in fig. 4, the method 400 begins at step S410. In step S410, network topology information is obtained from the configuration management database, and the network topology information includes the numbers and connection relationships of the hosts, the switches, and the internet data centers.
Subsequently, step S420 is executed to generate a network probing list corresponding to each host according to the network topology information. According to an embodiment of the present invention, when generating the network probing list, for each host, according to the network topology information, the other hosts under the switch to which the host belongs are all set as target hosts to be probed, a corresponding network probing list is generated according to the IP addresses of the host and each target host, and a corresponding network probing list is generated according to the IP addresses of the host and each target host. According to another embodiment of the present invention, when generating the network probing list, for each host, according to the network topology information, the host with the same number under the other switches included in the internet data center to which the host belongs is set as a target host to be probed, and the corresponding network probing list is generated according to the IP addresses of the host and each target host. According to another embodiment of the present invention, when generating the network probing list, for each host, according to the network topology information, the host with the same number under each switch included in the other internet centers to which the host does not belong is set as a target host to be probed, and the corresponding network probing list is generated according to the IP addresses of the host and each target host.
Considering that the larger the number of target hosts covered by the network probe list, the better the network status can be detected, according to yet another embodiment of the present invention, the network probe list can be generated as follows. Firstly, according to the network topology information, for each host, setting other hosts under the switch to which the host belongs as target hosts to be detected, and generating a corresponding network detection list according to the IP addresses of the host and each target host. Then, the host with the same number under other switches included in the internet data center to which the host belongs is set as a target host to be detected, the host with the same number under each switch included in other internet centers to which the host does not belong is set as the target host to be detected, and finally, a corresponding network detection list is generated according to the IP addresses of the host and each target host.
Since the network probe list changes with the change of the network topology information, according to another embodiment of the present invention, the network server 700 further includes a cache database, and acquires the network topology information stored in the configuration management database according to a preset third time interval, and if the acquired network topology information changes from the previous network topology information, generates a new network probe list according to the acquired network topology information, and stores the new network probe list in the cache database for updating. The specific processing procedures of steps S410 and S420 and the network probe list updating can refer to the related content of the network probe list generated in advance by the network server 700 in the method 300 and corresponding to the host associated with each data probe, which is not described herein again.
Next, in step S430, in response to the list update request sent by the data probe, the network probe list corresponding to the host associated with the data probe is sent to the data probe to instruct the data probe to perform network status detection at a preset first time interval, and send a detection result to the network server 700. According to an embodiment of the present invention, the network probe list may be transmitted to the data probe in response to a list update request transmitted by the data probe as follows. In this embodiment, first, a list update request sent by the data probe is responded, the list update request is sent to the network server 700 by the data probe according to a preset second time interval, and then the version of the network probe list corresponding to the host associated with the data probe is sent to the data probe to indicate the data probe to determine whether to update the network probe list according to the version, receive the determined update information fed back by the data probe, and send the network probe list corresponding to the host associated with the data probe to the data probe. The specific processing procedure in step S430 can refer to the related contents of steps S311, S312, and S313 in the method 300, which are not described herein again.
Finally, step S440 is performed, in which the detection results reported by the data probes are received, and the detection results are sent to the corresponding aggregation server 800 after forming corresponding index data. According to an embodiment of the present invention, each detection result may be transmitted to the corresponding aggregation server 800 after forming corresponding index data as follows. In this embodiment, each detection result is checked, the detection results that are qualified in the check are converted into corresponding index data, and the index data is sent to the corresponding aggregation server 800 according to a preset index transmission rule. The specific processing procedure in step S440 can refer to the related contents of steps S321 and S322 in the method 300, which are not described herein again.
The existing network detection schemes are divided into two categories, one is manual detection, which wastes time and labor and lacks data support, and the accuracy is low, and the other is active detection of network conditions by deploying probes on all online servers, but the problems of incomplete network topology coverage and difficulty in comprehensive utilization of detection data exist, so the network fault detection scheme based on network detection also has the problems. According to the technical scheme of the network detection, the network detection list corresponding to each host is generated based on the network topology information, the network detection list corresponding to the host associated with the data probe is sent to the data probe to indicate the data probe to carry out network state detection, the detection results reported by the data probe are received, and the detection results form corresponding index data and are sent to the corresponding aggregation server. In the technical scheme, the network topology information comprises the serial numbers and the connection relations of the hosts, the switches and the internet data centers, and the network detection list generated according to the network topology information can cover all the hosts, so that the network full coverage is realized, the calculation amount and the system resource consumption are reduced, the network connection condition between any two hosts can be detected in real time, and the self-adaptive network topology change is realized by issuing the network detection list. Therefore, according to the technical scheme of the network fault detection of the embodiment of the invention, on the basis of the network detection scheme, the abnormal network connection state is actively alarmed, the fault response time is greatly shortened, and the network quality data between any two switches and any two internet data centers can be obtained in real time through the distributed aggregation of the index data.
A7. The method according to any of a1-6, wherein the step of sending each probe result after forming corresponding index data to the corresponding aggregation server includes:
checking each detection result;
and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
B11. The method as recited in B10, wherein the step of the data probe obtaining the network probe list corresponding to the associated host from the network server comprises:
according to a preset second time interval, sending a list updating request to the network server to obtain a version of a network detection list corresponding to the associated host;
and if the version is newer than the version of the current network detection list of the data probe, acquiring the network detection list corresponding to the associated host from the network server to replace the current network detection list of the data probe.
B12. The method of B10 or 11, further comprising:
the data probe judges whether the consumed system resources are overloaded or not according to a preset resource occupation rule;
if the consumed system resources are overloaded, stopping detection;
if the consumed system resources are not overloaded, the detection is continued.
B13. The method according to any one of B10-12, wherein a configuration management database and a cache database are provided in the network server, the configuration management database stores the latest network topology information, the network topology information includes the numbers and connection relationships of the hosts, the switches, and the internet data centers, the method further includes the network server pre-generating a network probe list corresponding to the host associated with each data probe, and the pre-generating a network probe list corresponding to the host associated with each data probe includes:
according to the network topology information, setting other hosts under the switch to which the host belongs as target hosts to be detected for each host;
setting hosts with the same number as other switches under the Internet data center to which the host belongs as target hosts to be detected;
setting hosts with the same number as the hosts under all the switches of other Internet centers to which the host does not belong as target hosts to be detected;
and generating a corresponding network detection list according to the IP addresses of the host and each target host, and storing the network detection list in a cache database.
B14. The method of B13, further comprising:
the network server acquires network topology information stored in the configuration management database according to a preset third time interval;
and if the acquired network topology information is changed compared with the previous network topology information, generating a new network detection list according to the acquired network topology information, and storing the new network detection list in the cache database for updating.
B15. The method according to any of B10-14, wherein the step of sending each probe result after forming corresponding index data to the corresponding aggregation server includes:
checking each detection result;
and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
B16. The method according to any of B10-15, wherein the step of detecting network connections based on the abnormal indicator data of each state comprises:
according to the index data of each abnormal state, detecting the number ratio of hosts with network faults connected to another switch between every two switches under each internet data center and hosts under one switch;
and judging that the network connection fault between the switches exists in each switch with the host number ratio exceeding a preset first ratio.
B17. The method according to B16, wherein the step of detecting the network connection according to the index data of each abnormal state includes:
detecting the number ratio of a first switch of each switch, which is connected to the switches under other Internet data centers, under the Internet data center to which the switch belongs, for the switches of which the number ratio of the hosts does not exceed a preset first ratio;
and judging the existence of network connection faults between the internet data centers and the exchangers for all the internet data centers with the number of the first exchangers more than a preset second proportion.
B18. The method according to B16 or 17, wherein the step of detecting the network connection according to the index data of each abnormal state includes:
detecting the number ratio of a second exchanger of the network data center to which the exchanger belongs and the network fault of the exchanger connected to other internet centers in the exchanger of which the number ratio of the hosts does not exceed the preset first ratio;
and judging that the network connection fault between the internet data centers exists in each internet data center with the number of the second switches in a ratio exceeding a preset third ratio.
B19. The method according to any one of B10-18, wherein the step of performing aggregate calculation on each normal index data to obtain corresponding network quality data includes:
generating sub-position value data according to a preset fourth time interval for the index data in the normal state;
aggregating the index data in normal states according to the corresponding switches, and counting the network quality indexes of the switches;
aggregating the index data in normal states according to the corresponding internet data centers, and counting the network quality indexes of the internet data centers;
and combining the generated place grading value data and the network quality indexes of all the switches and all the Internet data centers to form corresponding network quality data.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or groups of devices in the examples disclosed herein may be arranged in a device as described in this embodiment, or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. Modules or units or groups in embodiments may be combined into one module or unit or group and may furthermore be divided into sub-modules or sub-units or sub-groups. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the network probing method and/or the network failure detection method of the present invention according to instructions in said program code stored in the memory.
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (18)

1. A network probing method, adapted to be executed in a network server, where a configuration management database is installed in the network server, where the configuration management database stores latest network topology information, the network server is communicatively connected to one or more aggregation servers and is accessed to a plurality of internet data centers, each internet data center has a plurality of switches deployed therein, each switch has a plurality of hosts accessed thereto, and each host has a corresponding data probe installed therein in advance and is communicatively connected to the network server, the method comprising:
acquiring network topology information from the configuration management database, wherein the network topology information comprises serial numbers and connection relations of all hosts, all switches and all internet data centers;
generating a network detection list corresponding to each host according to the network topology information;
responding to a list updating request sent by the data probe, sending a network detection list corresponding to a host associated with the data probe to the data probe so as to indicate the data probe to perform network state detection at a preset first time interval, and sending a detection result to the network server;
receiving detection results reported by each data probe, forming corresponding index data by each detection result, and sending the index data to the corresponding aggregation server;
wherein, the step of sending the network probing list corresponding to the host associated with the data probe to the data probe in response to the list update request sent by the data probe comprises:
responding to a list updating request sent by the data probe, wherein the list updating request is sent to the network server by the data probe according to a preset second time interval;
sending the version of the network detection list corresponding to the associated host to the data probe so as to indicate the data probe to judge whether to update the network detection list according to the version;
and receiving the determination update information fed back by the data probe, and sending a network detection list corresponding to the host associated with the data probe to the data probe.
2. The method of claim 1, wherein generating the network probe list corresponding to each host according to the network topology information comprises:
according to the network topology information, setting other hosts under the switch to which the host belongs as target hosts to be detected for each host;
and generating a corresponding network detection list according to the IP addresses of the host and each target host.
3. The method according to claim 1 or 2, wherein the step of generating a network probe list corresponding to each host according to the network topology information comprises:
according to the network topology information, for each host, setting the host with the same number under other switches included in the internet data center to which the host belongs as a target host to be detected;
and generating a corresponding network detection list according to the IP addresses of the host and each target host.
4. The method according to claim 1 or 2, wherein the step of generating a network probe list corresponding to each host according to the network topology information comprises:
according to the network topology information, for each host, setting the host with the same number under each switch included in other internet centers to be a target host to be detected;
and generating a corresponding network detection list according to the IP addresses of the host and each target host.
5. The method according to claim 1 or 2, wherein a cache database is further provided in the network server, the method further comprising:
acquiring network topology information stored in the configuration management database according to a preset third time interval;
and if the acquired network topology information is changed compared with the previous network topology information, generating a new network detection list according to the acquired network topology information, and storing the new network detection list in the cache database for updating.
6. The method according to claim 1 or 2, wherein the step of sending each probe result after forming corresponding index data to the corresponding aggregation server comprises:
checking each detection result;
and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
7. A computing device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-6.
8. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-6.
9. A network fault detection method is suitable for being executed in a network fault detection system, the system comprises one or more network servers, an aggregation server, a database server and a plurality of data probes, the network servers are in communication connection with the aggregation server, the system is accessed to a plurality of Internet data centers, each Internet data center is provided with a plurality of switches, each switch is accessed to a plurality of hosts, each host is provided with a corresponding data probe in advance and is in communication connection with the network server, and a network probe list corresponding to the host associated with each data probe is stored in the network server, and the method comprises the following steps:
the data probe acquires a network detection list corresponding to the associated host from the network server, performs network state detection at a preset first time interval according to the acquired network detection list, and sends a detection result to the network server;
the network server receives the detection results reported by the data probes, forms corresponding index data with the detection results and sends the index data to the corresponding aggregation server;
the aggregation server receives one or more index data reported by the network server, detects network connection according to the index data with abnormal states, sends out an alarm if the detection result indicates that the network connection fails, performs aggregation calculation on the index data with normal states to obtain corresponding network quality data, and sends the network quality data to the database server;
the database server receives and stores the network quality data sent by the aggregation server;
the network server is provided with a configuration management database, the configuration management database stores the latest network topology information, and a network detection list stored in the network server is generated according to the following mode:
the network server acquires network topology information from the configuration management database, wherein the network topology information comprises serial numbers and connection relations of all hosts, all switches and all internet data centers;
and the network server generates a network detection list corresponding to each host according to the network topology information.
10. The method of claim 9, wherein the step of the data probe obtaining the network probe list corresponding to the associated host from the network server comprises:
according to a preset second time interval, sending a list updating request to the network server to obtain a version of a network detection list corresponding to the associated host;
and if the version is newer than the version of the current network detection list of the data probe, acquiring the network detection list corresponding to the associated host from the network server to replace the current network detection list of the data probe.
11. The method of claim 9 or 10, further comprising:
the data probe judges whether the consumed system resources are overloaded or not according to a preset resource occupation rule;
if the consumed system resources are overloaded, stopping detection;
if the consumed system resources are not overloaded, the detection is continued.
12. The method according to claim 9 or 10, wherein a configuration management database and a cache database are provided in the network server, the configuration management database stores the latest network topology information, the network topology information includes the numbers and connection relationships of the hosts, the switches, and the internet data centers, the method further includes the step of generating in advance a network probe list corresponding to the host associated with each data probe by the network server, and the step of generating in advance a network probe list corresponding to the host associated with each data probe includes:
according to the network topology information, setting other hosts under the switch to which the host belongs as target hosts to be detected for each host;
setting hosts with the same number as other switches under the Internet data center to which the host belongs as target hosts to be detected;
setting hosts with the same number as the hosts under all the switches of other Internet centers to which the host does not belong as target hosts to be detected;
and generating a corresponding network detection list according to the IP addresses of the host and each target host, and storing the network detection list in a cache database.
13. The method of claim 12, further comprising:
the network server acquires network topology information stored in the configuration management database according to a preset third time interval;
and if the acquired network topology information is changed compared with the previous network topology information, generating a new network detection list according to the acquired network topology information, and storing the new network detection list in the cache database for updating.
14. The method of claim 13, wherein the step of sending each probe result to the corresponding aggregation server after forming the corresponding index data comprises:
checking each detection result;
and converting the detection result which is qualified by verification into corresponding index data, and sending the index data to the corresponding aggregation server according to a preset index transmission rule.
15. The method of claim 14, wherein the step of detecting network connections based on the abnormal metric data for each state comprises:
according to the index data of each abnormal state, detecting the number ratio of hosts with network faults connected to another switch between every two switches under each internet data center and hosts under one switch;
and judging that the network connection fault between the switches exists in each switch with the host number ratio exceeding a preset first ratio.
16. The method of claim 15, wherein the step of detecting network connections based on the abnormal metric data for each state comprises:
detecting the number ratio of a first switch of each switch, which is connected to the switches under other Internet data centers, under the Internet data center to which the switch belongs, for the switches of which the number ratio of the hosts does not exceed a preset first ratio;
and judging the existence of network connection faults between the internet data centers and the exchangers for all the internet data centers with the number of the first exchangers more than a preset second proportion.
17. The method of claim 16, wherein the step of detecting network connections based on the anomalous indicator data for each state comprises:
detecting the number ratio of a second exchanger of the network data center to which the exchanger belongs and the network fault of the exchanger connected to other internet centers in the exchanger of which the number ratio of the hosts does not exceed the preset first ratio;
and judging that the network connection fault between the internet data centers exists in each internet data center with the number of the second switches in a ratio exceeding a preset third ratio.
18. The method of claim 17, wherein the step of performing aggregate computation on the index data in each normal state to obtain corresponding network quality data comprises:
generating sub-position value data according to a preset fourth time interval for the index data in the normal state;
aggregating the index data in normal states according to the corresponding switches, and counting the network quality indexes of the switches;
aggregating the index data in normal states according to the corresponding internet data centers, and counting the network quality indexes of the internet data centers;
and combining the generated place grading value data and the network quality indexes of all the switches and all the Internet data centers to form corresponding network quality data.
CN201711216097.XA 2017-11-28 2017-11-28 Network detection method, network fault detection method and system Active CN107995030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711216097.XA CN107995030B (en) 2017-11-28 2017-11-28 Network detection method, network fault detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711216097.XA CN107995030B (en) 2017-11-28 2017-11-28 Network detection method, network fault detection method and system

Publications (2)

Publication Number Publication Date
CN107995030A CN107995030A (en) 2018-05-04
CN107995030B true CN107995030B (en) 2021-09-14

Family

ID=62033818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711216097.XA Active CN107995030B (en) 2017-11-28 2017-11-28 Network detection method, network fault detection method and system

Country Status (1)

Country Link
CN (1) CN107995030B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198774B (en) * 2018-10-31 2023-09-29 百度在线网络技术(北京)有限公司 Unmanned vehicle simulation anomaly tracking method, device, equipment and computer readable medium
CN110149307A (en) * 2019-04-03 2019-08-20 广东申立信息工程股份有限公司 A kind of IDC safety management system
CN113055238B (en) * 2019-12-26 2023-02-03 深信服科技股份有限公司 Network detection method, platform and computer readable storage medium
CN111811560B (en) * 2020-05-29 2022-09-16 深圳元戎启行科技有限公司 Detection method, device and system of automatic driving sensor and computer equipment
CN113765727B (en) * 2020-06-03 2023-07-11 深信服科技股份有限公司 Data center network time delay detection method, device, equipment and medium
CN111817911B (en) * 2020-06-23 2023-08-08 腾讯科技(深圳)有限公司 Method, device, computing equipment and storage medium for detecting network quality
CN114095808B (en) * 2020-08-24 2023-04-28 华为技术有限公司 Network fault detection method, device, equipment and computer readable storage medium
CN112751704B (en) * 2020-12-17 2022-07-05 杭州安恒信息技术股份有限公司 Method, device and equipment for checking connectivity of heterogeneous network in network target range
CN113300908B (en) * 2021-04-28 2022-03-11 郑州信大捷安信息技术股份有限公司 Link monitoring method and system based on unidirectional network boundary equipment
CN112995042B (en) * 2021-05-11 2021-08-17 深圳市科力锐科技有限公司 Method, device and equipment for generating service topological graph and storage medium
CN115550211A (en) * 2021-06-29 2022-12-30 青岛海尔科技有限公司 Method and device for detecting network connection quality, storage medium and electronic device
CN113783752B (en) * 2021-08-26 2023-05-16 四川新网银行股份有限公司 Method for monitoring network quality during inter-access of inter-network-segment business systems of intranet
CN113905050B (en) * 2021-08-30 2023-07-18 成都市联洲国际技术有限公司 Method, device and system for detecting internet access information
CN114070760B (en) * 2021-11-16 2023-05-02 北京知道创宇信息技术股份有限公司 Mapping method and related device for network space asset
CN114157554B (en) * 2021-12-21 2024-02-23 唯品会(广州)软件有限公司 Fault checking method and device, storage medium and computer equipment
CN115348197B (en) * 2022-06-10 2023-07-21 国网思极网安科技(北京)有限公司 Network asset detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291232A (en) * 2008-06-03 2008-10-22 北京星网锐捷网络技术有限公司 Ethernet port, Ethernet switch and signal receiving and sending method of Ethernet equipment
CN102055626A (en) * 2010-12-31 2011-05-11 北京中创信测科技股份有限公司 Internet protocol (IP) network quality detecting method and system
CN102891779A (en) * 2012-09-27 2013-01-23 北京网瑞达科技有限公司 Large-scale network performance measuring system and method for IP network
CN105871634A (en) * 2016-06-01 2016-08-17 北京蓝海讯通科技股份有限公司 Method and application for detecting cluster anomalies and cluster managing system
CN106991033A (en) * 2017-04-01 2017-07-28 北京蓝海讯通科技股份有限公司 Notify method, device, server and the readable storage medium storing program for executing of alarm information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9836908B2 (en) * 2014-07-25 2017-12-05 Blockchain Technologies Corporation System and method for securely receiving and counting votes in an election

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291232A (en) * 2008-06-03 2008-10-22 北京星网锐捷网络技术有限公司 Ethernet port, Ethernet switch and signal receiving and sending method of Ethernet equipment
CN102055626A (en) * 2010-12-31 2011-05-11 北京中创信测科技股份有限公司 Internet protocol (IP) network quality detecting method and system
CN102891779A (en) * 2012-09-27 2013-01-23 北京网瑞达科技有限公司 Large-scale network performance measuring system and method for IP network
CN105871634A (en) * 2016-06-01 2016-08-17 北京蓝海讯通科技股份有限公司 Method and application for detecting cluster anomalies and cluster managing system
CN106991033A (en) * 2017-04-01 2017-07-28 北京蓝海讯通科技股份有限公司 Notify method, device, server and the readable storage medium storing program for executing of alarm information

Also Published As

Publication number Publication date
CN107995030A (en) 2018-05-04

Similar Documents

Publication Publication Date Title
CN107995030B (en) Network detection method, network fault detection method and system
CN107835098B (en) Network fault detection method and system
US11799735B2 (en) Information processing method in M2M and apparatus
CN110365748A (en) Treating method and apparatus, storage medium and the electronic device of business datum
JP6062034B2 (en) Processing control system, processing control method, and processing control program
CN110365765A (en) A kind of bandwidth scheduling method and device of cache server
US20160036665A1 (en) Data verification based upgrades in time series system
CN114172829B (en) Server health monitoring method and system and computing equipment
CN113259428A (en) Data access request processing method and device, computer equipment and medium
US10659289B2 (en) System and method for event processing order guarantee
CN112051771B (en) Multi-cloud data acquisition method and device, computer equipment and storage medium
US8677003B1 (en) Distributed processing of streaming data on an event protocol
Snyder et al. A case for epidemic fault detection and group membership in HPC storage systems
CN113472670B (en) Method for computer network, network device and storage medium
CN111162929B (en) Hierarchical management method and system
CN114553747A (en) Method, device, terminal and storage medium for detecting abnormality of redis cluster
CN115333917A (en) CDN anomaly detection method and device
US11190432B2 (en) Method and first node for managing transmission of probe messages
CN112559325B (en) Application program testing system, method, computing device and readable storage medium
US20180270102A1 (en) Data center network fault detection and localization
WO2024047775A1 (en) Determination of machine learning model to be used for given predictive purpose for communication system
CN113379208B (en) Index calculation method, apparatus and readable storage medium
CN114826867B (en) Method, device, system and storage medium for processing data
CN114697319B (en) Tenant service management method and device for public cloud
CN114422324B (en) Alarm information processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant