CN114095394B

CN114095394B - Network node fault detection method and device, electronic equipment and storage medium

Info

Publication number: CN114095394B
Application number: CN202111415301.7A
Authority: CN
Inventors: 李莹; 王存
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2023-09-19
Anticipated expiration: 2041-11-25
Also published as: CN114095394A

Abstract

The disclosure provides a network node fault detection method, a network node fault detection device, electronic equipment and a storage medium, relates to the technical field of data processing, and particularly relates to the technical field of network fault positioning. The specific implementation scheme is as follows: acquiring detection data obtained by detecting nodes of the content distribution network, and analyzing and processing the detected data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for current fault judgment; performing fault judgment on the detection data of each dimension respectively to obtain a fault detection result under each dimension; and combining the fault detection result to determine the dimension of the fault of the node of the content distribution network. The method and the device can more accurately position the dimension of the fault of the content distribution network.

Description

Network node fault detection method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of data processing, in particular to the technical field of network fault positioning, and specifically relates to a network node fault detection method, a device, electronic equipment and a storage medium.

Background

The content delivery network (Content Delivery Network, CDN) technology is an intelligent virtual network technology built on top of existing networks. The CDN comprises a plurality of CDN nodes (nodes), and in order to ensure the service quality of the CDN nodes, a detector is adopted to detect the network IP addresses in the CDN nodes in real time. And determining whether the CDN node is faulty or not through the detection data.

Disclosure of Invention

The disclosure provides a network node fault detection method, a network node fault detection device, electronic equipment and a storage medium.

According to a first aspect of the present disclosure, there is provided a network node failure detection method applied to a content distribution network, comprising:

acquiring detection data obtained by detecting the content distribution network, and analyzing and processing the detected data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for current fault judgment; performing fault judgment on the detection data of each dimension respectively to obtain a fault detection result under each dimension; and combining the fault detection result to determine the dimension of the current fault of the content distribution network.

According to a second aspect of the present disclosure, there is provided a network node failure detection apparatus applied to a content distribution network, comprising:

the processing module is used for acquiring detection data obtained by detecting the content distribution network, analyzing and processing the detected data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for fault judgment; the fault detection method is also used for carrying out fault judgment on the detection data of each dimension respectively to obtain a fault detection result under each dimension; and the determining module is used for determining the dimension of the fault of the content distribution network by combining the fault detection result.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the first aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a network node fault detection method provided in an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for acquiring probe data according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of processing probe data in a network node fault detection method according to an embodiment of the disclosure

FIG. 4 is a schematic flow chart of fault determination provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a fault detection threshold determination method for detecting a fault provided by an embodiment of the present disclosure;

FIG. 6 is a flow chart of determining a fault detection result provided by an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram of determining fault dimensions provided by an embodiment of the present disclosure;

FIG. 8 is a flow chart of a fault determination method provided by an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a network node fault detection device according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an electronic device for implementing a network node failure detection method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In CDN technology, CDN node failures generally include physical failures of service node machines, performance failures of service node machines, network transmission failures, and other failures. The physical faults of the service node machine can be faults such as power failure of a machine room, disk damage and the like. The performance failure of the service node machine may be a failure of insufficient central processing unit (central processing unit, CPU) capability, insufficient disk input/output capability, etc. The network transmission failure may be a backbone failure, a network congestion, or the like. Other faults may be network attacks, node blocking, etc.

Therefore, in order to ensure the quality of service of the CDN nodes, it is necessary to quickly and accurately identify the anomaly of each CDN node by using an effective technical means. In the related art, the abnormality of the CDN node generally includes two cases, one is a CDN node failure, and the other is a network IP link failure. In the related art, the detection data is utilized to directly screen fault data, CDN nodes corresponding to the screened fault data are determined, and the CDN nodes are removed from the network.

However, in the related art, according to the screened fault data, the corresponding CDN nodes are directly removed from the network, which is very easy to cause local traffic carried by the CDN nodes, and the local traffic is scheduled to a remote area, so that excessive disaster recovery is caused, and further, the service quality is damaged, and in particular, the service quality in the local area is greatly reduced.

Based on the above, the disclosure provides a network node fault detection method, which accurately locates the fault position dimension of the failed CDN by analyzing the detection data of each dimension, and avoids excessive fault disaster recovery processing, thereby causing damage to a server serving a customer.

The following embodiments will describe a network node failure detection method with reference to the accompanying drawings.

Fig. 1 shows a flow chart of a network node fault detection method provided by an embodiment of the present disclosure, and as shown in fig. 1, the method may include:

step S110: and acquiring detection data obtained by detecting the nodes of the content distribution network, and analyzing and processing the detected data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for current fault judgment.

In the disclosed embodiments, the plurality of dimensions of the content distribution network include a node dimension, a network IP dimension, and a network link dimension. Wherein, in the present disclosure, the network IP dimension may also be referred to as VIP dimension. Each node dimension includes a plurality of network IPs over each of which a plurality of network links are connected. The network link may be a regional operator to network IP network link. It is to be understood that in this disclosure, the network link dimension refers to the regional operator to node dimension.

In the present disclosure, nodes of a content distribution network are distributed in each region, and different numbers of probes are set in each region according to requirements of the corresponding network in each region. The probe probes the data of the content distribution network node comprising the operator providing the network, in other words the probe data is aggregated from data of regional operators to the network IP. Wherein network IP may also be referred to as VIP in this disclosure.

Fig. 2 is a schematic flow chart of a method for acquiring probe data according to an embodiment of the present disclosure, and as shown in fig. 2, taking a certain operator of 3 areas as an example, the 3 areas are respectively denoted as an area a, an area B, and an area C. The area a includes sub-areas a, b, c. The probes comprised by sub-area a are denoted a-operator-1, the probes comprised by sub-area b are denoted b-operator-1, and the probes comprised by sub-area c are denoted c-operator-1. The region B includes a sub-region d, a sub-region e, and a sub-region f. The probes comprised by sub-area d are denoted d-operator-1, the probes comprised by sub-area e are denoted e-operator-1, and the probes comprised by sub-area f are denoted f-operator-1. The region C includes a sub-region g, a sub-region h, and a sub-region i. The probes comprised by sub-area g are denoted g-operator-1, the probes comprised by sub-area h are denoted h-operator-1, and the probes comprised by sub-area i are denoted i-operator-1. Based on the probes included in the plurality of areas, for example, a-operator-1, b-operator-1, c-operator-1, d-operator-1, e-operator-1, f-operator-1, g-operator-1, h-operator-1, and i-operator-1, VIP under a certain node of a certain operator (for example, x.x.x.35 or x.x.x.36 under a node BJCT of a certain operator) is probed, and probe data of a certain node detected by a certain operator of 3 areas is obtained.

Therefore, in the present disclosure, different regional operators may probe the same network IP through a probe machine to obtain probe data of a plurality of regional operators of a certain network IP, and the sum of the probe data of all regional operators of the network IP is used as probe data of the network IP dimension. And taking the sum of the detection data of all network IP under a certain node as the detection data of the dimension of the node. Taking the example of fig. 2 described above, the sum of the detection data of a certain operator detecting x.x.x.35 or x.x.x.36 in 3 areas is determined as the detection data of x.x.x.35 or x.x.x.36. In other words, in the present disclosure, probe data of the network IP dimension may be obtained by aggregating probe data of the network link dimension. And further aggregating the network IP dimension detection data to obtain the node dimension detection data.

In the present disclosure, after multi-dimensional probe data is obtained, the number of probe data for each dimension is determined. The amount of probe data determines whether the probe data can be used for current fault detection. If available for current fault detection, the manner in which fault detection is used is further determined.

Step S120: and respectively carrying out fault judgment on the detection data of each dimension to obtain a fault detection result under each dimension.

In the embodiment of the disclosure, fault judgment is performed on the detection data of the network IP dimension and the detection data of the network link dimension respectively by using a fault detection mode for each dimension, so as to obtain a fault detection result of the network IP dimension and a fault detection result of the network link dimension. And judging the fault detection results of all network IP dimensions under each node to obtain the fault detection results of the node dimensions, namely, the fault detection results of the node depend on the fault detection results of all corresponding network IP dimensions.

Step S130: and determining the dimension of the current failure of the node of the content distribution network according to the failure detection result.

In the embodiment of the disclosure, a network IP dimension fault detection result and a network link dimension fault detection result are linked, and the network IP dimension fault detection result and the network link dimension fault detection result are mutually referred to determine the dimension of the content distribution network which is currently in fault.

By the network node fault detection method provided by the embodiment of the disclosure, the multi-dimensional fault detection results are processed in a linkage manner, so that the influence on the detection success rate of other dimensions is avoided, and the fault detection results are more accurate.

Fig. 3 is a schematic flow chart of processing probe data in a network node fault detection method according to an embodiment of the disclosure, where, as shown in fig. 3, the method may include:

step S210: and analyzing and processing the detection data to obtain multi-dimensional detection data.

Step S220: determining the dimension of the content distribution network with faults, and eliminating the fault detection data from the multi-dimensional detection data to obtain the multi-dimensional detection data for judging the current faults.

In the embodiment of the disclosure, as described above, the probe data in the node dimension, the probe data in the network IP dimension, and the probe data in the network link dimension are obtained by analyzing the probe data.

Determining the dimensions of the node, the network IP and the network links, eliminating the detection data of the dimensions where the faults occur, and determining the remaining multi-dimensional detection data as multi-dimensional detection data for current fault judgment.

According to the embodiment of the invention, the detection data with the fault dimension are removed, so that the residual detection data can more accurately represent the current actual fault dimension, and the interference of the fault detection data is avoided.

The following embodiment will explain fault detection results obtained in each dimension by performing fault judgment on the detection data in each dimension.

Fig. 4 is a schematic flow chart of fault determination provided by an embodiment of the disclosure, and as shown in fig. 4, the method may include:

step S310: based on the number of probe data for each dimension, a fault detection threshold corresponding to each dimension is determined.

In the embodiment of the disclosure, two manners for fault judgment are provided according to the acquired number of the detection data of each dimension, wherein one manner is to compare a detection success rate threshold with the success rate of the detection data and determine whether a fault exists in the dimension corresponding to the detection data. Alternatively, the detection failure number threshold is compared with the detection failure number to determine whether a fault exists in the dimension corresponding to the detection data.

If the number of the detection data is greater than or equal to the analysis value, selecting a success rate threshold as a fault detection threshold; if the number of the detection data is smaller than the analysis value, the failure number is selected as a failure detection threshold.

In the present disclosure, a fault detection threshold for detecting a fault may be determined for each dimension of the detection data, respectively, according to the number of detection data for fault determination. Namely, the network IP dimension detection data and the network link dimension detection data are respectively selected to adopt a success rate threshold value or a failure number threshold value to judge whether faults exist.

Step S320: and detecting the detection data of each dimension based on the fault detection threshold value to obtain a fault detection result of each dimension.

In the embodiment of the disclosure, network IP dimension detection data is detected based on a fault detection threshold value of the network IP dimension detection data, and a fault detection result of the network IP dimension detection data is obtained. And detecting the network link dimension detection data based on a fault detection threshold value of the network link dimension detection data to obtain a fault detection result of the network link dimension detection data. And obtaining fault detection results of all network IP dimensions under each node, and obtaining fault detection results of the node dimensions corresponding to the fault detection results by judging the fault detection results of all network IP dimensions under each node.

The following embodiments will explain determination of a corresponding failure detection threshold for each dimension of the detection data based on the number of the detection data.

Fig. 5 is a flowchart illustrating a fault detection threshold determining method for detecting a fault according to an embodiment of the present disclosure, where, as shown in fig. 5, keywords in probe data include nodes, network IP, and network links. And analyzing the detected data to obtain the number of the multi-dimensional detection data. Judging whether the detected data amount is sufficient, namely determining a preset analysis value for selecting a fault judgment threshold value and the detected data amount of each dimension.

Comparing the number of the detection data of the network IP dimension with the size of the preset analysis value, and comparing the number of the detection data of the network link dimension with the size of the preset analysis value.

And if the quantity of the detection data is larger than or equal to a preset analysis value, namely the quantity of the detection data is sufficient, taking the success rate threshold value as a fault detection threshold value for detecting faults. If the detected data is smaller than the preset analysis value, namely the detected data quantity is insufficient, the failure number threshold is used for detecting a fault detection threshold of the fault.

The following embodiments will explain fault detection results obtained by performing fault detection on detection data of each dimension based on a fault detection threshold.

Fig. 6 is a schematic flow chart of determining a fault detection result according to an embodiment of the disclosure, where the method may include:

step S410: calculating the occupation ratio of the number of successfully detected data in the detected data to the number of the detected data to obtain the success rate of the detected data; and calculating the number of data in the detection data, which are failed in detection, so as to obtain the failure number of the detection data.

In the disclosed embodiment, the number of probe data per dimension includes the number of success and failure of the probe. When the number of the detection data is larger than or equal to the analysis value, the number of the detection data is considered to be relatively large, the success rate of the number of the success times accounting for the number of the detection data is calculated, the success rate of the detection success is more accurately reflected to show whether the content distribution network fails, and in the case, the success rate is compared with a success rate threshold value to determine whether the failure exists. When the number of detection data is smaller than the analysis value, the number of detection data is considered to be relatively small, and the obtained failure number is compared with a failure number threshold value to determine whether a fault exists. For example, if the number of the probe machines is 10, the number of probe data is considered to be relatively small, and if there are 3 probe data with failed probes, the content distribution network corresponding to the probe data is considered to have a fault in at least one dimension.

Step S420: and determining that the fault detection result of the dimension is fault in response to the success rate being smaller than or equal to the success rate threshold or the failure number being larger than the failure number threshold.

In the embodiment of the disclosure, for the network IP dimensional detection data, according to the fault detection threshold corresponding to the network IP dimension, detection data with a success rate not greater than the success rate threshold in the network IP dimensional detection data is determined, or detection data with a failure number not less than the failure number threshold in the network IP dimensional detection data is determined.

And determining the detection data with the success rate not greater than the success rate threshold in the network link dimension detection data or determining the detection data with the failure number not less than the failure number threshold in the network link dimension detection data according to the failure detection threshold corresponding to the network link dimension.

In the embodiment of the disclosure, according to the comparison result of the detection data of each dimension and the fault detection threshold value, determining the network IP dimension and/or the network link dimension with fault, thereby determining the content distribution network dimension with possible fault.

The following embodiments will describe determining the dimension of a failure of a content distribution network in connection with the failure detection result.

In the present disclosure, a node includes one or more network IPs. The network links comprise links of a regional operator to the content distribution network nodes or links of a regional operator to the network IP of the content distribution network. The probe data for the node dimension includes probe data for all network link dimensions of the node connection. The probe data for the network IP dimension includes probe data for all network link dimensions of the network IP connection.

Fig. 7 illustrates a schematic flow chart for determining fault dimensions according to an embodiment of the disclosure, where the method may include:

step S510: and determining a network IP dimension fault detection result based on the network link dimension fault detection result.

Step S520: and determining a fault detection result of the node dimension based on the fault detection result of the network IP dimension.

In the embodiment of the disclosure, in a failure detection result of a network IP dimension and a failure detection result of a network link dimension, according to a relationship among a node dimension, a network IP dimension, and a network link dimension, a dimension in which a failure occurs in the node dimension, the network IP dimension, and the network link dimension is determined one by one. As described above, the network link dimension detection data are aggregated to obtain network IP dimension detection data, and further the network IP dimension detection data are aggregated to obtain node dimension detection data, and a network link with a fault is determined based on a fault detection result of the node dimension network link dimension; and determining the failure result of the network IP of the node according to the failure detection result of the network IP dimension and the failed network link. And/or determining the fault detection result of the node dimension based on the fault detection result of the network IP dimension and the network IP with faults.

Fig. 8 shows a flow chart of a fault determining method provided by an embodiment of the present disclosure, where, as shown in fig. 8, VIP1, VIP2, VIP3, VIP4 represent different network IPs, area-isp1, area-isp2, area-isp3, area-isp4 represent different network links, that is, represent links from a certain operator to VIP1, VIP2, VIP3, VIP4 in different areas, respectively.

If the VIP1 and/or the VIP4 fail, at least one of the area-isps 1, 2, 3 and 4 connected with the VIP1 and/or the VIP4 fails. If the area-isp1 and/or area-isp2 fails, the success rate of the detection data of the connected VIP1, VIP2, VIP3 and VIP4 is reduced.

In the disclosure, if the network IP dimension fails and there are no network links with no failure among all network links connected to the network IP in the failure detection result, it is determined that the network IP dimension has no failure, or the failure detection result of the network IP dimension meets a preset condition, and it is determined that the network IP corresponding to the failure detection result fails.

In the embodiment of the present disclosure, if the success rate of the probe data of all network links connected by the network IP is less than or equal to the success rate threshold; and the success rate of the network IP dimension detection data is smaller than or equal to a success rate threshold value, or the failure number in the network IP dimension detection data is larger than a failure number threshold value, so as to determine that the network IP dimension fails.

In other words, if it is determined that the network IP dimension fails, the following conditions are simultaneously satisfied:

the success rate of the network IP dimension detection data is smaller than or equal to a success rate threshold value, or the failure number in the network IP dimension detection data is larger than a failure number threshold value.

Condition 2: the network IP dimension detection data comprise network link dimension detection data of the network IP dimension;

condition 3: the success rate of the detection data of the network link dimension is smaller than the success rate threshold.

In the disclosure, if the number of failed network IPs is greater than a failure number threshold in the failure detection result of the network IP dimension, determining that the node dimension to which the network IP dimension belongs has a failure. Or if the percentage of the number of failed network IPs in the number of all network IPs in the failure detection result of the network IP dimension is greater than the failure percentage threshold value, determining that the node dimension failure of the network IP dimension belongs to

Excessive disaster recovery can be avoided through linkage processing of different dimensions, so that damage to a client server can be avoided.

Based on the same principle as the method shown in fig. 1, fig. 9 shows a schematic structural diagram of a network node fault detection device provided by an embodiment of the present disclosure, and as shown in fig. 9, the network node fault detection device 100 may include:

the processing module 101 is configured to obtain probe data obtained by probing the content distribution network, and analyze and process the probe data based on multiple dimensions of the content distribution network, so as to obtain multi-dimensional probe data for fault judgment. And the fault detection device is also used for respectively carrying out fault judgment on the detection data of each dimension to obtain a fault detection result under each dimension. A determining module 102, configured to determine, in combination with the failure detection result, a dimension in which a node of the content distribution network fails.

In the present disclosure, a processing module 101 is configured to perform analysis processing on the detection data to obtain multi-dimensional detection data; determining the dimension of the content distribution network with faults, and eliminating the fault detection data from the multi-dimensional detection data to obtain the multi-dimensional detection data for judging the current faults.

In the present disclosure, the processing module 101 is configured to determine, based on the number of the probe data of each dimension, a fault detection threshold corresponding to each dimension; and performing fault detection on the detection data of each dimension based on the fault detection threshold value to obtain a fault detection result of each dimension.

In the present disclosure, the determining module 102 is configured to determine, in response to the number of the detection data being greater than or equal to a preset analysis value, a success rate threshold as a fault detection threshold for detecting a fault; and/or, in response to the detection data being smaller than a preset analysis value, determining a failure number threshold as a failure detection threshold for detecting a failure.

In the present disclosure, a determining module 102 is configured to calculate a ratio of a number of data detected successfully in the detected data to a number of the detected data, to obtain a success rate of the detected data; calculating the number of data in detection failure in the detection data to obtain the failure number of the multi-dimensional detection data; and determining that the fault detection result of the dimension is fault in response to the success rate being smaller than or equal to the success rate threshold or the failure number being larger than the failure number threshold.

The content distribution network dimension includes a node dimension, a network IP dimension, and a network link dimension; the node comprises one or more of the network IPs; the network link comprises a link from a regional operator to the content distribution network node or a link from a regional operator to a network IP of the content distribution network node; the detection data of the node dimension comprises detection data of all network link dimensions connected by the node; the network IP dimension detection data comprise detection data of all network link dimensions of the network IP connection;

in the present disclosure, the determining module 102 is configured to determine a failure detection result of the network IP dimension based on the failure detection result of the network link dimension; and determining the fault detection result of the node dimension based on the fault detection result of the network IP dimension.

In the present disclosure, the determining module 102 is configured to determine that the network IP dimension is fault-free in response to a fault-free network link being present in all network links of the network IP connection. .

In the present disclosure, the determining module 102 is configured to respond to the success rate of probe data of all network links connected by the network IP being less than or equal to a success rate threshold; and the success rate of the network IP dimension detection data is smaller than or equal to a success rate threshold value, or the failure number in the network IP dimension detection data is larger than a failure number threshold value; and determining that the network IP dimension fails.

In this disclosure, the determining module 102 is configured to determine that the node dimension fails in response to a failure detection result of the network IP dimension, where the number of failed network IPs is greater than a failure threshold, or a percentage of the number of failed network IPs that is the number of all network IPs is greater than a failure percentage threshold.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 shows a schematic block diagram of an example electronic device 200 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 200 includes a computing unit 201 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 202 or a computer program loaded from a storage unit 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data required for the operation of the device 200 can also be stored. The computing unit 201, ROM 202, and RAM 203 are connected to each other through a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.

Various components in device 200 are connected to I/O interface 205, including: an input unit 206 such as a keyboard, a mouse, etc.; an output unit 207 such as various types of displays, speakers, and the like; a storage unit 208 such as a magnetic disk, an optical disk, or the like; and a communication unit 209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 209 allows the device 200 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 201 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of computing unit 201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 201 performs the various methods and processes described above, such as method network node failure detection. For example, in some embodiments, method network node failure detection may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 200 via the ROM 202 and/or the communication unit 209. When the computer program is loaded into RAM 203 and executed by the computing unit 201, one or more steps of the method network node failure detection described above may be performed. Alternatively, in other embodiments, the computing unit 201 may be configured to perform method network node failure detection by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A network node fault detection method is applied to a content distribution network and comprises the following steps:

acquiring detection data obtained by detecting nodes of the content distribution network, and analyzing and processing the detection data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for current fault judgment;

the content distribution network dimension comprises a node dimension, a network IP dimension and a network link dimension; the node comprises one or more of the network IPs; the network link comprises a link from a regional operator to the content distribution network node or a link from a regional operator to a network IP of the content distribution network node; the detection data of the node dimension comprises detection data of all network link dimensions connected by the node; the network IP dimension detection data comprise detection data of all network link dimensions of the network IP connection;

performing fault judgment on the detection data of each dimension respectively to obtain a fault detection result under each dimension;

determining a fault detection result of the network IP dimension based on the fault detection result of the network link dimension;

and determining the fault detection result of the node dimension based on the fault detection result of the network IP dimension.

2. The method of claim 1, wherein the analyzing the probe data based on the plurality of dimensions of the content distribution network to obtain multi-dimensional probe data for current fault determination comprises:

analyzing and processing the detection data to obtain multi-dimensional detection data;

determining the dimension of the content distribution network with faults, and eliminating the fault detection data from the multi-dimensional detection data to obtain the multi-dimensional detection data for judging the current faults.

3. The method of claim 1, wherein the performing fault determination on the detection data of each dimension to obtain a fault detection result in each dimension includes:

determining a fault detection threshold corresponding to each dimension based on the number of the detection data of each dimension;

and performing fault detection on the detection data of each dimension based on the fault detection threshold value to obtain a fault detection result of each dimension.

4. A method according to claim 3, wherein the determining a fault detection threshold corresponding to each dimension based on the number of probe data for each dimension comprises:

determining a success rate threshold as a fault detection threshold for detecting a fault in response to the number of detection data being greater than or equal to a preset analysis value; and/or

And determining a failure number threshold as a fault detection threshold for detecting a fault in response to the detection data being smaller than a preset analysis value.

5. The method of claim 4, wherein performing fault detection on the detection data of each dimension based on the fault detection threshold value to obtain a fault detection result of each dimension comprises:

calculating the ratio of the number of the successfully detected data in the detected data to the number of the detected data to obtain the success rate of the detected data; calculating the number of data in the detection data, which is failed in detection, so as to obtain the failure number of the detection data;

and determining that the fault detection result of the dimension is fault in response to the success rate being smaller than or equal to the success rate threshold or the failure number being larger than the failure number threshold.

6. The method of claim 1, wherein the determining the failure detection result for the network IP dimension based on the failure detection result for the network link dimension comprises:

and determining that the network IP dimension is fault-free in response to the presence of a fault-free network link in all network links of the network IP connection.

7. The method of claim 1, wherein the determining the failure detection result for the network IP dimension based on the failure detection result for the network link dimension comprises:

the success rate of the probe data of all network links responding to the network IP connection is less than or equal to a success rate threshold value; and the success rate of the network IP dimension detection data is smaller than or equal to a success rate threshold value, or the failure number in the network IP dimension detection data is larger than a failure number threshold value;

and determining that the network IP dimension fails.

8. The method of claim 1, the determining the failure detection result for the node dimension based on the failure detection result for the network IP dimension, comprising:

and responding to the fault detection result of the network IP dimension, wherein the number of the network IPs with faults is larger than a fault threshold value, or the percentage of the number of the network IPs with faults accounting for the number of all the network IPs is larger than a fault percentage threshold value, and determining that the node dimension has faults.

9. A network node failure detection apparatus for use in a content distribution network, comprising:

the processing module is used for acquiring detection data obtained by detecting the nodes of the content distribution network, and analyzing and processing the detected data based on multiple dimensions of the content distribution network to obtain multi-dimensional detection data for current fault judgment; the fault detection method is also used for carrying out fault judgment on the detection data of each dimension respectively to obtain a fault detection result under each dimension;

the determining module is used for determining the fault detection result of the network IP dimension based on the fault detection result of the network link dimension; and determining a fault detection result of the node dimension based on the fault detection result of the network IP dimension.

10. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.