WO2020259551A1 - 一种网络连接故障处理方法及装置 - Google Patents

一种网络连接故障处理方法及装置 Download PDF

Info

Publication number
WO2020259551A1
WO2020259551A1 PCT/CN2020/097989 CN2020097989W WO2020259551A1 WO 2020259551 A1 WO2020259551 A1 WO 2020259551A1 CN 2020097989 W CN2020097989 W CN 2020097989W WO 2020259551 A1 WO2020259551 A1 WO 2020259551A1
Authority
WO
WIPO (PCT)
Prior art keywords
queue
semi
server
overflow
network connection
Prior art date
Application number
PCT/CN2020/097989
Other languages
English (en)
French (fr)
Inventor
赵帅
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2020259551A1 publication Critical patent/WO2020259551A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management

Definitions

  • This application relates to the field of Internet technology, and in particular to a method and device for handling network connection failures.
  • a container includes one or more applications and environment files necessary for the operation of these applications. Deploying applications in containers can reduce application operating differences caused by changes in the host operating system release version and other basic environments.
  • the server can be deployed in a container, so that even if the host where the server is located changes, the server can provide stable services to the client.
  • a connection between the client and the server deployed in the container needs to be established through TCP (Transmission Control Protocol, Transmission Control Protocol) before data transmission can be performed.
  • TCP Transmission Control Protocol
  • the connection establishment may fail. It can be understood that the connection establishment failure will cause the communication between the client and the server to fail, which in turn will have a greater impact on the availability of the service provided by the server. Impact.
  • the purpose of the embodiments of the present application is to provide a method and device for processing network connection failures, so as to implement container-based automated network connection failure processing.
  • the specific technical solutions are as follows:
  • the embodiment of the application provides a method for processing network connection failures, the method includes:
  • the network connection path between the client and the server determine the network connection path between the client and the server, perform fault detection on the nodes in the network connection path, and determine the node to be repaired, so as to The node to be repaired is repaired.
  • the method further includes:
  • the network connection path between the client and the server is determined , Perform fault detection on the nodes in the network connection path, and determine the nodes to be repaired, so as to repair the nodes to be repaired.
  • the determining whether the semi-connected queue of the server has overflowed includes:
  • the overflow information includes an overflow quantity; the increasing the queue length of the semi-connection queue according to a preset adjustment rule until the semi-connection queue of the server no longer overflows, including:
  • the obtaining the overflow information of the semi-connected queue of the server includes:
  • Log in to the container enter a network information query instruction in the container to obtain network information of the container, and query the network information for character information in a predetermined format corresponding to the overflow information as the overflow information .
  • the increasing the queue length of the semi-connection queue according to a preset adjustment rule includes:
  • N is a natural number, and the value of N is greater than 1;
  • the queue parameter is used to define the queue length of the semi-connection queue of the server.
  • the performing failure detection on a node in the network connection path and determining a node to be repaired includes:
  • the detection message includes identification information; the obtaining the number of detection messages received by each node in the network connection path includes:
  • the number of detection messages received by each node in the network connection path is obtained.
  • the embodiment of the present application also provides a network connection failure processing device, the device includes:
  • the determining module is configured to determine whether the semi-connection queue of the server has overflowed when the network connection between the client and the server deployed in the container fails to be established;
  • An adjustment module configured to increase the queue length of the semi-connection queue according to a preset adjustment rule if it is determined that the semi-connection queue of the server side overflows, until the semi-connection queue of the server side no longer overflows;
  • the first detection module is configured to determine the network connection path between the client and the server if it is determined that the semi-connection queue of the server does not overflow, and perform fault detection on nodes in the network connection path, The node to be repaired is determined, and the node to be repaired is used for repair.
  • the device further includes:
  • the second detection module is configured to determine that the client and the service fail to establish a network connection between the client and the server if the semi-connection queue of the server no longer overflows. For the network connection path between the terminals, fault detection is performed on the nodes in the network connection path, the node to be repaired is determined, and the node to be repaired is used for repair.
  • the determining module is specifically set as:
  • the overflow information includes an overflow quantity;
  • the adjustment module is specifically set as:
  • the determining module is specifically set as:
  • Log in to the container enter a network information query instruction in the container to obtain network information of the container, and query the network information for character information in a predetermined format corresponding to the overflow information as the overflow information .
  • the adjustment module is specifically set as follows:
  • N is a natural number, and the value of N is greater than 1;
  • the queue parameter is used to define the queue length of the semi-connection queue of the server.
  • the first detection module is specifically set to:
  • the detection message includes identification information; the first detection module is specifically set to:
  • the number of detection messages received by each node in the network connection path is obtained.
  • An embodiment of the present application also provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
  • Memory set to store computer programs
  • the processor is configured to implement any one of the above-mentioned network connection failure processing methods when executing the program stored in the memory.
  • the embodiment of the present application also provides a computer-readable storage medium, which stores a computer program in the computer-readable storage medium, and when the computer program is executed by a processor, implements any one of the aforementioned network connection failure processing methods.
  • the embodiment of the present application also provides an executable program code, the executable program code is set to be executed to execute any one of the aforementioned network connection failure processing methods.
  • the embodiment of the present application also provides a computer program product containing instructions, which when running on a computer, causes the computer to execute any of the aforementioned network connection failure processing methods.
  • the connection path detects the failure of the nodes in the network connection path, determines the node to be repaired, and repairs the node to be repaired.
  • the network fault can be diagnosed. If the semi-connected queue on the server side overflows, it is determined that the network connection failure is caused by the full semi-connected queue, and the queue length of the semi-connected queue is increased. Repair, if the semi-connection queue of the server does not overflow, it is determined that the network connection failure is caused by the node failure in the network connection path, and the failed node to be repaired is automatically located to repair the node to be repaired, thereby reducing manual consumption and improving the network Failure handling efficiency.
  • Figure 1 is a schematic flow diagram of a method for TCP to establish a connection
  • FIG. 2 is a schematic flowchart of a method for processing network connection failures according to an embodiment of the application
  • FIG. 3 is a schematic structural diagram of a network connection failure processing device provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the server can be deployed in a container, so that even if the host where the server is located changes, the server can provide stable services to the client.
  • the client and the server deployed in the container need to establish a connection through TCP (Transmission Control Protocol) before data transmission can be performed.
  • TCP Transmission Control Protocol
  • the way to establish a connection via TCP is: First, the client sends a SYN (Synchronize Sequence Numbers) message to the server, and the server returns SYN+ after receiving the SYN message ACK (Acknowledgement, confirmation character) message, and then, the client sends an ACK message to the server, thereby establishing a connection between the client and the server.
  • SYN Synchronize Sequence Numbers
  • ACK Acknowledgement, confirmation character
  • the server after receiving the SYN message, the server will generate an entry corresponding to the SYN message, and store the entry in the semi-connected queue, and after receiving the ACK message corresponding to the SYN message, then The entry will be stored from the semi-connected queue to the fully connected queue.
  • connection establishment may fail.
  • the server may not be able to reply to the SYN+ACK message in time because the semi-connection queue of the server is full, which causes the connection establishment to fail, or it may also be caused by the packet loss of a node in the network connection path. Connection failed, etc.
  • the network connection failure can only be handled manually by operation and maintenance personnel, which requires a lot of labor and is low in efficiency. It lacks a container-based automated network connection failure processing method.
  • this application provides a network connection fault handling method, which can be applied to any electronic device, such as the host of the container where the server is located, other computers in the network, mobile terminals, etc.
  • the application embodiment does not limit this.
  • the following generally describes the network connection failure processing method provided by the embodiments of the present application, and the foregoing network connection failure processing method includes:
  • the network connection path between the client and the server determine the network connection path between the client and the server, perform fault detection on the nodes in the network connection path, and determine the nodes to be repaired to repair the nodes to be repaired.
  • the method for handling network connection failures can diagnose network failures by judging whether the semi-connection queue on the server side overflows. If the semi-connection queue on the server side overflows, it is determined that the semi-connection queue is full. The network connection is faulty and repaired by increasing the queue length of the semi-connected queue. If the semi-connected queue on the server side does not overflow, it is determined that the network connection failure is caused by a node failure in the network connection path, and the failed node to be repaired is automatically located. To repair the nodes to be repaired, thereby reducing manual consumption and improving the efficiency of network fault handling.
  • a schematic flow chart of a method for handling network connection failures includes the following steps:
  • S201 In the case that the establishment of the network connection between the client and the server deployed in the container fails, determine whether the semi-connection queue of the server overflows. If it overflows, execute S202, if it does not overflow, execute S203.
  • the server is deployed in a container, where the container refers to a complete operating environment.
  • the container can include the server, class libraries, other binary files, configuration files, etc. for the service
  • the terminal provides files for the operating environment.
  • the server's semi-connection queue overflows by obtaining the overflow information of the server's semi-connection queue. If the overflow information is obtained, it is determined that the semi-connection queue of the server is determined to overflow, and if the overflow information is not obtained, it is determined that the semi-connection queue of the server has not overflowed.
  • the overflow information may be an identification information, that is, when the semi-connection queue of the server overflows, an identification information may be generated to indicate that the semi-connection queue of the server overflows.
  • the overflow information can also include the overflow quantity, that is, when the server's semi-connected queue overflows, the number of entries that overflow the semi-connected queue is output.
  • the overflow quantity can be a cumulative value. If the semi-connection queue overflows, the overflow quantity will increase. If the semi-connection queue does not overflow, then the overflow quantity will not be output, or the overflow quantity will not be output compared with the previous output. Will change.
  • the overflow information of the semi-connected queue of the server you can log in to the container where the server is located. Then, enter the network information query command in the container to obtain the network information of the container. Furthermore, query the character information in the predetermined format corresponding to the overflow information in the network information, and if the character information in the predetermined format is queried in the network information, the character information in the predetermined format is analyzed to obtain the semi-connected server For the overflow information of the queue, if the character information conforming to the preset format is not queried in the network information, it is determined that the overflow information is not obtained.
  • the queue length of the semi-connection queue can be increased until the semi-connection queue on the server side no longer overflows, so that the semi-connection queue on the server side can store new entries.
  • the way to increase the queue length of the semi-connected queue can be as follows: firstly, obtain the queue parameters of the server, where the queue parameters are used to define the queue length of the semi-connected queue of the server, and then adjust according to the parameters Rule, increase the value of the queue parameter.
  • increasing the value of the queue parameter can be to increase the value of the queue parameter by N times, where the value of N is a natural number greater than 1. For example, increase the value of the queue parameter to twice the current value.
  • increasing the value of the queue parameter according to the parameter adjustment rule, or increasing the value of the queue parameter by a preset value For example, increase the value of the queue parameter by 1000 based on the current value, and so on.
  • the queue parameters include the net.core.somaxconn parameter and the backlog parameter, where the net.core.somaxconn parameter defines the maximum length of the listening queue allowed by the server, and the backlog parameter defines the server The maximum length of the fully connected queue allowed.
  • the maximum queue length of the semi-connected queue is the minimum of the net.core.somaxconn parameter and the backlog parameter.
  • the queue parameter may also include the tcp_max_syn_backlog parameter.
  • the maximum queue length of the semi-connected queue is the minimum value among the net.core.somaxconn parameter, the backlog parameter, and the tcp_max_syn_backlog parameter.
  • the client can try to establish a network connection with the server. Specifically, an instruction to establish a connection can be sent to the client, and after receiving the instruction, the client re-establishes a network connection with the server deployed in the container. Alternatively, the client can continuously try to establish a connection with the server deployed in the container at a preset time interval.
  • the overflow information contains the overflow quantity
  • the overflow quantity of the semi-connection queue after the increase in the queue length can be obtained, and then the comparison The number of overflows of the semi-connected queue after the increase of the queue length this time and the number of overflows obtained previously.
  • the queue length of the semi-connection queue can be increased according to the preset adjustment rule until the semi-connection queue no longer overflows.
  • the network connection between the client and the server is successfully established under the condition that the semi-connected queue of the server no longer overflows, then it indicates that the previous network connection failure was caused by the full semi-connected queue of the server , By increasing the queue length of the semi-connection queue, the network connection failure has been dealt with.
  • the network connection between the client and the server still fails to be established when the semi-connection queue of the server no longer overflows, then it indicates that the network connection failure is caused by a packet loss at a node in the network connection path.
  • S203 Determine the network connection path between the client and the server, perform fault detection on nodes in the network connection path, determine the node to be repaired, and repair the node to be repaired.
  • the network connection path between the client and the server can be determined, each node in the network connection path can be detected for failure, and the node to be repaired can be determined.
  • the network connection path between the client and the server can be determined by querying the routing table, or the network connection path between the client and the server can also be determined by querying information from the operator.
  • the specific is not limited.
  • the fault detection of the nodes in the network connection path and the way to determine the nodes to be repaired can be as follows: first, send a fault detection instruction to the client, so that the client can use ping (Packet Internet Groper, Internet packet explorer), Send a preset number of detection messages to the server, and then obtain the number of detection messages received by each node in the network connection path, and determine whether the number of detection messages received by each node is the same as the preset number If they are not the same, the node is determined as the node to be repaired.
  • ping Packet Internet Groper, Internet packet explorer
  • the fault detection instruction may be a ping-c 1000-Q 0x2 ⁇ server_ip> command, and the preset number of detection messages sent may be carried in the fault detection instruction, or may be randomly generated by the client.
  • each node to be repaired can be repaired to deal with network connection failures as soon as possible.
  • it is also possible to repair each node to be repaired in turn in the order in which nodes pass from the client to the server in the network connection path, and try to establish a network connection between the client and the server after each repair of a node to be repaired It can be understood that the failure of the node to be repaired with the first packet loss may be greater than that of the nodes to be repaired with subsequent packet loss. Therefore, the resource consumption in processing network connection failures can be reduced.
  • the detection message may also include identification information, and the identification information is used to identify the type of the message as a detection message.
  • the identification information can be located in the DSCP (Differentiated Services Code Point) field in the detection message.
  • the DSCP field can be set to a specific value, which indicates that the message type is a detection message. Text.
  • the method for handling network connection failures can diagnose network failures by judging whether the semi-connection queue on the server side overflows. If the semi-connection queue on the server side overflows, it is determined that the semi-connection queue is full. The network connection is faulty and repaired by increasing the queue length of the semi-connected queue. If the semi-connected queue on the server side does not overflow, it is determined that the network connection failure is caused by a node failure in the network connection path, and the failed node to be repaired is automatically located. To repair the nodes to be repaired, thereby reducing manual consumption and improving the efficiency of network fault handling.
  • an embodiment of the present application also provides a network connection failure processing device.
  • FIG. 3 it is a schematic structural diagram of the foregoing network connection failure processing device.
  • the device includes:
  • the determining module 310 is configured to determine whether the semi-connection queue of the server has overflowed when the network connection between the client and the server deployed in the container fails to be established;
  • the adjustment module 320 is configured to increase the queue length of the semi-connection queue according to a preset adjustment rule if it is determined that the semi-connection queue of the server side overflows, until the semi-connection queue of the server side no longer overflows;
  • the first detection module 330 is configured to determine the network connection path between the client and the server if it is determined that the semi-connected queue of the server does not overflow, perform fault detection on nodes in the network connection path, and determine the node to be repaired, and wait Repair the node for repair.
  • the device further includes:
  • the second detection module (not shown in the figure) is set to determine the connection between the client and the server if the network connection between the client and the server fails to be established when the semi-connected queue on the server no longer overflows To perform fault detection on nodes in the network connection path, determine the node to be repaired, and repair the node to be repaired.
  • the determining module 310 is specifically set as follows:
  • the overflow information includes the overflow quantity; the adjustment module 320 is specifically set as follows:
  • the determining module 310 is specifically set as follows:
  • Log in to the container enter a network information query command in the container to obtain the network information of the container, and query the network information for character information in a predetermined format corresponding to the overflow information as the overflow information.
  • the adjustment module 320 is specifically set as follows:
  • N is a natural number, and the value of N is greater than 1;
  • the queue parameter is used to define the queue length of the semi-connected queue of the server.
  • the first detection module 330 is specifically set as:
  • the detection message includes identification information; the first detection module 330 is specifically set to:
  • the number of detection messages received by each node in the network connection path is obtained.
  • the network connection failure processing device can diagnose network failures by judging whether the semi-connection queue on the server side overflows. If the semi-connection queue on the server side overflows, it is determined that the semi-connection queue is full. The network connection is faulty and repaired by increasing the queue length of the semi-connected queue. If the semi-connected queue on the server side does not overflow, it is determined that the network connection failure is caused by a node failure in the network connection path, and the failed node to be repaired is automatically located. To repair the nodes to be repaired, thereby reducing manual consumption and improving the efficiency of network fault handling.
  • An embodiment of the present application also provides an electronic device, as shown in FIG. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404.
  • the processor 401, the communication interface 402, and the memory 403 pass through the communication bus 404. Complete the communication between each other,
  • the memory 403 is set to store computer programs
  • the network connection path between the client and the server is determined, the nodes in the network connection path are detected for failure, and the node to be repaired is determined to repair the node to be repaired.
  • the electronic device provided by the embodiment of the present application can diagnose network faults by judging whether the semi-connected queue on the server side overflows, and if the semi-connected queue on the server side overflows, it is determined that the network connection failure is caused by the full semi-connected queue. , And repair by increasing the queue length of the semi-connected queue. If the semi-connected queue of the server does not overflow, it is determined that the network connection failure is caused by the node failure in the network connection path, and the failed node to be repaired is automatically located for repair Nodes are repaired, thereby reducing manual consumption and improving the efficiency of network fault handling.
  • the communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the aforementioned electronic device and other devices.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the foregoing processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, any network connection failure described above is realized. Processing method steps.
  • an executable program code is also provided, and the executable program code is configured to be executed to execute any one of the aforementioned network connection failure handling methods.
  • a computer program product containing instructions is also provided, which when running on a computer, causes the computer to execute any network connection failure processing method in the foregoing embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data
  • the center transmits to another website, computer, server, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请实施例提供了一种网络连接故障处理方法及装置,在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队列是否溢出,若确定服务端的半连接队列溢出,则按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出,若确定服务端的半连接队列未溢出,则确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以对待修复节点进行修复。这样,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,从而减少人工消耗,提高网络故障处理效率。

Description

一种网络连接故障处理方法及装置
本申请要求于2019年6月28日提交中国专利局、申请号为201910578595.1发明名称为“一种网络连接故障处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,特别是涉及一种网络连接故障处理方法及装置。
背景技术
在一个容器中,包括一个或多个应用程序以及这些应用程序的运行所必需的环境文件。将应用程序部署在容器中,可以减少由于宿主机操作系统发行版本和其他基础环境的变化造成的应用程序运行差异。一些场景中,可以将服务端部署在容器中,这样,即使服务端所处的宿主机发生变化,服务端也可以为客户端提供稳定的服务。
通常,客户端与部署在容器中的服务端之间需要通过TCP(Transmission Control Protocol,传输控制协议)建立连接,才能进行数据传输。但是,在通过TCP建立连接的过程中,存在建立连接失败的可能,可以理解,连接建立失败将导致客户端与服务端之间的通信失败,进而对服务端所提供的服务的可用性产生较大的影响。而目前通常只能通过运维人员人工排查的方式,处理网络连接故障,需要耗费大量的人工,且效率较低。
因此,目前亟需一种基于容器平台的业务的自动化的网络连接故障处理方法。
发明内容
本申请实施例的目的在于提供一种网络连接故障处理方法及装置,以实现基于容器的自动化的网络连接故障处理。具体技术方案如下:
本申请实施例提供了一种网络连接故障处理方法,所述方法包括:
在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定所述服务端的半连接队列是否溢出;
如果确定所述服务端的半连接队列溢出,则按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出;
如果确定所述服务端的半连接队列未溢出,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以对所述待修复节点进行修复。
可选的,在所述按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出之后,所述方法还包括:
若在所述服务端的半连接队列不再溢出的情况下,所述客户端与所述服务端之间的网络连接建立失败,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以对所述待修复节点进行修复。
可选的,所述确定所述服务端的半连接队列是否溢出,包括:
获取所述服务端的半连接队列的溢出信息;
若获取到所述溢出信息,则确定所述服务端的确定所述服务端的半连接队列溢出;
若未获取到所述溢出信息,则确定所述服务端的半连接队列未溢出。
可选的,所述溢出信息中包含有溢出数量;所述按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出,包括:
每次增大所述半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量;
比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量,若两者相同,则确定所述服务端的半连接队列不再溢出
可选的,所述获取所述服务端的半连接队列的溢出信息,包括:
登录所述容器,在所述容器内输入网络信息查询指令,以获取所述容器的网络信息,在所述网络信息中查询与所述溢出信息对应的预定格式的字符 信息,作为所述溢出信息。
可选的,所述按照预设调整规则,增大所述半连接队列的队列长度,包括:
将所述服务端的队列参数的数值增大N倍,其中,所述N为自然数,且N的取值大于1;或者,
将所述服务端的队列参数的数值增加预设值;
其中,所述队列参数用于定义所述服务端的半连接队列的队列长度。
可选的,所述对所述网络连接路径中的节点进行故障检测,确定待修复节点,包括:
向所述客户端发送故障检测指令,以使所述客户端利用因特网包探索器,向所述服务端发送预设数量的检测报文;
获取所述网络连接路径中的每个节点接收到的检测报文的数量;
判断每个所述节点接收到的检测报文的数量是否与所述预设数量相同,若不相同,则将该节点确定为待修复节点。
可选的,所述检测报文中包括标识信息;所述获取所述网络连接路径中的每个节点接收到的检测报文的数量,包括:
根据所述标识信息,获取所述网络连接路径中的每个节点接收到的检测报文的数量。
本申请实施例还提供了一种网络连接故障处理装置,所述装置包括:
确定模块,设置为在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定所述服务端的半连接队列是否溢出;
调整模块,设置为如果确定所述服务端的半连接队列溢出,则按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出;
第一检测模块,设置为如果确定所述服务端的半连接队列未溢出,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的 节点进行故障检测,确定待修复节点,以所述待修复节点进行修复。
可选的,所述装置还包括:
第二检测模块,设置为若在所述服务端的半连接队列不再溢出的情况下,所述客户端与所述服务端之间的网络连接建立失败,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以所述待修复节点进行修复。
可选的,所述确定模块,具体设置为:
获取所述服务端的半连接队列的溢出信息;
若获取到所述溢出信息,则确定所述服务端的确定所述服务端的半连接队列溢出;
若未获取到所述溢出信息,则确定所述服务端的半连接队列未溢出。
可选的,所述溢出信息中包含有溢出数量;所述调整模块,具体设置为:
每次增大所述半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量;
比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量,若两者相同,则确定所述服务端的半连接队列不再溢出
可选的,所述确定模块,具体设置为:
登录所述容器,在所述容器内输入网络信息查询指令,以获取所述容器的网络信息,在所述网络信息中查询与所述溢出信息对应的预定格式的字符信息,作为所述溢出信息。
可选的,所述调整模块,具体设置为:
将所述服务端的队列参数的数值增大N倍,其中,所述N为自然数,且N的取值大于1;或者,
将所述服务端的队列参数的数值增加预设值;
其中,所述队列参数用于定义所述服务端的半连接队列的队列长度。
可选的,所述第一检测模块,具体设置为:
向所述客户端发送故障检测指令,以使所述客户端利用因特网包探索器,向所述服务端发送预设数量的检测报文;
获取所述网络连接路径中的每个节点接收到的检测报文的数量;
判断每个所述节点接收到的检测报文的数量是否与所述预设数量相同,若不相同,则将该节点确定为待修复节点。
可选的,所述检测报文中包括标识信息;所述第一检测模块,具体设置为:
根据所述标识信息,获取所述网络连接路径中的每个节点接收到的检测报文的数量。
本申请实施例还提供了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
存储器,设置为存放计算机程序;
处理器,设置为执行存储器上所存放的程序时,实现上述任一所述的网络连接故障处理方法。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一所述的网络连接故障处理方法。
本申请实施例还提供了一种可执行程序代码,所述可执行程序代码设置为被运行以执行上述任一所述的网络连接故障处理方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述任一所述的网络连接故障处理方法。
本申请实施例有益效果:
本申请实施例提供的网络连接故障处理方法及装置,在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队 列是否溢出,若确定服务端的半连接队列溢出,则按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出,若确定服务端的半连接队列未溢出,则确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以对待修复节点进行修复。这样,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,若服务端的半连接队列溢出,则判定由于半连接队列已满导致网络连接故障,并通过增大半连接队列的队列长度进行修复,若服务端的半连接队列未溢出,则判定由于网络连接路径中的节点故障导致网络连接故障,并自动定位发生故障的待修复节点,以对待修复节点进行修复,从而减少人工消耗,提高网络故障处理效率。
当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和相关技术的技术方案,下面对实施例和相关技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为TCP建立连接的方法的流程示意图;
图2为本申请实施例提供的一种网络连接故障处理方法的流程示意图;
图3为本申请实施例提供的一种网络连接故障处理装置的结构示意图;
图4为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
一些场景中,可以将服务端部署在容器中,这样,即使服务端所处的宿 主机发生变化,服务端也可以为客户端提供稳定的服务。通常,客户端与部署在容器中的服务端之间需要先通过TCP(Transmission Control Protocol,传输控制协议)建立连接,才能进行数据传输。
如图1所示,通过TCP建立连接的方式为:首先,由客户端向服务端发送SYN(Synchronize Sequence Numbers,同步序列编号)报文,服务端在接收到的SYN报文之后,返回SYN+ACK(Acknowledgement,确认字符)报文,进而,客户端向服务端发送ACK报文,从而建立客户端与服务端之间的连接。
其中,服务端在接收到的SYN报文之后,会生成该SYN报文对应的条目,并将该条目存储至半连接队列中,而在接收到该SYN报文对应的ACK报文之后,则会将该条目从半连接队列存储至全连接队列中。
但是,在通过TCP建立连接的过程中,存在建立连接失败的可能。举例而言,可能由于服务端的半连接队列已满,导致服务端不能及时回复SYN+ACK报文,从而导致建立连接失败,或者,也可能由于网络连接路径中的某一节点丢包,导致建立连接失败,等等。而目前通常只能通过运维人员人工排查的方式,处理网络连接故障,需要耗费大量的人工,且效率较低,缺少一种基于容器的自动化的网络连接故障处理方法。
为了解决上述技术问题,本申请提供了一种网络连接故障处理方法,该方法可以应用于任一电子设备,如服务端所处容器的宿主机、网络中的其他计算机、移动终端等等,本申请实施例对此不做限定。
下面从总体上对本申请实施例提供的网络连接故障处理方法进行说明,上述网络连接故障处理方法包括:
在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队列是否溢出;
如果确定服务端的半连接队列溢出,则按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出;
如果确定服务端的半连接队列未溢出,则确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以 对待修复节点进行修复。
由以上可见,本申请实施例提供的网络连接故障处理方法,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,若服务端的半连接队列溢出,则判定由于半连接队列已满导致网络连接故障,并通过增大半连接队列的队列长度进行修复,若服务端的半连接队列未溢出,则判定由于网络连接路径中的节点故障导致网络连接故障,并自动定位发生故障的待修复节点,以对待修复节点进行修复,从而减少人工消耗,提高网络故障处理效率。
下面通过具体实施例,对本申请实施例提供的网络连接故障处理方法进行详细说明。
如图2所示,为本申请实施例提供的一种网络连接故障处理方法的流程示意图,包括如下步骤:
S201:在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队列是否溢出。若溢出,则执行S202,若未溢出,则执行S203。
在本步骤中,服务端部署在容器中,其中,容器是指一个完整的运行环境,在一个容器数据包中,可以包括服务端、以及类库、其他二进制文件、配置文件等等为该服务端提供运行环境的文件。通过将服务端及其运行环境容器化,可以减少操作系统发行版本和其他基础环境造成的服务端运行差异。
一种实现方式中,可以通过获取服务端的半连接队列的溢出信息,确定服务端的半连接队列是否溢出。若获取到溢出信息,则确定服务端的确定服务端的半连接队列溢出,若未获取到溢出信息,则确定服务端的半连接队列未溢出。
其中,溢出信息可以为一个标识信息,也就是说,当服务端的半连接队列溢出时,可以生成一个标识信息,以表示服务端的半连接队列溢出。溢出信息也可以包含溢出数量,也就是说,当服务端的半连接队列溢出时,输出溢出半连接队列的条目的数量。溢出数量可以为累计值,若半连接队列溢出, 那么,溢出数量会增大,若半连接队列未溢出,那么,将不会输出溢出数量,或者,溢出数量与前一次的输出相比,不会发生变化。
在获取服务端的半连接队列的溢出信息时,可以先登录到服务端所处的容器。然后,在容器内输入网络信息查询指令,获取容器的网络信息。进而,在网络信息中查询与溢出信息对应的预定格式的字符信息,若在网络信息中查询到符合预设格式的字符信息,则对符合预设格式的字符信息进行分析,得到服务端的半连接队列的溢出信息,若网络信息中未查询到符合预设格式的字符信息,则判定未获取到溢出信息。
举例而言,可以通过docker exec命令(容器执行命令),登录到服务端所处的容器中。然后,通过在容器内输入“netstat-s|grep overflow”指令,获取到容器的网络信息。若输出的网络信息中有类似“XXX times listen queue of socket overflow”的输出,则表明网络信息中存在符合预设格式的字符,其中,“XXX”表示任意一个具体的数字,也就是服务端的半连接队列的溢出数量。
S202:按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出。
在本步骤中,若确定服务端的半连接队列溢出,表明服务端当前半连接队列已满,也就是说,可以初步判定是由于半连接队列已满,导致网络连接故障。在这种情况下,可以增大半连接队列的队列长度,直至服务端的半连接队列不再溢出,从而使得服务端的半连接队列能够存储新的条目。
其中,按照预设调整规则,增大半连接队列的队列长度的方式,可以为:首先,获取服务端的队列参数,其中,队列参数用于定义服务端的半连接队列的队列长度,然后,按照参数调整规则,增大队列参数的数值。
其中,按照参数调整规则,增大队列参数的数值,可以为将队列参数的数值增大N倍,其中,N的取值为大于1的自然数。比如,将队列参数的数值增大至当前值的2倍。
或者,按照参数调整规则,增大队列参数的数值,也可以为将队列参数的数值增大预设值。比如,将队列参数的数值在当前值的基础上增大1000,等等。
举例而言,一种实现方式中,队列参数包括net.core.somaxconn参数和backlog参数,其中,net.core.somaxconn参数定义了服务端允许的监听队列的最大长度,而backlog参数定义了服务端允许的全连接队列的最大长度。在这种情况下,半连接队列的最大队列长度为net.core.somaxconn参数和backlog参数中的最小值。
在另一种实现方式中,队列参数还可以包括tcp_max_syn_backlog参数,在这种情况下,半连接队列的最大队列长度为net.core.somaxconn参数、backlog参数以及tcp_max_syn_backlog参数中的最小值。
在按照预设调整规则,增大半连接队列的队列长度之后,可以由客户端尝试与服务端建立网络连接。具体的,可以向客户端发送建立连接的指令,客户端在接收到该指令之后,再次与部署于容器中的服务端建立网络连接。或者,客户端也可以按照预设时间间隔,不断尝试与部署于容器中的服务端建立连接。
一种实现方式中,若溢出信息中包含有溢出数量,那么,可以在每次增大半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量,然后,比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量。
若两者相同,可以确定服务端的半连接队列不再溢出,若该次增大队列长度后的半连接队列的溢出数量大于前一次获取的溢出数量,可以确定服务端的半连接队列仍然是溢出的,进而,可以继续按照预设调整规则,继续增大半连接队列的队列长度,直到半连接队列不再溢出。
若在服务端的半连接队列不再溢出的情况下,客户端与所述服务端之间的网络连接建立成功,那么,表明在此之前的网络连接故障是由于服务端的半连接队列已满导致的,通过增大半连接队列的队列长度,网络连接故障已经得到了处理。
而若在服务端的半连接队列不再溢出的情况下,客户端与服务端之间的网络连接依然建立失败,那么,表明网络连接故障是由于网络连接路径中的某一节点丢包导致的。在这种情况下,可以进一步确定客户端与服务端之间 的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,对待修复节点进行修复。
S203:确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以对待修复节点进行修复。
在本步骤中,若确定服务端的半连接队列未溢出,可以推断,是由于网络连接路径中的节点故障,发生了丢包,从而导致网络故障。那么,可以确定客户端与服务端之间的网络连接路径,对网络连接路径中的各个节点进行故障检测,确定待修复节点。
具体的,可以通过查询路由表的方式,确定客户端与服务端之间的网络连接路径,或者,也可以通过向运营商方式查询信息的方式,确定客户端与服务端之间的网络连接路径,具体不做限定。
其中,对网络连接路径中的节点进行故障检测,确定待修复节点的方式,可以为:首先,向客户端发送故障检测指令,以使客户端利用ping(Packet Internet Groper,因特网包探索器),向服务端发送预设数量的检测报文,然后,获取网络连接路径中的每个节点接收到的检测报文的数量,判断每个节点接收到的检测报文的数量是否与预设数量相同,若不相同,则将该节点确定为待修复节点。
其中,故障检测指令可以为ping-c 1000-Q 0x2<server_ip>命令,所发送的检测报文的预设数量,可以由故障检测指令中携带,或者,也可以由客户端随机生成。
对于接收到的检测报文的数量与预设数量不相同的待修复节点,可以对每个待修复节点进行修复,以尽快处理网络连接故障。或者,也可以按照网络连接路径中,从客户端至服务端的节点通过顺序,依次对每个待修复节点进行修复,每修复一个待修复节点,就尝试建立客户端与服务端之间的网络连接,可以理解,第一个丢包的待修复节点发生故障的可能大于后几个丢包的待修复节点,因此,可以减少处理网络连接故障中的资源消耗。
检测报文中还可以包括标识信息,该标识信息用于标识报文的类型为检测报文。这样,可以根据标识信息,获取网络连接路径中的每个节点接收到 的检测报文的数量。从而提高故障检测的准确性。举例而言,标识信息可以位于检测报文中的DSCP(Differentiated Services Code Point,差分服务代码点)字段,比如,可以将DSCP字段设定为一个特定值,该特定值表示报文类型为检测报文。
上述内容中,比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量,如果该次增大队列长度后的半连接队列的溢出数量大于前一次获取的溢出数量,则可以返回按照预设调整规则,增大半连接队列的队列长度的步骤,直至某一次获取的溢出数量不大于上一次所获取的溢出数量。
由以上可见,本申请实施例提供的网络连接故障处理方法,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,若服务端的半连接队列溢出,则判定由于半连接队列已满导致网络连接故障,并通过增大半连接队列的队列长度进行修复,若服务端的半连接队列未溢出,则判定由于网络连接路径中的节点故障导致网络连接故障,并自动定位发生故障的待修复节点,以对待修复节点进行修复,从而减少人工消耗,提高网络故障处理效率。
与上述方法相对应的,本申请实施例还提供了一种网络连接故障处理装置,如图3所示,为上述网络连接故障处理装置的结构示意图,该装置包括:
确定模块310,设置为在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队列是否溢出;
调整模块320,设置为如果确定服务端的半连接队列溢出,则按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出;
第一检测模块330,设置为如果确定服务端的半连接队列未溢出,则确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以待修复节点进行修复。
一种实现方式中,装置还包括:
第二检测模块(图中未示出),设置为若在服务端的半连接队列不再溢出的情况下,客户端与服务端之间的网络连接建立失败,则确定客户端与服务 端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以待修复节点进行修复。
一种实现方式中,确定模块310,具体设置为:
获取服务端的半连接队列的溢出信息;
若获取到溢出信息,则确定服务端的确定服务端的半连接队列溢出;
若未获取到溢出信息,则确定服务端的半连接队列未溢出。
一种实现方式中,溢出信息中包含有溢出数量;调整模块320,具体设置为:
每次增大半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量;
比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量,若两者相同,则确定服务端的半连接队列不再溢出。
一种实现方式中,确定模块310,具体设置为:
登录容器,在容器内输入网络信息查询指令,以获取容器的网络信息,在网络信息中查询与溢出信息对应的预定格式的字符信息,作为溢出信息。
一种实现方式中,调整模块320,具体设置为:
将服务端的队列参数的数值增大N倍,其中,N为自然数,且N的取值大于1;或者,
将服务端的队列参数的数值增加预设值;
其中,队列参数用于定义服务端的半连接队列的队列长度。
一种实现方式中,第一检测模块330,具体设置为:
向客户端发送故障检测指令,以使客户端利用因特网包探索器,向服务端发送预设数量的检测报文;
获取网络连接路径中的每个节点接收到的检测报文的数量;
判断每个节点接收到的检测报文的数量是否与预设数量相同,若不相同, 则将该节点确定为待修复节点。
一种实现方式中,检测报文中包括标识信息;第一检测模块330,具体设置为:
根据标识信息,获取网络连接路径中的每个节点接收到的检测报文的数量。
由以上可见,本申请实施例提供的网络连接故障处理装置,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,若服务端的半连接队列溢出,则判定由于半连接队列已满导致网络连接故障,并通过增大半连接队列的队列长度进行修复,若服务端的半连接队列未溢出,则判定由于网络连接路径中的节点故障导致网络连接故障,并自动定位发生故障的待修复节点,以对待修复节点进行修复,从而减少人工消耗,提高网络故障处理效率。
本申请实施例还提供了一种电子设备,如图4所示,包括处理器401、通信接口402、存储器403和通信总线404,其中,处理器401,通信接口402,存储器403通过通信总线404完成相互间的通信,
存储器403,设置为存放计算机程序;
处理器401,设置为执行存储器403上所存放的程序时,实现如下步骤:
在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定服务端的半连接队列是否溢出;
如果确定服务端的半连接队列溢出,则按照预设调整规则,增大半连接队列的队列长度,直至服务端的半连接队列不再溢出;
如果确定服务端的半连接队列未溢出,则确定客户端与服务端之间的网络连接路径,对网络连接路径中的节点进行故障检测,确定待修复节点,以对待修复节点进行修复。
由以上可见,本申请实施例提供的电子设备,通过判断服务端的半连接队列是否溢出,可以对网络故障进行诊断,若服务端的半连接队列溢出,则判定由于半连接队列已满导致网络连接故障,并通过增大半连接队列的队列 长度进行修复,若服务端的半连接队列未溢出,则判定由于网络连接路径中的节点故障导致网络连接故障,并自动定位发生故障的待修复节点,以对待修复节点进行修复,从而减少人工消耗,提高网络故障处理效率。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一网络连接故障处理方法的步骤。
在本申请提供的又一实施例中,还提供了一种可执行程序代码,所述可执行程序代码设置为被运行以执行上述任一所述的网络连接故障处理方法。
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一网络连接故障处理方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形 式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务端或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务端或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务端、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例、电子设备实施例、存储介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (19)

  1. 一种网络连接故障处理方法,其中,所述方法包括:
    在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定所述服务端的半连接队列是否溢出;
    如果确定所述服务端的半连接队列溢出,则按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出;
    如果确定所述服务端的半连接队列未溢出,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以对所述待修复节点进行修复。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    若在所述服务端的半连接队列不再溢出的情况下,所述客户端与所述服务端之间的网络连接建立失败,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以对所述待修复节点进行修复。
  3. 根据权利要求1或2所述的方法,其中,所述确定所述服务端的半连接队列是否溢出,包括:
    获取所述服务端的半连接队列的溢出信息;
    若获取到所述溢出信息,则确定所述服务端的确定所述服务端的半连接队列溢出;
    若未获取到所述溢出信息,则确定所述服务端的半连接队列未溢出。
  4. 根据权利要求3所述的方法,其中,所述溢出信息中包含有溢出数量;所述按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出,包括:
    每次增大所述半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量;
    比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出 数量,若两者相同,则确定所述服务端的半连接队列不再溢出。
  5. 根据权利要求3所述的方法,其中,所述获取所述服务端的半连接队列的溢出信息,包括:
    登录所述容器,在所述容器内输入网络信息查询指令,以获取所述容器的网络信息,在所述网络信息中查询与所述溢出信息对应的预定格式的字符信息,作为所述溢出信息。
  6. 根据权利要求1至5任一项所述的方法,其中,所述按照预设调整规则,增大所述半连接队列的队列长度,包括:
    将所述服务端的队列参数的数值增大N倍,其中,所述N为自然数,且N的取值大于1;或者,
    将所述服务端的队列参数的数值增加预设值;
    其中,所述队列参数用于定义所述服务端的半连接队列的队列长度。
  7. 根据权利要求1至6任一项所述的方法,其中,所述对所述网络连接路径中的节点进行故障检测,确定待修复节点,包括:
    向所述客户端发送故障检测指令,以使所述客户端利用因特网包探索器,向所述服务端发送预设数量的检测报文;
    获取所述网络连接路径中的每个节点接收到的检测报文的数量;
    判断每个所述节点接收到的检测报文的数量是否与所述预设数量相同,若不相同,则将该节点确定为待修复节点。
  8. 根据权利要求7所述的方法,其中,所述检测报文中包括标识信息;
    所述获取所述网络连接路径中的每个节点接收到的检测报文的数量,包括:
    根据所述标识信息,获取所述网络连接路径中的每个节点接收到的检测报文的数量。
  9. 一种网络连接故障处理装置,其中,所述装置包括:
    确定模块,设置为在客户端与部署于容器中的服务端之间的网络连接建立失败的情况下,确定所述服务端的半连接队列是否溢出;
    调整模块,设置为如果确定所述服务端的半连接队列溢出,则按照预设调整规则,增大所述半连接队列的队列长度,直至所述服务端的半连接队列不再溢出;
    第一检测模块,设置为如果确定所述服务端的半连接队列未溢出,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以所述待修复节点进行修复。
  10. 根据权利要求9所述的装置,其中,所述装置还包括:
    第二检测模块,设置为若在所述服务端的半连接队列不再溢出的情况下,所述客户端与所述服务端之间的网络连接建立失败,则确定所述客户端与所述服务端之间的网络连接路径,对所述网络连接路径中的节点进行故障检测,确定待修复节点,以所述待修复节点进行修复。
  11. 根据权利要求9或10所述的装置,其中,所述确定模块设置为:
    获取所述服务端的半连接队列的溢出信息;
    若获取到所述溢出信息,则确定所述服务端的确定所述服务端的半连接队列溢出;
    若未获取到所述溢出信息,则确定所述服务端的半连接队列未溢出。
  12. 根据权利要求11所述的装置,其中,所述溢出信息中包含有溢出数量;所述调整模块设置为:
    每次增大所述半连接队列的队列长度后,获取该次增大队列长度后的半连接队列的溢出数量;
    比对该次增大队列长度后的半连接队列的溢出数量和前一次获取的溢出数量,若两者相同,则确定所述服务端的半连接队列不再溢出。
  13. 根据权利要求11所述的装置,其中,所述确定模块设置为:
    登录所述容器,在所述容器内输入网络信息查询指令,以获取所述容器 的网络信息,在所述网络信息中查询与所述溢出信息对应的预定格式的字符信息,作为所述溢出信息。
  14. 根据权利要求9至13任一项所述的装置,其中,所述调整模块设置为:
    将所述服务端的队列参数的数值增大N倍,其中,所述N为自然数,且N的取值大于1;或者,
    将所述服务端的队列参数的数值增加预设值;
    其中,所述队列参数用于定义所述服务端的半连接队列的队列长度。
  15. 根据权利要求9至14任一项所述的装置,其中,所述第一检测模块,设置为:
    向所述客户端发送故障检测指令,以使所述客户端利用因特网包探索器,向所述服务端发送预设数量的检测报文;
    获取所述网络连接路径中的每个节点接收到的检测报文的数量;
    判断每个所述节点接收到的检测报文的数量是否与所述预设数量相同,若不相同,则将该节点确定为待修复节点。
  16. 根据权利要求15所述的装置,其中,所述检测报文中包括标识信息;所述第一检测模块设置为:
    根据所述标识信息,获取所述网络连接路径中的每个节点接收到的检测报文的数量。
  17. 一种电子设备,其中,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;
    存储器,设置为存放计算机程序;
    处理器,设置为执行存储器上所存放的程序时,实现权利要求1-8任一所述的方法步骤。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-8任一所述的 方法步骤。
  19. 一种可执行程序代码,其中,所述可执行程序代码设置为被运行以执行权利要求1-8任一所述的方法步骤。
PCT/CN2020/097989 2019-06-28 2020-06-24 一种网络连接故障处理方法及装置 WO2020259551A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910578595.1 2019-06-28
CN201910578595.1A CN110300026A (zh) 2019-06-28 2019-06-28 一种网络连接故障处理方法及装置

Publications (1)

Publication Number Publication Date
WO2020259551A1 true WO2020259551A1 (zh) 2020-12-30

Family

ID=68029463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097989 WO2020259551A1 (zh) 2019-06-28 2020-06-24 一种网络连接故障处理方法及装置

Country Status (2)

Country Link
CN (1) CN110300026A (zh)
WO (1) WO2020259551A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110300026A (zh) * 2019-06-28 2019-10-01 北京金山云网络技术有限公司 一种网络连接故障处理方法及装置
CN112019499A (zh) * 2020-07-15 2020-12-01 上海趣蕴网络科技有限公司 一种握手过程中对连接请求的优化方法和系统
CN113726553A (zh) * 2021-07-29 2021-11-30 浪潮电子信息产业股份有限公司 一种节点故障恢复方法、装置、电子设备及可读存储介质
CN114116128B (zh) * 2021-11-23 2023-08-08 抖音视界有限公司 容器实例的故障诊断方法、装置、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1674485A (zh) * 2004-03-25 2005-09-28 国际商业机器公司 动态提供计算机系统资源的方法和系统
CN101808021A (zh) * 2010-04-16 2010-08-18 华为技术有限公司 故障检测方法、装置及系统以及报文统计方法、节点设备
CN104754003A (zh) * 2013-12-30 2015-07-01 腾讯科技(深圳)有限公司 传输数据的方法及系统
CN107342885A (zh) * 2016-05-03 2017-11-10 中兴通讯股份有限公司 终端最大传输单元的调整方法、装置和终端设备
CN109245955A (zh) * 2017-07-10 2019-01-18 阿里巴巴集团控股有限公司 一种数据处理方法、装置及服务器
CN110300026A (zh) * 2019-06-28 2019-10-01 北京金山云网络技术有限公司 一种网络连接故障处理方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100397829C (zh) * 2002-12-27 2008-06-25 华为技术有限公司 一种频发性离散事件性故障的告警方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1674485A (zh) * 2004-03-25 2005-09-28 国际商业机器公司 动态提供计算机系统资源的方法和系统
CN101808021A (zh) * 2010-04-16 2010-08-18 华为技术有限公司 故障检测方法、装置及系统以及报文统计方法、节点设备
CN104754003A (zh) * 2013-12-30 2015-07-01 腾讯科技(深圳)有限公司 传输数据的方法及系统
CN107342885A (zh) * 2016-05-03 2017-11-10 中兴通讯股份有限公司 终端最大传输单元的调整方法、装置和终端设备
CN109245955A (zh) * 2017-07-10 2019-01-18 阿里巴巴集团控股有限公司 一种数据处理方法、装置及服务器
CN110300026A (zh) * 2019-06-28 2019-10-01 北京金山云网络技术有限公司 一种网络连接故障处理方法及装置

Also Published As

Publication number Publication date
CN110300026A (zh) 2019-10-01

Similar Documents

Publication Publication Date Title
WO2020259551A1 (zh) 一种网络连接故障处理方法及装置
WO2019100921A1 (zh) 消息推送方法及装置
WO2022016847A1 (zh) 一种适用于云平台的自动化测试方法及装置
CN109586959B (zh) 一种故障检测的方法及装置
CN112929241B (zh) 一种网络测试方法及装置
WO2021164261A1 (zh) 云网络设备的测试方法、存储介质和计算机设备
CN113472607B (zh) 应用程序网络环境检测方法、装置、设备及存储介质
CN111953770B (zh) 一种路由转发方法、装置、路由设备及可读存储介质
CN113141405B (zh) 服务访问方法、中间件系统、电子设备和存储介质
CN111711533B (zh) 故障诊断方法、装置、电子设备及存储介质
CN110851290A (zh) 一种数据同步方法、装置、电子设备及存储介质
WO2020125074A1 (zh) 消息到达率确定方法、装置、数据统计服务器及存储介质
CN109246189B (zh) 网络数据分发方法及装置、存储介质、服务端
CN111246406A (zh) 一种短信发送方法、系统、存储介质及终端设备
CN111147310A (zh) 一种日志跟踪处理的方法、装置、服务器及介质
US20160337216A1 (en) System and method for testing a coap server
CN106571975B (zh) 一种通信数据的容错方法及装置
CN114153668A (zh) 自动化测试方法、装置、电子设备及存储介质
CN116170235B (zh) 一种数据库优化访问方法、系统、设备及介质
CN112822248A (zh) 一种ota升级方法、装置、可读介质及电子设备
CN111866921A (zh) 一种5g基站业务故障查找方法、装置、设备及可存储介质
CN111130941B (zh) 一种网络错误检测方法、装置以及计算机可读存储介质
CN110611678B (zh) 一种识别报文的方法及接入网设备
CN109743232B (zh) 一种接口探测方法及装置
CN111858379A (zh) 应用的测试方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20832846

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20832846

Country of ref document: EP

Kind code of ref document: A1