CN116781564B

CN116781564B - Network detection method, system, medium and electronic equipment of container cloud platform

Info

Publication number: CN116781564B
Application number: CN202310932548.9A
Authority: CN
Inventors: 蓝维洲; 李冀; 程锐; 颜开; 郭峰; 潘远航; 张红兵
Original assignee: Shanghai Daoke Network Technology Co ltd
Current assignee: Shanghai Daoke Network Technology Co ltd
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2024-02-13
Anticipated expiration: 2043-07-26
Also published as: CN116781564A

Abstract

The application relates to the technical field of cloud primordia, and provides a network detection method and system of a container cloud platform. The container cloud platform is provided with a network detection controller, and at least one node of the container cloud platform is correspondingly provided with a network detection agent component. The first network detection agent component sends a detection request to a specified target component based on a network detection strategy recorded by a user-defined network detection resource file; the network detection strategy comprises request parameters of a detection request, wherein the request parameters are used for specifying a sending mode of the detection request; the network detection controller generates a network detection result based on the response corresponding to the detection request. Therefore, request parameters of the network detection task are defined through the self-defined network detection resource file, the network detection task is executed by the network detection controller and the network detection agent component on at least one node, and the network detection result is automatically generated based on the response, so that active, automatic and distributed network detection of the container cloud platform is realized.

Description

Network detection method, system, medium and electronic equipment of container cloud platform

Technical Field

The application relates to the technical field of cloud protogenesis, in particular to a network detection method and system of a container cloud platform, a computer readable storage medium and electronic equipment.

Background

With the continued development of container technology, enterprises have become a popular choice for deploying and managing application instances of containerized deployments in a production environment using container cloud platforms. Inside the container cloud platform, the network is one of the core factors of the normal operation of the whole container cloud platform, and a healthy network can ensure smooth communication among container groups for deploying application instances and ensure that services can respond to access requests efficiently. If the network fails, the operating efficiency of the entire container cloud platform will be severely impacted and cause application instances to be unavailable or response time to be slow. Thus, ensuring that network health is good is not only a very important task for enterprise operators, but is also a critical factor in ensuring that application instances operate reliably in a production environment.

To ensure healthy operation of the network, network operating conditions need to be detected. The existing network detection modes mainly comprise two types, namely, periodic passive network detection is realized through some auxiliary diagnostic tools, and the method mainly relies on collecting index information of a container cloud platform or an application to determine the states of the whole container cloud platform and each application, so that targeted active detection on a network cannot be performed based on different network detection requirements. And the other is that an operation and maintenance person actively deploys an application instance in the container cloud platform according to network detection requirements, then an access request is initiated to the application instance inside or outside the container cloud platform, and the connectivity of the network is checked according to the response result of the access request. However, as the scale of the container cloud platform increases, the requirements of different network detection increase, the network detection flow is increasingly complex, the existing method for actively detecting by operation and maintenance personnel manually has low efficiency, and the network detection result is difficult to quickly and automatically obtain. In addition, the network topology structure of the container cloud platform is complex, the existing method is difficult to locate faults in a short time, network connectivity and network performance of the container cloud platform in the operation process cannot be guaranteed, and the reliability of the container cloud platform is affected.

Therefore, a targeted, periodic, automated, and fully interconnected network detection scheme suitable for use in cloud primary scenarios is needed to realize the omnidirectional coverage detection in a container cloud platform.

Disclosure of Invention

An object of the present application is to provide a network detection method, system, computer readable storage medium and electronic device for a container cloud platform, so as to solve or alleviate the above-mentioned problems in the prior art.

In order to achieve the above object, the present application provides the following technical solutions:

the application provides a network detection method of a container cloud platform, wherein a network detection controller is deployed on the container cloud platform, and a network detection agent component is correspondingly deployed in at least one node of the container cloud platform, and the method comprises the following steps:

the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by the user-defined network detection resource file;

the target component is any component in the container cloud platform, the target component is determined based on the content recorded by the user-defined network detection resource file, the network detection strategy at least comprises request parameters of the detection request, and the request parameters are used for specifying a sending mode of the detection request;

The network detection controller generates a network detection result based on the response corresponding to the detection request.

In the above technical solution, the network detection policy further includes a start time and a number of execution rounds;

the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by a user-defined network detection resource file, and specifically comprises the following steps:

the first network detection agent component repeatedly sends the detection request to the target component according to the execution round number at the starting time.

In the above technical solution, the network detection policy further includes a decision criterion;

the network detection controller generates a network detection result based on the response corresponding to the detection request, and then further comprises:

analyzing the network detection result to obtain a network quality index value;

and comparing the network quality index value with the judging standard to determine whether the container cloud platform is in a health state.

In the above technical solution, the first network detection agent component is disposed in a first node of the container cloud platform, the target component is a second network detection agent component, and the second network detection agent component is disposed in a second node of the container cloud platform; the request parameters comprise at least one communication mode, wherein the communication mode is used for defining a network detection path through which the detection request passes, and the detection request is an HTTP access request;

the first network detection agent component sends the HTTP access request to a preset API interface of at least one second network detection agent component through at least one network detection path.

In the above technical solution, the communication manner includes any one of container group IP address communication, node port communication, platform load balancing communication, and platform entry component communication.

In the above technical solution, the target component is a domain name resolution component;

the first network detection agent component sends a domain name resolution request to the domain name resolution component based on a network detection strategy recorded by a user-defined network detection resource file; the domain name in the domain name resolution request is the domain name of the first network detection agent component.

The embodiment of the application provides a network detection system of container cloud platform, has deployed network detection controller on the container cloud platform, corresponding deployment has network detection agent subassembly in at least one node of container cloud platform, the system includes:

The sending unit is configured to send a detection request to the target component by the first network detection agent component based on the network detection strategy recorded by the self-defined network detection resource file;

and the response unit is configured to generate a network detection result based on the response corresponding to the detection request by the network detection controller.

The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the network detection method of the container cloud platform according to any embodiment.

The embodiment of the application also provides electronic equipment, which comprises: the network detection method of the container cloud platform according to any embodiment of the present invention includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program.

The beneficial effects are that:

according to the technical scheme provided by the embodiment of the application, network detection tasks are cooperatively completed by the network detection controllers deployed in the container cloud platform and the network detection agent components correspondingly deployed in each node, target components needing to be detected are determined specifically through content recorded in the user-defined network detection resource file, a sending mode of a detection request is defined through a network detection strategy, the network detection agent components specifically execute the network detection tasks based on the network detection strategy, and then a network detection result is generated by the network detection controllers based on response of the detection request. Therefore, the target, timeliness and detection range of the network detection task are defined by using the self-defined network detection resource file, an operation and maintenance personnel can actively initiate the network detection task only by creating the self-defined network detection resource file or updating the content of the self-defined network detection resource file according to the network detection requirement of the network detection, the network detection proxy component and the network detection controller automatically execute the self-defined network detection resource file according to the network detection strategy recorded by the self-defined network detection resource file, and the network detection result is automatically generated after the detection is completed, so that the active and automatic network detection of the container cloud platform is realized, network faults can be timely and accurately found and positioned, and the reliability of the network of the container cloud platform is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. Wherein:

fig. 1 is a logic schematic diagram of a network detection method of a container cloud platform according to an embodiment of the present application;

fig. 2 is a flow chart of a network detection method of a container cloud platform according to an embodiment of the present application;

fig. 3 is a logic schematic diagram of a network detection method using a container group IP address communication manner according to an embodiment of the present application;

fig. 4 is a logic schematic diagram of a network detection method using a node port communication manner according to an embodiment of the present application;

fig. 5 is a logic schematic diagram of a network detection method using a node port communication method according to another embodiment of the present application;

FIG. 6 is a logic diagram of a network detection method using platform load balancing communication or platform portal component communication according to an embodiment of the present application;

FIG. 7 is a logic diagram of network detection of domain name resolution components according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a network detection system of a container cloud platform according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 10 is a hardware configuration diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the disclosure, the container cloud platform is a cloud computing platform based on container technology, which is managed by a management system of the container cloud platform, allows users to easily create, deploy, manage and expand containerized applications, and provides a way to simplify and automate container lifecycle management, so that developers and operation and maintenance teams can be more focused on the development and deployment of applications without paying attention to underlying infrastructure details.

Specifically, container cloud platforms typically contain the following major components and functions:

(1) Container run (ContainerRuntime, CR): a common container runtime may be, for example, docker or controller, for creating and managing containers on individual nodes.

(2) Scheduler (also called scheduler): container cloud platforms typically contain an orchestrator for automatically dispatching containers to different nodes and ensuring high availability and resource utilization of containers.

(3) And (3) storage management: the container cloud platform supports persistent storage requirements for containerized applications and provides reliable storage solutions, such as persistent volumes (persistence volumes).

(4) Automatic expansion and contraction: the container cloud platform can automatically adjust the number of copies of the container according to the load of the application program so as to realize horizontal expansion or reduction.

In addition, the container cloud platform can further comprise a monitoring component for collecting operation indexes to monitor the operation state of the platform, a permission and safety component for ensuring communication safety and the like.

An example of a particular implementation of a container cloud platform is the Kubernetes system. The Kubernetes system is one of the most popular container orchestrators at present, providing comprehensive container management functionality, supporting highly automated container lifecycle management, and being deployed in a variety of cloud service providers or proprietary data centers. The technical scheme is described by taking a Kubernetes system as an example in the embodiment.

Before further describing embodiments of the present application in detail, the terms and expressions that are referred to in the embodiments of the present application are described, and are suitable for the following explanation.

(1) A Kubernetes cluster is a cluster of nodes, including a plurality of nodes, deployed with a Kubernetes system.

(2) CRD (Custom Resource Definition) is an extension of the Kubernetes API, allowing developers to create Custom Resource types in the Kubernetes cluster and create corresponding Custom Resources (CRs) based on the Custom Resource types. These custom resource types can be used with existing Kubernetes resource types (e.g., deployments, pods and Services) and can be accessed and managed through a kuberneteseapi.

(3) HTTP (Hypertext transfer protocol) Chinese names are hypertext transfer protocols, which specify the transmission of hypertext (text, pictures, video, etc.) by data interaction between a browser and a server, and HTTP protocols specify the rules to be followed by hypertext transfer.

(4) CNI (ContainerNetwork Interface), an open standard for managing networks in a containerized environment. It provides a simple set of command line interfaces that can be used to establish connections between containers and networks, assign IP addresses, and configure network parameters; the CNI plug-in is software implementing the CNI interface for actually performing the tasks of network configuration and connection container. Kubernetes may use various CNI plug-ins to support different network models, such as bridges, overlay networks, VLAN networks, and the like.

(5) Kube-proxy, which is a component in Kubernetes clusters, operates on each node, is responsible for forwarding network traffic between nodes in the cluster, and is responsible for realizing the service discovery and load balancing functions of Kubernetes Service; while it provides a reverse proxy function that can forward external traffic to services within the cluster. For example, if a domain name is created outside the cluster that points to a service within the cluster, then Kube-proxy may be used to forward traffic to that service.

(6) A Controller (Controller), typically composed of a set of programs written in the Go language, runs on a Kubernetes Master node. The Controller monitors the API-Server component, receives events and data from the API-Server component, and controls and manages the running of the application instances of the containerized deployment according to predefined policies and rules.

(7) Pod, also called a container group, is the smallest unit in a Kubernetes cluster that can be scheduled and is the basic unit for deploying application instances in Kubernetes. A Pod may contain one or more containers and may be assigned to run on any node.

(8) Pod IP is a private IP address assigned to each Pod by the Kubernetes system, which is visible only inside the cluster, and is not available for communication with public or other external networks for application instances deployed in pods to communicate with the network inside the cluster.

(9) Cluster IP, also known as virtual IP/Cluster IP, is a virtual IP address assigned by the Kubernete System for each service, which is visible only inside the Cluster for communication between services.

(10) NodePort, also known as a node port, is a special Service Type (Service Type) and, for services supporting the NodePort communication method, the Cluster IP address and port number of the Service can be mapped to one port number (i.e., nodePort) on each node in the Cluster, so as to realize access to the Service by using the node IP address and the NodePort port number. For the service which does not support the NodePort communication mode, the IP address and the port number of a certain Pod corresponding to the service are mapped to one port number on each node in the cluster, so that the Pod is directly accessed by using the node IP address and the NodePort port number.

(11) The LoadBalancer is an external load balancer allocated to each service by the Kubernetes cluster, and is configured to receive an external access request, and distribute the external access request to an application instance corresponding to the service in the cluster according to a preset load balancing policy.

(12) The Ingress component is a special load balancer for routing external access requests to application instances corresponding to services within the cluster.

(13) IPv4 and IPv6 are two different versions of the internet protocol that are used to transfer data between computers. IPv4 is an early-used version of the internet protocol that uses a 32-bit IP address to uniquely identify a network device. IPv6 is a later-introduced version of the internet protocol that uses a 128-bit IP address to uniquely identify a network device.

(14) Full Mesh, full interconnection network connection, i.e. all network communication nodes are directly connected.

As described in the background, in order to implement network detection, in the related art, an application instance is typically deployed in a cluster by an operation and maintenance personnel, and then access requests are initiated from different locations to perform network connectivity test with the application instance as a target, or periodic passive and periodic network detection is implemented with the aid of a diagnostic tool, however, as the size of the cluster is continuously enlarged, the following problems exist in using the above scheme to verify the connectivity of the network:

1. failure to determine communication mode leading to network communication failure

In order to meet communication requirements in different scenes in a cluster, kubernetes provides five communication modes of container group IP address communication (Pod IP), virtual IP address (master IP), node Port communication (Node Port), platform load balancing communication (loadbar) and platform entry component communication (Ingress), which may all have problems to cause network communication failure, and for Kubernetes clusters using multiple communication modes, even if network communication failure is found, the communication mode with problems is difficult to determine.

Pod IP: pod is the smallest unit in the Kubernetes cluster that can be scheduled, pod IP is the IP address assigned to the Pod by the Kubernetes system when the Pod is created, and application instances deployed in different Pods can access each other through the Pod IP. Because Pod IP addresses are dynamically allocated, when an application instance deployed in a Pod uses Pod IP for communication, once the Pod restarts or fails, the Pod-bound IP address may change, resulting in network communication failure.

Cluster IP: in order to facilitate the mutual access between application instances in the Kubernetes Cluster, the Kubernetes system provides a Service discovery mechanism, a virtual IP address (i.e., a Cluster IP) is allocated to each Service (Service), and the Pod where all application instances corresponding to the same Service are located is mapped to the Cluster IP, so that load balancing can be realized at the Service level. An application instance in the Kubernetes Cluster may issue an access request directed to the Cluster IP of a Service to implement access to a certain application instance corresponding to the Service. However, if the Service does not operate normally or the listening port of the Service is set to be wrong, the application instance cannot access a certain application instance corresponding to the Service through the Cluster IP of the Service. In addition, if the network policy of the Service corresponding to the application instance that issues the access request is configured to be limited, the application instance cannot access other services using the Cluster IP.

NodePort: in order to facilitate an external requester to access a Service deployed in a Kubernetes cluster, a static port (NodePort) of a Node is used to expose a certain Service outside the Kubernetes cluster, and an access request which is sent by the external requester and points to NodeIP (NodePort) is routed to a preset port of a corresponding Service or is directly routed to a preset port of a Pod of a certain application instance corresponding to the Service, so that the access to the certain application instance corresponding to the Service is realized. However, if the designated node port for Service is already occupied by other processes, or the network security group is configured incorrectly, the external access request cannot reach the designated node port, which will cause the Service to fail to respond normally to the external access request.

LoadBalancer: in order to facilitate the external requester to access the services deployed in the Kubernetes cluster, a LoadBalancer (LoadBalancer) is set to forward the external access request to the corresponding Service, and specifically, a LoadBalancer and an external access IP address need to be set separately for each Service. An access request which is sent by an external requesting party and points to an external access IP address of the Service is routed to a Cluster IP of the corresponding Service, so that access to a certain application instance corresponding to the Service is realized. If the node where the loadbalancing is located fails or the load balancing policy is incorrectly configured, service cannot respond normally to the external access request.

Ingress: in view of the above-mentioned communication manner of loadbalancers, a LoadBalancer and an external access IP address need to be set for each Service separately, which is too costly, an external access request may be forwarded to a corresponding Service by using an Ingress, and specifically only one Ingress and an external access IP address need to be set for a Kubernetes Cluster, where the Ingress may route the external access request to a Cluster IP of the corresponding Service according to a URI access path in the external access request, so as to implement access to a certain application instance corresponding to the Service. In addition, ingress needs to be managed by the corresponding Ingress Controller to function properly. If the Ingress configuration is incorrect or the Ingress Controller process crashes or accidentally exits, the Service will not be able to respond normally to the external access request.

2. Failure to determine the cause of network communication failure

Based on the foregoing description, the Kubernetes system provides multiple communication methods, each communication method needs to keep the corresponding network component in a normal state during normal operation, once the network component fails, the communication method will cause a problem, and any one communication method may cause network communication failure of the whole Kubernetes cluster. In particular, faults occurring in network components can be categorized by external manifestations into functional faults and sporadic barriers.

Where a functional impairment refers to an abnormal or inoperable network component itself, which is embodied by an inability to provide network forwarding, resulting in an application instance that is always unable to communicate, such faults are often relatively easy to discover. Such as: the CNI plug-in configuration error for managing the container network causes that the application instance cannot access other application instances or external networks correctly, or the Kube-proxy component configuration error for being responsible for forwarding network traffic between the Service and the Pod causes that an access request directed to Service cannot be routed to a corresponding application instance, or Service, ingress configuration error causes that the application instance cannot access a certain application instance corresponding to Service correctly. The sporadic obstacle refers to that the network component can work normally, but in some special situations, such as a control plane component is changed greatly, the cluster scale is too large, the communication flow is too high, and the sporadic network forwarding capacity fault can occur, so that the sporadic network communication failure is caused, and the occurrence frequency of the fault is low and is difficult to detect.

3. Network topology complexity increases resulting in difficult fault localization

With the increasing size of clusters, once network communication failure occurs, it is difficult for operation and maintenance personnel to locate faults in a short time, and network detection of Full Mesh is required to determine all faults.

Based on the foregoing, it can be seen that the network detection scheme in the cloud native scenario needs to meet the following requirements:

(1) Network detection of various communication modes such as IPv4, IPv6 network protocols, pod IP, cluster IP, node Port, loadBalancer, ingress and the like are supported simultaneously.

(2) The Pod IP of the detected target application instance may change at any time, and the network detection component needs to be able to acquire the Pod IP of the target application instance in real time.

(3) Certain problems with the network component itself may cause imperceptible network jitter and sporadic network communication failures, requiring timely discovery of such problems with the network component itself.

(4) In order to be able to perform network detection of the Kubernetes cluster with full coverage, it is necessary to issue access requests from all nodes in the cluster to the target application instance, ensuring full-scale distributed detection within the entire cluster.

Therefore, the embodiment of the application provides a network detection method, a system, a computer-readable storage medium and electronic equipment of a container cloud platform suitable for network detection under a cloud primary scene, and the scheme utilizes a user-defined resource definition mechanism of a Kubernetes system, can automatically initiate active, omnibearing coverage and full-quantity distributed network detection aiming at different network detection requirements, quickly determines the health state of the whole network, determines the reason for causing communication failure and positions faults, and timely grasps the network connectivity and network performance of the container cloud platform in the running process, thereby improving the reliability of the container cloud platform.

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. Various examples are provided by way of explanation of the present application and not limitation of the present application. Indeed, it will be apparent to those skilled in the art that modifications and variations can be made in the present application without departing from the scope or spirit of the application. For example, features illustrated or described as part of one embodiment can be used on another embodiment to yield still a further embodiment. Accordingly, it is intended that the present application include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

In the following description, the terms "first/second/third" are used merely to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence where allowed, to enable embodiments of the present application described herein to be implemented in other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the present disclosure only and is not intended to be limiting of the present disclosure.

Exemplary method

The embodiment of the application provides a network detection method for a container cloud platform, as shown in fig. 1 to 7, a network detection controller is deployed on the container cloud platform, and a network detection Agent component (Agent for short) is correspondingly deployed in at least one node of the container cloud platform, where the method includes:

in step S101, the first network detection agent component sends a detection request to the target component based on the network detection policy recorded in the custom network detection resource file.

In this embodiment, in order to detect the network health status of the cluster, a network detection Controller is disposed in a container cloud platform (such as a Kubernetes cluster), where the network detection Controller is a custom Controller (Controller) for expanding the functions of the Kubernetes, and can implement logic of network detection according to the requirement of network detection.

Specifically, referring to fig. 1, according to the roles of the various nodes, the nodes in the Kubernetes cluster may be divided into a control node and an operating node, where the network detection controller is typically disposed, and it should be understood that the network detection controller may also be disposed on the operating node, and the deployment location of the network detection controller is not limited in this embodiment.

By utilizing a custom resource definition (Custom Resource Definition, abbreviated as CRD) mechanism of the Kubernetes system, the present embodiment introduces a new custom resource type, that is, a custom network detection resource, in the Kubernetes cluster, where the custom network detection resource is defined by a corresponding custom network detection resource file, and it should be understood that, according to different network detection tasks, the custom network detection resource may have different resource types, for example, the custom network detection resource may be an AppHttp resource for detecting network connectivity of some service applications in a specific range, or a NetHttp resource for detecting network connectivity of the whole cluster base, or a NetDNS resource for detecting connectivity of some network component (such as a DNS component), etc. The management of the custom network detection resources is realized by creating or updating the custom network detection resource file and then creating a corresponding resource object (CustomResource, CR) by the network detection controller.

The custom network detection resource file may be YAML (YAMLAin't Markup Language) format or JSON format, and both the files are human-readable data serialization format files, on which network detection policies are recorded. The network detection policy is defined for specific network detection task requirements, and may include requirements of purpose, timeliness, detection range, and the like of the network detection task.

The network detection controller can establish a custom network detection resource object with corresponding attribute and specification according to the content of the custom network detection resource file, write corresponding control logic in the establishment, update and deletion events of the custom network detection resource object according to the requirement of a network detection task, and can monitor the state of the custom network detection resource object in the cluster operation process so as to control the association and the dependency relationship among the network detection controller, the network detection agent component and the custom network detection resource object. It should be understood that these associations and dependencies may be interface call relationships between each other or may be parameter transfer relationships, which is not limited in this embodiment.

In this embodiment, the network detection proxy component is an application provided for implementing a network detection method, and is disposed in at least one node of the container cloud platform, and configured to specifically execute a network detection task according to a network detection policy recorded in a user-defined network detection resource file.

It can be understood that the first network detection agent component may be a network detection agent component deployed on any node in the container cloud platform, or may be a set of network detection agent components deployed on a plurality of nodes, or may be a set of network detection agent components deployed on all nodes. Thus, when the first network detection agent component is deployed on any node in the container cloud platform, the first network detection agent component can be used to send a detection request to the target component so as to detect the network connectivity of the node where the first network detection agent component and the target component are located and the first network detection agent component at the application level. When the first network detection agent component is a collection of network detection agent components deployed on any of a plurality of nodes, the first network detection agent component may be used to detect network connectivity between any of the plurality of nodes and the node where the target component is located, and network connectivity at the application level for both the target component and the first network detection agent component.

The target component is any component in the container cloud platform, and the target component can be determined based on content recorded by the user-defined network detection resource file. It will be appreciated that the number of target components may be one or more, and that the types of these components, as well as the functions implemented, may be the same or different.

Because the target component is determined based on the content recorded in the self-defined network detection resource file, an operation and maintenance person can specify different range and different types of target components in the self-defined network detection resource file according to the requirements of different network detection tasks, and then the network communication condition of the target component is comprehensively detected.

In this embodiment, the detection request may be an HTTP access request, a TCP request, or a UDP request, and the network protocol used for the detection request is not limited in this embodiment.

Illustratively, network connectivity detection and application connectivity detection can be classified according to the purpose of network detection. In the network detection task of determining network connectivity, a control message packet based on ICMP protocol may be sent to the target component to determine the network connectivity status, whether the node is reachable, whether the route is available, etc. of the network itself, or other access requests may be sent to determine the network connectivity of the application layer. In application connectivity detection, HTTP access requests with different parameters are generally used to determine whether the network between service applications is smooth and the performance meets the requirements in different request modes and under different request protocols.

In this embodiment, the network detection policies are recorded in a custom network detection resource file, the content of which is defined for the network detection task requirements, and each custom network detection resource is recorded with a corresponding network detection policy, where the network detection policy corresponds to one network detection task. That is, the network detection policy corresponding to each network detection task is determined according to the corresponding customized network detection resource, and is essentially defined by the content of the corresponding customized network detection resource file.

In order to realize different network detection task demands, the network detection policy at least comprises request parameters of the detection request, wherein the request parameters are used for specifying a sending mode of the detection request.

Specifically, network detection targets achieved by different sending modes are different, for example, the sending mode is set to be a Pod IP detection mode, then the first network detection proxy component directly sends a detection request to the target component according to the Pod IP to detect connectivity of Pod IP between the two, for example, the sending mode is set to be a multi-network card detection mode, then the first network detection proxy component can send the detection request to different network cards of the target component by using different network cards on a node where the first network detection proxy component is located to determine network connectivity of a plurality of network cards between the first network detection proxy component and the target component. Here, the network detection agent component can determine different transmission modes according to different request parameters, so as to execute different network detection tasks.

In addition, the request parameters may be different in different application scenarios as a component of the network detection policy, for example, in order to determine different sending modes of the detection request, the request parameters may include a sending path of the detection request, a request body (http-body), a Namespace (Namespace) to which the target component belongs, security authentication information (tlsSecretName) to which access is requested, and so on. The request parameters are recorded in a user-defined network detection resource file in a Key-Value pair mode, wherein the values of the request parameters can be character strings, numerical values and Boolean types, and can also be files stored under a specified path of a container group, for example, configuration mapping files (ConfigMap files).

It should be noted that, the configuration mapping file is an object for storing configuration data, and is also a mechanism for transferring configuration parameters to an application in the container set, where the content of the configuration mapping file may be a key value pair, or may be nested to include other files, and after the configuration mapping file is mounted to a specific path in the container, the container may read the configuration data in the mounting path and use the configuration data in the mounting path for the application program, so that the application program is used in the container, and the purpose of decoupling the application program configuration information from the container image is achieved, so that a user may dynamically modify the configuration of the application program without reconstructing the image. In this embodiment, the configuration mapping file may be used to define a complex network detection task, and transfer complex configuration information into a container set where the network detection proxy component is located, so that the network detection proxy component may perform network detection targeting the BUG replication by defining the content of the HTTP request body, i.e., by defining the HTTP request body in the configuration mapping file, the network detection proxy component may inject a detection request to the target component that may replicate its BUG, observe the processing result of the target component on these detection requests, and locate the location where the BUG occurs and the cause of the occurrence. In another scenario, an E2E (End to End) test procedure may be defined in the configuration mapping file, and an E2E test may be implemented by using the network detection agent component, or parameters of a random test may be further defined, so as to implement the effect of the chaotic test.

Taking an AppHttp resource as an example, the content of the custom network detection resource file may be as follows:

the meaning of each field is as follows:

apiVersion: specify a version of the Kubernetes API;

kined: the resource type, here AppHttp, means creating a custom network detection resource of AppHttp;

metadata: metadata of the user-defined network detection resource is defined, wherein the metadata comprises information such as names (names);

spec, namely, appointing the specification and the attribute of the self-defined network detection resource, namely, configuring a network detection strategy;

spec.request: performance requirement indexes of network detection tasks are specified;

spec.request.duration inconsecond: defining the longest duration (seconds) of a round of network detection tasks;

spec.request.perrequesttimeutimms: defining a request timeout threshold, the longest duration of a single HTTP access request;

spec.request.qps: defining a minimum number of access requests per second of response;

spec. Schedule: defining the execution mode of the network detection task;

spec, schedule, rounddown, umbember: defining the number of rounds needed to be executed by the network detection task;

spec.schedule.roundtimeoutMinute: defining a maximum duration of a single network detection request;

spec. Defining a desired network detection result;

spec. A reasonable threshold for average response time;

spec.expect.successrate: a communication success rate threshold;

spec. Status code of successful communication;

spec.target: requesting parameters;

spec.target.body ConfigmapName: the Configmap file name of the request body;

spec.target.body ConfigmapNamespace: the namespace in which the Configmap file of the body is requested.

In this embodiment, because the network detection policy is recorded in the custom network detection resource file, and the value of the request parameter can be flexibly customized, the operation and maintenance personnel can start or update a network detection task only by creating or modifying the content of the custom network detection resource file, and then the network detection agent component automatically executes the complex network detection tasks according to the network detection policy, so as to realize active and automatic network detection.

It will be appreciated that certain problems with the target component or with the individual network components themselves that implement network communications may cause imperceptible network jitter and sporadic network communications failures, requiring timely discovery of such problems with the individual components themselves. To this end, in some embodiments, the network detection policy further includes a start time and a number of execution rounds; the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by a user-defined network detection resource file, and specifically comprises the following steps: the first network detection agent component repeatedly sends detection requests to the target component by the number of execution rounds at start-up time.

The starting time refers to the starting time of the network detection task, namely when the network detection task is started once, and the number of execution rounds is the number of times that the network detection task needs to be repeatedly executed. Alternatively, an execution interval between each round of network detection tasks may be defined in the network detection policy, and further, a default execution interval may be set, such as executing one round at intervals of 2 minutes. Thus, the first network detection agent component may begin performing a first round of network detection at start-up time, then perform a second round of network detection at intervals of 2 minutes, and so on, until the number of execution rounds is reached.

Through the setting, on one hand, the network detection task can be automatically executed at the appointed time to generate a network detection result without the intervention of operation and maintenance personnel, on the other hand, in the enterprise production environment, the execution of the network detection inevitably has a certain influence on service access, and through the setting of the starting time, the execution time of the network detection task and the peak time period of the service application can be staggered, so that adverse effects on the service application are avoided. The starting time and the execution number are used for designating the time for starting the transmission of the detection request and the repeated transmission times, so that the detection result distortion caused by network jitter can be avoided, and the network detection result is more in line with the actual situation.

It should be understood that the time required for executing the network detection task in one round may be different in different network health states, for this purpose, a task timeout threshold may also be set in the network detection policy, that is, after the first network detection proxy component sends a detection request to the target component at the start time, the network detection result and the time required for executing the network detection task in this round are recorded, and when the time is greater than the set task timeout threshold, it is determined that the network detection task in this round fails, and the execution of the subsequent network detection task in this round is stopped, so as to save resources.

Because of the complexity of service applications in application communications, in production practice, omnibearing network detection is usually required for an application layer, for which, in some embodiments, a first network detection proxy component is deployed in a first node of a container cloud platform, a target component is a second network detection proxy component, and the second network detection proxy component is deployed in a second node of the container cloud platform; the request parameters include at least one communication means for defining a network detection path through which the detection request is routed, the detection request being an HTTP access request. The first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by a user-defined network detection resource file, and specifically comprises the following steps: the first network detection agent component sends an HTTP access request to a preset API interface of the at least one second network detection agent component over the at least one network detection path.

In this embodiment, the detection request is an HTTP access request, that is, the network detection uses the HTTP protocol for communication. The HTTP protocol is a protocol for transmitting hypertext and other resources between a client and a server, with HTTP access requests running at the application layer level of network communication, being transmitted to the server through the underlying network by layer encapsulation and processing, and returning response data to the client. That is, the premise of using HTTP to communicate is that the underlying network can be normally connected, and the network detection proxy component between any two nodes performs network detection by using the request-response model to communicate based on the HTTP protocol, so that it can determine whether the underlying network is in a health state, and also can determine the network state of the application layer, thereby implementing comprehensive detection of the health state of the network.

Specifically, referring to fig. 1, the first network detection agent component is configured as a network detection agent component that initiates a detection request, and is disposed on a first node, where the first node may be a working node 1, a working node 2, a working node 3, or any multiple working nodes. The target component is a second network detection agent component that is deployed on a second node in the cluster as a component that receives the detection request.

It should be understood that when the target component is the second network detection agent component, each network detection agent component will report its Pod IP to the network detection controller, and then the network detection controller synchronizes the Pod IP of all network detection agent components to each network detection agent component, so that the first network detection agent component sends a detection request to the second network detection agent component. When the Pod IP of the network detection proxy component changes, the changed Pod IP is immediately reported to the network detection controller and is synchronized to other network detection proxy components, so that the first network detection proxy component can acquire the Pod IP of the second network detection proxy component in real time.

In some embodiments, the second node may be all other nodes except the first node, so that the first network detection proxy component on the first node sends a detection request to the second network detection proxy component on all other nodes, and all nodes deploying the network detection proxy components form an all-interconnect network connection, so as to implement all-around coverage detection of all nodes.

When the target component is the second network detection agent component, the first network detection agent component and the second network detection agent component are defined based on the difference between the sender and the receiver, it can be understood that the network detection agent component on the same node can be either the sender of the detection request or the receiver of the detection request, and when the network detection agent component is used as the party for sending the detection request, the network detection agent component is called as the first network detection agent component, and when the network detection agent component is used as the party for receiving the detection request, the network detection agent component is called as the second network detection agent component. The sending and receiving of the detection request are carried out between the network detection agent components, after the second network detection agent component receives the detection request through the preset API interface, the second network detection agent component responds according to the network detection purpose, and the response result is returned to the first network detection agent component.

Further, since the sending and receiving of the detection request are performed between network detection proxy components, a selector (selector) parameter may be set in the network detection policy to specify the range of the first node, thereby defining the first network detection proxy component, and at this time, network connectivity detection within the range of the specified node may be implemented.

As can be seen from the foregoing description, there may be various communication modes of the network in the cluster, and in this embodiment, the request parameter includes at least one communication mode, and the communication mode is used to define a network detection path through which the detection request passes. When network detection is executed, a first network detection agent component sends an HTTP access request to a preset API interface of a second network detection agent component through at least one network detection path according to a communication mode specified by request parameters, a network detection result is sent to a network detection controller, and the network detection controller judges whether a network is in a health state or not based on the summarized network detection result, so that detection of network performances of different network paths in a container cloud platform is realized.

In some embodiments, the communication manner includes any one of a container group IP address (Pod IP) communication, a node port (node port) communication, a platform load balancing (loadbearer) communication, and a platform entry (Ingress) component communication.

The following describes in detail the process of the first network detection agent component sending HTTP access requests using different communication means.

Fig. 3 shows an example of transmitting an HTTP access request using the communication scheme of Pod IP. As shown in fig. 3, the network detection agent component is deployed in a container group, and when the first network detection agent component uses Pod IP to send an HTTP access request to the second network detection agent component, the HTTP access request is directed to a container group port of the container group in which the second network detection agent component is located.

When the HTTP access request is sent by using the communication method of the node port, whether the proxy component supports the node port type service is detected according to the second network, and the network path through which the HTTP access request passes is different.

Fig. 4 shows an example of transmitting an HTTP access request using a node port (nodebort) communication scheme. As shown in fig. 4, when an HTTP access request is transmitted to a second network detection proxy component that does not support a node port type service using a node port, the HTTP access request is directed to a node IP (node IP) to which the service to which the second network detection proxy component belongs is bound: nodePort, and then directly route to the container group port of the container group where the second network detection agent component is located.

Fig. 5 illustrates another example of sending HTTP access requests using node port communication. As shown in fig. 5, when the first network detection agent component uses a nodebort to send an HTTP access request to the second network detection agent component supporting a nodebort type service, the HTTP access request points to a nodebip to which the service to which the second network detection agent component belongs is bound: the NodePort is then routed to a preset port of the service to which the second network detection proxy component belongs, and finally the service-based load balancing mechanism is sent to a container group port of the container group to which the second network detection proxy component belongs.

FIG. 6 illustrates one example of sending HTTP access requests using platform load balancing communications or platform portal component communications. As shown in fig. 6, when the first network detection agent component uses load balancing/Ingress to send an HTTP access request to the second network detection agent component, the HTTP access request points to an external access IP address and an external access port of a service to which the second network detection agent component belongs, and then is routed to a Cluster IP and a preset port of the service to which the second network detection agent component belongs, and finally is routed to a container group port of a container group to which the second network detection agent component belongs.

Therefore, different communication modes are different in network detection paths through which the HTTP access request passes, in this embodiment, because the request parameter includes at least one communication mode, the first network detection proxy component can send the HTTP access request to the second network detection proxy component based on the request parameter, so that the HTTP access request reaches a preset API interface of the second network detection proxy component according to the specified network detection path, and thus, the network path through which the HTTP access request passes is controlled by the request parameter, and network connectivity and network performance when the HTTP access request passes through the network paths are recorded, so that full-volume distributed, full-automatic and active network detection of any network path is realized.

In some embodiments, the network detection agent component is deployed on all nodes of the container cloud platform, or the network detection agent component is deployed on a designated node of the container cloud platform.

Specifically, in this embodiment, the network detection proxy component may be deployed on all nodes of the container cloud platform in a DeamonSet manner, or may be deployed on a designated node of the container cloud platform in a depoyment manner, where the two Deployment manners include different node ranges and different network detection ranges.

When the network detection agent is deployed in the form of the DeamonSet, the DeamonSet controller can deploy a network detection agent component in each node in the Kubernetes cluster, so that the network of all the nodes can be detected. When Deployment is performed in a depoyment mode, the network detection proxy component is only deployed in a designated node in the Kubernetes cluster, so that the network of the designated node can be detected.

That is, in the method provided in this embodiment, a custom network detection resource (such as NetHttp resource) is deployed in the Kubernetes cluster, and meanwhile, a corresponding network detection Controller (Controller) is deployed in the Kubernetes cluster in a form of deviyment, and a corresponding network detection proxy component (Agent) is deployed in a form of devimonset on all nodes (or in a form of deviyment on a designated node), so as to form a basic architecture of the whole network detection system, so as to implement different network detection tasks.

It should be noted that, to implement network detection in Full Mesh (Full Mesh) in the whole cluster, deployment is required in a form of demamonset, and in some scenarios, network detection is only required for some nodes that may have problems, and Deployment is only required in a form of Deployment, so that detection time and network resources can be saved.

Taking NetHttp resource as an example, the content of the custom network detection resource file in this embodiment may be as follows:

/>

the above is the configuration file content for creating NetHttp resources, and the meaning of each field is as follows:

spec. Schedule: defining the execution parameters of the network detection task, wherein:

roundnunber: defining the number of rounds needed to be executed by the network detection task;

intervalMinute: defining an execution interval between each round of network detection tasks;

startaftermine: defining the time length (minutes) of the delayed execution of the network detection task;

timeoutmin: the longest execution time of the network detection task is defined, and if the execution time of the network detection task is overtime, the network detection result is judged to be that the network has faults.

spec.request: performance requirement indexes of network detection tasks are specified, wherein:

the reduction in second: defining the longest duration (seconds) of a round of network detection tasks;

perrequesttimeutimms: defining a request timeout threshold, the longest duration of a single HTTP access request;

qps: a minimum number of access requests per second of response by the second network detection agent component is defined.

spec.target.targetagent: a network communication means is specified for probing use of the second network detection agent component, wherein:

testEndpoint: indicating whether HTTP access request needs to be sent to a second network detection agent component in the cluster in an Endpoint mode;

testIPv4: indicating whether the IPv4 address needs to be detected;

testIPv6: indicating whether or not the IPv6 address needs to be probed;

testClusterIp: indicating whether HTTP access request needs to be sent to a second network detection agent component which provides service in a Cluster IP mode;

testIngress: indicating whether HTTP access request needs to be sent to a second network detection agent component which provides service in the cluster in an Ingress mode;

testNodePort: indicating whether HTTP access request needs to be sent to a second network detection proxy component providing service in a NodePort mode in the cluster;

testLoadBlancer: indicating whether or not an HTTP access request needs to be sent to a second network detection proxy component in the cluster that serves in the testloadblocker manner.

spec. Success: a condition for determining the performance of the clustered network is defined, wherein:

meanAccess DelayInMs: defining a reasonable threshold value of average response time, and if the actual average response time of the HTTP access request exceeds the threshold value, judging that the network detection result is that the network has faults;

successRate: and defining a communication success rate threshold of the HTTP access request, and judging that the network has a fault as a network detection result if the communication success rate of the HTTP access request is smaller than the threshold.

spec.status: the execution state of the network detection task is recorded, wherein:

doneRound: indicating the number of rounds after the network detection task is executed;

expectedRound: representing the number of rounds the network detection task is expected to execute;

finish: indicating whether the network detection task is finished or not;

lastRoundStatus: representing the execution state of the network detection task of the previous round;

history: is an array for recording the historical execution results of network detection tasks.

When the NetHttp resource file is deployed to the Kubernetes cluster and the network detection Controller (Controller) monitors that the NetHttp resource file is created through the API-Server component, the execution parameters of the network detection task are acquired according to the content recorded in the spec.

And all network detection agent components in the Cluster monitor that the NetHttp resource file is created through the API-Server component, then the network detection task can be started to be executed at a designated time point (after 2 minutes), all network detection agent components can send HTTP access requests to other network detection agent components in the modes of Pod IP, cluster IP, node Port and Ingress, and the requested APIs are APIs which are provided by the network detection agent components and are specially used for network detection, so that the network omnibearing detection based on the HTTP protocol is realized.

It should be appreciated that the inclusion of various types of network components in a cluster, such as CNI components, DNS components, etc., that are themselves malfunctioning or otherwise inoperable, can result in network failure. Such as DNS component, application access DNS component for domain name resolution is critical to the proper operation of the application, but the following problems may exist in the process of application access DNS component: a DNS component fails or is configured incorrectly, resulting in the DNS component failing to respond to the domain name resolution request; the condition that the DNS component is unavailable or has higher response delay when responding to a large number of domain name resolution requests is caused by the insufficient performance or improper resource configuration of the DNS component; the CNI component fails to assign an IP address to the DNS component, resulting in the DNS component not functioning properly. Aiming at the function and performance detection of the DNS component, the related technology is realized by adopting the traditional DNS detection tools such as Nslookup, dig, DNSperf, but the detection tools are single-machine tests, cannot simulate the actual multi-user and high-concurrency scene, and cannot test the safety and reliability of the DNS component.

To this end, in the methods provided in some embodiments of the present application, the target component is a domain name resolution component (DNS component); the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by a user-defined network detection resource file, and specifically comprises the following steps: the first network detection agent component sends a domain name resolution request to the domain name resolution component based on a network detection strategy recorded by the user-defined network detection resource file; the domain name in the domain name resolution request is the domain name of the first network detection agent component.

It should be understood that the domain name resolution component may be a DNS component for the domain name resolution service of the whole platform provided for the container cloud platform, or may be a DNS component customized by a user and deployed on the container cloud platform only for a part of applications, which is not limited in this embodiment.

In one embodiment, the custom network detection resource may be a custom domain name resolution network detection resource, such as a NetDNS resource. The resource is used for defining a network detection strategy corresponding to the domain name resolution network detection task. Accordingly, the network detection controller may be specifically a domain name resolution network detection controller, which is configured to manage and control the status of the custom domain name resolution network detection resource.

To enable network detection of DNS components, in this example, the network detection policy may include: identification of DNS components; the access address and port of the DNS component, namely the destination address and port for sending the domain name resolution request; the IP address protocol (IPv 4 or IPv 6) and the communication protocol (TCP, UDP) employed for the domain name resolution request.

As one example, the contents of the NetDNS resource file can be as follows:

the meaning of each field is as follows:

apiVersion: API versions of NetDNS resource objects.

kined: types of NetDNS resource objects.

metadata: metadata object containing basic information of NetDNS resource object, such as name and tag, where metadata. Name represents the name of the NetDNS resource object.

spec: the execution requirement of the domain name resolution network detection task, namely the network detection strategy, is defined.

spec. Schedule: the execution mode of the network detection task is defined, wherein:

startaftermine: defining the time length (minutes) for the delay execution of the domain name resolution network detection task;

roundnunber: defining the number of rounds needed to be executed by the domain name resolution network detection task;

intervalMinute: defining the execution interval between each round of domain name resolution network detection tasks;

timeoutmin: the longest execution time of the domain name resolution network detection task is defined, and if the domain name resolution network detection task is overtime, the domain name resolution network detection result is judged to be that the network has faults.

spec.request: performance requirement indexes of domain name resolution network detection tasks are specified, wherein:

testIPv4: indicating whether the connectivity of the IPv4 network needs to be tested;

testIPv6: indicating whether the connectivity of the IPv6 network needs to be tested;

the reduction in second: defining the duration (seconds) of a round of domain name resolution network detection tasks;

qps: defining a minimum number of access requests per second of response by the DNS component;

perrequesttimeutimms: the maximum duration of a single domain name resolution request is defined.

spec. Protocol: the protocol used by the domain name resolution request is defined and may be TCP or UDP.

spec. Success: defining decision conditions for domain name resolution network connectivity and domain name resolution component performance, wherein:

successRate: defining a threshold value of the success rate of the domain name resolution request, and judging that the network is in a healthy state according to the detection result of the domain name resolution network if the success rate of the domain name resolution request is greater than the threshold value;

meanAccess DelayInMs: a reasonable threshold of average response time is defined, and if the actual average response time of the domain name resolution request exceeds the threshold, the domain name resolution network detection result is determined to be that the network performance is problematic.

In this embodiment, the first network detection proxy component sends a domain name resolution request to an access address (IPv 4 or IPv 6) and a port of the domain name resolution component by using a TCP or UDP protocol based on the network detection policy, and takes the domain name of the first network detection proxy component as the domain name requested to be resolved, so that whether the domain name resolution component is in a health state can be determined according to feedback of domain name resolution.

Fig. 7 illustrates one example of network detection of domain name resolution components. As shown in fig. 7, in performing network detection on a domain name resolution component, a cluster may include a custom domain name resolution network detection resource file (NetDNS resource file), a domain name resolution network detection controller deployed in a depoyment form, and a network detection proxy component deployed on all nodes in a deammonset form.

The network detection proxy component (first network detection proxy component) deployed on all nodes (or designated nodes) monitors the change event of the NetDNS resources in the Kubernetes cluster through the API-Server component, when the newly added NetDNS resources or the existing NetDNS resources in the Kubernetes cluster are monitored to be modified, that is, a new NetDNS resource file is created or the content of the existing NetDNS resource file is modified, the domain name resolution network detection task is executed according to the domain name resolution network detection requirement defined by the content of the NetDNS resource file, that is, a domain name resolution request is sent to the domain name resolution component according to at least one protocol of TCP or UDP, and the domain name used by the domain name resolution request is the domain name of the domain name resolution component, so as to detect whether the domain name resolution component network is in a health state.

Step S102, the network detection controller generates a network detection result based on the response corresponding to the detection request.

In practice, the network detection task at least includes a round of network detection, when each round of network detection is performed, the first network detection agent component actively sends a detection request to the target component according to a certain rate based on a network detection policy recorded by a user-defined network detection resource file, and determines whether the response of the current detection request meets preset performance and function requirements in real time according to a performance index of the detection request and a status code of successful communication defined in the network detection policy.

The sending rate of the detection request may be set according to needs, for example, in the early stage of network detection, since the network connection condition is not determined, the detection request may be generated at a lower rate, and after the network functional disorder is eliminated, the detection request may be sent at a higher rate, so as to determine the bearing capacity and performance of the network.

It should be understood that each round of network detection may include sending one or more detection requests, and after each round of network detection, each first network detection agent component records responses corresponding to all detection requests of the round, and the network detection controller generates network detection results according to the records of all network detection agent components.

In some embodiments, the network detection policy further includes a decision criterion; the network detection controller generates a network detection result based on the response corresponding to the detection request, and then further comprises: analyzing the network detection result to obtain a network quality index value; and comparing the network quality index value with a judging standard to determine whether the container cloud platform is in a health state.

Specifically, the network quality indicator value may include a communication success rate and an average response time.

Analyzing the network detection result to obtain a network quality index value, which can be realized by the following modes: comparing the response status code returned by each detection request in the network detection result with the status code of successful communication, if the response status code and the status code accord with preset conditions, judging that the communication is successful, otherwise, the communication is failed, and further summarizing to obtain the communication success rate of the round of network detection; and carrying out average calculation on response time required by all detection requests in the network detection result to obtain average response time, and forming corresponding network quality index values of the network detection by the communication success rate and the average response time.

For example, when the detection request is an HTTP access request, the communication success rate of generating the HTTP access request may be calculated according to the state code of the HTTP response, and the average response time of the HTTP access request may be calculated according to the average time consumption of the access request response.

Wherein, the status code of the HTTP protocol uses three digits to represent different errors, and five categories are 1XX,2XX,3XX,4XX and 5XX respectively. The class 1XX status code information indicates a temporary response. The client should be ready to receive one or more 1xx responses before receiving the regular response. The 2xx class status code information indicates that the server successfully accepted the client request. The 3xx class status code information represents: the client browser must take more actions to fulfill the request. For example, the browser may have to request a different page on the server, or repeat the request through the proxy server. The 4xx class status code information indicates: errors occur and the client appears to be problematic. For example, the client requests a page that does not exist, and the client does not provide valid authentication information. The 5xx class status code information indicates: the server cannot complete the request due to encountering an error. When the visible status code is in the 2xx class or the 3xx class, the service is in a health state, and the communication success rate of the HTTP access request can be calculated by counting the types of the status codes of all HTTP responses.

The average response time refers to the average time consumption of multiple HTTP access request responses, where the response time of one HTTP access request may include: the time it takes to resolve the IP address of the domain name acquisition server, the time it takes to establish a connection with the target component, the time it takes to transmit an HTTP access request to the target component, the time it takes to wait for the target component to process the request and return a response, etc. In this way, the connectivity of the network and the quality of the network are determined by the communication success rate and the average response time of the HTTP access request.

For another example, when the target component is a domain name resolution component, the communication success rate of domain name resolution can be calculated and generated according to the response status code of the DNS. The response status codes of DNS include five types, which are 0, 1, 2, 3, 4 and 5, respectively, and represent that the query request is successfully completed (noerr), the query request format is wrong (format), the server fails to process (servail), the domain name requested to be queried does not exist (nxda), the function is not realized (NOTIMP), the server refuses to reply to the query (REFUSED), when the visible response status code is 0, the domain name access request is successful, the request success rate of domain name resolution can be calculated by counting the response status codes of all domain name resolution requests, and when the request success rate of domain name resolution is higher than a reasonable threshold, the domain name resolution network is judged to be in a healthy state.

In this embodiment, the decision criteria are used to define decision conditions of the cluster network functions or performances, for example, may include a reasonable threshold of average response time, a communication success rate threshold, a status code of successful communication, and so on.

And comparing the communication success rate and the average response time generated after the network detection task is executed with a reasonable threshold value and a communication success rate threshold value of the average response time in the judgment standard, so as to determine whether the container cloud platform is in a health state.

It should be noted that, the first network detection agent component sends a detection request to the target component, and only when the first network detection agent component is in network communication with the target component, the first network detection agent component can receive the response, and then further determines whether the application communication or domain name resolution is successful according to the status code in the response. If the first network detection proxy component does not receive the response for more than a preset time period (for example, 20 seconds), the network between the first network detection proxy component and the target component is not enabled, and the network of the node where the first network detection proxy component is located can be judged to have faults.

In addition, if the network detection task is to detect the domain name resolution network, since the domain name used by the domain name resolution request is the domain name of the first network detection proxy component, it is also possible to further check that the response with the response code of 0, if the IP address of the response is the IP address of the first network detection proxy component itself, it indicates that the domain name resolution component is completely normal, if the IP address of the response is not the IP address of the first network detection proxy component, it indicates that the domain name resolution component is abnormal, and there is a possibility of being invaded by a third party, and further checking is required. In this way, whether the first network detection agent component receives a response within a preset time period is used for judging the connectivity of the domain name resolution network, whether the function of the domain name resolution component is normal is judged according to the response status code of the response, whether the domain name resolution component is safe is judged according to the IP address of the response, whether the performance of the domain name resolution network or the domain name resolution component has a problem is judged according to the average response time, and the comprehensive detection of the connectivity, the function and the performance of the domain name resolution network and the DNS component is realized.

In summary, in the embodiment of the present application, a network detection controller is deployed on a container cloud platform, and a network detection proxy component is correspondingly deployed in at least one node of the container cloud platform. The first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by the user-defined network detection resource file; the target component is any component in the container cloud platform, the content recorded by the user-defined network detection resource file is determined, the network detection strategy at least comprises request parameters of a detection request, and the request parameters are used for specifying a sending mode of the detection request; the network detection controller generates a network detection result based on the response corresponding to the detection request. Therefore, a user-defined resource definition mechanism provided by the container cloud platform is utilized, a new user-defined resource type is introduced into the container cloud platform in the form of CRD resources, the targets, timeliness and detection ranges of network detection tasks are defined through user-defined network detection resource files, operation and maintenance personnel can actively initiate the network detection tasks by only creating the user-defined network detection resource files or updating the content of the user-defined network detection resource files, network detection proxy components on all nodes and a network detection controller automatically execute the network detection tasks according to the network detection strategies recorded by the user-defined network detection resource files, and network detection results are automatically generated after detection, so that active, automatic and full-quantity distributed network detection of the container cloud platform is realized.

The embodiment of the application provides a periodic and automatic network detection method suitable for a cloud primary scene, which can configure relevant parameters of a network detection task only by modifying specific fields in user-defined network detection resources, can support the detection of connectivity of all nodes, all network paths and all network protocols, and realizes the omnibearing coverage of network detection.

Exemplary System

The embodiment of the application provides a network detection system of a container cloud platform, a network detection controller is deployed on the container cloud platform, and a network detection agent component is correspondingly deployed in at least one node of the container cloud platform, as shown in fig. 8, and the system comprises: a transmitting unit 801, a responding unit 802. Wherein:

the sending unit 801 is configured to send, by the first network detection proxy component, a detection request to the target component based on the network detection policy described in the custom network detection resource file.

The target component is any component in the container cloud platform, the content recorded by the user-defined network detection resource file is determined, and the network detection strategy at least comprises a request parameter of a detection request, wherein the request parameter is used for specifying a sending mode of the detection request.

And a response unit 802 configured to generate a network detection result by the network detection controller based on the response corresponding to the detection request.

The network detection system of the container cloud platform provided by the embodiment of the application can realize the steps and the flow of the network detection method of the container cloud platform provided by any embodiment, and achieve the same technical effects, and are not described in detail herein.

Exemplary apparatus

Fig. 9 is a schematic structural diagram of an electronic device provided according to some embodiments of the present application; a management system of a container cloud platform is operated on an electronic device, the system is used for managing the container cloud platform, a network detection controller is deployed on the container cloud platform, a network detection agent component is correspondingly deployed in at least one node of the container cloud platform, as shown in fig. 9, and the electronic device comprises:

one or more processors 901;

a computer readable storage medium, which may be configured to store one or more programs 902, the one or more processors 901 when executing the one or more programs 902 implement the steps of: the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by the user-defined network detection resource file; the target component is any component in the container cloud platform, the content recorded by the user-defined network detection resource file is determined, the network detection strategy at least comprises request parameters of a detection request, and the request parameters are used for specifying a sending mode of the detection request; the network detection controller generates a network detection result based on the response corresponding to the detection request.

FIG. 10 is a hardware architecture of an electronic device provided in accordance with some embodiments of the present application; as shown in fig. 10, the hardware structure of the electronic device may include: a processor 1001, a communication interface 1002, a computer-readable storage medium (also referred to as memory) 1003, and a communication bus 1004.

Wherein the processor 1001, the communication interface 1002, and the computer-readable storage medium 1003 communicate with each other via a communication bus 1004.

The electronic equipment is operated with a management system of a container cloud platform, and is used for managing the container cloud platform, a network detection controller is deployed on the container cloud platform, and a network detection agent component is correspondingly deployed in at least one node of the container cloud platform.

The computer-readable storage medium 1003 may be configured to store one or more programs.

Alternatively, the communication interface 1002 may be an interface of a communication module, such as an interface of a GSM module.

The processor 1001 may be specifically configured to: the first network detection agent component sends a detection request to the target component based on a network detection strategy recorded by the user-defined network detection resource file; the target component is any component in the container cloud platform, the content recorded by the user-defined network detection resource file is determined, the network detection strategy at least comprises request parameters of a detection request, and the request parameters are used for specifying a sending mode of the detection request; the network detection controller generates a network detection result based on the response corresponding to the detection request.

The processor 1001 may be a general purpose processor, including a central processing unit (central processing unit, CPU), a Network Processor (NP), etc., and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The electronic device of the embodiments of the present application exist in a variety of forms including, but not limited to:

(1) A mobile communication device: such devices are characterized by mobile communication capabilities and are primarily aimed at providing voice, data communications. Such terminals include: smart phones (e.g., iPhone), multimedia phones, functional phones, and low-end phones, etc.

(2) Ultra mobile personal computer device: such devices are in the category of personal computers, having computing and processing functions, and generally also having mobile internet access characteristics. Such terminals include: PDA, MID, and UMPC devices, etc., such as iPad.

(3) Portable entertainment device: such devices may display and play multimedia content. The device comprises: audio, video players (e.g., iPod), palm game consoles, electronic books, and smart toys and portable car navigation devices.

(4) And (3) a server: the configuration of the server includes a processor, a hard disk, a memory, a system bus, and the like, and the server is similar to a general computer architecture, but is required to provide highly reliable services, and thus has high requirements in terms of processing capacity, stability, reliability, security, scalability, manageability, and the like.

(5) Other electronic devices with data interaction function.

It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present application may be split into more components/steps, and two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the purposes of the embodiments of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine storage medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be stored on such software processes on a recording medium using a general purpose computer, a special purpose processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a storage component (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, implements the network detection methods of the container cloud platform described herein. Furthermore, when a general purpose computer accesses code for implementing the methods illustrated herein, execution of the code converts the general purpose computer into a special purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part.

The above-described apparatus and system embodiments are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A network detection method for a container cloud platform, wherein a network detection controller is deployed on the container cloud platform, and a network detection agent component is correspondingly deployed in at least one node of the container cloud platform, the method comprising:

the network detection controller generates a network detection result based on the response corresponding to the detection request;

The first network detection agent component is deployed in a first node of the container cloud platform, the target component comprises a second network detection agent component, and the second network detection agent component is deployed in a second node of the container cloud platform; the request parameters comprise at least one communication mode, wherein the communication mode is used for defining a network detection path through which the detection request passes, and the detection request is an HTTP access request;

2. The network detection method of a container cloud platform of claim 1, wherein the network detection policy further comprises a start time and a number of execution rounds;

3. The network detection method of a container cloud platform of claim 1, wherein the network detection policy further comprises a decision criterion;

analyzing the network detection result to obtain a network quality index value;

4. The network detection method of a container cloud platform according to claim 1, wherein the communication mode includes any one of container group IP address communication, node port communication, platform load balancing communication, and platform entry component communication.

5. The network detection method of a container cloud platform according to claim 1, wherein the network detection agent component is deployed on all nodes of the container cloud platform or on a designated node of the container cloud platform.

6. A network detection system of a container cloud platform, wherein a network detection controller is disposed on the container cloud platform, and a network detection agent component is correspondingly disposed in at least one node of the container cloud platform, the system comprising:

a response unit configured to generate a network detection result based on a response corresponding to the detection request by the network detection controller;

the sending unit is specifically configured to send the HTTP access request to a preset API interface of at least one second network detection agent component through at least one network detection path by using the first network detection agent component.

7. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the network detection method of the container cloud platform according to any of claims 1 to 5.

8. An electronic device, comprising: a memory, a processor, and a program stored in the memory and executable on the processor, the processor implementing the network detection method of the container cloud platform according to any one of claims 1 to 5 when the program is executed.