NL2030719B1

NL2030719B1 - Microservice application observability system

Info

Publication number: NL2030719B1
Application number: NL2030719A
Authority: NL
Inventors: Jiang Shuomiao; Lv Lixing; Cheng Xuelin; Fu Enhui
Original assignee: Univ Zhejiang; Shanghai Zhuyun Information Tech Co Ltd
Priority date: 2021-11-24
Filing date: 2022-01-26
Publication date: 2023-06-15
Also published as: CN114143169A

Abstract

The present disclosure discloses a microserVice application observability system. The system includes a collector, a data gateway and a data center; the collector is connected to the data center through the data gateway; the collector is deployed in an application system in user environment; the collector is configured to collect various types of data generated by an application to form application data; the data gateway is configured to transmit the application data to the data center; the data center is configured to View the application data; store the application data by category; perform anomaly detection on the application data and raise an alarm; and use a built-in data analysis module to perform corresponding analysis. The present disclosure is easy to deploy and facilitates the realization of observability services.

Description

MICROSERVICE APPLICATION OBSERVABILITY SYSTEM

TECHNICAL FIELD

[01] The present disclosure relates to a field of observability construction, in particular to a microservice application observability system.

BACKGROUND ART

[02] Existing open-source products, such as ELK, Prometheus, Grafana, etc., are mostly purely monitoring products and cannot provide overall observability services.

[03] ELK is an overall monitoring and detection solution developed by Elastic.co based on ElasticSearch, which is composed of ElasticSearch, Logstash, and Kibana.

Problems and shortcoming with ELK: installation and deployment are extremely complicated and the learning cost is huge, and overall operation and maintenance costs are huge, and maintenance of a cluster requires a lot of expenses.

[04] Prometheus is a recently popular observability product based on a time series database, which is characterized by synchronizing data with a Scrape model and forming an independent Exporter system, and is a cloud native monitoring solution recommended by CNCF Cloud Native Foundation. Problems and shortcoming with Prometheus: the overall construction complexity is high and it is not easy to use; and open-source code has alot of potential technical risks; default data storage mode is only stand-alone mode, which cannot achieve long-term storage and query of massive indicators.

[05] Grafana is an open source data visualization tool for the monitoring field, does not process data itself, but integrates with the aforementioned products to be valuable.

Problems and shortcoming with Grafana: it does not process data, it is just a presentation layer; there is no unified data query language, different data layers determine the use of different query languages to build the presentation layer; due to open-source properties, docking different data sources requires redesigning the data docking of the corresponding presentation layer, and the production of the presentation layer is quite cumbersome.

[06] It can be seen that the existing open source product deployment is relatively cumbersome and complicated.

SUMMARY

[07] Based on this, the embodiment of the present disclosure provides a microservice application observability system that is easy to deploy and easy to implement.

[08] In order to achieve the above objectives, the present disclosure provides the following solutions:

[09] A microservice application observability system includes a collector, a data gateway and a data center; wherein the collector is connected to the data center through the data gateway; the collector 1s deployed in an application system in user environment; the collector is configured to collect various types of data generated by an application to form application data; the data gateway is configured to transmit the application data to the data center; the data center is configured to view the application data, store the application data by category, perform anomaly detection on the application data and raise an alarm, and use a built-in data analysis module to perform corresponding analysis.

[10] Optionally, the collector specifically includes a collecting module and a transmitting module; both the collecting module and a third-party collector are connected to the transmitting module; the collecting module is configured to collect target data, the target data includes indicator data, log data and file data; the transmitting module is configured to send the application data formed by the target data and data collected by the third-party collector to the data gateway.

[11] Optionally, the transmitting module specifically includes an HTTP module and an IO module; the third-party collector is connected to the IO module through the HTTP module; the collecting module is connected to the IO module; the HTTP module is configured to access data collected by the third-party collector to the IO module; the IO module is configured to transmit the data sent by the collecting module and the HTTP module to the data gateway.

[12] Optionally, the collector further includes a configuration loading module, a service management module, a tool chain module, a pipeline module, an election module and a document module;

[13] the configuration loading module is configured to configure data; the service management module is configured to manage services; the tool chain module is configured to implement document viewing, restarting services, and updating; the pipeline module is configured to process the application data; the election module is configured to control at most one of the collector in each time period to collect; the document module is configured to access a document list in the application data by accessing a page.

[14] Optionally, the data center specifically includes a data classification module, an anomaly detection module, a data analysis module, a storage medium, and a visual operation interface;

[15] the data classification module is configured to determine a data type according to a format or a label of the application data; the storage medium is configured to store the application data in different storage media according to the data type; the anomaly detection module is configured to use a decision tree to detect anomalies in the application data and issue alarms according to a configured alarm rule; the data analysis module is configured to analyze the application data; the visual operation interface is configured for users to view the application data and configure the alarm rule.

[16] Optionally, the data center further includes a new indicator generation module and an indicator prediction module;

[17] the new indicator generation module is configured to process the application data by using aggregation methods to generate a target indicator; the aggregation methods comprise counting numbers, counting an average, counting a maximum and counting a minimum value; the indicator prediction module is configured to put the application data at a current moment into a multi-layer perceptron to predict the application data at a future moment.

[18] Optionally, the collector further includes a cache module; and the cache module 1s configured to cache the application data when the application data fails to be sent.

[19] Optionally, the microservice application observability system further includes a disk; wherein the disk 1s configured to store the application data when an abnormality of a network causes the data gateway to send failure or a flow of the data gateway is greater than a set flow value, and send the application data to the data gateway when the network returns to normal or the flow of the data gateway is less than or equal to the set flow value.

[20] Optionally, the IO module sends data to the data gateway at a set frequency.

[21] Optionally, the storage medium includes an Influx DB and an Elasticsearch.

[22] Compared with existing technologies, the beneficial effects of the present disclosure are:

[23] A microservice application observability system is proposed in the embodiment of the present disclosure, which includes a collector, a data gateway and a data center; wherein the collector is connected to the data center through the data gateway; the collector is deployed in an application system in user environment; the collector is configured to collect various types of data generated by an application to form application data; the data gateway is configured to transmit the application data to the data center; the data center is configured to view the application data, store the application data by category, perform anomaly detection on the application data and raise an alarm, and use a built-in data analysis module to perform corresponding analysis. The observability system including the collector, the data gateway and the data center of the present disclosure is easy to install and deploy.

BRIEF DESCRIPTION OF THE DRAWINGS

[24] In order to explain the embodiments of the present disclosure or technical solutions in the prior art more clearly, the drawings needed in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present disclosure, for those of ordinary skill in the art, without creative labor, other drawings can be obtained based on these drawings.

[25] FIG. 1 is a schematic structural diagram of a microservice application observability system provided by an embodiment;

[26] FIG. 2 is a schematic structural diagram of a collector provided by an embodiment;

[27] FIG. 3 1s a schematic structural diagram of a data center provided by an 5 embodiment.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[28] The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments of the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.

[29] In order to make the above objectives, features and advantages of the present disclosure more obvious and understandable, the present disclosure will be further described in detail below with reference to the accompanying drawings and specific implementations.

[30] A microservice application observability system provided in the embodiment is a system designed to solve the observability of cloud computing and cloud-native era systems for each microservice application.

[31] Referring to FIG. 1, a microservice application observability system of an embodiment includes a collector, a data gateway and a data center; wherein the collector 1s connected to the data center through the data gateway; the collector is deployed in an application system in user environment; the collector is configured to collect various types of data generated by an application to form application data; the data gateway is configured to transmit the application data to the data center; the data center is configured to view the application data, store the application data by category, perform anomaly detection on the application data and raise an alarm, and use a built-in data analysis module to perform corresponding analysis.

[32] The collector in the embodiment can be deployed into the system in a variety of ways. The collector can be deployed on a host to collect information such as the host's

CPU, memory, port and other information, and deployed on K8S to collect container running states of a container and other data. At the same time, the collector can also be set up by embedding codes to obtain the data generated inside the application.

[33] In one embodiment, refer to FIG. 2, the collector is divided into three layers from top to bottom, namely a top layer, a transport layer and a collecting layer.

[34] The top layer includes a program entry module and some public modules.

Specifically, the top layer includes: a configuration loading module, a service management module, a tool chain module, a pipeline module, an election module and a document module. In addition to a main configuration of the collector, the configuration of each collector is configured separately. Therefore, the configuration loading module is configured to realize configuration data. The service management module is mainly responsible for service management of the entire collector. As a client program, in addition to collecting data, the collector also provides many other peripheral functions, which are implemented in the tool chain module, such as viewing documents, restarting services, updating and so on. Therefore, the tool chain module is mainly configured for document viewing, restarting services and updating. The pipeline module is configured to process the application data. For example, in log processing, through a pipeline script, the log is cut, and unstructured log data is converted into structured data; in other non- log data, corresponding data processing can also be performed. When there are too many collectors deployed, users can make the configuration of all collectors the same, and then distribute the configuration to each collector through automated batch deployment.

The purpose of the election module is that in a cluster, certain data collection (such as

Kubernetes cluster indicators) should only be collected by one collector, otherwise data duplication will be caused and a collected party is put on pressure. Therefore, in the case that all collectors in the cluster have the same configuration, through the election module, at most only one collector will implement the collection can be realized at any time. The document of the collector is installed along with it, and the document module is configured for users to access a document list in the application data through visiting a page.

[35] The transport layer is responsible for an input and output of almost all data. The transport layer includes a transmitting module, the transmitting module specifically includes an HTTP module and an IO module; the third-party collector is connected to the IO module through the HTTP module; the collecting module is connected to the IO module; the HTTP module is configured to access data collected by the third-party collector to the IO module; the IO module is configured to transmit the data sent by the collecting module and the HTTP module to the data gateway. Each data collector sends the data to the IO module after each collection is completed. The IO module encapsulates a unified data construction, processing and sending interface, which is convenient for accessing the data collected by each collector plug-in. In addition, the IO module sends data to the data gateway via HTTP at a set frequency. The third-party collector can be

Telegraf, Prometheus and so on.

[36] The collecting layer is responsible for collecting various data, and sending the data to the transport layer after a certain processing. The collecting layer includes: the collecting module, wherein the collecting module and the third-party collector are connected to the transmitting module; the collecting module is configured to collect target data; the target data includes not only indicator data, log data, and file data, but also application performance data, event report data, and security monitoring data, etc.; the transmitting module is configured to send the application data formed by the target data and the data collected by the third-party collector to the data gateway. Data collection is an initial part of application observation in the embodiment. The data collection function of the embodiment supports multiple collection methods. Users can use the built-in collection module to collect, or use other third-party collectors to collect.

Most of the data collectors in the embodiment implement the function of data collection based on bypass technology, and can complete the task of data collection without affecting the business system as much as possible.

[37] In one embodiment, refer to FIG. 3, the data center specifically includes: a data classification module, an anomaly detection module, a data analysis module, a storage medium, and a visual operation interface. The data classification module is configured to determine a data type according to a format or a label of the application data. The storage medium is configured to store the application data in different storage media according to the data type. For example, the stored application data includes CPU occupancy, memory, disk occupancy and so on. The anomaly detection module is configured to use a decision tree to detect anomalies in the application data and issue alarms according to a configured alarm rule. Specifically, the anomaly detection module builds the decision tree based on historical application data, and use the decision tree to determine whether the current application data is abnormal, and if there is an abnormality, a prompt will be issued. The data analysis module is configured to analyze the application data. The visual operation interface is configured for users to view the application data and configure the alarm rule. The storage medium include: an Influx

DB and an Elasticsearch. The visual operation interface includes: a user login interface, an indicator view interface, a log view interface, an application performance monitoring interface, an anomaly detection management interface and an infrastructure management interface.

[38] In one embodiment, the data center further includes: a new indicator generation module and an indicator prediction module; the new indicator generation module is configured to process the application data using aggregation methods to generate a target indicator; the aggregation methods include counting numbers, counting an average, counting a maximum, and counting a minimum value; the indicator prediction module is configured to put the application data at the current moment into a multi-layer perceptron to predict the application data at the future moment.

[39] In one embodiment, the collector collects various indicators of the application through regular collection, and then sends the data to the data gateway regularly and quantitatively via HTTP(s). Because the collector may fail to send data middleware due to some network reasons, the collector further includes: a cache module; the cache module is configured to cache the application data when the application data fails to be sent. The cache module can cache up to one thousand points of data.

[40] In one embodiment, the microservice application observability system further includes a disk. After receiving the data, the data gateway forwards the data to the data center. If the data 1s failed to be sent to the data center due to some reasons, or is too late to be sent to the data center because the flow is too large, the data gateway will persist the data to the disk, and then send the data to the data center when the subsequent flow decreases or the network recovers. Therefore, the disk is configured to store the application data when the network abnormality causes the data gateway to send failure or the data gateway's flow is greater than a set flow value, and send the application data to the data gateway when the network returns to normal or the data gateway's flow is less than or equal to the set flow value.

[41] Inthe embodiment, a message queuing technology is adopted on a transmission link from the data collector to the data gateway, and a multi-try mechanism is adopted on the transmission link from the data gateway to the data center, thereby ensuring data consistency.

[42] In actual application, users adopt the microservice application observability system in the above embodiment, which can realize the following functions:

[43] 1, collecting data: the embodiment supports multiple collection methods. Users can use the built-in collector of the embodiment (collecting module) to collect data, or use the third-party collectors to collect data, such as Telegraf, Prometheus and so on.

The built-in collector of the embodiment can collect many important data, such as indicator data, log data, and file data. Each built-in collector sends the data to the IO module after each collection. Users can also use the third-party collector to collect data.

The third-party collector can transmit the collected data to the IO module through the

HTTP module, and the IO module uniformly transmits the data to the data gateway.

[44] 2, Data observation: users can observe various data of applications through the embodiment. The visual operation interface of the embodiment can vividly display the collected data, and allow users to perform a secondary processing on the data very conveniently. Users can perform the following operations:

[45] Viewing indicators: users view collected various indicator data through the visual interface, and analyze possible problems in the current application system through these indicator data. Users can view all link data of the application, also search, filter, export the link data, view link details, and perform total analysis on link performance through flame graphs, span lists and others to clearly track data details of each link performance.

[46] Generating new indicators: the embodiment supports the generation of new indicators based on currently existing collected data. Users can obtain new indicator data through aggregation methods. Users can aggregate original indicators into new indicators by choosing a suitable aggregation method to make the indicators more practical.

[47] Anomaly detection: the embodiment has built-in multiple anomaly detection libraries and supports multiple detection rules such as CPU, memory, and ports.

Anomalies can be found in real time from the detection data, and the alarm can be triggered to notify users. Users can configure the alarm rules through the visual interface to realize the anomaly detection function. The embodiment also stores all alarm history records in events, related events can be supported one-click aggregation, and to be searched and analyzed.

[48] The microservice application observability system in the above embodiments only needs to copy an installation instruction in a description document to a server to run. The users can also personalize different configuration files provided by the embodiment to quickly use a required observability service. The existing open source products such as Prometheus, etc., are more complicated to install, and often require a large number of commands to be run on the server to install successfully, which is not user-friendly.

[49] The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.

[50] Specific embodiments are used in this article to illustrate principles and implementation methods of the present disclosure.

The description of the above embodiments is only used to help understand the method and core idea of the present disclosure; at the same time, for those of ordinary skill in the art, according to the idea of the present disclosure, there will be changes in the specific implementation methods and the scope of application.

In summary, the content of this specification should not be construed as a limitation of the present disclosure.

Claims

S12 - Conclusions

A microservice application observability system, comprising a collector, a data gateway and a data center; wherein the collector is connected to the data center via the data gateway, the collector is deployed in an application system in user environment; the collector is configured to collect various kinds of data generated by an application to form application data; the data gateway is configured to broadcast the application data to the data center; the data center is configured to view the application data, store the application data by category, perform anomaly detection on the application data and raise alarms, and use a built-in data analysis module to perform corresponding analysis .

A microservice application observability system according to claim 1, wherein the collector in particular comprises a collector module and a transmitter module; both the collector module and a third-party collector are connected to the transmitter module; the collection module is configured to collect target data, the target data includes indicator data, log data and file data; the transmitting module is configured to transmit the application data formed by the target data and the data collected by the third-party collector to the data gateway.

The microservice application observability system of claim 2, wherein the broadcast module comprises an HTTP module and an IO module; the third-party collector is connected to the IO module via the HTTP module; the collector module is connected to the IO module; the HTTP module is configured to access data collected by the third-party collector to the IO module; the IO module is configured to broadcast the data sent by the collection module to the data gateway.

The microservice application observability system of claim 2, wherein the collector further comprises: a configuration loading module, a service management module, a tool chain module, a pipeline module, a dialing module, and a document module;

S13 - where the configuration load module is configured to configure data; the service management module is configured to manage services; the toolbox module is configured to implement document viewing, service restart and updating; the pipeline module is configured to process the application data; the selector module is configured to control at most one of the collectors in each time period to collect, the document module is configured to access a document list in the application data by accessing a page.

The microservice application observability system of claim 1, wherein the data center specifically comprises: a data classification module, an anomaly detection module, a data analysis module, a storage medium, and a visual operator interface; wherein the data classification module is configured to determine a data type according to a classification or label of the application data; the storage medium is configured to store the application data according to the data type; the anomaly detection module is configured to use a decision tree so as to detect anomalies in the application data and issue alarms according to a configured alarm rule, the data analysis module is configured to analyze the application data; the visual operator interface is configured so that users can view the application data and configure the alarm rule.

The microservice application observability system of claim 1, wherein the data center further comprises a new indicator generation module and an indicator prediction module; wherein the new indicator generation module is configured to process the application data using aggregation methods to generate a target indicator, the aggregation methods include counting numbers, counting an average, counting a maximum and counting a minimum value , the indicator prediction module is configured to place the application data at a current time in a multi-layered perceptron so as to predict the application data at a future time.

The microservice application observability system of claim 1, wherein the collector further comprises a cache module; wherein the cache module is configured to cache application data if the application data fails to be sent.

The microservice application observability system of claim 1, further comprising a disk; wherein the disk is configured, if a network abnormality causes the data gateway to fail to transmit or if a stream from the data gateway exceeds a set stream value, to store the application data and to send the application data to the data gateway when the network returns to normal or when the data gateway current is less than or equal to 1s of the set current value.

The microservice application observability system of claim 3, wherein the IO module sends data to the data gateway at a set frequency.

The microservice application observability system of claim 5, wherein the storage medium comprises an Influx DB and an Elasticsearch.