CN110311817B

CN110311817B - Container log processing system for Kubernetes cluster

Info

Publication number: CN110311817B
Application number: CN201910578033.7A
Authority: CN
Inventors: 白伟
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2021-09-28
Anticipated expiration: 2039-06-28
Also published as: CN110311817A

Abstract

The invention relates to the technical field of containers, discloses a container log processing system for a Kubernets cluster, and solves the problem of how to collect, search and archive container logs randomly distributed in the Kubernets cluster. The invention discloses a dynamic deployment service based on Kubernetes, which is used for marking a specific label on the deployment service, configuring a log collection component to collect, search and archive the label on the deployment service, utilizing information as log collection buffer storage, compressing and storing the archived log to solve the problems of influence on service performance and high storage cost in the log collection process, configuring a disposal event of a Filebeat as the specific label of the deployment service, and achieving dynamic switching of log collection. The invention is suitable for data center transmission control.

Description

Container log processing system for Kubernetes cluster

Technical Field

The invention relates to the technical field of containers, in particular to a container log processing system for a Kubernetes cluster.

Background

With the popularization of micro-service architecture, more and more companies adopt micro-services to build their own service platforms, adopt container platforms represented by kubernets to manage the micro-services, and use the kubernets to perform a series of container arrangement operations such as resource scheduling and dynamic capacity expansion. The log is used as important information for recording the running state of the container and is used as key data for diagnosing and positioning problems in daily production, and the significance of the log is more and more emphasized by people. Particularly, in a large-scale container cluster, a series of problems such as how to collect randomly distributed container log data and provide distributed container log archiving and searching functions, etc. of one microservice, which have to be multiple copies and randomly distributed to different host nodes, become a challenge to be faced in the containerization deployment process.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the container log processing system for the Kubernets cluster is provided, and the problem of how to collect, search and archive container logs randomly distributed in the Kubernets cluster is solved.

In order to solve the problems, the invention adopts the technical scheme that: the container log processing system for the Kubernetes cluster comprises a log acquisition module, a log collection module, a log consumption module, a log archiving program, a log buffer storage module, a search analysis service module and two specific tags;

two specific tags are attached to applications deployed in a Kubernets cluster, wherein the tag value of one tag A is the same as the application name, and the other tag B is used for determining whether application logs need to be collected and archived;

the log acquisition module is used for acquiring application log data;

the log collection module is used for writing the log collected by the log collection module into the log buffer storage module and configuring log discarding event conditions, wherein the log data written into the log buffer storage module needs to contain the two specific tags;

the log consumption module is used for consuming the log data in the buffer storage module and writing the consumed log data into the search analysis service module, wherein the log data written into the search analysis service module needs to contain the two specific tags;

the log archiving program is used for archiving the log data collected by the search analysis service module, the log archiving program determines whether the application service log needs to be archived or not through the label B before archiving, and the label A is used for searching the service module to retrieve the archived data during archiving.

Furthermore, the parameters of the log collection module need to be set before the log collection module collects the logs, so that the logs in a single container can be updated in a rolling manner, and the logs in the single container are prevented from being overlarge.

Further, the log collection module may be Docker.

Further, the log buffer storage module may be Kafka.

Further, the log collection module may be a filebed.

Furthermore, the Filebeat is deployed in the kubernets cluster in a DaemonSet manner, so as to ensure that each host node in the kubernets cluster runs one pod copy, and when a new node is added into the kubernets cluster or an old node is removed, the Filebeat can automatically schedule a pod to the new node or delete the redundant copy, so as to ensure that the log of each node can be correctly collected.

Further, the log consumption module may be a Logstash.

Further, the archive program may also have a retry mechanism to ensure that archive data is not lost.

Further, the search service module may be an Elasticsearch.

The invention has the beneficial effects that: the invention adopts asynchronous processing for collecting and archiving container logs in the Kubernetes cluster, namely, the logs collected by the log collecting module are written into the log buffer storage module, thereby reducing the performance influence on the service in the process of collecting and archiving the cluster logs. And the distributed log real-time search and history filing all-round log data are provided, the data are ensured not to be lost, and the log filing adopts a compressed storage method, so that the log storage cost is greatly reduced.

Drawings

FIG. 1 is a schematic flow chart of an embodiment.

Fig. 2 is a schematic structural diagram of the embodiment.

Detailed Description

In order to solve the problem of how to collect, search and archive the logs of the randomly distributed containers in the Kubernets cluster, two specific labels are marked when an application is deployed in the Kubernets cluster, wherein the value of one label A is the same as the application name, so that the distributed logs are conveniently archived and searched, and the value of the other label B determines whether the application logs need to be collected and archived. The application which is convenient to control and does not need to collect logs causes unnecessary resource waste to the system.

The application writes the log into the standard output, all nodes in the kubernets cluster write the application log into a host node file system through a log acquisition module Docker, the Docker can process the application log through a module called as a Lopredriver of the Docker, and the Lopredriver is a module used by the Docker for processing the standard output of the container. Docker supports a plurality of different processing modes, and the invention adopts Docker to default JSON File logs. For a large-scale cluster container, the log file size is increased very fast, which undoubtedly will quickly exhaust the disk space of the host node, so we need to set the container log rolling size.

According to the invention, a log collection module FileBeat is deployed in a Kubernets cluster, and the application log collected by each host node through Docker is uploaded to a log centralized storage, the FileBeat is deployed in the Kubernets cluster in a DaemonSet mode, the DaemonSet can ensure that each host node in the Kubernets cluster runs a pod copy, when a new node is added into the Kubernets cluster or an old node is removed, the cloud can be automatically scheduled to the new node or the redundant copy is deleted, and the log of each node can be correctly collected. Because the application logs have the characteristics of real-time generation and large data volume, great I/O pressure and even log data loss can be caused to the centralized storage of the logs by simultaneously uploading the Filebeat to the application logs to the centralized storage, so that the Filebeat is configured to directly transmit the log data to a plurality of partitions under Topic specified in a log buffer storage module Kafka cluster, and the application logs are temporarily cached by utilizing the characteristic that Kafka supports ultrahigh concurrent writing. Meanwhile, the log data transmitted by the Filebeat needs to be configured, a label marked when the application is deployed needs to be added, and a log discarding event is started, and whether the log needs to be transmitted by Kafka depends on whether log collection is started when the application is deployed.

According to the method, a log consumption module Logstash is deployed in a Kubernets cluster, the Logstash consumes Filebeat and transmits a Kafka specified Topic message, and an application log is written into a log centralized storage search service module Elasticissearch. Configuring the Logstash to write into the template of the Elasticissearch as one Index per day, preventing the single Index from being too large, causing performance impact on log search, and the log data written into the Elasticissearch contains a specific label marked when the application is deployed. And all logs in the cluster are written into the Elasticissearch in a centralized manner, the occupation of storage space is very high, a timing task is configured to ensure that the Elasticissearch only stores log data within 6 months, and the performance of a log search interface is ensured.

The invention deploys a log filing program in a Kubernetes cluster, and is used for filing the elastic search collection log to a plurality of copies of dispersed log data according to the levels of the cluster, the space and the application, wherein the filing interval is once per hour, the label A is used for searching a service module to retrieve the filed data during filing, and the filed log data is uploaded to an object storage Ceph. The log archiving program utilizes a Kubernets watch mechanism to count resource changes in the cluster, only logs on the application with the log collection tag are archived, the archiving program compresses the application logs before uploading to the Ceph, the compressed logs greatly reduce the storage space, the log archiving program is convenient to permanently store the logs, and the problem that the log searching can only provide 6 months of time limit is solved.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following examples.

Examples

With reference to fig. 1 and fig. 2, an embodiment provides a method for processing a container log in a kubernets cluster, which mainly includes the following steps:

the method comprises the following steps: the method comprises the steps of deploying application apps in a system and starting log collection, wherein the system marks two specific tags of matrix-application apps and matrix-logger on the application apps.

Step two: the Docker log collection related parameters were modified to ensure that a single container log could be rolled up, while setting log-opt max-size 100m to prevent the single container log from being too large.

Step three: the method comprises the steps of deploying Filebeat in a cluster in a system, designating the Filebeat to asynchronously write log data into Topic named as matrix by Kafka, adding drop _ event in a Filebeat configuration template, discarding the event when the matrix-loader is off, not collecting the log data, adding add _ kubernets _ metadata under processors, and needing to add specific label matrix-application deployed in include _ fields to facilitate searching and archiving distributed logs subsequently.

Step four: deploying Logstash in the system, consuming Topic information named matrix by Kafka, writing consumption data into an Elasticissearch, configuring a Logstash output plug-in as the Elasticissearch, and designating index creation format as one per day. The system provides a retrieval interface to search the application app into the Elasticsearch log in real time.

Step five: a log filing program is deployed in the system to file multiple copies of scattered log data of an Elasticisearch collection log according to the levels of clusters, spaces and applications, the log filing program utilizes a Kubernets watch mechanism to check matrix-logger which is on, log collection is started by an application app, the filing program retrieves the log data stored by the Elasticisearch through a matrix-application which is an app attribute, the logs of the application app are written into a local storage according to a time sequence, gz compression is carried out on the local logs, the logs are transmitted into a Ceph after compression is finished, results are written into a database, downloading and checking of historical filing logs are facilitated for a service, and if the whole filing process is interrupted, the filing program retries and files the data again, so that log data are not lost.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. The container log processing system for the Kubernetes cluster is characterized by comprising a log acquisition module, a log collection module, a log consumption module, a log archiving program, a log buffer storage module, a search service module and two specific tags;

the log acquisition module is used for acquiring application log data;

the log collection module is used for writing the log collected by the log collection module into the log buffer storage module and configuring a log discarding event condition, wherein the log data written into the log buffer storage module needs to contain the two specific tags;

the log consumption module is used for consuming the log data in the buffer storage module and writing the consumed log data into the search service module, wherein the log data written into the search service module needs to comprise the two specific tags;

the log archiving program is used for archiving the log data collected by the search service module, the log archiving program determines whether the application service log needs to be archived or not through the label B before archiving, and the label A is used for searching the search service module to retrieve the archived data during archiving.

2. The container log processing system for a Kubernetes cluster of claim 1, wherein parameters of the log collection module are set before collection by the log collection module to ensure that individual container logs can be updated on a rolling basis while preventing individual container logs from being too large.

3. The container log processing system for a kubernets cluster of claim 1, wherein the log collection module is Docker.

4. The container log processing system for a kubernets cluster of claim 1, wherein the log buffering storage module is Kafka.

5. The container log processing system for a kubernets cluster of claim 1, wherein the log collection module is a filebed.

6. The container log processing system for a kubernets cluster of claim 5, wherein Filebeat is deployed in the kubernets cluster in a DaemonSet manner to ensure that each host node in the kubernets cluster runs one copy of a pod, and when a new node is added to the kubernets cluster or an old node is removed, the Filebeat automatically schedules the pod to the new node or deletes the redundant copy to ensure that the log of each node can be collected correctly.

7. The container log processing system for a Kubernetes cluster of claim 1, wherein the log consumption module is logstack.

8. The container log processing system for a kubernets cluster of claim 1, wherein the log archive program is further provided with a retry mechanism.

9. The container log processing system for a kubernets cluster of claim 1, wherein the search service module is an Elasticsearch.