CN115168314A

CN115168314A - Log data processing method and system

Info

Publication number: CN115168314A
Application number: CN202211094169.9A
Authority: CN
Inventors: 魏鹏飞; 桂升; 宋春岭; 崔培升
Original assignee: Beijing Esafenet Science & Technology Co ltd
Current assignee: Beijing Esafenet Science & Technology Co ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2022-10-11

Abstract

The invention relates to the technical field of computer data processing, and discloses a method and a system for processing log data, wherein the method comprises the following steps: carrying out unified specification on the format of the log data on the application servers to form a log file of each application server; uploading the log file on each application server to an FTP server; according to the configured log transmission parameters, the log files on the FTP server are imported into an ES server cluster, and the log transmission parameters are used for controlling the frequency and the flow threshold value of the log files imported into the ES server cluster; the ES server cluster establishes indexes according to time period partitions according to a preset script; responding to external request information for judging whether the index of the current time interval exists or not, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval. The invention can provide a complete set of methods including log data production, collection, transmission, storage and retrieval.

Description

Log data processing method and system

Technical Field

The invention relates to the technical field of computer data processing, in particular to a method and a system for processing log data.

Background

In the security application of the enterprise information system, a large amount of log information is generated every day, such as a user login log, a user system operation log, a mail sending and receiving log, a software installation log, a user file encryption and decryption operation log, a user behavior recording log and the like. How to effectively collect, process, store, manage, retrieve, and monitor these logs is particularly important for enterprises.

In the complex enterprise informatization construction, the conditions that enterprise information systems of all departments under the group enterprise flag are independently deployed and are mutually isolated often exist. Various system log information generated by each division of a group every day is often recorded on respective application servers, and log recording modes are various situations. As the number of group branches increases, the number of branch application systems increases, the number of services that can be provided by the systems increases, and the number of logs generated by each system also shows explosive growth. Therefore, for the group headquarters, the problem of how to efficiently collect, store and centrally manage the branches and branch companies to generate a huge amount of logs of various types every day needs to be faced and solved.

In the traditional log collection method, data reporting components or programs are integrated on all branch application systems, and storage tasks are directly submitted to a headquarter database or a non-relational database. Due to the complicated application of each subsection and various reporting forms, the headquarter log collection platform needs to be in butt joint with the application log reporting of each subsection, and the maintenance and development cost is very high. In the face of high concurrency and large data volume log reporting, various reporting parameters and threshold values on each server cannot be efficiently controlled, the stability and integrity of group headquarters data storage cannot be effectively ensured, and log data cannot be effectively traced after being lost.

Furthermore, the conventional relational database storage logs may face the following problems: with the increase of the number of the table logs, the problem of difficult capacity expansion occurs when the utilization rate of the disk is insufficient; when the single-table log records are too large, the query efficiency is low, the log keyword retrieval query waiting time is too long, and no response problem exists; when the concurrency of log insertion requests is too large, the database connection is often exhausted until the problem of database downtime exists when too many requests are made; and in the process of reporting the mass data of each part to a group headquarter log collection platform, the log data is lost due to network abnormality, program abnormality, database downtime and the like.

A file operation monitoring module is established between a virtual file and a real file in an operating system, and comprises a log manager, which obtains a file log from a log cache queue and sends the file log to a log monitoring server through Socket.

In addition, in the process of collecting logs, group operation and maintenance personnel need to log in each branch server, switch among a plurality of servers, check branch logs and check the health condition of the system; tracking and analyzing the operation behaviors of the user, abnormal records of system operation and the like, and the operation and maintenance process is extremely difficult to analyze. Because the data information of the log data of each branch application server cannot be shared, a group cannot actively, efficiently and intensively collect the logs of each branch, and the aims of unified management of the logs of each branch, early warning of the log abnormality of each branch, tracking of illegal operation behaviors of users of each branch and deep mining and analysis of the service log data of each branch are fulfilled.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method and a system for processing log data, which can provide a complete set of methods including log data production, collection, transmission, storage and retrieval, so as to solve the above problems in the background art.

In order to achieve the above object, a first aspect of the present invention provides a method for processing log data, including the following steps:

s1: carrying out unified specification on the format of the log data on the application servers to form a log file of each application server;

s2: uploading the log file on each application server to an FTP server;

s3: according to the configured log transmission parameters, the log files on the FTP server are imported into an ES server cluster, and the log transmission parameters are used for controlling the frequency and the flow threshold value of the log files imported into the ES server cluster;

s4: the ES server cluster establishes indexes according to time period partitions according to a preset script;

s5: responding to the request information whether the index of the current time interval exists or not from the outside, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval.

Further, the step S1 specifically includes the following steps:

s11: the application server imports the expanded log production component;

s12: calling a log production component to modify the configuration information of each application server, and configuring the output format and the log output file path of a log file;

s13: and the application server outputs the log file according to the configuration information.

Further, the step S2 specifically includes the following steps:

s21: deploying a system batch script on an application server;

s22: configuring server IP, path, FTP user name and FTP password uploaded by log files in system batch processing scripts;

s23: and regularly executing an FTP uploading command in the dos command of the windows system through the system batch processing script, and uploading the local log file of the application server to the FTP server according to the configuration.

Further, the step S23 specifically includes the following steps:

s231: if the local log file of the application server is successfully uploaded to the FTP server, deleting the successfully uploaded log file on the application server;

s232: and if the uploading of the local log file of the application server fails, the local log file is uploaded again when the timing task is executed next time.

Further, the log transmission parameters are the starting number, the thread number and/or the memory size of the log transmission management application program.

In a second aspect of the present invention, there is provided a log data processing system, including:

the application server is configured to carry out unified specification on the format of the log data to form a log file; uploading the log file to an FTP server;

the FTP server is configured to import the stored log files into the ES server cluster according to the configured log transmission parameters, and the log transmission parameters are used for controlling the frequency and the flow threshold value of the log files imported into the ES server cluster;

the ES server cluster is configured to create indexes according to preset scripts and time section partitions; responding to external request information for judging whether the index of the current time interval exists or not, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval.

Further, the at least one application server is further configured to:

importing the expanded log production component;

calling a log production component to modify the configuration information of each application server, and configuring the output format and the log output file path of a log file;

and outputting the log file according to the configuration information.

Further, the at least one application server is further configured to:

deploying a system batch script;

configuring server IP, path, FTP user name and FTP password uploaded by log files in system batch processing scripts;

and regularly executing an FTP uploading command in the dos command of the windows system through the system batch processing script, and uploading the local log file to the FTP server according to the configuration.

Further, the at least one application server is further configured to:

if the local log file is successfully uploaded to the FTP server, deleting the log file which is successfully uploaded locally; and if the uploading of the local log file fails, the local log file is uploaded again when the timing task is executed next time.

The log data processing method and system have the following beneficial effects:

(1) A complete set of method and device are provided for each link of log data production, collection, transmission, storage and retrieval;

(2) The log production end standardizes the log output form and unifies the log output format by importing the log expansion component;

(3) The logs are subjected to system batch processing scripts, the FTP service carried by the windows server is called, log data files of various types are uploaded to the headquarter FTP server, and log data are intensively stored in a warehouse, so that the complexity and diversity of independent reporting and storage of each branch application server are avoided, and the maintenance cost is reduced;

(4) The headquarter centrally controls the starting and related parameters of the log transmission management application program, so that the frequency and flow of log data introduced into the ES server cluster are easily controlled, and the stability of log data storage of the ES server cluster is ensured;

(5) The logs are uploaded to a headquarter FTP server in a uniform format and are intensively warehoused, the warehousing caliber is standardized, and the maintenance cost is reduced;

(6) Log data is stored on a headquarter FTP server in a file format, and after logs are stored in a warehouse, abnormal records can be traced effectively;

(7) The log collection platform of the headquarter FTP server monitors the index state, flexibly builds indexes according to configuration parameters by year, month or day, avoids the problem of performance reduction caused by overlarge index data, and improves the insertion performance and retrieval performance of logs.

Drawings

FIG. 1 is a general deployment architecture diagram of a log data processing system as disclosed in one embodiment of the invention;

fig. 2 is a schematic diagram illustrating an interactive deployment of an FTP server and an ES cluster server of a log data processing system according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating steps of a log data processing method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that although the terms first, second, third, etc. may be used to describe the acquisition modules in the embodiments of the present invention, these acquisition modules should not be limited to these terms. These terms are only used to distinguish the acquisition modules from each other.

The word "if" as used herein may be interpreted as "at 8230; \8230;" or "when 8230; \8230;" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (a stated condition or event)" may be interpreted as "upon determining" or "in response to determining" or "upon detecting (a stated condition or event)" or "in response to detecting (a stated condition or event)", depending on the context.

It should be noted that the directional terms such as "upper", "lower", "left" and "right" described in the embodiments of the present invention are described in the angles shown in the drawings, and should not be construed as limiting the embodiments of the present invention. In addition, in this context, it is also to be understood that when an element is referred to as being "on" or "under" another element, it can be directly formed on "or" under "the other element or be indirectly formed on" or "under" the other element through an intermediate element.

The invention provides a complete log data processing method and a complete log data processing system from the aspects of log data production, collection, transmission, storage, retrieval and the like, and the complete log data processing method and the complete log data processing system mainly comprise the following main parts:

1. log data production end

Log4j is an open source item of Apache, and has the characteristics of thread safety, high file writing speed, strong expandable configuration, flexible Log output format and the like. The Log data production end in the invention is expanded and enhanced on the basis of Log4j technology.

The extended Log4j package defines the Log Filter class and the Log4jRollingFileAppender class by user. The LogFilter class inherits the Filter abstract class in the log4j package and implements the OptionHandler interface. An enum LevelType { } enumeration class is defined in the LogFilter, and log types and codes are defined in the enumeration class, for example, a file log: CDGFILE _ LOG _ LEVEL (code). The Log4jRollingFileAppender inherits the RollingFileAppender class in the Log4j packet and realizes an Appender interface and an OptionHandler interface. The extension packet defines a String getLog4jConfigservice () interface to obtain log4j.xml configuration information and a void updateLog4jConfigservice () interface to modify log4j.xml configuration, so that a headquarter log collection platform can conveniently and centrally manage configuration information of log production end components of all branches, and maintenance is facilitated.

The log output file format, the path is defined in a log4j.xml configuration file, and a self-defined apend node is added in the log4 j.xml:

<appender name="xx"

class="com.esafenet.log.util.Log4jRollingFileAppender"><appender>，

the node name is a log type name, such as File log. The Log4jRollingFileAppender is a realization class for outputting the data Log file, and the name format and the generation format of the Log content data are specified in the class.

2. Log collection end

The log collection end uploads log data generated by application systems on all enterprise branch application servers to a log type folder pre-configured by an FTP server of a headquarter in an FTP mode.

The log collection uses a windows server with a timing task and an FTP file uploading service instruction, so that the step of installing third-party FTP client software and timing task software is omitted, and unnecessary service loss and maintenance cost for installing the third-party service software are reduced. The log collection end makes full use of the strong script processing capacity of the window system, the underlying principle is that a service related command in a windows system dos command is executed through a bat script, and a log data file generated by the log production end is uploaded to an ftp server of a group headquarter according to the appointed configuration.

Parameters such as a log path folder to be uploaded, an IP of an FTP server, a user name of the FTP server, a password of the FTP server, an enterprise branch name and the like are mainly defined in bat script configuration:

set folder = log folder path;

set ftpHost=ip；

set ftpUser=xx；

set ftpPass=xx；

the configurations can be uniformly configured and maintained through the group log collection platform.

Then, by traversing the log data files of the specified types in the folder, calling a custom uploading function uploadFile to upload the log data files:

for/R% folder%% f in (Log type name _. Log) do (log type name _. Log) (%), (log type name _. Log)%)

call :uploadFile 1,%%f,"file_log"

)

Uploading function:

:uploadFile

if ["%~1"]==["1"] (

set putCmd=put "%~2"

echo %~3

)

echo open %ftpHost%>ftp.up

echo %ftpUser%>>ftp.up

echo %ftpPass%>>ftp.up

echo cd %~3>>ftp.up

echo binary>>ftp.up

echo %putCmd%>>ftp.up

echo bye>>ftp.up

for/F "usebackq tokens=1" %%A in (`FTP -s:ftp.up`) do (

if %%A == 226 (

echo %%A transfer file %~2 successful..>> logs.txt

echo begin del file %~2 ..>> logs.txt

del %~2>> logs.txt

)

del ftp.up /q

goto:eof

and the bat batch processing script is triggered and executed by a self-contained timing task of the windows system. And when the local log files are successfully uploaded to the total service FTP server, the log files which are successfully uploaded on the application server are immediately deleted. When the log file is not uploaded successfully due to the network or other reasons, the log file can be uploaded again when the timed task is executed next time.

3. Log data import

logstack is a pipeline with real-time data transmission capability, has strong real-time data transmission capability, can uniformly filter data log files uploaded from a log production end, and inputs the data log files into an elastic search engine storage server cluster (ES server cluster) according to specified specifications and formats.

According to the method, log data are imported into an elastic search by deploying logstack service on an FTP server of a group headquarters and utilizing the strong data transmission capacity of the logstack.

In order to facilitate the logstash centralized management of logstash service configuration and service start and stop, operation and management are performed through a log collection platform of a group, and operation and maintenance personnel can easily perform configuration and service start and stop management on logstash without mastering complex instructions and configuration of logstash.

The system triggers and executes the bat scripts in batch by the system timing task, and the logs of the application servers of all enterprise branches are uploaded to the group server side, so that the logs are managed and stored in a centralized mode. The group headquarters can reasonably limit the flow threshold value of data warehousing by reasonably setting parameters such as logstack starting quantity, thread quantity, memory size and the like, improve the transmission and warehousing performance of logs, solve the problems of bottleneck, data loss, service downtime and the like on the flow which is easily met in a high-concurrency scene, and improve the processing capacity of the system. Meanwhile, operation and maintenance personnel can flexibly configure logstack related parameters, programs and components do not need to be separately deployed in all branches, and operation and maintenance cost is greatly reduced.

4. Log storage and retrieval

The ElasticSearch search engine is a search non-relational database based on Lucene, provides storage capacity based on separate storage type mass data and full-text retrieval service capacity based on distributed mass data, can conveniently and quickly retrieve data from the mass separate storage type mass data, and is based on a RESTful web interface to conveniently and quickly match keywords for retrieval. The feature of the ElasticSearch is that the retrieval speed is fast, the data volume is large, the matching degree is high, and the ElasticSearch is based on the natural characteristic of the distributed storage, so that it is easy to perform horizontal capacity expansion. These characteristics make up for problems that traditional databases cannot solve at all.

According to the method, the ElasticSearch log is stored, and the storage efficiency and the log query efficiency of the ElasticSearch log are improved. In order to ensure the stability of the ElasticSearch, the invention reasonably configures the logstack starting number and related transmission parameters according to the log collection platform of the group to control the frequency and threshold value of log data input into the ElasticSearch, and avoids the problems of excessive concurrency, excessive data volume, data insertion request rejection after the data insertion limit of the ElasticSearch is exceeded, and data loss risk and downtime of the ElasticSearch caused by independent reporting of each application server of each part.

In order to improve the efficiency of querying and inserting the ElasticSearch log and avoid the influence on the performance of the ElasticSearch log insertion and query caused by overlarge single index, the log collection platform establishes indexes in a partition mode according to the preset index script and the time period by day, month or year according to the preset rule. The log collection platform requests whether the index of the current time interval exists from the ElasticSearch cluster at a specified frequency, calls an interface GET/index name-YYYYMM/_ search, judges whether the index exists, and creates the index according to the current time interval if the index does not exist.

The present invention will be described in detail with reference to the accompanying drawings.

One embodiment of the invention provides a log data processing system. Referring to fig. 1, the system includes at least one application server located at an enterprise branch, an FTP server located at an enterprise headquarters, and a cluster of ES servers.

1. Application server

And the application servers are positioned in each enterprise division or division company and used for uniformly standardizing the format of the log data to form a log file and uploading the log file to the FTP server.

In particular, the application server is illustratively configured to implement the following functions:

and importing the expanded log4j packet, namely a log production device, into the application server of each enterprise division. And calling log4j service through a log collection platform, modifying the configuration information of the application servers of all enterprise branches, and configuring a log output file format and a log output file path. The log output platform modifies log4j. Xml relevant configuration by calling a log4j relevant interface, and the output file log is taken as an example as follows:

<appender name="file_log"

class="com.esafenet.log.util.Log4jRollingFileAppender">

<param name="DatePattern"

value="'.'yyyy-MM-dd'.log'"></param>

....

</appender>

the File _ log in < addder name = "File _ log" > is the self-defined service log type name to be generated, the value in < param name = "File" in < addder > child node is the log output path, and the log data File of the type is output to D: \ locations \ File path. < param name = "DatePattern" value = "(' yyyyy-MM-dd '. Log '") means that the log file is generated by day, and the generation format is file _ log _ yyyymmd.

Next, a Logger object is initialized in the log output service class:

protected Logger logger = Logger.getLogger(this.getClass());

LOG (FILE _ LOG _ LEVEL, objectClass) method is then invoked in the LOG output platform. Here, FILE _ LOG _ LEVEL is a custom LOG code; objectClass is a journal object in which the keywords of the journal output are defined. Calling the method adds a row of json character strings with preset formats in the D \ logs \ file \ log _ yyymmd. Log file, wherein the json character strings comprise values of all keywords defined by a data source system, a source ip and an Objectclass.

And then deploying a system batch processing bat script on each branch server, and configuring a server IP (Internet protocol), an uploading path, an FTP (file transfer protocol) user name, an FTP password and an enterprise branch name uploaded by a log in the bat script.

And then, configuring a timing task and executing the system batch processing bat script.

After the timing task is executed, a folder named by the enterprise branch name is created on the root directory of the headquarter FTP server, and a file _ log folder is created under the folder, and the file log data file uploaded by the branch is stored under the folder.

2. FTP server and ES server cluster

The FTP server is located at the end of the enterprise group and is configured to import the stored log files into the ES server cluster according to the configured log transmission parameters, and the log transmission parameters are used for controlling the frequency and the flow threshold value of the log files imported into the ES server cluster.

The ES server cluster is a storage server cluster of an elastic search engine and is configured to create indexes according to preset scripts and time interval partitions; responding to external request information for judging whether the index of the current time interval exists or not, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval.

Specifically, referring to fig. 2, a log transmission management application instance, for example, logstack, is deployed on each headquarters FTP server, and input and output related parameters of each instance are configured.

This embodiment is still illustrated by using filelog as an example:

input related parameters

input {

Read log file path

file {

path => "/data/loges/B3/filelog*.log"

tags => ["filelog"]

start_position => "beginning"

codec => "json"

}

The path in the above configuration is configured as a log file path for logstack monitoring, where/data/locations/is a path of an ftp root directory, and B3 is a part name, and represents all data files in log format beginning with filelog under the monitoring/data/locations/B3/directory. The logstack reads the newly added log data at a pre-configured frequency and imports the newly added log data into the ES cluster.

output related parameter

output {

# export to the Enterprise subsection elastic search

if "file2" in [tags] {

elasticsearch {

hosts => ["ip1:port","ip2:port",..]

index = > "index name- { + yyymm }".

document_type => "filelog2"

document_id => "%{id}"

pipeline => "my_timestamp_pipeline"

}

The hosts defines the ES cluster ip address set, and the index is the index name imported by the configuration data.

The index of the ES cluster is created according to year, month and day according to the configuration center. If the index expression here is created by month as the index name- (+ YYYYMM).

The log collection platform configuration center regularly detects whether the index of the current time interval is created according to the preset frequency, and if the index of the current time interval does not exist, the index is created according to the script preset by the configuration center.

Referring to fig. 3, another embodiment of the present invention further provides a method for processing log data, including the following steps:

step S101, the format of the log data on the application server is unified and normalized, and a log file of each application server is formed.

Specifically, an extended log4j log production device, that is, an extended log4j package, is introduced into the application server of each enterprise division. And calling log4j service through a log collection platform to modify the configuration information of each enterprise branch, and configuring a log output file format and a log output file path. The log output platform modifies log4j. Xml relevant configuration by calling log4j relevant interfaces, taking output file logs as an example here:

<appender name="file_log"

class="com.esafenet.log.util.Log4jRollingFileAppender">

<param name="DatePattern"

value="'.'yyyy-MM-dd'.log'"></param>

....

</appender>

the File _ log in < addder name = "File _ log" > is the self-defined service log type name to be generated, the value in < param name = "File" in < addder > child node is the log output path, and the log data File of the type is output to D: \ locations \ File path. < param name = "DatePattern" value = ". ' yyyyyy-MM-dd '. Log '" > indicates that the log file is generated by day, and the generation format is file _ log _ yyyymdd.

Then, initializing a Logger object in the log output service class:

protected Logger logger = Logger.getLogger(this.getClass());

LOG (FILE _ LOG _ LEVEL, objectClass) method is then called in the LOG output method. Here, FILE _ LOG _ LEVEL is a custom LOG code; objectClass is a journal object in which the keywords of the journal output are defined. The method is called, a row of json character strings with preset formats are added in the D: \ pages \ file _ log _ yyyymmdd. Log file, and the json character strings contain values of all keywords defined by a data source system, a source ip and an Objectclass.

The log data production end of the invention unifies the log output format of each server by expanding log4j components, standardizes the aperture of log data output, and the server end centrally manages and configures, thus leading to simple maintenance and strong expandability.

Step S102, uploading the log files on each application server to an FTP server;

specifically, a system batch bat script is deployed on an application server of each enterprise subsection. Then, the log uploaded server IP, upload path, FTP username, FTP password, enterprise branch name, etc. in the bat script are configured. And then, configuring a timing task to execute the bat script, and after the timing task is executed, creating a folder named by the branch name on an FTP root directory of the configuration server, and creating a file _ log folder below the folder, wherein the folder stores file log data files uploaded by application servers of enterprise branches.

According to the method, the bat script is executed through the task plan of the windows system, and the script uploads the log file through calling the window-carried ftp uploading command, so that the steps of installing third-party ftp client software and timed task software are omitted, and unnecessary server performance loss and maintenance cost for installing third-party service software are reduced. In addition, log data files are uploaded to a headquarter FTP server for centralized management and warehousing, the warehousing apertures are unified, the complexity and diversity of independent reporting and warehousing of all branch application servers are avoided, and the maintenance cost is reduced.

Step S103, importing the log file on the FTP server into an ES server cluster according to configured log transmission parameters, wherein the log transmission parameters are used for controlling the frequency and the flow threshold value of importing the log file into the ES server cluster;

step S104, creating indexes by partitioning the ES server cluster according to time periods according to a preset script;

step S105, responding to the request information of whether the index of the current time interval exists or not from the outside, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval.

Specifically, the steps S103 to S105 may be implemented by deploying one logstash instance on each FTP server, and configuring input and output related parameters of each instance after deployment.

Here, we still take filelog as an example:

input related parameters

input {

Read log file path

file {

path => "/data/loges/B3/filelog*.log"

tags => ["filelog"]

start_position => "beginning"

codec => "json"

}

The path in the above configuration is configured as a log file path monitored by logstack, wherein/data/logs/is a path of ftp root directory, and B3 is a branch name, which indicates all data files in log format beginning with filelog under the monitoring/data/logs/B3/directory. The Logstash reads the newly added log data at a pre-configured frequency and imports the log data into the ES cluster.

output related parameter

output {

# output to a partial elastic search

if "file2" in [tags] {

elasticsearch {

hosts => ["ip1:port","ip2:port",..]

index = > "index name- { + YYYYMM }".

document_type => "filelog2"

document_id => "%{id}"

pipeline => "my_timestamp_pipeline"

}

The index of the ES cluster is created according to year, month and day according to the configuration center. If the index expression here is created monthly as the index name- (+ YYYYMM).

The log collection platform configuration center regularly detects whether the index of the current time interval is created according to the preset frequency, and if the index of the current time interval does not exist, the index is created according to the script preset by the configuration center. Specifically, the log collection platform requests the ElasticSearch cluster for the existence of the index of the current time interval at a specified frequency, calls an interface GET/index name-YYYYYMM/_ search, judges whether the index exists or not, and creates the index according to the current time interval if the index does not exist.

According to the invention, as the log data source files are uploaded to the headquarter FTP server in a centralized manner, the log data can be traced, the headquarter can reasonably and effectively control parameters such as logstack starting instance quantity, quantity of once-read transmission logs, transmission data frequency, memory size and the like according to the reported flow size of the log data, server hardware configuration and ES cluster performance indexes, control the threshold value of data entering the ES, and improve the stability of the Logstash processing data.

In addition, the invention also supports the log collection platform to manage and configure each logstack instance and related parameters in a centralized way, monitors the health state of the whole device, and the headquarter can adjust the strategy and parameters in time to ensure the healthy operation of the device. In order to improve the efficiency of querying and inserting the ElasticSearch log and avoid the influence on the performance of the ElasticSearch log insertion and query caused by overlarge single index, the device reasonably divides the index according to time intervals through a log collection platform, and improves the performance indexes of log retrieval, analysis and statistics.

The above description is that of the preferred embodiment of the invention only. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims

1. A method for processing log data is characterized by comprising the following steps:

s2: uploading the log file on each application server to an FTP server;

s3: according to configured log transmission parameters, the log file on the FTP server is imported into an ES server cluster, and the log transmission parameters are used for controlling the frequency and the flow threshold value of the log file imported into the ES server cluster;

2. The method for processing log data according to claim 1, wherein the step S1 specifically includes the steps of:

s11: the application server imports the expanded log production component;

3. The method for processing log data according to claim 2, wherein the step S2 specifically includes the following steps:

s21: deploying a system batch script on an application server;

4. The method for processing log data according to claim 3, wherein the step S23 specifically includes the steps of:

5. The method according to claim 1, wherein the log transmission parameters are the number of started log transmission management applications, the number of threads, and/or the memory size.

6. A log data processing system, comprising:

the ES server cluster is configured to create indexes according to preset scripts and time period partitions; responding to external request information for judging whether the index of the current time interval exists or not, calling a query interface to judge whether the index exists or not, and if not, creating the index according to the current time interval.

7. The log data processing system of claim 6, wherein the at least one application server is further configured to:

importing the expanded log production component;

and outputting the log file according to the configuration information.

8. The log data processing system of claim 7, wherein the at least one application server is further configured to:

deploying a system batch script;

9. The log data processing system of claim 8, wherein the at least one application server is further configured to:

if the local log file is successfully uploaded to the FTP server, deleting the successfully uploaded log file locally; and if the uploading of the local log file fails, the local log file is uploaded again when the timing task is executed next time.

10. The log data processing system of claim 6, wherein the log transmission parameter is a number of log transmission management applications started, a number of threads, and/or a memory size.