CN106649461A - Method for automatically cleaning and maintaining elastic search log index file - Google Patents

Method for automatically cleaning and maintaining elastic search log index file Download PDF

Info

Publication number
CN106649461A
CN106649461A CN201610849348.7A CN201610849348A CN106649461A CN 106649461 A CN106649461 A CN 106649461A CN 201610849348 A CN201610849348 A CN 201610849348A CN 106649461 A CN106649461 A CN 106649461A
Authority
CN
China
Prior art keywords
index
log
task
elasticsearch
policy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610849348.7A
Other languages
Chinese (zh)
Inventor
金洪殿
赵仁明
亓开元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201610849348.7A priority Critical patent/CN106649461A/en
Publication of CN106649461A publication Critical patent/CN106649461A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention particularly relates to a method for automatically cleaning and maintaining an ElasticSearch log index file. According to the method for automatically cleaning and maintaining the ElasticSearch log index file, the index file is stored separately according to the time dimension, a log index deleting strategy is made according to the service requirement and becomes a scheduling task, the log deleting task is scheduled by using a scheduling frame, when the historical data index is required to be deleted, the index which meets the strategy is deleted integrally according to the log index deleting strategy, and the problem of efficiency of deleting according to a DeleteByquery mode can be solved. The method for automatically cleaning and maintaining the ElasticSearch log index file can quickly and efficiently delete the index file, cannot cause performance influence on current index and query, and solves the problem that the ElasticSearch has low efficiency when a DeleteByquery mode is adopted to delete a large data volume index.

Description

一种自动化清理维护ElasticSearch日志索引文件的方法A method for automatically cleaning and maintaining ElasticSearch log index files

技术领域technical field

本发明涉及大数据技术领域,特别涉及一种自动化清理维护ElasticSearch日志索引文件的方法。The invention relates to the technical field of big data, in particular to a method for automatically cleaning and maintaining ElasticSearch log index files.

背景技术Background technique

在信息技术中,大数据(Big data)是指无法在一定时间内,用常规的工具软件(如现有数据库管理工具或数据处理应用)对其内容进行抓取、管理、存储、搜索、共享、分析和可视化处理的由数量巨大、结构复杂、类型众多数据构成的大型复杂数据集合。大数据具有四大特点,即高容量(Volume)、快速性(Velocity)、多样性(Variety)和价值密度低(Value)。大数据带来的挑战在于它的实时处理,而数据本身也从结构性数据转向了非结构性数据,因此使用关系数据库对大数据进行处理是非常困难的。In information technology, big data refers to the inability to capture, manage, store, search, and share its content within a certain period of time with conventional software tools (such as existing database management tools or data processing applications). Large and complex data collections consisting of huge quantities, complex structures, and many types of data that are processed, analyzed, and visualized. Big data has four characteristics, namely high volume (Volume), rapidity (Velocity), diversity (Variety) and low value density (Value). The challenge brought by big data lies in its real-time processing, and the data itself has changed from structured data to unstructured data, so it is very difficult to use relational databases to process big data.

大数据通常用来形容一个公司创造的大量非结构化数据和半结构化数据,这些数据在下载到关系型数据库用于分析时会花费过多时间和金钱。大数据分析常和云计算联系到一起,因为实时的大型数据集分析需要像MapReduce、HBase一样的框架来向数十、数百或甚至数千的电脑分配工作。大数据分析相比于传统的数据仓库应用,具有数据量大、查询分析复杂等特点。大数据需要特殊的技术,以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术,包括大规模并行处理(MPP)数据库、数据挖掘电网、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统。Big data is often used to describe the large volumes of unstructured and semi-structured data that a company creates that would take too much time and money to download to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time analysis of large data sets requires frameworks like MapReduce and HBase to distribute work to tens, hundreds, or even thousands of computers. Compared with traditional data warehouse applications, big data analysis has the characteristics of large data volume and complex query and analysis. Big data requires special techniques to efficiently handle large volumes of data that tolerate elapsed time. Technologies applicable to big data, including massively parallel processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,便于与企业应用进行集成,是当前流行的企业搜索引擎,能够满足实时搜索,稳定,可靠,快速等要求。ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on a RESTful web interface. Elasticsearch is developed in Java and is easy to integrate with enterprise applications. It is a popular enterprise search engine and can meet the requirements of real-time search, stability, reliability, and speed.

但是,由于Elasticsearch底层实现的原因,当索引文件过大,需要大量删除索引时,需要很多索引文件的底层操作,造成了这一过程需要耗时比较长,往往对应用造成很大的影响。However, due to the underlying implementation of Elasticsearch, when the index file is too large and a large number of indexes need to be deleted, many underlying operations on the index file are required. This process takes a long time and often has a great impact on the application.

在当前的IT运维领域,基于ELK(ElasticSearch+Logstash+Kibana)平台的日志分析和监控工具被越来越多的运维人员使用。由于该系统的特殊性与所监控的系统的规模,往往会有大量的日志文件产生,并对其时效性要求较高。因此在数据量比较大并且增量数据也很多的情况下,索引文件就会很大,就会给索引与查询带来性能上的影响并对存储空间造成了一定的压力。在查询日志的过程中,一般只关注近期的数据,历史数据可以删除,因此如何自动化快速的删除历史索引数据成为这一架构实现的关键。基于上述情况,本发明提出了一种自动化清理维护ElasticSearch日志索引文件的方法。In the current IT operation and maintenance field, log analysis and monitoring tools based on the ELK (ElasticSearch+Logstash+Kibana) platform are used by more and more operation and maintenance personnel. Due to the particularity of the system and the scale of the monitored system, a large number of log files are often generated, and the timeliness requirements are high. Therefore, when the amount of data is relatively large and there are many incremental data, the index file will be large, which will affect the performance of indexing and query and put a certain pressure on the storage space. In the process of querying logs, we generally only focus on recent data, and historical data can be deleted. Therefore, how to automatically and quickly delete historical index data becomes the key to the implementation of this architecture. Based on the above situation, the present invention proposes a method for automatically cleaning and maintaining ElasticSearch log index files.

发明内容Contents of the invention

本发明为了弥补现有技术的缺陷,提供了一种简单高效的自动化清理维护ElasticSearch日志索引文件的方法。In order to make up for the defects of the prior art, the present invention provides a simple and efficient method for automatically cleaning and maintaining ElasticSearch log index files.

本发明是通过如下技术方案实现的:The present invention is achieved through the following technical solutions:

一种自动化清理维护ElasticSearch日志索引文件的方法,其特征在于:将索引文件按照时间维度来分开存储,根据业务需要制定日志索引删除策略,并使之成为一个调度任务,利用调度框架调度日志删除任务,当需要删除历史数据索引时,只需根据日志索引删除策略整体删除符合策略的索引即可,能够解决按DeleteByquery方式删除的效率问题。A method for automatically cleaning and maintaining ElasticSearch log index files, characterized in that: the index files are stored separately according to the time dimension, the log index deletion strategy is formulated according to business needs, and it becomes a scheduling task, and the log deletion task is scheduled using the scheduling framework , when you need to delete the historical data index, you only need to delete the index that meets the policy as a whole according to the log index deletion strategy, which can solve the efficiency problem of deleting by DeleteByquery.

所述索引删除策略根据业务需要来制定日志索引删除策略,确定保留索引的最长有效时间或者保留索引的最大存储空间。The index deletion strategy formulates the log index deletion strategy according to business needs, and determines the longest valid time of the reserved index or the maximum storage space of the reserved index.

本发明自动化清理维护ElasticSearch日志索引文件的方法,包括以下步骤:The method for automatically cleaning and maintaining the ElasticSearch log index file of the present invention comprises the following steps:

(1)创建日志索引删除策略,并根据日志索引删除策略创建调度任务;(1) Create a log index deletion policy, and create a scheduling task according to the log index deletion policy;

(2)启动调度任务,根据日志索引删除策略,执行相应的后台任务进行日志清理的工作;(2) Start the scheduling task, and execute the corresponding background task to clean up the log according to the log index deletion strategy;

(3)判断是否按照时间策略调度任务,若按照时间策略调度任务,则遍历索引,删除符合时间策略的索引;若不按照照时间策略调度任务,则根据存储空间要求删除索引;删除索引后返回步骤(2)。(3) Determine whether the task is scheduled according to the time policy. If the task is scheduled according to the time policy, then traverse the index and delete the index that meets the time policy; if the task is not scheduled according to the time policy, delete the index according to the storage space requirements; return after deleting the index Step (2).

本发明的有益效果是:该自动化清理维护ElasticSearch日志索引文件的方法,能够快速高效的删除索引文件,不会对当前的索引和查询造成性能上的影响,解决了Elasticsearch在采用DeleteByquery方式删除大数据量索引时效率低下的问题。The beneficial effects of the present invention are: the method for automatically cleaning and maintaining the ElasticSearch log index file can quickly and efficiently delete the index file without affecting the performance of the current index and query, and solves the problem that Elasticsearch uses DeleteByquery to delete large data The problem of inefficiency in volume indexing.

附图说明Description of drawings

附图1为本发明自动化清理维护ElasticSearch日志索引文件的方法示意图。Accompanying drawing 1 is a schematic diagram of the method for automatically cleaning and maintaining ElasticSearch log index files according to the present invention.

具体实施方式detailed description

为了使本发明所要解决的技术问题、技术方案及有益效果更加清楚明白,以下结合附图和实施例,对本发明进行详细的说明。应当说明的是,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and embodiments. It should be noted that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

该自动化清理维护ElasticSearch日志索引文件的方法,将索引文件按照时间维度来分开存储,根据业务需要制定日志索引删除策略,并使之成为一个调度任务,利用调度框架如Quartz等调度日志删除任务,当需要删除历史数据索引时,只需根据日志索引删除策略整体删除符合策略的索引即可,解决了按DeleteByquery方式删除的效率问题。The method for automatically cleaning and maintaining ElasticSearch log index files stores index files separately according to the time dimension, formulates log index deletion strategies according to business needs, and makes it a scheduling task, and uses scheduling frameworks such as Quartz to schedule log deletion tasks. When you need to delete the historical data index, you only need to delete the index that meets the policy as a whole according to the log index deletion strategy, which solves the efficiency problem of deleting by DeleteByquery.

所述索引删除策略根据业务需要来制定日志索引删除策略,确定保留索引的最长有效时间或者保留索引的最大存储空间。The index deletion strategy formulates the log index deletion strategy according to business needs, and determines the longest valid time of the reserved index or the maximum storage space of the reserved index.

本发明自动化清理维护ElasticSearch日志索引文件的方法,包括以下步骤:The method for automatically cleaning and maintaining the ElasticSearch log index file of the present invention comprises the following steps:

(1)创建日志索引删除策略,并根据日志索引删除策略创建调度任务;(1) Create a log index deletion policy, and create a scheduling task according to the log index deletion policy;

(2)启动调度任务,根据日志索引删除策略,执行相应的后台任务进行日志清理的工作;(2) Start the scheduling task, and execute the corresponding background task to clean up the log according to the log index deletion strategy;

(3)判断是否按照时间策略调度任务,若按照时间策略调度任务,则遍历索引,删除符合时间策略的索引;若不按照照时间策略调度任务,则根据存储空间要求删除索引;删除索引后返回步骤(2)。(3) Determine whether the task is scheduled according to the time policy. If the task is scheduled according to the time policy, then traverse the index and delete the index that meets the time policy; if the task is not scheduled according to the time policy, delete the index according to the storage space requirements; return after deleting the index Step (2).

Claims (3)

1.一种自动化清理维护ElasticSearch日志索引文件的方法,其特征在于,将索引文件按照时间维度来分开存储,根据业务需要制定日志索引删除策略,并使之成为一个调度任务,利用调度框架调度日志删除任务,当需要删除历史数据索引时,只需根据日志索引删除策略整体删除符合策略的索引即可,能够解决按DeleteByquery方式删除的效率问题。1. A method for automatically cleaning and maintaining ElasticSearch log index files, characterized in that the index files are stored separately according to the time dimension, the log index deletion strategy is formulated according to business needs, and it becomes a scheduling task, and the scheduling framework is used to schedule logs For deletion tasks, when you need to delete historical data indexes, you only need to delete the indexes that meet the policy as a whole according to the log index deletion strategy, which can solve the efficiency problem of deleting by DeleteByquery. 2.根据权利要求1所述的自动化清理维护ElasticSearch日志索引文件的方法,其特征在于:所述索引删除策略根据业务需要来制定日志索引删除策略,确定保留索引的最长有效时间或者保留索引的最大存储空间。2. The method for automatically cleaning and maintaining ElasticSearch log index files according to claim 1, characterized in that: the index deletion strategy formulates the log index deletion strategy according to business needs, and determines the longest valid time for retaining the index or the duration of the retaining index. Maximum storage space. 3.根据权利要求1或2所述的自动化清理维护ElasticSearch日志索引文件的方法,其特征在于,包括以下步骤:3. according to claim 1 and the method for automatic cleaning maintenance ElasticSearch log index file, it is characterized in that, comprising the following steps: (1)创建日志索引删除策略,并根据日志索引删除策略创建调度任务;(1) Create a log index deletion policy, and create a scheduling task according to the log index deletion policy; (2)启动调度任务,根据日志索引删除策略,执行相应的后台任务进行日志清理的工作;(2) Start the scheduling task, and execute the corresponding background task to clean up the log according to the log index deletion strategy; (3)判断是否按照时间策略调度任务,若按照时间策略调度任务,则遍历索引,删除符合时间策略的索引;若不按照照时间策略调度任务,则根据存储空间要求删除索引;删除索引后返回步骤(2)。(3) Determine whether the task is scheduled according to the time policy. If the task is scheduled according to the time policy, then traverse the index and delete the index that meets the time policy; if the task is not scheduled according to the time policy, delete the index according to the storage space requirements; return after deleting the index Step (2).
CN201610849348.7A 2016-09-26 2016-09-26 Method for automatically cleaning and maintaining elastic search log index file Pending CN106649461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610849348.7A CN106649461A (en) 2016-09-26 2016-09-26 Method for automatically cleaning and maintaining elastic search log index file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610849348.7A CN106649461A (en) 2016-09-26 2016-09-26 Method for automatically cleaning and maintaining elastic search log index file

Publications (1)

Publication Number Publication Date
CN106649461A true CN106649461A (en) 2017-05-10

Family

ID=58854129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610849348.7A Pending CN106649461A (en) 2016-09-26 2016-09-26 Method for automatically cleaning and maintaining elastic search log index file

Country Status (1)

Country Link
CN (1) CN106649461A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804497A (en) * 2018-04-02 2018-11-13 北京国电通网络技术有限公司 A kind of big data analysis method based on daily record
CN108959501A (en) * 2018-06-26 2018-12-07 新华三大数据技术有限公司 Delete the method and device of ES index
CN110515898A (en) * 2019-07-31 2019-11-29 济南浪潮数据技术有限公司 Log processing method and device
CN111930735A (en) * 2020-08-14 2020-11-13 中国工商银行股份有限公司 Data cleaning method and device and electronic equipment
CN112328587A (en) * 2020-11-18 2021-02-05 山东健康医疗大数据有限公司 Data processing method and device for ElasticSearch
CN113515409A (en) * 2021-03-04 2021-10-19 浪潮云信息技术股份公司 Log timing backup method and system based on ELK
CN114090507A (en) * 2021-11-16 2022-02-25 新华三大数据技术有限公司 Log file cleaning method, system, device and storage medium
CN114546999A (en) * 2022-01-24 2022-05-27 北京北信源软件股份有限公司 Data cleaning method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2144177A2 (en) * 2008-07-11 2010-01-13 Day Management AG System and method for a log-based data storage
CN105117271A (en) * 2015-08-17 2015-12-02 广东电网有限责任公司电力科学研究院 Historical data emulation method of IEC61850 based status monitoring emulation system test platform
CN105740410A (en) * 2016-01-29 2016-07-06 浪潮电子信息产业股份有限公司 Data statistics method based on Hbase secondary index

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2144177A2 (en) * 2008-07-11 2010-01-13 Day Management AG System and method for a log-based data storage
CN105117271A (en) * 2015-08-17 2015-12-02 广东电网有限责任公司电力科学研究院 Historical data emulation method of IEC61850 based status monitoring emulation system test platform
CN105740410A (en) * 2016-01-29 2016-07-06 浪潮电子信息产业股份有限公司 Data statistics method based on Hbase secondary index

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804497A (en) * 2018-04-02 2018-11-13 北京国电通网络技术有限公司 A kind of big data analysis method based on daily record
CN108959501A (en) * 2018-06-26 2018-12-07 新华三大数据技术有限公司 Delete the method and device of ES index
CN110515898A (en) * 2019-07-31 2019-11-29 济南浪潮数据技术有限公司 Log processing method and device
CN110515898B (en) * 2019-07-31 2022-04-22 济南浪潮数据技术有限公司 Log processing method and device
CN111930735A (en) * 2020-08-14 2020-11-13 中国工商银行股份有限公司 Data cleaning method and device and electronic equipment
CN112328587A (en) * 2020-11-18 2021-02-05 山东健康医疗大数据有限公司 Data processing method and device for ElasticSearch
CN113515409A (en) * 2021-03-04 2021-10-19 浪潮云信息技术股份公司 Log timing backup method and system based on ELK
CN114090507A (en) * 2021-11-16 2022-02-25 新华三大数据技术有限公司 Log file cleaning method, system, device and storage medium
CN114546999A (en) * 2022-01-24 2022-05-27 北京北信源软件股份有限公司 Data cleaning method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106649461A (en) Method for automatically cleaning and maintaining elastic search log index file
KR102627690B1 (en) Dimensional context propagation techniques for optimizing SKB query plans
CN109684352B (en) Data analysis system, data analysis method, storage medium, and electronic device
Tao et al. Minimal mapreduce algorithms
US8918363B2 (en) Data processing service
CN102799622B (en) Distributed structured query language (SQL) query method based on MapReduce expansion framework
WO2011146452A1 (en) Data storage and processing service
JP7030831B2 (en) Manage large association sets with optimized bitmap representations
JP2014502762A (en) Filtering query data in the data store
US10929370B2 (en) Index maintenance management of a relational database management system
US20140229427A1 (en) Database management delete efficiency
CN104917627A (en) Log cluster scanning and analysis method used for large-scale server cluster
US8694503B1 (en) Real-time indexing of data for analytics
Zhi et al. Research of Hadoop-based data flow management system
CN107330098A (en) A kind of querying method of self-defined report, calculate node and inquiry system
Sathya et al. Application of Hadoop MapReduce technique to Virtual Database system design
CN117171135A (en) User behavior analysis modeling method, analysis method and system
Pothuganti Big data analytics: Hadoop-Map reduce & NoSQL databases
CN110019152A (en) A kind of big data cleaning method
KR20180077830A (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
Wang et al. Event Indexing and Searching for High Volumes of Event Streams in the Cloud
Darius et al. From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies
Lou et al. Research on data query optimization based on SparkSQL and MongoDB
US8849833B1 (en) Indexing of data segments to facilitate analytics
CN105989117B (en) A method and system for fast joint processing of semi-structured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510

RJ01 Rejection of invention patent application after publication