CN106649461A - Method for automatically cleaning and maintaining elastic search log index file - Google Patents
Method for automatically cleaning and maintaining elastic search log index file Download PDFInfo
- Publication number
- CN106649461A CN106649461A CN201610849348.7A CN201610849348A CN106649461A CN 106649461 A CN106649461 A CN 106649461A CN 201610849348 A CN201610849348 A CN 201610849348A CN 106649461 A CN106649461 A CN 106649461A
- Authority
- CN
- China
- Prior art keywords
- index
- log
- task
- elasticsearch
- policy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2272—Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及大数据技术领域,特别涉及一种自动化清理维护ElasticSearch日志索引文件的方法。The invention relates to the technical field of big data, in particular to a method for automatically cleaning and maintaining ElasticSearch log index files.
背景技术Background technique
在信息技术中,大数据(Big data)是指无法在一定时间内,用常规的工具软件(如现有数据库管理工具或数据处理应用)对其内容进行抓取、管理、存储、搜索、共享、分析和可视化处理的由数量巨大、结构复杂、类型众多数据构成的大型复杂数据集合。大数据具有四大特点,即高容量(Volume)、快速性(Velocity)、多样性(Variety)和价值密度低(Value)。大数据带来的挑战在于它的实时处理,而数据本身也从结构性数据转向了非结构性数据,因此使用关系数据库对大数据进行处理是非常困难的。In information technology, big data refers to the inability to capture, manage, store, search, and share its content within a certain period of time with conventional software tools (such as existing database management tools or data processing applications). Large and complex data collections consisting of huge quantities, complex structures, and many types of data that are processed, analyzed, and visualized. Big data has four characteristics, namely high volume (Volume), rapidity (Velocity), diversity (Variety) and low value density (Value). The challenge brought by big data lies in its real-time processing, and the data itself has changed from structured data to unstructured data, so it is very difficult to use relational databases to process big data.
大数据通常用来形容一个公司创造的大量非结构化数据和半结构化数据,这些数据在下载到关系型数据库用于分析时会花费过多时间和金钱。大数据分析常和云计算联系到一起,因为实时的大型数据集分析需要像MapReduce、HBase一样的框架来向数十、数百或甚至数千的电脑分配工作。大数据分析相比于传统的数据仓库应用,具有数据量大、查询分析复杂等特点。大数据需要特殊的技术,以有效地处理大量的容忍经过时间内的数据。适用于大数据的技术,包括大规模并行处理(MPP)数据库、数据挖掘电网、分布式文件系统、分布式数据库、云计算平台、互联网和可扩展的存储系统。Big data is often used to describe the large volumes of unstructured and semi-structured data that a company creates that would take too much time and money to download to a relational database for analysis. Big data analysis is often associated with cloud computing, because real-time analysis of large data sets requires frameworks like MapReduce and HBase to distribute work to tens, hundreds, or even thousands of computers. Compared with traditional data warehouse applications, big data analysis has the characteristics of large data volume and complex query and analysis. Big data requires special techniques to efficiently handle large volumes of data that tolerate elapsed time. Technologies applicable to big data, including massively parallel processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.
ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,便于与企业应用进行集成,是当前流行的企业搜索引擎,能够满足实时搜索,稳定,可靠,快速等要求。ElasticSearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on a RESTful web interface. Elasticsearch is developed in Java and is easy to integrate with enterprise applications. It is a popular enterprise search engine and can meet the requirements of real-time search, stability, reliability, and speed.
但是,由于Elasticsearch底层实现的原因,当索引文件过大,需要大量删除索引时,需要很多索引文件的底层操作,造成了这一过程需要耗时比较长,往往对应用造成很大的影响。However, due to the underlying implementation of Elasticsearch, when the index file is too large and a large number of indexes need to be deleted, many underlying operations on the index file are required. This process takes a long time and often has a great impact on the application.
在当前的IT运维领域,基于ELK(ElasticSearch+Logstash+Kibana)平台的日志分析和监控工具被越来越多的运维人员使用。由于该系统的特殊性与所监控的系统的规模,往往会有大量的日志文件产生,并对其时效性要求较高。因此在数据量比较大并且增量数据也很多的情况下,索引文件就会很大,就会给索引与查询带来性能上的影响并对存储空间造成了一定的压力。在查询日志的过程中,一般只关注近期的数据,历史数据可以删除,因此如何自动化快速的删除历史索引数据成为这一架构实现的关键。基于上述情况,本发明提出了一种自动化清理维护ElasticSearch日志索引文件的方法。In the current IT operation and maintenance field, log analysis and monitoring tools based on the ELK (ElasticSearch+Logstash+Kibana) platform are used by more and more operation and maintenance personnel. Due to the particularity of the system and the scale of the monitored system, a large number of log files are often generated, and the timeliness requirements are high. Therefore, when the amount of data is relatively large and there are many incremental data, the index file will be large, which will affect the performance of indexing and query and put a certain pressure on the storage space. In the process of querying logs, we generally only focus on recent data, and historical data can be deleted. Therefore, how to automatically and quickly delete historical index data becomes the key to the implementation of this architecture. Based on the above situation, the present invention proposes a method for automatically cleaning and maintaining ElasticSearch log index files.
发明内容Contents of the invention
本发明为了弥补现有技术的缺陷,提供了一种简单高效的自动化清理维护ElasticSearch日志索引文件的方法。In order to make up for the defects of the prior art, the present invention provides a simple and efficient method for automatically cleaning and maintaining ElasticSearch log index files.
本发明是通过如下技术方案实现的:The present invention is achieved through the following technical solutions:
一种自动化清理维护ElasticSearch日志索引文件的方法,其特征在于:将索引文件按照时间维度来分开存储,根据业务需要制定日志索引删除策略,并使之成为一个调度任务,利用调度框架调度日志删除任务,当需要删除历史数据索引时,只需根据日志索引删除策略整体删除符合策略的索引即可,能够解决按DeleteByquery方式删除的效率问题。A method for automatically cleaning and maintaining ElasticSearch log index files, characterized in that: the index files are stored separately according to the time dimension, the log index deletion strategy is formulated according to business needs, and it becomes a scheduling task, and the log deletion task is scheduled using the scheduling framework , when you need to delete the historical data index, you only need to delete the index that meets the policy as a whole according to the log index deletion strategy, which can solve the efficiency problem of deleting by DeleteByquery.
所述索引删除策略根据业务需要来制定日志索引删除策略,确定保留索引的最长有效时间或者保留索引的最大存储空间。The index deletion strategy formulates the log index deletion strategy according to business needs, and determines the longest valid time of the reserved index or the maximum storage space of the reserved index.
本发明自动化清理维护ElasticSearch日志索引文件的方法,包括以下步骤:The method for automatically cleaning and maintaining the ElasticSearch log index file of the present invention comprises the following steps:
(1)创建日志索引删除策略,并根据日志索引删除策略创建调度任务;(1) Create a log index deletion policy, and create a scheduling task according to the log index deletion policy;
(2)启动调度任务,根据日志索引删除策略,执行相应的后台任务进行日志清理的工作;(2) Start the scheduling task, and execute the corresponding background task to clean up the log according to the log index deletion strategy;
(3)判断是否按照时间策略调度任务,若按照时间策略调度任务,则遍历索引,删除符合时间策略的索引;若不按照照时间策略调度任务,则根据存储空间要求删除索引;删除索引后返回步骤(2)。(3) Determine whether the task is scheduled according to the time policy. If the task is scheduled according to the time policy, then traverse the index and delete the index that meets the time policy; if the task is not scheduled according to the time policy, delete the index according to the storage space requirements; return after deleting the index Step (2).
本发明的有益效果是:该自动化清理维护ElasticSearch日志索引文件的方法,能够快速高效的删除索引文件,不会对当前的索引和查询造成性能上的影响,解决了Elasticsearch在采用DeleteByquery方式删除大数据量索引时效率低下的问题。The beneficial effects of the present invention are: the method for automatically cleaning and maintaining the ElasticSearch log index file can quickly and efficiently delete the index file without affecting the performance of the current index and query, and solves the problem that Elasticsearch uses DeleteByquery to delete large data The problem of inefficiency in volume indexing.
附图说明Description of drawings
附图1为本发明自动化清理维护ElasticSearch日志索引文件的方法示意图。Accompanying drawing 1 is a schematic diagram of the method for automatically cleaning and maintaining ElasticSearch log index files according to the present invention.
具体实施方式detailed description
为了使本发明所要解决的技术问题、技术方案及有益效果更加清楚明白,以下结合附图和实施例,对本发明进行详细的说明。应当说明的是,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and embodiments. It should be noted that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
该自动化清理维护ElasticSearch日志索引文件的方法,将索引文件按照时间维度来分开存储,根据业务需要制定日志索引删除策略,并使之成为一个调度任务,利用调度框架如Quartz等调度日志删除任务,当需要删除历史数据索引时,只需根据日志索引删除策略整体删除符合策略的索引即可,解决了按DeleteByquery方式删除的效率问题。The method for automatically cleaning and maintaining ElasticSearch log index files stores index files separately according to the time dimension, formulates log index deletion strategies according to business needs, and makes it a scheduling task, and uses scheduling frameworks such as Quartz to schedule log deletion tasks. When you need to delete the historical data index, you only need to delete the index that meets the policy as a whole according to the log index deletion strategy, which solves the efficiency problem of deleting by DeleteByquery.
所述索引删除策略根据业务需要来制定日志索引删除策略,确定保留索引的最长有效时间或者保留索引的最大存储空间。The index deletion strategy formulates the log index deletion strategy according to business needs, and determines the longest valid time of the reserved index or the maximum storage space of the reserved index.
本发明自动化清理维护ElasticSearch日志索引文件的方法,包括以下步骤:The method for automatically cleaning and maintaining the ElasticSearch log index file of the present invention comprises the following steps:
(1)创建日志索引删除策略,并根据日志索引删除策略创建调度任务;(1) Create a log index deletion policy, and create a scheduling task according to the log index deletion policy;
(2)启动调度任务,根据日志索引删除策略,执行相应的后台任务进行日志清理的工作;(2) Start the scheduling task, and execute the corresponding background task to clean up the log according to the log index deletion strategy;
(3)判断是否按照时间策略调度任务,若按照时间策略调度任务,则遍历索引,删除符合时间策略的索引;若不按照照时间策略调度任务,则根据存储空间要求删除索引;删除索引后返回步骤(2)。(3) Determine whether the task is scheduled according to the time policy. If the task is scheduled according to the time policy, then traverse the index and delete the index that meets the time policy; if the task is not scheduled according to the time policy, delete the index according to the storage space requirements; return after deleting the index Step (2).
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610849348.7A CN106649461A (en) | 2016-09-26 | 2016-09-26 | Method for automatically cleaning and maintaining elastic search log index file |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610849348.7A CN106649461A (en) | 2016-09-26 | 2016-09-26 | Method for automatically cleaning and maintaining elastic search log index file |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649461A true CN106649461A (en) | 2017-05-10 |
Family
ID=58854129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610849348.7A Pending CN106649461A (en) | 2016-09-26 | 2016-09-26 | Method for automatically cleaning and maintaining elastic search log index file |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649461A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804497A (en) * | 2018-04-02 | 2018-11-13 | 北京国电通网络技术有限公司 | A kind of big data analysis method based on daily record |
CN108959501A (en) * | 2018-06-26 | 2018-12-07 | 新华三大数据技术有限公司 | Delete the method and device of ES index |
CN110515898A (en) * | 2019-07-31 | 2019-11-29 | 济南浪潮数据技术有限公司 | Log processing method and device |
CN111930735A (en) * | 2020-08-14 | 2020-11-13 | 中国工商银行股份有限公司 | Data cleaning method and device and electronic equipment |
CN112328587A (en) * | 2020-11-18 | 2021-02-05 | 山东健康医疗大数据有限公司 | Data processing method and device for ElasticSearch |
CN113515409A (en) * | 2021-03-04 | 2021-10-19 | 浪潮云信息技术股份公司 | Log timing backup method and system based on ELK |
CN114090507A (en) * | 2021-11-16 | 2022-02-25 | 新华三大数据技术有限公司 | Log file cleaning method, system, device and storage medium |
CN114546999A (en) * | 2022-01-24 | 2022-05-27 | 北京北信源软件股份有限公司 | Data cleaning method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2144177A2 (en) * | 2008-07-11 | 2010-01-13 | Day Management AG | System and method for a log-based data storage |
CN105117271A (en) * | 2015-08-17 | 2015-12-02 | 广东电网有限责任公司电力科学研究院 | Historical data emulation method of IEC61850 based status monitoring emulation system test platform |
CN105740410A (en) * | 2016-01-29 | 2016-07-06 | 浪潮电子信息产业股份有限公司 | Data statistics method based on Hbase secondary index |
-
2016
- 2016-09-26 CN CN201610849348.7A patent/CN106649461A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2144177A2 (en) * | 2008-07-11 | 2010-01-13 | Day Management AG | System and method for a log-based data storage |
CN105117271A (en) * | 2015-08-17 | 2015-12-02 | 广东电网有限责任公司电力科学研究院 | Historical data emulation method of IEC61850 based status monitoring emulation system test platform |
CN105740410A (en) * | 2016-01-29 | 2016-07-06 | 浪潮电子信息产业股份有限公司 | Data statistics method based on Hbase secondary index |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804497A (en) * | 2018-04-02 | 2018-11-13 | 北京国电通网络技术有限公司 | A kind of big data analysis method based on daily record |
CN108959501A (en) * | 2018-06-26 | 2018-12-07 | 新华三大数据技术有限公司 | Delete the method and device of ES index |
CN110515898A (en) * | 2019-07-31 | 2019-11-29 | 济南浪潮数据技术有限公司 | Log processing method and device |
CN110515898B (en) * | 2019-07-31 | 2022-04-22 | 济南浪潮数据技术有限公司 | Log processing method and device |
CN111930735A (en) * | 2020-08-14 | 2020-11-13 | 中国工商银行股份有限公司 | Data cleaning method and device and electronic equipment |
CN112328587A (en) * | 2020-11-18 | 2021-02-05 | 山东健康医疗大数据有限公司 | Data processing method and device for ElasticSearch |
CN113515409A (en) * | 2021-03-04 | 2021-10-19 | 浪潮云信息技术股份公司 | Log timing backup method and system based on ELK |
CN114090507A (en) * | 2021-11-16 | 2022-02-25 | 新华三大数据技术有限公司 | Log file cleaning method, system, device and storage medium |
CN114546999A (en) * | 2022-01-24 | 2022-05-27 | 北京北信源软件股份有限公司 | Data cleaning method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649461A (en) | Method for automatically cleaning and maintaining elastic search log index file | |
KR102627690B1 (en) | Dimensional context propagation techniques for optimizing SKB query plans | |
CN109684352B (en) | Data analysis system, data analysis method, storage medium, and electronic device | |
Tao et al. | Minimal mapreduce algorithms | |
US8918363B2 (en) | Data processing service | |
CN102799622B (en) | Distributed structured query language (SQL) query method based on MapReduce expansion framework | |
WO2011146452A1 (en) | Data storage and processing service | |
JP7030831B2 (en) | Manage large association sets with optimized bitmap representations | |
JP2014502762A (en) | Filtering query data in the data store | |
US10929370B2 (en) | Index maintenance management of a relational database management system | |
US20140229427A1 (en) | Database management delete efficiency | |
CN104917627A (en) | Log cluster scanning and analysis method used for large-scale server cluster | |
US8694503B1 (en) | Real-time indexing of data for analytics | |
Zhi et al. | Research of Hadoop-based data flow management system | |
CN107330098A (en) | A kind of querying method of self-defined report, calculate node and inquiry system | |
Sathya et al. | Application of Hadoop MapReduce technique to Virtual Database system design | |
CN117171135A (en) | User behavior analysis modeling method, analysis method and system | |
Pothuganti | Big data analytics: Hadoop-Map reduce & NoSQL databases | |
CN110019152A (en) | A kind of big data cleaning method | |
KR20180077830A (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
Wang et al. | Event Indexing and Searching for High Volumes of Event Streams in the Cloud | |
Darius et al. | From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies | |
Lou et al. | Research on data query optimization based on SparkSQL and MongoDB | |
US8849833B1 (en) | Indexing of data segments to facilitate analytics | |
CN105989117B (en) | A method and system for fast joint processing of semi-structured data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |
|
RJ01 | Rejection of invention patent application after publication |