CN111026621A

CN111026621A - Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Info

Publication number: CN111026621A
Application number: CN201911342583.5A
Authority: CN
Inventors: 蒋方禹; 范渊; 史光庭
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-04-17
Anticipated expiration: 2039-12-23
Also published as: CN111026621B

Abstract

The application discloses a monitoring alarm method, device, equipment and medium for an elastic search cluster, wherein the method comprises the following steps: acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, the running condition of the Elasticissearch cluster can be monitored, corresponding alarm is generated on abnormal running conditions, the false alarm rate is low, the reliability is high, and the cost is low.

Description

Monitoring alarm method, device, equipment and medium for Elasticissearch cluster

Technical Field

The application relates to the technical field of an elastic search cluster, in particular to a monitoring alarm method, device, equipment and medium for the elastic search cluster.

Background

With the advent of the big data age, ElasitcSearch is gaining more and more favor as a distributed full-text search engine. How to effectively and inexpensively monitor and manage the Elasticsearch cluster has been a big problem. At present, the monitoring method of the Elasticsearch cluster mainly collects cluster operation data on a main node in the Elasticsearch cluster, judges whether the cluster operation data is greater than or equal to a preset alarm threshold value, and gives a corresponding alarm when the cluster operation data is greater than or equal to the preset alarm threshold value, so that a large number of false alarms exist, the false alarm rate is high, and after the alarm, operation and maintenance personnel are required to investigate the reason of the alarm, the cost is high, and the reliability is low.

Disclosure of Invention

In view of this, an object of the present application is to provide a monitoring alarm method, apparatus, device, and medium for an Elasticsearch cluster, which can monitor an operating condition of the Elasticsearch cluster and generate a corresponding alarm for an abnormal operating condition, and has a low false alarm rate, high reliability, and low cost. The specific scheme is as follows:

in a first aspect, the application discloses an Elasticsearch cluster-oriented monitoring alarm method, which includes:

acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;

acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node;

performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not;

and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.

Optionally, the obtaining the target cluster operation data through a Restful API on the master node in the Elasticsearch cluster includes:

acquiring target cluster running data comprising any one or a combination of several of a child node IP, a child node name, an index number, index health, an index state, a merge thread number, a task name, task running time, a segment number, a segment size, query delay, JVM (java virtual machine) usage, GC time, GC times, storage space occupied by the index and a fragment volume through a Restful API (application programming interface) on a main node in an elastic segment cluster.

Optionally, the obtaining, by the data collection agent on the working node in the Elasticsearch cluster, the target node operation data includes:

the method comprises the steps of obtaining target node operation data comprising any one or combination of CPU utilization rate, hard disk utilization rate, memory utilization rate, hard disk reading rate, hard disk writing rate and hard disk io blocking rate through a data collection agent on a working node in an Elasticissearch cluster.

Optionally, before performing the association analysis on the target cluster operation data and the target node operation data, the method further includes:

and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.

Optionally, after the target cluster operation data and the target node operation data are cleaned according to a preset association rule, the method further includes:

and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.

Optionally, after the corresponding alarm is performed in the preset alarm manner, the method further includes:

and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.

Optionally, the performing, by performing correlation analysis on the target cluster operation data and the target node operation data to determine whether the operation state of the Elasticsearch cluster is abnormal includes:

judging whether the index running state of the Elasticissearch cluster is abnormal or not by analyzing the index name, the index number, the index health degree, the index state, the merge thread number, the segment size, the query delay and the fragment volume;

judging whether the operating efficiency of the Elasticissearch cluster is abnormal or not by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;

judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate;

and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate.

In a second aspect, the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:

the system comprises a first data acquisition module, a second data acquisition module and a first data processing module, wherein the first data acquisition module is used for acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster;

a second data acquisition module, configured to acquire target node operation data through a data collection agent on a working node in an Elasticsearch cluster, where the data collection agent is an executable file deployed in the working node and is configured to collect the target node operation data of the working node;

the data analysis module is used for performing correlation analysis on the target cluster operation data and the target node operation data so as to judge whether the operation state of the Elasticise cluster is abnormal or not;

and the alarm module is used for giving corresponding alarm in a preset alarm mode when the running state of the Elasticissearch cluster is abnormal.

In a third aspect, the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:

a memory and a processor;

wherein the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the aforementioned disclosed monitoring alarm method for the Elasticsearch cluster.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned disclosed monitoring alarm method for an Elasticsearch cluster.

Therefore, the method comprises the steps that firstly, the Restful API on the main node in the Elasticissearch cluster is used for obtaining the running data of the target cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; then, performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode. Therefore, after the target cluster operation data on the main node and the target node operation data on the working node in the Elasticissearch cluster are obtained, the target cluster operation data and the target node operation data are subjected to correlation analysis to judge whether the operation state of the Elasticissearch cluster is abnormal or not, if yes, corresponding alarm is given, so that the operation condition of the Elasticissearch cluster can be monitored, corresponding alarm is given to the abnormal operation condition, and the Elasticissearch cluster alarm system is low in false alarm rate, high in reliability and low in cost.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart of an Elasticissearch cluster-oriented monitoring alarm method disclosed in the present application;

FIG. 2 is a flowchart of a specific monitoring alarm method for an Elasticsearch cluster disclosed in the present application;

FIG. 3 is a schematic structural diagram of an elastic search cluster-oriented monitoring alarm device disclosed in the present application;

FIG. 4 is a structural diagram of a monitoring alarm device facing to an elastic search cluster disclosed in the present application;

fig. 5 is a diagram of a server structure disclosed in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, the monitoring method of the Elasticsearch cluster mainly collects cluster operation data on a main node in the Elasticsearch cluster, judges whether the cluster operation data is greater than or equal to a preset alarm threshold value, and gives a corresponding alarm when the cluster operation data is greater than or equal to the preset alarm threshold value, so that a large number of false alarms exist, the false alarm rate is high, and after the alarm, operation and maintenance personnel are required to investigate the reason of the alarm, the cost is high, and the reliability is low. In view of the above, the application provides an Elasticsearch cluster-oriented monitoring alarm method, which can monitor the operation condition of an Elasticsearch cluster and generate a corresponding alarm for an abnormal operation condition, and has the advantages of low false alarm rate, high reliability and low cost.

Referring to fig. 1, an embodiment of the present application discloses an Elasticsearch cluster-oriented monitoring alarm method, including:

step S11: and acquiring the target cluster operation data through a Restful API on the main node in the Elasticissearch cluster.

In this embodiment, the target cluster operating data on the main node in the Elasticsearch cluster needs to be acquired first, and specifically, the target cluster operating data may be acquired through an Application Programming Interface (Restful API) on the main node. The target cluster operation data may also include other cluster operation data, wherein the target cluster operation data includes any one or a combination of several of an IP child node, a child node name, an index number, an index health, an index state, a merge thread number, a task name, a task operation time, a segment number, a segment size, an inquiry delay, a usage amount of a JVM (Java Virtual Machine), a GC (Garbage Collection) time, a GC number, an index occupied storage space, and a segment volume.

Step S12: and acquiring target node operation data through a data collection agent on a working node in the Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node.

In this embodiment, it is further required to obtain target node operation data on a working node in the Elasticsearch cluster, specifically, the target node operation data is obtained through a data collection agent on the working node, where the data collection agent is an executable file deployed in the working node and is used for the target node operation data of the working node, the target node operation data includes any one or a combination of a CPU usage rate, a hard disk usage rate, a memory usage rate, a hard disk read rate, a hard disk write rate, and a hard disk io blocking rate, and the target node operation data may further include other node operation data.

Step S13: and performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not.

After the target cluster operation data and the target node operation data are obtained, correlation analysis needs to be performed on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticise cluster is abnormal or not. The target cluster operation data on the main node in the Elasticissearch cluster and the target node operation data on the working node have relevance, and whether the operation data of the working node and the operation data of the main node are corresponding and consistent or not can be determined by analyzing the operation data with relevance so as to judge whether the operation state of the Elasticissearch cluster is abnormal or not.

Step S14: and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.

In a specific implementation process, after the target cluster operation data and the target node operation data are subjected to correlation analysis, if the operation state of the Elasticsearch cluster is abnormal, corresponding alarm is given in a preset alarm mode. Specifically, the alarm mode includes, but is not limited to, a visual information prompting mode and a voice prompting mode. For example by way of mail.

Referring to fig. 2, an embodiment of the present application discloses a specific monitoring alarm method for an Elasticsearch cluster, where the method includes:

step S21: and acquiring the target cluster operation data through a Restful API on the main node in the Elasticissearch cluster.

Step S22: the method comprises the steps of obtaining target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and used for collecting the target node operation data of the working node.

Step S23: and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.

In a specific implementation process, after the target cluster operation data and the target node operation data are obtained, the target cluster operation data and the target node operation data need to be cleaned according to a preset association rule, and the cleaned target cluster operation data and the cleaned target node operation data are stored in corresponding databases. Specifically, the target cluster operation data and the target node operation data are cleaned according to a preset rule by taking a time line as a basis, so that the target cluster operation data and the target node operation data are associated.

Step S24: and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.

It can be understood that after the cleaned target cluster operation data and target node operation data are stored in the corresponding databases, the target cluster operation data and the target node operation data in the databases need to be visually displayed through an ECharts drawing, so that operation and maintenance personnel can know the current operation condition of the whole cluster.

Step S25: and performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not.

It can be understood that the target cluster operation data and the target node operation data need to be subjected to correlation analysis to determine whether the operation state of the Elasticsearch cluster is abnormal. Specifically, whether the operating efficiency of the Elasticsearch cluster is abnormal is judged by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate. And when the index running state, the running effect, the related task running or the system running load of the Elasticissearch cluster are abnormal, judging that the running state of the Elasticissearch cluster is abnormal.

Step S26: and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.

Step S27: and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.

Referring to fig. 3, an embodiment of the present application discloses an Elasticsearch cluster-oriented monitoring alarm device, including:

a first data obtaining module 11, configured to obtain target cluster operation data through a Restful API on a host node in an Elasticsearch cluster;

a second data obtaining module 12, configured to obtain target node operation data through a data collection agent on a working node in an Elasticsearch cluster, where the data collection agent is an executable file deployed in the working node and is configured to collect the target node operation data of the working node;

the data analysis module 13 is configured to perform correlation analysis on the target cluster operation data and the target node operation data to determine whether an operation state of the Elasticsearch cluster is abnormal;

and the alarm module 14 is configured to perform corresponding alarm in a preset alarm manner when the operating state of the Elasticsearch cluster is abnormal.

Further, referring to fig. 4, an embodiment of the present application further discloses an Elasticsearch cluster-oriented monitoring alarm device, including: a processor 21 and a memory 22.

Wherein the memory 22 is used for storing a computer program; the processor 21 is configured to execute the computer program to implement the monitoring alarm method facing the Elasticsearch cluster disclosed in the foregoing embodiment.

For a specific process of the monitoring alarm method for the Elasticsearch cluster, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated herein.

Further, as shown in fig. 5, a schematic diagram of a server structure provided in the embodiment of the present application is shown. The server 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, sensors 25, and a communication bus 26. The memory 42 is configured to store a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the monitoring alarm method for an Elasticsearch cluster disclosed in any of the foregoing embodiments.

In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware on the internet of things device; the communication interface 24 creates a data transmission channel between the server 20 and an external device, and the communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited again; sensor 25 for acquiring sensor data, specific sensor types including, but not limited to, speed sensor, temperature sensor, infrared sensor, light sensor, sound sensor, image sensor, and the like.

In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.

The operating system 221 is used to manage and control hardware and computer programs 222 on the internet of things device 20, so as to implement operations and processing on the mass databases 223 in the processor 21 and the memory 22, and may be Windows, Unix, Linux, or the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the Elasticsearch cluster-oriented monitoring alarm method disclosed in any of the foregoing embodiments. The data 223 may include data received by the server and transmitted from an external device, or may include data collected by the sensor 25 itself.

Further, an embodiment of the present application also discloses a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the following steps:

acquiring target cluster operation data through a Restful API on a main node in an Elasticissearch cluster; acquiring target node operation data through a data collection agent on a working node in an Elasticissearch cluster, wherein the data collection agent is an executable file deployed in the working node and is used for collecting the target node operation data of the working node; performing correlation analysis on the target cluster operation data and the target node operation data to judge whether the operation state of the Elasticissearch cluster is abnormal or not; and if the running state of the Elasticissearch cluster is abnormal, giving corresponding alarm in a preset alarm mode.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: acquiring target cluster running data comprising any one or a combination of several of a child node IP, a child node name, an index number, index health, an index state, a merge thread number, a task name, task running time, a segment number, a segment size, query delay, JVM (java virtual machine) usage, GC time, GC times, storage space occupied by the index and a fragment volume through a Restful API (application programming interface) on a main node in an elastic segment cluster.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: the method comprises the steps of obtaining target node operation data comprising any one or combination of CPU utilization rate, hard disk utilization rate, memory utilization rate, hard disk reading rate, hard disk writing rate and hard disk io blocking rate through a data collection agent on a working node in an Elasticissearch cluster.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and cleaning the target cluster operation data and the target node operation data according to a preset association rule, and storing the target cluster operation data and the target node operation data into a corresponding database according to dates.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and performing visual display on the cleaned target cluster operation data and the cleaned target node operation data through ECharts drawing.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: and analyzing the correlation analysis result to obtain an abnormal generation reason, and carrying out visual display on the abnormal generation reason.

In this embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: judging whether the index running state of the Elasticissearch cluster is abnormal or not by analyzing the index name, the index number, the index health degree, the index state, the merge thread number, the segment size, the query delay and the fragment volume; judging whether the operating efficiency of the Elasticissearch cluster is abnormal or not by analyzing the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; judging whether the operation of the related tasks of the Elasticissearch cluster is abnormal or not by analyzing the task operation time, the hard disk reading rate, the hard disk writing rate and the hard disk io blocking rate; and judging whether the system operation load of the Elasticise cluster is abnormal or not by analyzing the JVM usage amount, the GC time, the GC times, the storage space occupied by the index, the CPU usage rate and the hard disk usage rate.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of other elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The detailed description is given above to the monitoring alarm method, device, equipment and medium for the Elasticsearch cluster provided by the present application, a specific example is applied in the present application to explain the principle and the implementation manner of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An Elasticissearch cluster-oriented monitoring alarm method is characterized by comprising the following steps:

2. The method for monitoring and alarming for the Elasticissearch cluster as claimed in claim 1, wherein the obtaining of the target cluster operation data through the Restful API on the master node in the Elasticissearch cluster comprises:

3. The monitoring alarm method facing to the Elasticsearch cluster as claimed in claim 2, wherein the obtaining of the target node operation data by the data collection agent on the working node in the Elasticsearch cluster comprises:

4. The transit search cluster-oriented monitoring and alarming method as claimed in claim 3, wherein before the performing the correlation analysis on the target cluster operation data and the target node operation data, the method further comprises:

5. The transit search cluster-oriented monitoring and alarming method as claimed in claim 4, wherein after the target cluster operation data and the target node operation data are cleaned according to a preset association rule, the method further comprises:

6. The monitoring alarm method facing to the Elasticsearch cluster as claimed in claim 5, wherein after the corresponding alarm is performed by the preset alarm method, the method further comprises:

7. The monitoring alarm method facing to the Elasticissearch cluster as claimed in any one of claims 3 to 6, wherein the performing correlation analysis on the target cluster operation data and the target node operation data to determine whether the operation state of the Elasticissearch cluster is abnormal comprises:

8. An Elasticsearch cluster-oriented monitoring alarm device is characterized by comprising:

9. An Elasticsearch cluster-oriented monitoring alarm device, comprising:

a memory and a processor;

wherein the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the monitoring alarm method for the Elasticsearch cluster of any of claims 1 to 7.

10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the method for monitoring and alarming for Elasticsearch cluster as claimed in any of claims 1 to 7.